[BiO BB] how to work on two txt files simultaneously by handle corresponding lines from each file
Jose Maria Gonzalez Izarzugaza
biopctgi at yahoo.es
Wed Jul 20 11:19:29 EDT 2005
Hello Alex,
I think that what you want is to modify long1 with short1, long2 with
short2 and so on.
I recommed you to replace your 2 loops with this one.
for ($seq=0;$seq<scalar @long;$seq++){
$short=$short[$seq];
$long=$long[$seq];
$offset = int(rand(length($long)%193));
substr($long,$offset,length($short),$short);
printf "%3d", $offset+1;
print "\n", $long, "\n";
}
Good Luck!
Txema
Alex Zhang wrote:
>Dear All,
>
>Sorry to bother you again.
>
>I have two txt files to handle. One is
>"short_sequences" and the other
>one is "long_sequences". The "short_sequences" holds
>100 short sequences (8 nucleotide long) and 100 long
>sequences (200 nucleotide long) in the
>"long_sequence".
>
>For example, the first short sequence is "TTGACATA"
>and the first long sequence is
>"GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
>GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
>CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
>GAACCTTGGACTAACCACTGTCTGGATA".
>
>Basically, I want to generate a random position as a
>starting site to replace a substring
>in the long sequence with a short sequence. In this
>example, we can choose a starting site
>as 5th nucleotide in the long sequence, after
>replacing using "TTGACATA", the replaced
>long sequence is
>"GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
>GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
>CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
>GAACCTTGGACTAACCACTGTCTGGATA".
>
>Then I want replace the 2nd long sequence with the 2nd
>short sequence and then repeat this over and over
>again until the last long sequence is reached and
>replaced. I think the only problem is that the
>starting site should not be larger than 193.
>Otherwise, there are
>not enough nucleotides in the long sequence for
>replacement.
>
>Furthurmore, I want to keep track the starting
>replacement site for each long sequence.
>
>
>I am copying my code in the below.
>******************************************
>use strict;
>use warnings;
>
>my (@short, @long, $offset); # the 'short' array will
>hold the short
> #sequences while 'long'
>array the long sequences
>
>open(FILE1, '<', "short_sequences.txt") || die "Can't
>open short_sequences.txt: $!\n";
>while(<FILE1>){
> chomp;
> push(@short, $_);
>}
>close FILE1; #Close the file
>
>open(FILE2, '<', "long_sequences.txt") || die "Can't
>open long_sequences.txt: $!\n";
>while(<FILE2>){
> chomp;
> push(@long, $_);
>}
>close FILE2; #Close the file
>
>
># replacement
>foreach my $short(@short){
> foreach my $long(@long){
> $offset = int(rand(length($long)%193));
> substr($long,$offset,length($short),$short);
> printf "%3d", $offset+1;
> print "\n", $long, "\n";
>
> }
>}
>********************************************
>
>But I just realized that there is a problem for the
>two
>loops. The problem is that each short sequence will be
>used to replace all long sequences not the
>corresponding one.
>
>So I seek your suggestions on how to handle two files
>simultaneously for my case.
>
>Thank you very much and look forward to your reply!
>
>Best Regards,
> Alex
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam? Yahoo! Mail has the best spam protection around
>http://mail.yahoo.com
>_______________________________________________
>Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>
>
More information about the BBB
mailing list