[BiO BB] how to work on two txt files simultaneously by handle corresponding lines from each file

Wed Jul 20 11:19:29 EDT 2005

Hello Alex,

I think that what you want is to modify long1 with short1, long2 with 
short2 and so on.

I recommed you to replace your 2 loops with this one.

for ($seq=0;$seq<scalar @long;$seq++){
    $short=$short[$seq];
    $long=$long[$seq];
    $offset = int(rand(length($long)%193));
     substr($long,$offset,length($short),$short);
     printf "%3d", $offset+1;
     print "\n", $long, "\n";   
    }

Good Luck!
Txema

Alex Zhang wrote:

>Dear All,
>
>Sorry to bother you again.
>
>I have two txt files to handle. One is
>"short_sequences" and the other
>one is "long_sequences". The "short_sequences" holds
>100 short sequences (8 nucleotide long) and 100 long
>sequences (200 nucleotide long) in the
>"long_sequence".
>
>For example, the first short sequence is "TTGACATA"
>and the first long sequence is
>"GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
>GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
>CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
>GAACCTTGGACTAACCACTGTCTGGATA".
>
>Basically, I want to generate a random position as a
>starting site to replace a substring
>in the long sequence with a short sequence. In this
>example, we can choose a starting site
>as 5th nucleotide in the long sequence, after
>replacing using "TTGACATA", the replaced
>long sequence is
>"GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
>GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
>CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
>GAACCTTGGACTAACCACTGTCTGGATA".
>
>Then I want replace the 2nd long sequence with the 2nd
>short sequence and then repeat this over and over
>again until the last long sequence is reached and
>replaced. I think the only problem is that the
>starting site should not be larger than 193.
>Otherwise, there are
>not enough nucleotides in the long sequence for
>replacement.
>
>Furthurmore, I want to keep track the starting
>replacement site for each long sequence.
>
>
>I am copying my code in the below. 
>******************************************
>use strict;
>use warnings;
>
>my (@short, @long, $offset); # the 'short' array will
>hold the short
>                            #sequences while 'long'
>array the long sequences
>
>open(FILE1, '<', "short_sequences.txt") || die "Can't
>open short_sequences.txt: $!\n";
>while(<FILE1>){
>   chomp;
>   push(@short, $_);
>}
>close FILE1; #Close the file
>
>open(FILE2, '<', "long_sequences.txt")  || die "Can't
>open long_sequences.txt: $!\n";
>while(<FILE2>){
>   chomp;
>   push(@long, $_);
>}
>close FILE2; #Close the file
>
>
># replacement
>foreach my $short(@short){
>   foreach my $long(@long){
>       $offset = int(rand(length($long)%193));
>       substr($long,$offset,length($short),$short);
>       printf "%3d", $offset+1;
>       print "\n", $long, "\n";
>
>   }
>}
>********************************************
>
>But I just realized that there is a problem for the
>two
>loops. The problem is that each short sequence will be
>used to replace all long sequences not the
>corresponding one. 
>
>So I seek your suggestions on how to handle two files
>simultaneously for my case. 
>
>Thank you very much and look forward to your reply!
>
>Best Regards,
>    Alex
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>_______________________________________________
>Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>  
>