[BiO BB] Poly A tail length - script help please

Cook, Malcolm MEC at Stowers-Institute.org
Wed Sep 10 16:12:16 EDT 2003


But that does not compute the 'longest stretch'.

The attached perl script does, and will allow you to write:

> polyfind [-all] *.seq > polyfind.results

Enjoy,

Malcolm Cook

> -----Original Message-----
> From: Joseph Landman [mailto:landman at scalableinformatics.com]
> Sent: Tuesday, September 09, 2003 6:58 PM
> To: BiO BB
> Cc: biodevelopers
> Subject: Re: [BiO BB] Poly A tail length - script help please
> 
> 
> First one is free ... 
> 
>         #!/usr/bin/perl
>         
>         use strict;
>         
>         my ($directory,$directory_handle,$file, at files,$sequence);
>         my ($file_handle,$poly_a_tail,$rseq);
>         
>         $directory = "./";	# directory to open
>         if (!(opendir $directory_handle,$directory))
>            {
>              die "FATAL ERROR: Unable to open directory = 
> ".$directory."\n";
>            }
>            
>         # select only the .seq files
>         @files = grep { /\.seq$/ } readdir($directory_handle); 
>         
>         # loop over these selected files
>         foreach $file (@files)
>           {    
>             # try to open the file
>             if (!(open($file_handle,"< ".$file)))
>                {
>                  # if we cannot open it, warn the user, and 
> skip to the next file
>                  warn "Warning: unable to open file = 
> ".$file."\. Skipping\.\n";
>         	 next;
>                }
>               else
>                {
>                  # assume one line per file, or we will have 
> to modify this
>         	 chomp($sequence=<$file_handle>);
>         	 # now time to bring out the heavy artillery
>         	 $rseq=reverse $sequence;	# poly-a is now 
> at the head
>         	 $rseq =~ /^([AN]+)\w+$/;	# match A's 
> and/or N's at the front
>         	 $poly_a_tail = $1;		# return the match ...
>         	 printf "%i %s\n",length($poly_a_tail),$file;	
> # tell the world ...
>         	 close($file_handle);
>                }
>           }
> 
> 
> 
> On Tue, 2003-09-09 at 17:00, Tristan Fiedler wrote:
> > Thanks for the scripting tips!  I have a 'counting' issue 
> which I need to
> > quickly resolve.  A typical sequence input file (5 - 700 
> bases) looks like
> > :
> > 
> > AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA
> > 
> > I have over 500 files, named *.seq.  I would like to create 
> a script which :
> > 
> > a.  runs through all the files,
> > b.  counts the length of the 'poly A' tail (defined as the 
> longest stretch
> > of A or N)
> > c. sends the output to a file, eg.
> > 
> > 25 1.seq
> > 87 2.seq
> > 13 3.seq
> > 
> > Example valid poly A tails :
> > 
> > AAAANANANANAAANNAAAAAA
> > 
> > AAAAAAAAAAAAAA
> > 
> > NNNNNNNNNNNNN
> > 
> > AAANNNNNNNNNNNAAAAAAAAA
> > 
> > Thank you so much for your expertise!
> > 
> > Tristan
> -- 
> Joseph Landman, Ph.D
> Scalable Informatics LLC
> email: landman at scalableinformatics.com
>   web: http://scalableinformatics.com
> phone: +1 734 612 4615
> 
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: polyafind
Type: application/octet-stream
Size: 3438 bytes
Desc: polyafind
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030910/7473c37a/attachment.obj>


More information about the BBB mailing list