[BiO BB] Poly A tail length - script help please
Cook, Malcolm
MEC at Stowers-Institute.org
Wed Sep 10 16:12:16 EDT 2003
But that does not compute the 'longest stretch'.
The attached perl script does, and will allow you to write:
> polyfind [-all] *.seq > polyfind.results
Enjoy,
Malcolm Cook
> -----Original Message-----
> From: Joseph Landman [mailto:landman at scalableinformatics.com]
> Sent: Tuesday, September 09, 2003 6:58 PM
> To: BiO BB
> Cc: biodevelopers
> Subject: Re: [BiO BB] Poly A tail length - script help please
>
>
> First one is free ...
>
> #!/usr/bin/perl
>
> use strict;
>
> my ($directory,$directory_handle,$file, at files,$sequence);
> my ($file_handle,$poly_a_tail,$rseq);
>
> $directory = "./"; # directory to open
> if (!(opendir $directory_handle,$directory))
> {
> die "FATAL ERROR: Unable to open directory =
> ".$directory."\n";
> }
>
> # select only the .seq files
> @files = grep { /\.seq$/ } readdir($directory_handle);
>
> # loop over these selected files
> foreach $file (@files)
> {
> # try to open the file
> if (!(open($file_handle,"< ".$file)))
> {
> # if we cannot open it, warn the user, and
> skip to the next file
> warn "Warning: unable to open file =
> ".$file."\. Skipping\.\n";
> next;
> }
> else
> {
> # assume one line per file, or we will have
> to modify this
> chomp($sequence=<$file_handle>);
> # now time to bring out the heavy artillery
> $rseq=reverse $sequence; # poly-a is now
> at the head
> $rseq =~ /^([AN]+)\w+$/; # match A's
> and/or N's at the front
> $poly_a_tail = $1; # return the match ...
> printf "%i %s\n",length($poly_a_tail),$file;
> # tell the world ...
> close($file_handle);
> }
> }
>
>
>
> On Tue, 2003-09-09 at 17:00, Tristan Fiedler wrote:
> > Thanks for the scripting tips! I have a 'counting' issue
> which I need to
> > quickly resolve. A typical sequence input file (5 - 700
> bases) looks like
> > :
> >
> > AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA
> >
> > I have over 500 files, named *.seq. I would like to create
> a script which :
> >
> > a. runs through all the files,
> > b. counts the length of the 'poly A' tail (defined as the
> longest stretch
> > of A or N)
> > c. sends the output to a file, eg.
> >
> > 25 1.seq
> > 87 2.seq
> > 13 3.seq
> >
> > Example valid poly A tails :
> >
> > AAAANANANANAAANNAAAAAA
> >
> > AAAAAAAAAAAAAA
> >
> > NNNNNNNNNNNNN
> >
> > AAANNNNNNNNNNNAAAAAAAAA
> >
> > Thank you so much for your expertise!
> >
> > Tristan
> --
> Joseph Landman, Ph.D
> Scalable Informatics LLC
> email: landman at scalableinformatics.com
> web: http://scalableinformatics.com
> phone: +1 734 612 4615
>
>
> _______________________________________________
> BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: polyafind
Type: application/octet-stream
Size: 3438 bytes
Desc: polyafind
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030910/7473c37a/attachment.obj>
More information about the BBB
mailing list