[BiO BB] FW/Re: Fasta convertion in large EST assemblies

Bossers, A. A.Bossers at id.dlo.nl
Wed Apr 2 01:31:52 EST 2003

Dear all,
thanks for the quick replies and help with the fasta conversion problem. I
already started fiddling around in perl to convert the fasta files into
files acceptable to tgicl for EST assembly. But Eitan provided the most
simpel solution in his one line perl 'script' that exactly did what I
needed. BIG THANKS. The script just gets rid of all stuff after the filename
(as long as no spaces are in the filename) and preserves all sequence or
quality info behind it. His solution is below.
I still don't get why tgicl does't accept files in allowed fastA format. But
I don't bother anymore. My EST assembly is one step further.
Thanks again to all people sending me perl solutions!
-----Oorspronkelijk bericht-----
Van: Eitan Rubin [mailto:Eitan.Rubin at weizmann.ac.il] 
Verzonden: dinsdag 1 april 2003 20:28
Aan: A.Bossers at ID.DLO.NL
Onderwerp: Fasta convertion

  If I am not mistaken, you question is "how do I convert format A below to
format B". If this indeed what you need, the following should do the trick:
perl -pe 's/^>(\S+).*/>$1/;' old_format_file > new_format_file
Format A:
>seqname1 some text with spaces
>seqname2 some other text etc
Format B:
Eitan Rubin, PhD
Head of Bioinformatics and Biological Computing
Dept. Biological Services
Weizmann Institute of Science
Tel: +972-8-9343456
Fax: +972-8-9346006
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030402/3516bd34/attachment.html>

More information about the BBB mailing list