[Bioclusters] Call for comments in fasta files

Michael Cariaso bioclusters@bioinformatics.org
Tue, 03 Feb 2004 00:47:06 -0500

[This is offtopic, and I shouldn't be encouraging it]

Michael James wrote:
> I propose that the FASTA format be extended so that programs using it:
> 1) Strip and store as a comment anything on a line after a # sign.
> 2) Ignore lines with nothing [but whitespace] left after stripping.
> As far as blast is concerned
>  this would involve modifying formatdb
>  so it takes all such comments
>  and includes them in the existing  ".nal" file.
> No change needed to the main blastall binary
>  as this file already contains # comments.

There is a lot more code out there than just blast which would require 
adjustment. We've all got a few favorite annotations we'd like to see in 
every header. Personally I need species info. Why not pick a less 
destructive route.

Any code which writes a fasta file (called X) should feel free to put 
extra information into X.annot into the same directory. And any program 
that loads a fasta (named Y) should feel free to check for Y.annot in 
the same directory. May as well make it a valid XML file format while 
you're at it.

I seem to recall parsing a lot of fairly regularly annotated fasta files 
that were annotated with /tag='value' in the comment line. This was also 
a nice solution, albeit some sort of a %39 escaping for the "'" 
character would have helped.

And feel free to do whatever you like in-house. Its a dirty little 
secret that we've all kludged a few quick systems that simply would not 

> Your comments?

you asked ;)