[Bioclusters] Call for comments in fasta files
Michael Cariaso
bioclusters@bioinformatics.org
Tue, 03 Feb 2004 00:47:06 -0500
[This is offtopic, and I shouldn't be encouraging it]
Michael James wrote:
> I propose that the FASTA format be extended so that programs using it:
> 1) Strip and store as a comment anything on a line after a # sign.
> 2) Ignore lines with nothing [but whitespace] left after stripping.
>
> As far as blast is concerned
> this would involve modifying formatdb
> so it takes all such comments
> and includes them in the existing ".nal" file.
> No change needed to the main blastall binary
> as this file already contains # comments.
There is a lot more code out there than just blast which would require
adjustment. We've all got a few favorite annotations we'd like to see in
every header. Personally I need species info. Why not pick a less
destructive route.
Any code which writes a fasta file (called X) should feel free to put
extra information into X.annot into the same directory. And any program
that loads a fasta (named Y) should feel free to check for Y.annot in
the same directory. May as well make it a valid XML file format while
you're at it.
I seem to recall parsing a lot of fairly regularly annotated fasta files
that were annotated with /tag='value' in the comment line. This was also
a nice solution, albeit some sort of a %39 escaping for the "'"
character would have helped.
And feel free to do whatever you like in-house. Its a dirty little
secret that we've all kludged a few quick systems that simply would not
die.
> Your comments?
you asked ;)