[Biococoa-dev] Annotation

John Timmer jtimmer at bellatlantic.net
Mon Feb 21 11:01:45 EST 2005


> 
> Second, while searching the web a bit, I came along the BSML XML format which
> seems to become a kind of standard for new sequence formats. It would perhaps
> be nice (and wise) to have a look at the documents they made because they
> (obviously) studied the annotation/feature issue very well.
> You can find more info at: http://www.bsml.org/
> Now, just to make sure, bsml is a file format and one we could implement of
> course, internally the dictionary approach is for us the way to go, but it
> might be an idea to adhere to there nomenclature and/or tree/hierarchy. I came
> already across some nice ideas to keep in mind:
> 
>>  As research proceeds on a given biological molecule, certain segments of the
>> sequence become interesting for a variety of reasons. Sequence annotation is
>> used to capture this extra information about the sequence data. Positional
>> annotation refers to annotations that are specific to a portion of a
>> sequence. In BSML, positional annotation is captured through Feature tags.
>> Feature tags are child tags of a sequence tag, and therefore a Feature is
>> related to a single sequence. For example, the following tag indicates that
>> the region between 1513 and 1962 encodes a particular gene:
>>  
>>  <Feature id="FTR4" title="Leucine TNRA" class="GENE">
>>  <Qualifier value-type="gene"/>
>>  <Interval-loc startpos="1513 endpos="1962"
>>  complement="0"/>
>>  </Feature> 
>  
> So a feature is defined as a "positional annotation" which is a nice
> definition that I had in mind as well. Of course features give the extra
> problem that they have to be kept in sync during editing. Therefore it's
> perhaps better to internally have a dictionary of annotations and a dictionary
> of features.   
 

This is nice, and we should try for compatibility, but a bit difficult to
work as a dictionary.  The nice part is that it has a uniqueID, name, and
class.  The bad part is that they¹re all part of the same compound field, so
they don¹t work nicely as the dictionary key.

A related issue is that it would be really nice to be able to get
annotations for every exon or every ORF without having to enumerate through
the keys of the whole dictionary and check a field in each.  There¹s two
ways I can think of doing this ­ within the annotation wrapper, keep arrays
for each feature type and put things into the appropriate one as they¹re
added.  The alternative would be to make sure we write the appropriate code
to do the enumeration.  Personally, for performance reasons, I¹d favor the
first.

JT



_______________________________________________
This mind intentionally left blank

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050221/eb9de28e/attachment.html>


More information about the Biococoa-dev mailing list