[Biococoa-dev] SequenceIO

Charles Parnot charles.parnot at gmail.com
Wed Jun 29 00:33:49 EDT 2005


On Jun 28, 2005, at 5:15 PM, Koen van der Drift wrote:

> Hi,
>
> Where is everyone? Enjoying a vacation, or hard at work, or passed  
> out from the heatwave :)

working on the Xgrid stuff for Tiger... Version 0.1 will be done this  
week, so at least I will be able to use all these computers again for  
my calculations!!

> I started thinking again about the IO classes. Right now,  
> BCSequenceReader returns a dictionary containing one or more  
> sequences as the values, and either a description, or title as the  
> keys. This will allow that files containing multiple sequences can  
> be read into the dictionary. Accessing the sequences is not so  
> straightforward. Basically now the user first needs get the key for  
> the sequence value from an array of keys, and then use that key to  
> obtain the sequence from the dictionary. This seems rather  
> cumbersome, I think.
>
> Therefore I propose that BCSequenceReader simply returns an array  
> of objects. We can either store BCSequence objects in the array or  
> create some kind of wrapper for each sequence, eg a new SequenceIO  
> class. Annotations and features are now handled in the BCSequence  
> class, so can be added in the IO code.

It seems natural that the BCSequenceReader should now return  
BCSequence objects. Yes, totally for it. The annotations will come  
with it.

One of the problem at this point is we have not fully decided on a  
strong clear BCAnnotation object. Alex has started something, but I  
don't think he was done yet?


> So for a simple fasta class we would have an array of sequences  
> with one annotation, with the key @">" and the value whatever  
> string follows the first line.  For a more complicated sequence- 
> format, eg SwissProt, basically all annotations are read in line by  
> line, using the file-specific keys (@"ID", @"AC", @"DT" etc). Then  
> when it hits the sequence, we can create a BCSequence object, and  
> at the end store the annotations in the BCSequence. I suggest the  
> keys should be whatever the fileformat uses, but or somecommon  
> annotations, like author, organism, we could supply some more human  
> readable accessor methods.

Getting equivalent for the keys can easily be added later, and will  
be needed anyway. And, yes, keeping the annotations specific for the  
format is fine for now. I guess, we should add a 'originalFileFormat'  
key or something to help with key identifications.


> cheers,
>
> - Koen.

I will be back soon with more stuff... At least, one of us is working  
on BioCocoa!

charles

--
Xgrid-at-Stanford
Help science move fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford

Charles Parnot
charles.parnot at gmail.com






More information about the Biococoa-dev mailing list