[Biococoa-dev] SequenceIO

Koen van der Drift kvddrift at earthlink.net
Wed Jun 29 18:15:47 EDT 2005


On Jun 29, 2005, at 1:11 PM, Charles Parnot wrote:

>> Right now the BCAnnotation object mimics a dictionary. And one could 
>> argue whether we should not just use a wrapper for a dictionary. But 
>> that's not the most important issue.
>
> I think that encapsulating the BCAnnotation is a good thing. At this 
> point, it is a dictionary, but it could change in the future and keep 
> the same interface/header, so that the framework users don't have to 
> change their code.

Good point, I will leave it as it is.

>
>
>> Basically, they are stored in BCSequence as another dictionary, using 
>> the key of the annotation as the key and a BCAnnotation object as 
>> value. Maybe it is easier this way when looking up annotations, but 
>> it seems overcomplicated to me. If we keep a annotations wrapper 
>> object, why not store them in an NSArray?
>
> It is not really complicated. It is redundant, yes. But maybe 
> redundancy in this case is also convenient, as we can quickly access 
> the list of annotation names without looping through the NSArray. 
> Though there would convenience methods to do that even with NSArray, 
> e.g. KVO and 'valueForKeyPath'.

Complicated was indeed a wrong word choice. Again, I will leave this as 
it is.

Changing the code was not that difficult and I will commit the files 
soon, so everyone can see what is going on. That being said, I am 
running in the following problem. Some file formats have many lines 
with annotations, eg the test2.txt file in the Translation example. As 
you can see some lines have the same identifier (DT, OC, etc). If I use 
that as the key, the final dictionary wil only contain the last line, 
because it will override existing keys. I can think of a few solutions. 
First is what I do now, is to append the values to the existing one, 
leaving only one line with each identifier. This works fine, but could 
give problems if we want to write the files out, because we don't know 
where the different lines begin and end. We could of course put some 
kind of marker inbetween the strings, so whe know where each next one 
begins. Another solution could be to assign numbers to identifiers with 
multiple lines, ID1, ID2, ID3, etc. Problem here is that this will give 
preblems when searching for a specific key.  My preference would be now 
the first solution, but if anyone has a better suggestion, please 
shout.

Another issue are nested annotations. Again see the test2.txt file and 
look for RN (for reference). It is followed by a set of identifiers for 
the references, and then is followed by another reference. I guess I 
could put the subannotations in a new dictionary, and put those in the 
content of the RN annotation. A similar issue can be found in ncbi 
files (see test4.txt)

>
> In any case, one of the thing we agreed on at the WWDC (and you would 
> not know, sorry :-),

Still waiting for minutes and the presentation ;-)


> is that there probably won't be a performance issue with annotations, 
> so the way we do it does not really matter so much. NSArray, 
> NSDictionary: tomato, tomato. So the bottom line is: I will do 
> whatever the majority decides on this one.


It should be not so difficult to change, so for now let's stick with 
what we have.

cheers,

- Koen.




More information about the Biococoa-dev mailing list