[Biophp-dev] My brain hurts.

biophp-dev@bioinformatics.org biophp-dev@bioinformatics.org
Thu, 01 May 2003 20:28:17 PST


> Rather than "clobber" the existing code, I'm attaching my "replacement"
> files here for review.  They are:

Actually, please commit them to the cvs.  That will make it much easier
for me to work with them.  It is always possible to revert to older
versions.  Once they are in cvs, I can also make changes and commit them
without getting outof sync with what you are doing.  


> seq_factory.inc.php - the new seq_factory object (feed it
> information, call 
> the "createSeq" method, and you get back a seq object).

Just from looking at the code:  It looks as if seq_factory does not know
what parser type it is dealing with.  I thought that every parser could
return their own datastructure and that the 'translation' only takes
place in seq_factory.  Now, it looks as if every parser should return an
'id', 'sequence', and 'seqlength'.  If seq_factory knows what is coming,
we could even use the current genbank parser (just let it return a
seqobject, seqfactory will pass it through).  Should be easy to add.

And: bravo, you are using 4 spaces instead of tabs!  


> 1)The parsers must be classes
> 2)Memory-based parsers MUST accept an array of lines as a data source.
> 3)Parsers SHOULD also accept raw text, filehandles, or filenames.
> (only relevant when autodetection is being bypassed).

Is the code for dealing with this already there?  I did not notice
anything about streams.  It would not be pretty to have to write
different parsers for arrays of lines and streams.

> 4)Parsers MUST have a "fetchNext()" method, which returns the next
> parsed record (starting with the first one, obviously) as an array, made
> up of whatever key=>value pairs are available in the format.  The
> keys MUST
> be named after the attributes in the seq object (e.g. "id"),and SHOULD
> begin "id" and "sequence".  This method MUST return false if there are no
> more records.

I think you don't have to require a certain naming scheme.  Seq_factory() 
could do the translation as long as it knows what is coming and how to
translate it.

OK, the parser fetches a record and does a (usually inherent)
parserObj->moveNext(), so I guess fetchNext() is the right name.

B.t.w. do we go for fetchNext() or fetch_Next()?  Although I used the
latter, I'd actually prefer the first one.