[Biophp-dev] export/write object

Nico Stuurman biophp-dev@bioinformatics.org
Wed, 14 May 2003 14:21:05 -0700


>
> Some combination of "write" (or "export") and "seq" seems appropriate 
> to me
> for this particular section.  I kind of like "export" just because
> it doesn't imply that the destination of the data going out is
> a file (or printout :-) ), but that's pure semantic niggling and
> doesn't really matter...
>


OK, IOexport (and IOimport) it is.


>> But first a strcuture for the IOwrite class. I would go for a 
>> constructor
>> that takes an argument specifying the type of output desired (string,
>> array, file, filehandle?, or simply always return a string?), and the
>> type of sequence file desired (fasta,swissprot, genbank, etc..).  
>> There
>> should be a IO->write->add($seq) function that calls seq_factory, 
>> which
>> should translate the items of object $seq in items that can be 
>> directly
>> incorporated in the output.  The actual 'write' methods could almost 
>> be
>> just a template where php's variable interpolation can do the work.
>
> Hmmm, how's this:
>
> 1) Add a "getAsArray()" method to the seq object, which returns an
> array containing all of the 'set' attributes and their values 
> (key=attribute
> ["sequence","id", etc.], value=value of that attribute).  This
> will also substitute as a "wrapper" for all of the other interface 
> methods
> at once (i.e. so the user doesn't have to do "getId(); getSequence();"
> (etc...) if they want all of the seq object's data.)
>

Isn't this functionality supposed to part of seq_factory()?  Maybe I 
still don't get the concepts behind this structure.


> 2)The IOwrite (or IOWriteSeq?) should include methods to set the 
> destination
> (as you describe above - string, array, file, handle...) and type. 
> (this way
> the user can use the same instance of the writer object to produce 
> multiple
> files if desired).
>

OK.

> 3)The IOwrite object can have a "stack" where the extracted attributes 
> get
> stored as "generic arrays" (this way someone can write a file converter
> [e.g. genbank to fasta, or clustal to phylip] without the extra 
> baggage of
> creating seq objects [which are only going to be read back out of and
> destroyed anyway in that case] - the 'fetchRawRecord()' method of the 
> Parse
> object is for this sort of thing).
>
> 4)if given (to an "add()" method) an "array" of attributes, IOWrite
> just shoves them on the stack. If passed a seq object , IOWrite calls 
> its
> "getAsArray()" method and shoves the results of that on the stack.  
> (The
> "stack" is necessary when export is to interleaved file formats).  We 
> MIGHT
> include a "write()" (or some similar name) method to allow bypassing
> the "stack" and writing immediately for non-interleaved formats 
> (returns false
> if called while set to an interleaved format).
>

How important are interleaved formats going to be?  They complicate 
matters quite a bit, and if we can do without....  I would all be for a 
'write' method.  Also, how is an interleaved format going to be 
'written'?  By calling the 'write' method?


> 5)Perhaps I should move the "translation" layer back out of seq_factory
> and into a separate class.  The "Translate" class wouldn't need to
> be instantiated, but it would make a variety of minor "correction" 
> functions
> available everywhere as, e.g. "Translate::toSeq()".  More an "ease of 
> re-use"
> issue than anything technical, though.  There's no reason I can't make
> "_convertTerms()" into a public method and have people call it from
> outside as "seq_factory::convertTerms();"
>
> If I DID make a separate "Translate" class to be used like this, it 
> might
> also include things like "Translate::NCBIDeflineExtract($field)" which
> one could use to get, e.g., just the accession number out of an NCBI 
> Defline.
>

I can't oversee the advantages/disadvantages completely here.


> It might also be worth the trouble to move a lot of the "common" 
> functions
> that are currently in the class files but not part of the classes (e.g.
> the "complement()" function in seq.inc.php) where they can be accessed
> by other object (or have the file be utilized by itself by other 
> projects).
> (I think doing that will also make the actual seq objects [and others] 
> take
> up less resources since there'll only be one copy of the "common" 
> methods
> rather than a copy in each instance of the classes).
>

Hmm.  Doesn't it make more sense to make the part of the seq objects?


> I'd strongly advocate getting interface methods implemented in the seq 
> object
> soon - as I read up on Object Oriented design I keep seeing it said 
> that that
> you're "supposed to" use them instead of setting variables directly 
> (even for
> public variables, it would seem), and I'm beginning to see why - when 
> you have
> people using an interface method to set variables, you can do things 
> like
> validity checking, error correction, and transparently handling 
> internal
> changes (e.g. changing variable names [e.g. to meet PEAR standards on 
> naming],
> "splitting" variables, moving variables into an array for easier 
> handling,
> etc.) without breaking other objects, etc.
>
> For example, right now everyone is expected to directly set 
> $seq->sequence and
> $seq->moltype directly, which means I can easily accidently
> $seq->sequence='ZXKUQYB'; $seq->moltype='DNA';
>
> whereas if people are able to use a "setSequence()" method, we can add
> auto-detection of the type whenever the sequence is set (and 
> "setMolType()"
> can check the existing sequence to see if it's valid for that type...)
>

Right, although it all adds overhead.

> I was thinking about editing my old sequence class to make it "seq 
> compatible"
> and dropping it in as "alt_seq.inc.php", where we can compare them
> side-by-side and merge the useful features of each.  Thoughts?
>

Good plan.



Nico