[Bioclusters] Nightly updated BLAST databases

Joseph Landman bioclusters@bioinformatics.org
16 Dec 2002 21:02:32 -0500

On Mon, 2002-12-16 at 20:40, Elia Stupka wrote:
> > Generally this is not so hard.  You can even incorporate the update
> > into a queuing system, as long as you use an O(1) data distribution
> > system, such as the old ccp I had architected, or some newer stuff.  
> > Use a priority based mechanism to schedule the update to occur between
> > computing runs.  This requires some tuning/tweaking of the queuing
> > system, but it is generally not that hard to do.
> That was quite enlightening to a non system's person like me. One thing I
> would still say though is that I personally wouldn't like to implement an
> automatic update mainly because one would like to be able to reproduce a
> whole bioinformatics protocol (especially if it is for publication) and in
> order to do that one should know what version of a database his process
> was running against.

I would think that the repeatability of the computational pipeline is an
important factor, so the automatic update scheme might not fit into this
model.  Either that, or some sort of tagging of the database that was
used at a metadata level (I had been using things like date, time, and
MD5 sum in a small XML structure to identify the DB).  I think this is
the best of all worlds, though it requires lots of disk space on a
server somewhere, and clever data distribution mechanisms.  But it
allows you to version your pipeline protocols.  This could be quite
interesting from a data quality perspective.

> Nonetheless very very interesting, thanks! We might tweak it to the
> fullest, by automating as you suggest and then storing the information in
> our pipeline, to be able to track it...

I am working on a better distribution mechanism.  I'll let you know when
it is nearly ready for prime time.  It should fit in with this scheme
(the scheduler priority bubble).


Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman@scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615