[Bioclusters] breaking up NCBI databases

Jeremy Mann bioclusters@bioinformatics.org
Thu, 1 May 2003 17:29:52 -0500 (CDT)


I am curious if any knows of any commercial or open source solution to
breaking up the NCBI dbs into various sizes. Here, our present solution is
NFS mounts of /ncbi to each cluster node. Today, we gave a go at
submitting numerous BLAST jobs with PBS. Boy, talk about a complete
performance drain (namely nfsd). I haven't seen so many switch lights non
stop in a LONG time ;)

I have been experimenting with mpiBLAST (using 4 test nodes). So far its
worked extremely well. I like the fact that its formatdb formats equal
segments for how many nodes you specify. Now this database is only useful
when using mpiBLAST. I want to try and use one version of the database for
all immplementations (we also use wwwblast and command line tools). And
since the BLAST dbs are precompiled, we have to use fastacmd to revert to
unformatted, THEN run the mpiblast formatdb. Now do this for 20 nodes ;(
If we choose this method we would have to update once a week instead of
the present nightly updates.

Now there has been talk lately on this list regarding installing an extra
harddrive in each node and use that as the database drive. After today I
am completely sold on doing this seeing how drives are very, very cheap.

I guess the ending question is, we would like to use one database (with
equal segments for 22 nodes) for parallel, www blast and command line
BLAST programs. Does such a thing exist or am I just wishing?

Oh, one more catch, we also use SAM, PFam and GCG which use our existing
NCBI dbs.




-- 
Jeremy Mann
jeremy@biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672