[Bioclusters] requesting design advice on grid-optimized genome
annotation system
Gary Van Domselaar
gary at primary.bioinformatics.org
Sun Oct 23 14:46:26 EDT 2005
Hey Gang,
I'm designing a system for automatic prokaryotic genome annotation. The
system will need to annotate (typically several thousand) coding regions,
in part by BLASTing multiple reference databases, like COGs, UNIPROT, ncbi
nr etc. Im wondering about the most efficient way to do this using my
Xserve cluster and mpi-blast. Im cool with prestaging the
mpi-blast-formatted databases onto the compute nodes, and my intuition
tells me it would be best to blast the set of coding regions against one
reference database at at time, ie blast all coding regions against COGs,
then again against UNIPROT, etc. That way the reference databases can
stay resident in RAM for the entire blast run against the genome coding
regions. Does this sound right? Will this actually happen? Would I call
the mpi-blast executable once on the entire list of coding regions, or
would multiple mpi-blast calls (one per coding region) achieve the same
thing (keeping the database resident in RAM)? Any advice on how to
implement this system for optimal
mpi-blasting would be sincerely appreciated.
May the force be with you,
g.
--
Gary Van Domselaar, PhD
Associate Director, Bioinformatics.Org
gary at bioinformatics.org
More information about the Bioclusters
mailing list