[Bioclusters] Version of Blast that run on a cluster?
Joe Landman
landman at scalableinformatics.com
Thu Jan 6 15:06:17 EST 2005
Dan Roberts wrote:
>Yes our Beowoulf cluster is already built and in place. It is an
>Intel/Xenon flavor and we are currently using the Tourque/Maui
>batch/queuing mechanism.
>We currently have several modeling codes running on this cluster. I
>imagine that for now we would have to serve the blastable files from NFS
>storage via a GigaBit LAN attachement.
>
>Items that I am wondering about may include:
>1>What is the best version of Blast to run either on one single compute
>node or many compute nodes at once?
>
NCBI BLAST will run on one node at a time, in 32 or 64 bit mode if the
machine/OS is capable of 64 bit (Xeon's are not in general, though the
EM64T is a clone of the AMD64, and can largely run the same code).
However, due to the memory architecture on the EM64T (everything running
through a northbridge), you will not get the full benefits of the AMD64
architecture, especially on memory bandwidth bound code.
The 64 bit code is faster on Opterons by 10-30% as compared to
"identically compiled" 32 bit code. Not sure on EM64T. For older
Xeons, you can only run 32 bit code, so the point is somewhat moot.
mpiblast is available in 32 and 64 bit flavors and allows you to run
across multiple nodes on your cluster. For what you are indicating your
machine to be, I would suggest the 32 bit blast binaries
(http://download.scalableinformatics.com/downloads/ncbi) for single
machine work, and the 32 bit binaries of mpiblast for cross cluster
blasting (http://download.scalableinformatics.com/downloads/mpiblast).
>2>Anyone have any direct experience in have a single Beowoulf clusters
>with both 64 and 32 bit compute nodes?
yes. Add in some HPUX and it describes a cluster we built 10 months ago ...
> What might be nice to have is
>several different versions of BLAST locally installed and 32 and 64 bit
>queues defined for each version.
>
Well, it might be nice ... do you want your end users (unless you are
the end user) to know about the 32 bit vs 64 bit codes? Most end users
I have met don't care, their main concern is performance and accuracy,
not to mention ease of running. You can have multiple queues set up,
and dispatch the runs to the queues as needed, but there is (IMO)
significant value in hiding this from most of the end users (who could
not care less). The ones who really want at it can have at it, but for
most people, they will want to sit in front of an interface that
abstracts the cluster/grid for them.
You as an adminstrator do care about these things. In which case, it is
not so hard to set up queues for this. I would be a bit concerned about
Torque though, as it has some issues in a high throughput mode (or at
least it and its predecessors have had issues in the recent past with
large numbers of jobs sitting in queue).
>Thanks!
>Dan
>
>PS..Bit offtopic..anyone use the AVAKI datagrid to successfully
>synchronize the bucketfulls of BLAST datafiles that always seem to
>accumulate!
>_______________________________________________
>Bioclusters maillist - Bioclusters at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>
More information about the Bioclusters
mailing list