[Bioclusters] Version of Blast that run on a cluster?

Joe Landman landman at scalableinformatics.com
Thu Jan 6 15:06:17 EST 2005

Dan Roberts wrote:
>Yes our Beowoulf cluster is already built and in place.  It is an
>Intel/Xenon flavor and we are currently using the Tourque/Maui
>batch/queuing mechanism.
>We currently have several modeling codes running on this cluster.  I
>imagine that for now we would have to serve the blastable files from NFS
>storage via a GigaBit LAN attachement.
>Items that I am wondering about may include:
>1>What is the best version of Blast to run either on one single compute
>node or many compute nodes at once?

NCBI BLAST will run on one node at a time, in 32 or 64 bit mode if the 
machine/OS is capable of 64 bit (Xeon's are not in general, though the 
EM64T is a clone of the AMD64, and can largely run the same code).  
However, due to the memory architecture on the EM64T (everything running 
through a northbridge), you will not get the full benefits of the AMD64 
architecture, especially on memory bandwidth bound code.

The 64 bit code is faster on Opterons by 10-30% as compared to 
"identically compiled" 32 bit code.  Not sure on EM64T.  For older 
Xeons, you can only run 32 bit code, so the point is somewhat moot.

mpiblast is available in 32 and 64 bit flavors and allows you to run 
across multiple nodes on your cluster.  For what you are indicating your 
machine to be, I would suggest the 32 bit blast binaries 
(http://download.scalableinformatics.com/downloads/ncbi) for single 
machine work, and the 32 bit binaries of mpiblast for cross cluster 
blasting (http://download.scalableinformatics.com/downloads/mpiblast).

>2>Anyone have any direct experience in have a single Beowoulf clusters
>with both 64 and 32 bit compute nodes?

yes.  Add in some HPUX and it describes a cluster we built 10 months ago ...

>  What might be nice to have is
>several different versions of BLAST locally installed and 32 and 64 bit
>queues defined for each version.

Well, it might be nice ...  do you want your end users (unless you are 
the end user) to know about the 32 bit vs 64 bit codes?  Most end users 
I have met don't care, their main concern is performance and accuracy, 
not to mention ease of running.  You can have multiple queues set up, 
and dispatch the runs to the queues as needed, but there is (IMO) 
significant value in hiding this from most of the end users (who could 
not care less).  The ones who really want at it can have at it, but for 
most people, they will want to sit in front of an interface that 
abstracts the cluster/grid for them.

You as an adminstrator do care about these things.  In which case, it is 
not so hard to set up queues for this.  I would be a bit concerned about 
Torque though, as it has some issues in a high throughput mode (or at 
least it and its predecessors have had issues in the recent past with 
large numbers of jobs sitting in queue).

>PS..Bit offtopic..anyone use the AVAKI datagrid to successfully
>synchronize the bucketfulls of BLAST datafiles that always seem to
>Bioclusters maillist  -  Bioclusters at bioinformatics.org

More information about the Bioclusters mailing list