[Bioclusters] Parallel blast SGE

Juan Perin bioclusters@bioinformatics.org
Wed, 15 Sep 2004 11:15:22 -0400


I am running iNquiry which utilizes Sun Grid Engine for parallel job
processing.  BTBLASTALL is the wrapper program which takes a given blast job
from the command line and creates multiple 'blastall' commands to be run on
individual nodes.  BTBLASTALL determines the number of nodes to use and the
number of blastall threads to run depending on the number of databases that
have been created as a result of the parsing algorithm supplied by Bioteam.

My current run is searching a 1.5 gb FASTA database that was formatted into
14 chunks.  BTBLASTALL then created 14 blastall calls which, for some
reason, was sent each to a single CPU on each of the dual processor Xserve
G5's.  Therefore, instead of running a single blastall job on a chunk of a
database on the dual processors devoting two CPU's to a single search and
fully utilizing our 16 nodes (32 processors), We are running on 14 CPU's on
7 machines.  

I assumed that adding a '-a' option with perhaps 2 would create more threads
and force SGE to spread the jobs across more nodes, but this is already done
automatically in BTBLASTALL.  I then tried -a 4 which seemed to have created
more threads, and is now running on 8 machines instead of 7.

My question is::: does anyone know if it would somehow be advantageous to
add a higher number to the -a option, or if there is some way to configure
SGE to run on more machines in this instance?  This is the only job running
on the cluster right now.  So I would like to run it 'all out'.  The search
is running the 1.5gb database against itself to look for duplications.

THANKS

Juan Perin