[Bioclusters] blastall and SGE

Juan Carlos Perin bioclusters@bioinformatics.org
Wed, 29 Sep 2004 15:22:02 -0400

Sun Grid Engine doesn't seem to utilize all the empty resources that it 
should.  When I run btblastall on the command line on a search against 
NT ( which has been partitioned into 15 segments), only three machines 
actually get queued up for blastall jobs.  Also, during this process I 
do not see processor usage ever going above 24%.  I would hope, or 
expect that more, idle nodes, would receive blastall jobs with one of 
the 15 segments of the DB.  This is very disappointing considering a 
single G5 can search the NT database in under 3 minutes, while running 
on multiple nodes actually takes well over ten minutes.

I would also hope, but don't know how, to tweak or configure SGE to 
allow more efficient usage of idle resources.  On the same note, it 
seems that even running a regular blastall job from the command line on 
a single machine is also somehow restricted to a certain amount of CPU 
usage.  (usually no more than 60% CPU usage).  I'm wondering if there 
is a way to allow greater CPU usage overall.

The only work-around that seems to really work is running btblastall on 
the command line with a database that has been forced to segment into 
many more segments, rather than 15 (one for every node) into 30 or 32 
(one for every processor).  This, on the command line, seems to 
distribute jobs a little more efficiently, as well as utilizing more 
CPU power than any other run.

Any thoughts would be VERY helpful.