[Bioclusters] blastall and SGE
Juan Carlos Perin
bioclusters@bioinformatics.org
Wed, 29 Sep 2004 15:22:02 -0400
Sun Grid Engine doesn't seem to utilize all the empty resources that it
should. When I run btblastall on the command line on a search against
NT ( which has been partitioned into 15 segments), only three machines
actually get queued up for blastall jobs. Also, during this process I
do not see processor usage ever going above 24%. I would hope, or
expect that more, idle nodes, would receive blastall jobs with one of
the 15 segments of the DB. This is very disappointing considering a
single G5 can search the NT database in under 3 minutes, while running
on multiple nodes actually takes well over ten minutes.
I would also hope, but don't know how, to tweak or configure SGE to
allow more efficient usage of idle resources. On the same note, it
seems that even running a regular blastall job from the command line on
a single machine is also somehow restricted to a certain amount of CPU
usage. (usually no more than 60% CPU usage). I'm wondering if there
is a way to allow greater CPU usage overall.
The only work-around that seems to really work is running btblastall on
the command line with a database that has been forced to segment into
many more segments, rather than 15 (one for every node) into 30 or 32
(one for every processor). This, on the command line, seems to
distribute jobs a little more efficiently, as well as utilizing more
CPU power than any other run.
Any thoughts would be VERY helpful.
Thanks,
Juan