[Bioclusters] mpiblast w/ SGE problem?

Joe Landman landman at scalableinformatics.com
Fri Oct 19 11:52:12 EDT 2007


Hi Iddo:

   I have been busy and saw these go by without responding yet.  Here is 
a short response, though we can do a longer one later on.

   First we have a nice simple script that drives mpiBLAST through SGE 
if you like to use it off of our download site.  It is pretty simple, 
and handles pretty much everything for you.

   On to your issues.

Iddo Friedberg wrote:

> #!/bin/bash
> 
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
> 
> export MPI_DIR=/opt/mpich/gnu/
> # export BLASTDB=/share/bio/ncbi/db/
> #export BLASTDB=/home/thumper1/users/idoerg/databases/STRING
> #export BLASTMAT=/opt/Bio/ncbi/data
> export MPIBLAST_CONFIG=/home/thumper1/users/idoerg/mpiblast.conf
> export THOME=/home/thumper1/users/idoerg
> export P4_GLOBMEMSIZE=256000000
> export $TMP=/tmp
> 
> $MPI_DIR/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines
> /opt/Bio/mpiblast/bin/mpiblast -d protein.sequences.v7.0.fa \
>         -i $THOME/databases/STRING/top_c60_test_100.tfa -m 7 -p blastp -o
> $THOME/databases/STRING/top_c60_test_100.blast.xml

One of the things we do is usually have a

	echo "Machines file looks like this: "
	cat $TMPDIR/machines

when we run.  Also, if this is Rocks, and that is the Bioroll, I am not 
sure how they built mpiBLAST.  Our RPMs have been up and in use for 
years, without serious incident (some oddities in rebuilding on 
particular gcc/glibc combos).

[...]

> Yes.
> 
> qsub -pe mpich 12 mpiblast_sge.sh
> 
> qstat actually shows that 12 slots are allocated. But it only runs on one
> node!

for this job, what does

	qstat -f -r -ne

show?

> The other thing to look out for is the format of the machines file
>> that SGE creates -- it may or may not include the fully qualified
>> domain name and it may or may not be in the exact format that your
>> particular MPI installation expects. You can control the format of
>> the machines file by just looking at the source code for the script
>> that is being run as the pe_starter method within the configuration
>> of your parallel environment.

This is sort of what I am after with the above, it will tell us which 
machines have been allocated to the run.

Rocks has had a problem in the past with naming of the head node. 
Unfortunately, they set the name of the head node identical to that of 
the external interface, and this often causes some rather interesting 
name resolution issues (and makes changing the system difficult, the 
Rocks team usually suggests reloading rather than fixing).

SGE is very sensitive to name resolution issues, and so often times we 
will go back and fix the naming by hand in order to prevent these issues 
from arising.

> i checked that, and tried ssh-ing to those nodes as they appeared  in the
> machines file.. and the passwordless ssh worked.

Seeing the machines file would help.  Also, what is the allocation rule 
for that pe?

	qconf -sp mpich

Joe

> 
> ??
> 
> thanks,
> 
> Iddo
> 
> 
> 
> 
> 
> 
> 
> -Chris
>>
>>
>>
>> On Oct 18, 2007, at 4:08 PM, Iddo Friedberg wrote:
>>
>>> hi,
>>>
>>> I am trying to run mpiblast on a ROCKS cluster using SGE. mpiblast
>>> seems to
>>> be running well, but all slots are being run on a single node for some
>>> reason! Can anyone help? Full disclosure: newbie to mpi, mpiblast,
>>> and SGE.
>>>
>>> Here is the command line I use:
>>>
>>> % qsub -pe mpich 10 mpiblast_sge.sh
>>>
>>> And here is the  shell script mpiblast_sge.sh
>>> ----------------------------------------------------------------------
>>> #!/bin/bash
>>>
>>> #$ -cwd
>>> #$ -j y
>>> #$ -S /bin/bash
>>>
>>> export MPI_DIR=/opt/mpich/gnu/
>>> # export BLASTDB=/share/bio/ncbi/db/
>>> export BLASTDB=/home/thumper1/users/idoerg/databases/STRING
>>> export BLASTMAT=/opt/Bio/ncbi/data
>>> export THOME=/home/thumper1/users/idoerg
>>> export P4_GLOBMEMSIZE=256000000
>>>
>>> $MPI_DIR/bin/mpirun -np $NSLOTS -machinefile
>>> /home/thumper1/users/idoerg/tmp/machines /opt/Bio/mpiblast/bin/
>>> mpiblast -d
>>> protein.sequences.v7.0.fa \
>>>         -i $THOME/databases/STRING/top_c60_test_1000.tfa -m 7 -p
>>> blastp -o
>>> $THOME/databases/STRING/top_c60_test_1000.blast.xml
>>>
>>>
>>> ----------------------------------------------------------------------
>>> ---------------------------------------------------
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Iddo
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> I. Friedberg
>>>
>>> "The only problem with troubleshooting is that
>>> sometimes trouble shoots back."
>>> _______________________________________________
>>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>>> https://bioinformatics.org/mailman/listinfo/bioclusters
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>>
> 
> 
> 


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


More information about the Bioclusters mailing list