Hi Iddo: I have been busy and saw these go by without responding yet. Here is a short response, though we can do a longer one later on. First we have a nice simple script that drives mpiBLAST through SGE if you like to use it off of our download site. It is pretty simple, and handles pretty much everything for you. On to your issues. Iddo Friedberg wrote: > #!/bin/bash > > #$ -cwd > #$ -j y > #$ -S /bin/bash > > export MPI_DIR=/opt/mpich/gnu/ > # export BLASTDB=/share/bio/ncbi/db/ > #export BLASTDB=/home/thumper1/users/idoerg/databases/STRING > #export BLASTMAT=/opt/Bio/ncbi/data > export MPIBLAST_CONFIG=/home/thumper1/users/idoerg/mpiblast.conf > export THOME=/home/thumper1/users/idoerg > export P4_GLOBMEMSIZE=256000000 > export $TMP=/tmp > > $MPI_DIR/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines > /opt/Bio/mpiblast/bin/mpiblast -d protein.sequences.v7.0.fa \ > -i $THOME/databases/STRING/top_c60_test_100.tfa -m 7 -p blastp -o > $THOME/databases/STRING/top_c60_test_100.blast.xml One of the things we do is usually have a echo "Machines file looks like this: " cat $TMPDIR/machines when we run. Also, if this is Rocks, and that is the Bioroll, I am not sure how they built mpiBLAST. Our RPMs have been up and in use for years, without serious incident (some oddities in rebuilding on particular gcc/glibc combos). [...] > Yes. > > qsub -pe mpich 12 mpiblast_sge.sh > > qstat actually shows that 12 slots are allocated. But it only runs on one > node! for this job, what does qstat -f -r -ne show? > The other thing to look out for is the format of the machines file >> that SGE creates -- it may or may not include the fully qualified >> domain name and it may or may not be in the exact format that your >> particular MPI installation expects. You can control the format of >> the machines file by just looking at the source code for the script >> that is being run as the pe_starter method within the configuration >> of your parallel environment. This is sort of what I am after with the above, it will tell us which machines have been allocated to the run. Rocks has had a problem in the past with naming of the head node. Unfortunately, they set the name of the head node identical to that of the external interface, and this often causes some rather interesting name resolution issues (and makes changing the system difficult, the Rocks team usually suggests reloading rather than fixing). SGE is very sensitive to name resolution issues, and so often times we will go back and fix the naming by hand in order to prevent these issues from arising. > i checked that, and tried ssh-ing to those nodes as they appeared in the > machines file.. and the passwordless ssh worked. Seeing the machines file would help. Also, what is the allocation rule for that pe? qconf -sp mpich Joe > > ?? > > thanks, > > Iddo > > > > > > > > -Chris >> >> >> >> On Oct 18, 2007, at 4:08 PM, Iddo Friedberg wrote: >> >>> hi, >>> >>> I am trying to run mpiblast on a ROCKS cluster using SGE. mpiblast >>> seems to >>> be running well, but all slots are being run on a single node for some >>> reason! Can anyone help? Full disclosure: newbie to mpi, mpiblast, >>> and SGE. >>> >>> Here is the command line I use: >>> >>> % qsub -pe mpich 10 mpiblast_sge.sh >>> >>> And here is the shell script mpiblast_sge.sh >>> ---------------------------------------------------------------------- >>> #!/bin/bash >>> >>> #$ -cwd >>> #$ -j y >>> #$ -S /bin/bash >>> >>> export MPI_DIR=/opt/mpich/gnu/ >>> # export BLASTDB=/share/bio/ncbi/db/ >>> export BLASTDB=/home/thumper1/users/idoerg/databases/STRING >>> export BLASTMAT=/opt/Bio/ncbi/data >>> export THOME=/home/thumper1/users/idoerg >>> export P4_GLOBMEMSIZE=256000000 >>> >>> $MPI_DIR/bin/mpirun -np $NSLOTS -machinefile >>> /home/thumper1/users/idoerg/tmp/machines /opt/Bio/mpiblast/bin/ >>> mpiblast -d >>> protein.sequences.v7.0.fa \ >>> -i $THOME/databases/STRING/top_c60_test_1000.tfa -m 7 -p >>> blastp -o >>> $THOME/databases/STRING/top_c60_test_1000.blast.xml >>> >>> >>> ---------------------------------------------------------------------- >>> --------------------------------------------------- >>> >>> >>> >>> Thanks, >>> >>> Iddo >>> >>> >>> >>> >>> >>> -- >>> >>> I. Friedberg >>> >>> "The only problem with troubleshooting is that >>> sometimes trouble shoots back." >>> _______________________________________________ >>> Bioclusters maillist - Bioclusters at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bioclusters >> _______________________________________________ >> Bioclusters maillist - Bioclusters at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bioclusters >> > > > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615