[Bioclusters] qsub/mpirun problems

Zhiliang Hu zhu at iastate.edu
Tue Sep 9 14:18:28 EDT 2008


Dug further on my qsub/mpirun problems, now I came to an interesting situation:

(1)
I used to have following qsub/mpirun that worked for half a year (I reported on its initial success on this list last December):
--------------------------------------
qsub -l nodes=6:ppn=2
     -e /path/to/locationA
     -o /path/to/locationA
     /path/to/program

  where "program" is:

  /path/to/bin/mpirun
    /path/to/mpiblast
      -p blastn
      -d seq.db
      -i /path/to/input.seq 
      -o /path/to/output.txt
--------------------------------------
After we fixed some hardware issues (I can't see anything relevant but just as it occurred for your info), now it complains (in torque's "..ER" file): "Sorry, mpiBLAST must be run on 3 or more nodes".  (Also in the node's /undeliverred/ errors).

(2) 
If I modify the "program" to run on command line as following, it works fine:
----------------------------------------------
  /path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines 
    /path/to/mpiblast
      -p blastn
      -d seq.db
      -i /path/to/input.seq 
      -o /path/to/output.txt
----------------------------------------------

(3)
I do not think this is right but for trial, if I run it as in:
--------------------------------------
qsub -l nodes=6:ppn=2
     -e /path/to/locationA
     -o /path/to/locationA
     /path/to/program

  where "program" is:

  /path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines 
    /path/to/mpiblast
      -p blastn
      -d seq.db
      -i /path/to/input.seq 
      -o /path/to/output.txt
--------------------------------------
It fails with error: "pls:tm: failed to poll for a spawned proc, return status = 17002".

I am hoping, with some improvements on (1) will make it work again, but it ran out of my knowledge; therefore seek helps here.

Thank you in advance,

Zhiliang




More information about the Bioclusters mailing list