[Bioclusters] weird MPI problem

Jeremy Mann bioclusters@bioinformatics.org
Fri, 23 May 2003 09:28:09 -0500 (CDT)


> Sometimes mpi can leave behind shared memory segments or semaphores.
> ipcs and iprm will show the status and remove ipc objects, respectively.
> There are various ipc scripts out there that help you clean up lots of
> objects.
>
> Some I haven't tried are at http://herdtools.sourceforge.net
>
> cl_ipcs -- check on the Unix IPC information on remote machines
> cl_ipcrm -- remove Unix IPC segments from remote machines
>
> Hope this helps,

This isn't the case since the MPI examples work. This only occurs with
mpiblast. Now I am getting another error which is bizarre. Usually, I
don't pass the -f flag to mpiblast since it always found the file in
/usr/local. Anyway, if I leave it off and turn debugging mode on I get
this:

jeremy@bioinf:~/dnaseqs$ mpirun -v -np 4 /usr/local/bin/mpiblast -p blastp
-d nr -i nr-protein.fasta -o out
3
running /usr/local/bin/mpiblast on 4 LINUX ch_p4 processors
Created /home/user/jeremy/dnaseqs/PI27696

0       0.027609        Bailing out with signal 11
[0] MPI Abort by user Aborting program !
[0] Aborting program!
p0_27830:  p4_error: : 0
31      20.0474         93      Bailing out with00.0 signal .2
4704799977      Bailing out with 5      signal 2
Bailing out with signal 2
[3] MPI Abort by user Aborting program !
p3_30777:  p4_error: : 0
[3] Aborting program!
[2] MPI Abort by user Aborting program !
p2_21060:  p4_error: : 0
[2] Aborting program!
[1] MPI Abort by user Aborting program !
p1_6241:  p4_error: : 0
[1] Aborting program!
/usr/local/mpich/bin/mpirun: line 1: 27830 Broken pipe            
/usr/local/bin/mpiblast "-p" "blastp" "-d" "nr" "-i" "nr-protein.fasta"
"-o" "out3" -p4pg /home/user/jeremy/dnaseqs/PI27696 -p4wd
/home/user/jeremy/dnaseqs

Now if I pass the -f flag to mpiblast (to tell it where the configuration
file is) I get this:

jeremy@bioinf:~/dnaseqs$ mpirun -v -np 4 /usr/local/bin/mpiblast -f
/usr/local/mpiblast.conf -p blastp -d n
r -i nr-protein.fasta -o out3
running /usr/local/bin/mpiblast on 4 LINUX ch_p4 processors
Created /home/user/jeremy/dnaseqs/PI27849

[blastall] ERROR: Threshold for extending hits, default if zero
      blastp 11, blastn 0, blastx 12, tblastn 13
      tblastx 13, megablast 0 [/usr/local/mpiblast.conf] is bad or out of
range [? to ?]
0       0.036638        Bailing out with signal 11
[0] MPI Abort by user Aborting program !
[0] Aborting program!

My mpiblast.conf file:

/usr/local/shared
/ncbi/mpiblast
/usr/local/blast


Now if I run the PI test that came with mpich, it runs perfectly:

jeremy@bioinf:~$ mpirun -np 42 cpi
Process 0 of 42 on node1.hydrodyn.beowulf
pi is approximately 3.1415926535897469, Error is 0.0000000000000462
wall clock time = 4.257846
Process 21 of 42 on node1.hydrodyn.beowulf
Process 1 of 42 on node2.hydrodyn.beowulf
Process 3 of 42 on node4.hydrodyn.beowulf
Process 35 of 42 on node15.hydrodyn.beowulf
Process 2 of 42 on node3.hydrodyn.beowulf
Process 8 of 42 on node9.hydrodyn.beowulf
Process 29 of 42 on node9.hydrodyn.beowulf
Process 13 of 42 on node14.hydrodyn.beowulf
Process 39 of 42 on node19.hydrodyn.beowulf
Process 14 of 42 on node15.hydrodyn.beowulf
Process 10 of 42 on node11.hydrodyn.beowulf
Process 12 of 42 on node13.hydrodyn.beowulf
Process 33 of 42 on node13.hydrodyn.beowulf
Process 17 of 42 on node18.hydrodyn.beowulf
Process 9 of 42 on node10.hydrodyn.beowulf
Process 27 of 42 on node7.hydrodyn.beowulf
Process 34 of 42 on node14.hydrodyn.beowulf
Process 24 of 42 on node4.hydrodyn.beowulf
Process 32 of 42 on node12.hydrodyn.beowulf
Process 26 of 42 on node6.hydrodyn.beowulf
Process 31 of 42 on node11.hydrodyn.beowulf
Process 18 of 42 on node19.hydrodyn.beowulf
Process 15 of 42 on node16.hydrodyn.beowulf
Process 38 of 42 on node18.hydrodyn.beowulf
Process 28 of 42 on node8.hydrodyn.beowulf
Process 20 of 42 on node21.hydrodyn.beowulf
Process 19 of 42 on node20.hydrodyn.beowulf
Process 41 of 42 on node21.hydrodyn.beowulf
Process 22 of 42 on node2.hydrodyn.beowulf
Process 23 of 42 on node3.hydrodyn.beowulf
Process 4 of 42 on node5.hydrodyn.beowulf
Process 6 of 42 on node7.hydrodyn.beowulf
Process 36 of 42 on node16.hydrodyn.beowulf
Process 40 of 42 on node20.hydrodyn.beowulf
Process 25 of 42 on node5.hydrodyn.beowulf
Process 5 of 42 on node6.hydrodyn.beowulf
Process 11 of 42 on node12.hydrodyn.beowulf
Process 30 of 42 on node10.hydrodyn.beowulf
Process 7 of 42 on node8.hydrodyn.beowulf
Process 16 of 42 on node17.hydrodyn.beowulf
Process 37 of 42 on node17.hydrodyn.beowulf


Any other suggestions?


-- 
Jeremy Mann
jeremy@biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672