[Bioclusters] Re: new on using clusters: problem running mpiblast
(2)
Zhiliang Hu
hu at animalgenome.org
Fri Sep 21 11:06:35 EDT 2007
Thanks Joe!
I tested with 'which' and 'whereis', they do find 'orted' on my system.
So I tried again with full path to "mpirun" (i am sorry I should have
done this earlier):
> /opt/openmpi.gcc/bin/mpirun -np 3 -machinefile machines
/home/local/bin/mpiblast -p blastp -i ./bait.fasta -d ecoli.aa
which produced error:
----------------------
1 0.0799131 Bailing out with signal 11
[node002:19427] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 0
0 0.0862 Bailing out with signal 15
[node001:24948] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 0
2 0.0861399 Bailing out with signal 15
[node003:15941] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 0
----------------------
I have another openMPI installation, so I also tried:
> /opt/openmpi121.gcc/bin/mpirun -np 3 -machinefile machines
/home/local/bin/mpiblast -p blastp -i ./bait.fasta -d ecoli.aa
which gives different errors:
----------------------
[host.ansci.iastate.edu:07014] mca: base: component_find: unable to open ras tm: file not found (ignored)
[host.ansci.iastate.edu:07014] mca: base: component_find: unable to open pls tm: file not found (ignored)
[node001:24985] mca: base: component_find: unable to open ras tm: file not found (ignored)
[node001:24985] mca: base: component_find: unable to open pls tm: file not found (ignored)
[node003:15979] mca: base: component_find: unable to open ras tm: file not found (ignored)
[node002:19464] mca: base: component_find: unable to open ras tm: file not found (ignored)
[node003:15979] mca: base: component_find: unable to open pls tm: file not found (ignored)
[node002:19464] mca: base: component_find: unable to open pls tm: file not found (ignored)
1 0.0736248 Bailing out with signal 11
[node002:19464] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 0
0 0.0795131 Bailing out with signal 15
[node001:24985] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 0
2 0.0794392 Bailing out with signal 15
[node003:15979] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 0
----------------------
By the way, from the head node, 'ssh node001 which orted' does not
find it but 'ssh node001 whereis orted' found it (from both mpi
installations). Also, after I do 'ssh node001', both 'which' and
whereis' can find it from the two mpi installations.
I do have '/opt/openmpi121.gcc/bin' and '/opt/openmpi.gcc/bin' on
my path (I am using bash; I tried using 'tcsh' with more errors).
I hope this provide more useful clue to dig further?
Zhiliang
On Thu, 20 Sep 2007, Joe Landman wrote:
> Date: Thu, 20 Sep 2007 16:13:50 -0400
> From: Joe Landman <landman at scalableinformatics.com>
> To: HPC in Bioinformatics <bioclusters at bioinformatics.org>
> Subject: Re: [Bioclusters] Re: new on using clusters: problem running mpiblast
> (2)
>
> Zhiliang Hu wrote:
>
>> ---------------------------------------
>> bash: orted: command not found
>> bash: orted: command not found
>
>
> Ah-hah!
>
> Could you do a
>
> which orted
>
> on the head node from where you launch the mpiblast, and then
>
> ssh node001 which orted
>
> and report that back?
>
>> [ansci.iastate.edu:03916] ERROR: A daemon on node node001 failed to
>> start as expected.
>
> This suggests that a) orted wasn't found, and b) since that is required
> to let OpenMPI set up the remote process, the remote process doesn't get
> started.
>
>> [ansci.iastate.edu:03916] ERROR: There may be more information available
>> from
>> [ansci.iastate.edu:03916] ERROR: the remote shell (see above).
>> [ansci.iastate.edu:03916] ERROR: The daemon exited unexpectedly with
>> status 127.
>
> If you don't see orted on the remote system, you might need to contact
> your systems administrator to make sure the right path is mounted on the
> remote node.
>
> If you built OpenMPI yourself, you need to make sure your path variable
> includes the $openmpi/bin directory.
>
> Basically this looks like OpenMPI is not in your path, which is why it
> can't find orted, and this is why mpiblast isn't booting up on the node.
More information about the Bioclusters
mailing list