[Bioclusters] Re: new on using clusters: problem running mpiblast (2)

Zhiliang Hu hu at animalgenome.org
Tue Sep 4 15:42:54 EDT 2007


Somehow I found replies to my post on "Bioclusters" list archive through 
Google search but I didn't get them in my mail box.  Anyway let me follow 
up from the messages captured on web --

- I recompiled and made sure the "mpiblast" is located at a nsf shared 
file path, and got the same errors.

- When I added "--debug " to the mpirun I got the same error, no extra 
message.

- I did run a small c "hello" program from my colleagues which worked fine 
(got responses from every node).

One of my colleagues is suspecting if mpiblast was compiled right with 
OpenMPI, and I am looking at 
https://wiki.rocksclusters.org/wiki/index.php/MPI-Blast_with_OpenMPI but 
that's one for Rocks OS while I have CentOS 5; My vender is suggesting my 
evionment setup might have some problems.... I have been checking 
"everything" and still in mist ;-)

Please let me know if you may have more suggestions... Thanks in advance!

Zhiliang


> Hi Zhiliang,
>
> the command-line looks reasonable, does mpiblast generate any additional
> output when the --debug option is added to the command-line?
> Is the /usr/local/bin filesystem replicated on each node?  i.e. does
> every node have a copy of mpiblast located at /usr/local/bin/mpiblast?
> I personally have not tested mpiBLAST with OpenMPI, although mpiBLAST
> doesn't do anything too fancy with MPI so it really ought to work.
>
> -Aaron
>
>
> Zhiliang Hu wrote:
>> I am new on using clusters.
>>
>> I have just installed mpiblast 1.4.0 with ncbi toolbox (June 2005)
>> from source codes on a linux cluster [x86_64/x86_64 (GNU/Linux), CentOS].
>> The installation seemed to be successful.
>>
>> Now when I try the following:
>>
>> > /opt/openmpi.gcc/bin/mpirun -np 14
>>       /usr/local/bin/mpiblast -p blastn
>>                               -i /raid/pub/ncbi/blast/db/BTrsSNP
>>                               -d bta.genome.chr
>>                               -o out1
>>                               -e 0.0000000001
>>                               -W 38 -v 1 -b 1
>>
>> and immediately got following errors:
>> ----------------
>> MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with
>> errorcode 0
>> mpirun noticed that job rank 1 with PID 28131 on node xxxx.xxxxxx.xxx
>> exited on signal 15 (Terminated).
>> 12 additional processes aborted (not shown)
>> --------------------------------
>>
>> Maybe I am missing something obvious?  Could anyone point to the right
>> place for tracing the problem? ...
>>
>> Zhiliang



More information about the Bioclusters mailing list