Somehow I found replies to my post on "Bioclusters" list archive through Google search but I didn't get them in my mail box. Anyway let me follow up from the messages captured on web -- - I recompiled and made sure the "mpiblast" is located at a nsf shared file path, and got the same errors. - When I added "--debug " to the mpirun I got the same error, no extra message. - I did run a small c "hello" program from my colleagues which worked fine (got responses from every node). One of my colleagues is suspecting if mpiblast was compiled right with OpenMPI, and I am looking at https://wiki.rocksclusters.org/wiki/index.php/MPI-Blast_with_OpenMPI but that's one for Rocks OS while I have CentOS 5; My vender is suggesting my evionment setup might have some problems.... I have been checking "everything" and still in mist ;-) Please let me know if you may have more suggestions... Thanks in advance! Zhiliang > Hi Zhiliang, > > the command-line looks reasonable, does mpiblast generate any additional > output when the --debug option is added to the command-line? > Is the /usr/local/bin filesystem replicated on each node? i.e. does > every node have a copy of mpiblast located at /usr/local/bin/mpiblast? > I personally have not tested mpiBLAST with OpenMPI, although mpiBLAST > doesn't do anything too fancy with MPI so it really ought to work. > > -Aaron > > > Zhiliang Hu wrote: >> I am new on using clusters. >> >> I have just installed mpiblast 1.4.0 with ncbi toolbox (June 2005) >> from source codes on a linux cluster [x86_64/x86_64 (GNU/Linux), CentOS]. >> The installation seemed to be successful. >> >> Now when I try the following: >> >> > /opt/openmpi.gcc/bin/mpirun -np 14 >> /usr/local/bin/mpiblast -p blastn >> -i /raid/pub/ncbi/blast/db/BTrsSNP >> -d bta.genome.chr >> -o out1 >> -e 0.0000000001 >> -W 38 -v 1 -b 1 >> >> and immediately got following errors: >> ---------------- >> MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with >> errorcode 0 >> mpirun noticed that job rank 1 with PID 28131 on node xxxx.xxxxxx.xxx >> exited on signal 15 (Terminated). >> 12 additional processes aborted (not shown) >> -------------------------------- >> >> Maybe I am missing something obvious? Could anyone point to the right >> place for tracing the problem? ... >> >> Zhiliang