[Bioclusters] mpiBLAST errors
Joe Landman
bioclusters@bioinformatics.org
Mon, 02 Aug 2004 22:08:06 -0400
Hi Neil:
If you simply skip the one node which seems to stall, does the run
work? For PBS, you might need to create a queue which skips this node,
or simply mark the node down using pbsnode.
Joe
Neil Saunders wrote:
>We are running mpiBLAST (1.2.1) on 3 different clusters, with LAM-MPI
>and openPBS.
>
>2 of the clusters are fine, but one has recently developed some rather
>bizarre ouput errors. Small BLAST jobs (10s of sequences versus protein
>nr database) run fine, but larger jobs (e.g. all proteins from a typical
>microbial genome v. nr) have problems. The BLAST output file starts to
>write, but is truncated. The nodes appear to run lamboot and lamhalt
>fine, but one node seems to stall and we see this kind of error message:
>
>----------------------------------------------------------------------------
>Unknown message tag (-32766) received by 1
>UUnknUkUnknknonwonwonwown nm nm nm mesessessesssagaegaegaege t at at
>atag g( g( g( (-3-23-23-232767667667666) )r )r )r
>reececeeceeceiivevievievedd b db db byy 2 y3 y4 5
>-----------------------------------------------------------------------------
>One of the processes started by mpirun has exited with a nonzero exit
>code. This typically indicates that the process finished in error.
>If your process did not finish in error, be sure to include a "return
>0" or "exit(0)" in your C code before exiting the application.
>
>PID 8303 failed on node n0 (10.0.92.100) due to signal 9.
>-----------------------------------------------------------------------------
>
>
>Has anyone seen anything like this before or have any ideas what the
>error signifies? I suspect the head node of this cluster may have a
>different version of LAM-MPI to the slaves - could this be an issue?
>mpiBLAST seemed to compile cleanly with lam 7.0.6.
>
>thanks for any ideas,
>
>Neil Saunders
>
>
--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615