We are running mpiBLAST (1.2.1) on 3 different clusters, with LAM-MPI and openPBS. 2 of the clusters are fine, but one has recently developed some rather bizarre ouput errors. Small BLAST jobs (10s of sequences versus protein nr database) run fine, but larger jobs (e.g. all proteins from a typical microbial genome v. nr) have problems. The BLAST output file starts to write, but is truncated. The nodes appear to run lamboot and lamhalt fine, but one node seems to stall and we see this kind of error message: ---------------------------------------------------------------------------- Unknown message tag (-32766) received by 1 UUnknUkUnknknonwonwonwown nm nm nm mesessessesssagaegaegaege t at at atag g( g( g( (-3-23-23-232767667667666) )r )r )r reececeeceeceiivevievievedd b db db byy 2 y3 y4 5 ----------------------------------------------------------------------------- One of the processes started by mpirun has exited with a nonzero exit code. This typically indicates that the process finished in error. If your process did not finish in error, be sure to include a "return 0" or "exit(0)" in your C code before exiting the application. PID 8303 failed on node n0 (10.0.92.100) due to signal 9. ----------------------------------------------------------------------------- Has anyone seen anything like this before or have any ideas what the error signifies? I suspect the head node of this cluster may have a different version of LAM-MPI to the slaves - could this be an issue? mpiBLAST seemed to compile cleanly with lam 7.0.6. thanks for any ideas, Neil Saunders -- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney 2052, Australia http://psychro.bioinformatics.unsw.edu.au/neil/index.php