[Bioclusters] mpiblast users, I need your help

Sun Jan 23 03:29:43 EST 2005

Seems the mpiblast-users mailing list just isn't
getting much attention, so I'll retry here.

I've got a small test case that reproducibly crashes
for me. I was hoping a few others on the list could
run it and tell me how it behaves for them. 

Here is the script: 

lamboot -v lamhosts 
/usr/bin/mpirun -np 6 /usr/bin/mpiblast -p blastn -d
nt -o blast.out -i query.mfa 2> mpirun.err >
mpirun.out 
lamhalt 

lamhosts looks like this: 
burns 
smithers1 
smithers2 
smithers3 

the nt database is based on the ncbi nt.gz file.  

This is being run under lam/mpi 7.1.1 
with mpiblast 1.3 

Due to its size (24k) I've posted the query sequence
at  
http://cariaso.is-a-geek.com/~cariaso/files/mpiblastcrash.seq

other sequences work just fine, but many others crash
as well.  

When run with --debug, the crashes always occur after
lines like these, ending at outputResults(). 

[4] 105.211 MPI startup time is 0 
[0] 105.252 Receive was successful -- about to merge
(4) 
[0] 105.255 Query results have been merged 
[0] 105.255 4 / 4 frags have been searched for query
0. Writing results 
[0] 105.419 Setting bioseq cache 
[0] 105.419 bioseq_cache->data.ptrvalue is: 0x60101188

[0] 105.42 outputResults() 

The blast.out file remains at zero bytes. All nodes
still show mpiblast processes running under 'ps' but
'top ' shows that they are no longer consuming any
cpu. 

Help!

> From: jsarchuleta
> Also, I assume that outputResults() is always the
last
> thing displayed, and that Node 0 ("burns") is also
at 
> 0%.

yes, outputResults is always the last thing displayed.

Once I reach the outputResults() message CPU load on
all machines, including node 0, is at 0%.
And I've waited for more than enough time. 

=====
Mike Cariaso