Hi Glen, thank you for your quick response. On Thursday 06 April 2006 15:53, Glen Otero wrote: > > I usually don't see these types of errors. Here are a few questions: > > How did you format the database for mpiblast? /usr/local/bin/mpiformatdb --nfrags=28 -i Hs.seq.uniq was the latest call, but I had used mpiformatdb -N 28 -i Hs.seq.uniq earlier. > Is the mpiblast database on a shared filesystem, like NFS (I don't > think symlinks will work)? Currently, I have created a /export/data/blastdb/ on the frontend; this was rsynced to /state/partition1/blastdb on the compute nodes; on the frontend, I had a directory /state/partition1 (on the root partition...) containing a symlink to /export/data/blastdb. I have just used a bind mount on the frontend (no more symlinking), but this was not successful, either. The first tests were done via NFS, which did not work either. > How did you launch the job, SGE? In the future, we surely want to use mpiblast in an SGE environment; currently, it was started from the command line. > Can you try a smaller job using just the 6 compute nodes (and > formatting the db into 6 pieces)? Wow, I get a new one now: =============================== bastian at frontend:/state/partition1/blastdb> mpiformatdb -N 6 -i Hs.seq.uniq [...] [... semi-manual distributing data to /state/partiton1/blastdb of all nodes ...] bastian at frontend:/state/partition1/blastdb> cd ~/tmp03 bastian at frontend:~/tmp03> /opt/mpich/gnu/sbin/cleanipcs bastian at frontend:~/tmp03> cluster-fork /opt/mpich/gnu/sbin/cleanipcs [...] bastian at frontend:~/tmp03> mpirun -np 6 /usr/local/bin/mpiblast -p blastn -d Hs.seq.uniq -i IL2RA -o blast_results 54p3_2934: p4_error: : 0 3 0.078125 Bailing out with signal 11 [3] MPI Abort by user Aborting program ! [3] Aborting program! 2p1_28697: p4_error: interrupt SIGx: 13 p5_17962: p4_error: : 0 0.0742188 Bailing out with signal 11 [5] MPI Abort by user Aborting program ! [5] Aborting program! p4_21219: p4_error: : 0 rm_l_4_21279: (0.367188) net_send: could not write to fd=5, errno = 104 0.078125 Bailing out with signal 11 [4] MPI Abort by user Aborting program ! [4] Aborting program! p2_13443: p4_error: : 0 0.078125 Bailing out with signal 11 [2] MPI Abort by user Aborting program ! [2] Aborting program! rm_l_3_2994: (0.644531) net_send: could not write to fd=5, errno = 104 p1_28697: (7.242188) net_send: could not write to fd=5, errno = 32 rm_l_2_13503: (6.929688) net_send: could not write to fd=5, errno = 104 p2_13443: (6.929688) net_send: could not write to fd=5, errno = 32 p5_17962: (6.093750) net_send: could not write to fd=5, errno = 32 =============================== Signal 11 seems to be a segfault? Something's going awfully wrong here... > Can you try a smaller blast job using p53, p53db from ftp:// > ftp.bioinformatics.org/pub/biobrew/ and blastp? This works! :)) The first time I see mpiblast actually working :) Unfortunately, we are looking forward to blasting against the 17 GB genebank... Any more ideas? Thx again, Bastian -- Bastian Friedrich bastian at bastian-friedrich.de Adress & Fon available on my HP http://www.bastian-friedrich.de/ \~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ \ Computers make very fast, very accurate mistakes. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://bioinformatics.org/pipermail/biobrew-users/attachments/20060406/6f4156f9/attachment.bin