[Bioclusters] mpiBLAST recovery?

Iddo Friedberg idoerg at gmail.com
Fri Nov 9 03:48:10 EST 2007


Hi,

I am using mpiblast via the Sun Grid Engine. Is there a way to recover and
rerun mpiblast once a node is down (and subsequently goes up again?) I have
a downed node, and it seems that everything froze since it went down. It
will probably not be up until tomorrow, when our sysadmin comes in. I'd hate
to lose whatever work I already accumulated.

Just to appraise you of the situation, the downed node is called ikelite-3-8


I ssh'd to one of the working nodes (ikelite-3-5) and did the following:

idoerg at ikelite-3-5 ~]$ ps -lef | grep mpiblast
0 S idoerg     768   767  0  78   0 - 13774 rt_sig Nov08 ?        00:00:00
tcsh -c /opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397 \-p4amslave
\-p4yourname ikelite-3-5 \-p4rmrank 14
0 R idoerg     835   768 98  85   0 - 77560 -      Nov08 ?        12:13:36
/opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397   4amslave -p4yourname
ikelite-3-5 -p4rmrank 14
1 S idoerg     838   835  0  76   0 - 66503 -      Nov08 ?        00:00:00
/opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397   4amslave -p4yourname
ikelite-3-5 -p4rmrank 14
0 S idoerg     842   841  0  78   0 - 13774 rt_sig Nov08 ?        00:00:00
tcsh -c /opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397 \-p4amslave
\-p4yourname ikelite-3-5 \-p4rmrank 15
0 S idoerg     909   842 99  85   0 - 77592 -      Nov08 ?        12:16:27
/opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397   4amslave -p4yourname
ikelite-3-5 -p4rmrank 15
1 S idoerg     910   909  0  76   0 - 66503 -      Nov08 ?        00:00:00
/opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397   4amslave -p4yourname
ikelite-3-5 -p4rmrank 15


So as you can see, there is an attempt to ssh to ikelite-3-8, but of course
it cannot since ikelite-3-8 is down.

Thanks fro any help!

Iddo

-- 

I. Friedberg

"The only problem with troubleshooting is that
sometimes trouble shoots back."


More information about the Bioclusters mailing list