[Bioclusters] megablast crashes Linux machines?

Joe Landman landman at scalableinformatics.com
Fri May 27 18:24:01 EDT 2005

Hi Peter:

On Fri, 27 May 2005, peter_webb at agilent.com wrote:

> Every time I run it, two nodes crash.  Not the same two nodes every
> time, so doesn't look like a hardware problem.
> I'm going to drill down and see if I can find a small sample that
> reliably takes down the machine.  Meantime, I thought I'd ask, have
> others seen this?  We are running on SuperMicro dual Xeon nodes, the O/S
> is RHE4 WS.

Sounds suspiciously like memory.  We had a case recently with 6 
motherboards (expensive beasts at that) unable to drive memory at the 
rated specs when running under load.  Took a few hours of memtest86 to 
catch it, or a few good gaussian runs.

I might suggest running memtest over the weekend anyway.  Odds are it 
won't find anything, but still ...

How was megablast built?  How did you compile it?  Also, what does your 
swap space look like?

 	swapon -s

will tell us.  What do the logs say right before the crash (if anything)?

Is it a kernel panic?  a hard lock?

How did you build the cluster OS?

