[Bioclusters] Opteron Perl64 segfault issues

Joseph Landman bioclusters@bioinformatics.org
05 Aug 2003 11:51:52 -0400


Hi Nathan:

  I took the liberty of modifying your Perl script to instrument it for
process monitoring (memory size), and time stamp the iterations.  If you
have a memory leak, you will notice the process getting monotonically
bigger as a function of time.

  I am running this now on a RedHat 8.0 machine with a custom kernel,
and default compiled perl.  The script as written requires about 900MB
to run, though this is with the 32 bit Perl.  Memory usage seems to have
hit a plateau after 3 iterations.

  Have you run memory tests on this machine?  I highly recommend
memtest86 (http://www.memtest86.com) though I do not know if it runs on
an Opteron.  Let memtest grind over a weekend.  If it catches faults,
you might have a possible culprit.

  Other possibilities that come to mind given what you have indicated
include

1) memory leaks (hence the instrumentation) in Perl, or the core glibc
libraries
2) Perl bugs (in GC stage or memory management in general)

  Segfaults generally happen when one process stamps all over the memory
it is not supposed to touch.  Sometimes this happens in connection with
an out of memory condition.  If you run the test, and watch the logs
(/var/log/messages) to look for a "process killed" message or something
similar, this would be a clue.

  Note: if you turn off swap and the segfault happens sooner, then
increase swap (create a swapfile) and the segfault happens later, then
you likely have a memory leak somewhere.  

Here is the instrumented code

#!/usr/bin/perl
use POSIX qw(strftime);
$number = 10000000;
$five_percent=int($number/20);
$iteration=0;
$|=1;

while (1) 
  {
    $time_stamp= strftime "%F %T",localtime;
    printf "Getting process data [%s]\n",$time_stamp;
    @process=split(/\s+/,`ps --no-header -lwm -p $$`);
    $size=$process[9];
    $time_stamp= strftime "%F %T",localtime;
    printf "Iteration %i, size=%-.3f MB
[%s]\n",$iteration,$size*4096/(1024*1024),$time_stamp;
    for ($i = 1; $i <= $number;  $i++) 
       {
        print ".." if (($i % $five_percent) == 0);
        $hash{$i}++;
	printf "%i",int(20*($i/$number)) if (($i % $five_percent) == 0);
       }
    $time_stamp= strftime "%F %T",localtime;
    printf "\n\tstarting undef [%s]...",$time_stamp;
    $iteration++;
    undef %hash;
    $time_stamp= strftime "%F %T",localtime;
    printf "done [%s]\n",$time_stamp;
  }


Several iterations on my Athlon system show this:

[landman@squash.scalableinformatics.com:~]
49 >./test1.pl
Getting process data [2003-08-05 11:38:38]
Iteration 0, size=3.840 MB [2003-08-05 11:38:38]
..1..2..3..4..5..6..7..8..9..10..11..12..13..14..15..16..17..18..19..20
        starting undef [2003-08-05 11:39:19]...done [2003-08-05
11:39:30]
Getting process data [2003-08-05 11:39:30]
Iteration 1, size=809.250 MB [2003-08-05 11:39:40]
..1..2..3..4..5..6..7..8..9..10..11..12..13..14..15..16..17..18..19..20
        starting undef [2003-08-05 11:40:25]...done [2003-08-05
11:40:39]
Getting process data [2003-08-05 11:40:39]
Iteration 2, size=873.254 MB [2003-08-05 11:40:49]
..1..2..3..4..5..6..7..8..9..10..11..12..13..14..15..16..17..18..19..20
        starting undef [2003-08-05 11:41:35]...done [2003-08-05
11:41:50]
Getting process data [2003-08-05 11:41:50]
Iteration 3, size=873.254 MB [2003-08-05 11:41:59]
..1..2..3..4..5..6..7..8..9..10..11..12..13..14..15..16..17..18..19..20
        starting undef [2003-08-05 11:42:47]...done [2003-08-05
11:43:03]
Getting process data [2003-08-05 11:43:03]
Iteration 4, size=873.254 MB [2003-08-05 11:43:12]
..1..2..3..4..5..6..7..8..9..10..11..12..13..14..15..16..17..18..19..20
        starting undef [2003-08-05 11:44:00]...done [2003-08-05
11:44:16]

Joe

On Tue, 2003-08-05 at 09:42, Nathan O. Siemers wrote:
> Hello All:
> 
> 
> We are anticipating the purchase of an AMD opteron linux cluster to 
> replace our old IA-32 systems.  We have purchased a test box (Penguin) 
> running SUSE Linux and perl 5.8.0.  The summary of the software 
> configuration is included at the end of this message.
> 
> We have encountered an issue with the perl implementation on the 
> machine: I can reproducibly segfault perl with this code:
> 
> ____________________________________________________________
> #!/usr/bin/perl
> $number = 10000000;
> while (1) {
>      for ($i = 1; $i <= $number;  $i++) {
>      $hash{$i}++;
>      }
>      undef %hash;
> }
> _____________________________________________________________
> 
> One needs about 2g of ram on the machine to run the code, and it will 
> never terminate.  On our opteron system, this code will produce a 
> segmentation violation after a day or two of running.  The code simply 
> produces a large perl hash data structure and interacts with it in very 
> simple ways....
> 
> Instability in perl is a show stopper for us.  We currently do not know 
> if this behavior is related to:
> 
>      the 5.8.0 release of perl (our other systems are running 5.6.x), or 
> the way in which SUSE compiled it (I notice that threads are built into 
> their version).
> 
>      bug in the AMD CPU or motherboard, etc.
> 
>      problems with the AMD opteron Linux shared libraries or other 
> aspects of the linux port.
> 
>       Can anyone test this on their boxen (let it run for 4 days to be a 
> fair test) or shed some insight on where the problem may be?  (Yes we 
> *do* often keep such big hashes active in perl for long periods of time 
> for word-based seq identity searches).
> 
> 	Thanks,
> 
> 
> 	Nathan Siemers
> 
> 
> 
> 
> 
> 
> 
> 
> 
> uname -a:
> 
> Linux opt 2.4.19-SMP #1 SMP Mon Mar 31 23:48:08 UTC 2003 x86_64 unknown
> 
> 
>      and perl:
> 
> perl -V
> 
> Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
>    Platform:
>      osname=linux, osvers=2.4.19, archname=x86_64-linux-thread-multi
>      uname='linux jarre 2.4.19 #1 smp mon mar 24 16:17:59 utc 2003 
> x86_64 unknown
>   '
>      config_args='-ds -e -Dprefix=/usr -Dusethreads -Di_db -Di_dbm 
> -Di_ndbm -Di_g
> dbm -Duseshrplib=true'
>      hint=recommended, useposix=true, d_sigaction=define
>      usethreads=define use5005threads=undef useithreads=define 
> usemultiplicity=de
> fine
>      useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>      use64bitint=define use64bitall=define uselongdouble=undef
>      usemymalloc=n, bincompat5005=undef
>    Compiler:
>      cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing 
> -D_LARGEF
> ILE_SOURCE -D_FILE_OFFSET_BITS=64',
>      optimize='-O2 --pipe',
>      cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing'
>      ccversion='', gccversion='3.2.2 (SuSE Linux)', gccosandvers=''
>      intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
>      d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
>      ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', 
> lseeksize
> =8
>      alignbytes=8, prototype=define
>    Linker and Libraries:
>      ld='cc', ldflags =' -L/usr/local/lib64'
>      libpth=/lib64 /usr/lib64 /usr/local/lib64
>      libs=-lm -ldl -lcrypt -lpthread
>      perllibs=-lm -ldl -lcrypt -lpthread
>      libc=/lib64//lib64/libc.so.6, so=so, useshrplib=true, 
> libperl=libperl.so
>      gnulibc_version='2.2.5'
>    Dynamic Linking:
>      dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic 
> -Wl,-rpa
> th,/usr/lib/perl5/5.8.0/x86_64-linux-thread-multi/CORE'
>      cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64'
> 
> 
> Characteristics of this binary (from libperl):
>    Compile-time options: MULTIPLICITY USE_ITHREADS USE_64_BIT_INT 
> USE_64_BIT_ALL
> USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
>    Built under linux
>    Compiled at Mar 27 2003 16:06:33
>    @INC:
>      /usr/lib/perl5/5.8.0/x86_64-linux-thread-multi
>      /usr/lib/perl5/5.8.0
>      /usr/lib/perl5/site_perl/5.8.0/x86_64-linux-thread-multi
>      /usr/lib/perl5/site_perl/5.8.0
>      /usr/lib/perl5/site_perl
>      .
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman@scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615