SOLVED Re: [Bioclusters] Opteron Perl64 segfault issues

Tim Harsch bioclusters@bioinformatics.org
Wed, 27 Aug 2003 14:20:53 -0700


I would like to get some clarification from you because I have some
processes that do not use the -A F parameter, but do not use the ASN.1
deflines.  I'm worried this issue may be causing problems I'm not yet aware
of.  Can you summarize what the exact symptoms are and include
blast-help@ncbi.nlm.nih.gov in your reply so that they might have a chance
to fix the problem in future releases.

Also, setting -A F, obviates the need for the workarounds you talked about
right?

----- Original Message ----- 
From: "Nathan O. Siemers" <Nathan.Siemers@bms.com>
To: <bioclusters@bioinformatics.org>
Sent: Wednesday, August 27, 2003 5:48 AM
Subject: SOLVED Re: [Bioclusters] Opteron Perl64 segfault issues


>
>
> Sorry for the Opteron spam, but I hope this will help folks doing this
> in the future ;)
>
> We now believe that the abberant behavior in NCBI blast in some
> configurations can be completely traced to a single character change in
> the source code...
>
> In recent releases of the ncbi toolkit, the formatdb options to create
> ASN.1 structured deflines (-A) has been turned on by default, a
> divergence from previous behavior.  Unpredictable (and wrong!) things
> happen when sequences are input to formatdb that do not follow the
> arcane NCBI fasta naming terminology (foo|bar|etc|blah) when this option
> is selected.  In our case, we were using very simple naming conventions:
>
>  >name1
>  >name2
>  >name3
>
> (ncbi would have demanded something like >lcl|name1  )
>
>
> etc.  This is not compatible with the new default behavior of formatdb.
>
> Solution:  if you do not follow the NCBI fasta naming structure exactly,
> use the -A F option of formatdb and/or change the default in formatdb.c.
>
> NCBI toolkit versions somewhere after 2.2.1 have this problem.
>
> Classic NCBI.
>
> Nathan
>
>
>
>
>
>
>
>
> Nathan O. Siemers wrote:
> > All:
> >
> >     Joe Landman from Scalable Informatics, Lawrence Hannon from IBM, and
> > I have been working on issues running blast on the AMD opteron platform.
> > I've summarized my results (with much help from Joe and Lawrence) in
> > validating the blastall and formatdb code.  There are quirks with the
> > latest versions of the NCBI toolkit, producing corrupt blast results in
> > some situations.  They only appear with some (large) databases but we
> > are not sure what exactly causes this behavior at the present time.  We
> > have tentative workarounds, listed below.
> >
> >
> > Thanks to everyone who has helped me over the past few weeks - the
> > bottom line is that *none* of the problems I have seen over the past
> > weeks could actually be traced to problems with Opteron hardware (other
> > than a RAM chip) or Linux OS.  This is great news for Opteron.
> >
> >
> >
> > SUMMARY
> >
> > Builds of formatdb and blastall from the NCBI Toolkit version 2.2.6
> > can produce corrupted output when used with some formatdb parameters
> > in all builds so far tested on the AMD Opteron 64 bit platform.
> > Symptoms include failure to produce a correctly named .nal or .pal
> > file when databases are split up into volumes.  Pointer errors produce
> > incorrect results and alignments with some large databases.  NCBI
> > Toolkit 2.2.1 does not show this behavior.  Some of these errors have
> > been reproduced by us on SGI MIPS IRIX platforms with SGI compilers,
> > suggesting that the errors are neither Opteron nor compiler specific.
> >
> >
> >
> >
> >
> > Current workarounds are to:
> >
> >     1.  explicitly name the formatdb output database with the -n option
> >
> >     2.  use the '-o T' option in formatdb to alter the way blast indices
> >         are created.
> >
> >     Alternatively:
> >
> >     3.  Use the 2.2.1 version of the blastall tools.
> >
> >
> >
> >
> >
> > _______________________________________
> >
> > TESTS
> >
> > Machine, OS, libs:
> >
> > 2 CPU AMD Opteron (Penguin), 6G RAM, SUSE Linux 8, 2.4.19 SMP Linux
> > Kernel.
> >
> > Current configuration:
> >
> > opt:/gcgblast # gcc -v
> > Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.2.2/specs
> > Configured with: ../configure --enable-threads=posix --prefix=/usr
> > --with-local-prefix=/usr/local --infodir=/usr/share/info
> > --mandir=/usr/share/man --libdir=/usr/lib64
> > --enable-languages=c,c++,f77,objc,java,ada --enable-libgcj
> > --with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib
> > --with-system-zlib --enable-shared --enable-__cxa_atexit
x86_64-suse-linux
> > Thread model: posix
> > gcc version 3.2.2 (SuSE Linux)
> >
> > (gcc-3.2.2-26.x86_64.rpm)
> > (glibc-2.2.5-184.x86_64.rpm)
> >
> > ldd /usr/local/bin/blastall:
> >
> >         libm.so.6 => /lib64/libm.so.6 (0x0000002a9566d000)
> >         libpthread.so.0 => /lib64/libpthread.so.0 (0x0000002a957c6000)
> >         libc.so.6 => /lib64/libc.so.6 (0x0000002a958e2000)
> >         /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
> > (0x0000002a95556000)
> >
> >
> > _______________________________________
> >
> >
> > Databases:
> >
> > ncbi:  Human genome scaffold broken into 100KB pieces, 50KB overlap (
> > 5.9G )
> >
> > sncbi:  same as above but long sequence names converted to shorter form
> > (some names were very long and I wanted to make sure this was not an
> > name indexing problem)
> >
> > htg:  20 August download of NCBI htg sequence file (11G uncompressed)
> >
> > _______________________________________
> >
> > Formatdb options:
> >
> > o:  using '-o T' option for indexing
> >
> > no_o:     no -o option
> >
> > Other formatdb options used:  '-p F -n <name> -i <fasta_file>'
> >
> > _______________________________________
> >
> > blastall options:  '-p tblastn -v 3 -b 3 -a 2 -d <db> -i <input_file>'
> >
> > _______________________________________
> >
> > Input file:  12 protein sequences from fly refseq:
> >  >BMSPROT:NP_478140
> >  >BMSPROT:NP_523807
> >  >BMSPROT:NP_609725
> >  >BMSPROT:NP_524716
> >  >BMSPROT:NP_524665
> >  >BMSPROT:NP_524468
> >  >BMSPROT:NP_523392
> >  >BMSPROT:NP_572997
> >  >BMSPROT:NP_524671
> >  >BMSPROT:NP_608480
> >  >BMSPROT:NP_524763
> >  >BMSPROT:NP_524817
> >
> > (I've checked, the 'BMSPROT:' prefix doesn't seem to affect the
analysis).
> > _______________________________________
> >
> > R E S U L T S
> > ____________________________________________________________________
> >
> > NCBI Toolkit  ncbi-o  ncbi-no_o  sncbi_o  sncbi-no_o htg-o  htg-no_o
> >
> > 2.2.1         pass    pass       pass      pass      pass   pass
> >
> > 2.2.6         pass    FAIL*      pass      FAIL*     pass   pass
> >
> > ____________________________________________________________________
> >
> >
> > * - FAIL symptoms include error messages: '[blastall] ERROR: ncbiapi
> > [000.000]
> > BMSPROT:NP_478140: ObjMgrChoice: pointer [0] type [1] not found',
> > missing names for
> > sequence names of db hits in BLAST summary and sporadic nonsense
> > alignments.
> >
> > CONFIGURATION
> >
> > IBM,Siemers Opteron linux.ncbi.mk directives for 2.2.6 (April 2003),
> > SUSE 8.1 opteron
> > Linux
> >
> > NCBI_DEFAULT_LCL = lnx
> > NCBI_MAKE_SHELL = /bin/sh
> > NCBI_CC = gcc -pipe -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -O3
> > -DOS_UNIX_PPCLINUX  -I../include -I/usr/X11R6/include -L/usr/X11R6/lib64
> > -DWIN_MOTIF
> > # should probably be /usr/X11R6/lib64 above on SUSE 8.1
> > NCBI_CFLAGS1 = -c
> > NCBI_LDFLAGS1 =
> > NCBI_OPTFLAG =
> >
> > Opteron linux.ncbi.mk directives for 2.2.1 NCBI Toolkit:
> >
> >
> > NCBI_DEFAULT_LCL = lnx
> > NCBI_MAKE_SHELL = /bin/sh
> > NCBI_CC = gcc -pipe -D__USE_FILE_OFFSET64 -D__USE_LARGEFILE64
> > NCBI_CFLAGS1 = -c -DOS_UNIX_PPCLINUX
> > NCBI_LDFLAGS1 = -O2
> > NCBI_OPTFLAG = -O2
> >
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
>
> -- 
> Nathan Siemers|Associate Director|Applied Genomics|Bristol-Myers Squibb
> Pharmaceutical Research
> Institute|HW3-0.07|P.O. Box 5400|Princeton, NJ
> 08543-5400|(609)818-6568|nathan.siemers@bms.com
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters