SOLVED Re: [Bioclusters] Opteron Perl64 segfault issues
Nathan O. Siemers
bioclusters@bioinformatics.org
Wed, 27 Aug 2003 08:48:54 -0400
Sorry for the Opteron spam, but I hope this will help folks doing this
in the future ;)
We now believe that the abberant behavior in NCBI blast in some
configurations can be completely traced to a single character change in
the source code...
In recent releases of the ncbi toolkit, the formatdb options to create
ASN.1 structured deflines (-A) has been turned on by default, a
divergence from previous behavior. Unpredictable (and wrong!) things
happen when sequences are input to formatdb that do not follow the
arcane NCBI fasta naming terminology (foo|bar|etc|blah) when this option
is selected. In our case, we were using very simple naming conventions:
>name1
>name2
>name3
(ncbi would have demanded something like >lcl|name1 )
etc. This is not compatible with the new default behavior of formatdb.
Solution: if you do not follow the NCBI fasta naming structure exactly,
use the -A F option of formatdb and/or change the default in formatdb.c.
NCBI toolkit versions somewhere after 2.2.1 have this problem.
Classic NCBI.
Nathan
Nathan O. Siemers wrote:
> All:
>
> Joe Landman from Scalable Informatics, Lawrence Hannon from IBM, and
> I have been working on issues running blast on the AMD opteron platform.
> I've summarized my results (with much help from Joe and Lawrence) in
> validating the blastall and formatdb code. There are quirks with the
> latest versions of the NCBI toolkit, producing corrupt blast results in
> some situations. They only appear with some (large) databases but we
> are not sure what exactly causes this behavior at the present time. We
> have tentative workarounds, listed below.
>
>
> Thanks to everyone who has helped me over the past few weeks - the
> bottom line is that *none* of the problems I have seen over the past
> weeks could actually be traced to problems with Opteron hardware (other
> than a RAM chip) or Linux OS. This is great news for Opteron.
>
>
>
> SUMMARY
>
> Builds of formatdb and blastall from the NCBI Toolkit version 2.2.6
> can produce corrupted output when used with some formatdb parameters
> in all builds so far tested on the AMD Opteron 64 bit platform.
> Symptoms include failure to produce a correctly named .nal or .pal
> file when databases are split up into volumes. Pointer errors produce
> incorrect results and alignments with some large databases. NCBI
> Toolkit 2.2.1 does not show this behavior. Some of these errors have
> been reproduced by us on SGI MIPS IRIX platforms with SGI compilers,
> suggesting that the errors are neither Opteron nor compiler specific.
>
>
>
>
>
> Current workarounds are to:
>
> 1. explicitly name the formatdb output database with the -n option
>
> 2. use the '-o T' option in formatdb to alter the way blast indices
> are created.
>
> Alternatively:
>
> 3. Use the 2.2.1 version of the blastall tools.
>
>
>
>
>
> _______________________________________
>
> TESTS
>
> Machine, OS, libs:
>
> 2 CPU AMD Opteron (Penguin), 6G RAM, SUSE Linux 8, 2.4.19 SMP Linux
> Kernel.
>
> Current configuration:
>
> opt:/gcgblast # gcc -v
> Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.2.2/specs
> Configured with: ../configure --enable-threads=posix --prefix=/usr
> --with-local-prefix=/usr/local --infodir=/usr/share/info
> --mandir=/usr/share/man --libdir=/usr/lib64
> --enable-languages=c,c++,f77,objc,java,ada --enable-libgcj
> --with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib
> --with-system-zlib --enable-shared --enable-__cxa_atexit x86_64-suse-linux
> Thread model: posix
> gcc version 3.2.2 (SuSE Linux)
>
> (gcc-3.2.2-26.x86_64.rpm)
> (glibc-2.2.5-184.x86_64.rpm)
>
> ldd /usr/local/bin/blastall:
>
> libm.so.6 => /lib64/libm.so.6 (0x0000002a9566d000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x0000002a957c6000)
> libc.so.6 => /lib64/libc.so.6 (0x0000002a958e2000)
> /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
> (0x0000002a95556000)
>
>
> _______________________________________
>
>
> Databases:
>
> ncbi: Human genome scaffold broken into 100KB pieces, 50KB overlap (
> 5.9G )
>
> sncbi: same as above but long sequence names converted to shorter form
> (some names were very long and I wanted to make sure this was not an
> name indexing problem)
>
> htg: 20 August download of NCBI htg sequence file (11G uncompressed)
>
> _______________________________________
>
> Formatdb options:
>
> o: using '-o T' option for indexing
>
> no_o: no -o option
>
> Other formatdb options used: '-p F -n <name> -i <fasta_file>'
>
> _______________________________________
>
> blastall options: '-p tblastn -v 3 -b 3 -a 2 -d <db> -i <input_file>'
>
> _______________________________________
>
> Input file: 12 protein sequences from fly refseq:
> >BMSPROT:NP_478140
> >BMSPROT:NP_523807
> >BMSPROT:NP_609725
> >BMSPROT:NP_524716
> >BMSPROT:NP_524665
> >BMSPROT:NP_524468
> >BMSPROT:NP_523392
> >BMSPROT:NP_572997
> >BMSPROT:NP_524671
> >BMSPROT:NP_608480
> >BMSPROT:NP_524763
> >BMSPROT:NP_524817
>
> (I've checked, the 'BMSPROT:' prefix doesn't seem to affect the analysis).
> _______________________________________
>
> R E S U L T S
> ____________________________________________________________________
>
> NCBI Toolkit ncbi-o ncbi-no_o sncbi_o sncbi-no_o htg-o htg-no_o
>
> 2.2.1 pass pass pass pass pass pass
>
> 2.2.6 pass FAIL* pass FAIL* pass pass
>
> ____________________________________________________________________
>
>
> * - FAIL symptoms include error messages: '[blastall] ERROR: ncbiapi
> [000.000]
> BMSPROT:NP_478140: ObjMgrChoice: pointer [0] type [1] not found',
> missing names for
> sequence names of db hits in BLAST summary and sporadic nonsense
> alignments.
>
> CONFIGURATION
>
> IBM,Siemers Opteron linux.ncbi.mk directives for 2.2.6 (April 2003),
> SUSE 8.1 opteron
> Linux
>
> NCBI_DEFAULT_LCL = lnx
> NCBI_MAKE_SHELL = /bin/sh
> NCBI_CC = gcc -pipe -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -O3
> -DOS_UNIX_PPCLINUX -I../include -I/usr/X11R6/include -L/usr/X11R6/lib64
> -DWIN_MOTIF
> # should probably be /usr/X11R6/lib64 above on SUSE 8.1
> NCBI_CFLAGS1 = -c
> NCBI_LDFLAGS1 =
> NCBI_OPTFLAG =
>
> Opteron linux.ncbi.mk directives for 2.2.1 NCBI Toolkit:
>
>
> NCBI_DEFAULT_LCL = lnx
> NCBI_MAKE_SHELL = /bin/sh
> NCBI_CC = gcc -pipe -D__USE_FILE_OFFSET64 -D__USE_LARGEFILE64
> NCBI_CFLAGS1 = -c -DOS_UNIX_PPCLINUX
> NCBI_LDFLAGS1 = -O2
> NCBI_OPTFLAG = -O2
>
> _______________________________________________
> Bioclusters maillist - Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
--
Nathan Siemers|Associate Director|Applied Genomics|Bristol-Myers Squibb
Pharmaceutical Research
Institute|HW3-0.07|P.O. Box 5400|Princeton, NJ
08543-5400|(609)818-6568|nathan.siemers@bms.com