I had the same problem last April, and communications with NCBI led me to the following: * The databases need to be formatted with the -A F flag, meaning 'Do not create ASN.1 structured deflines'. This is an optional flag, according to the formatdb documentation, and the default is F, but it still needs to be explicitly set in the formatdb command. SeqPortNew errors, in my experience, are typically also a symptom of other problems in the blast run. If I ran against multiple databases like: blastall -p blastx -i inputseq -d "db1 db2" I would see SeqPortNew errors and obviously incorrect alignments. If I had happened to use '-b 0' flag (don't show me any alignments) I might not have noticed the incorrect results at all, and thus never realized that there was a serious problem with the db formatting. This was Blast 2.2.5, I don't know if the bug has been fixed in Blast 2.2.6 (the current version) but I still use the -A F to be safe. If you want more information I can send you the email thread. Susan. ------------------------------------------------------------------------ ---------------------- Susan Chacko Helix Systems 12B/2N207 Ph: 301-435-2982 National Institutes of Health Fax: 301-402-2190 Bethesda, MD 20814 Email: susanc@nih.gov On Dec 11, 2003, at 6:34 PM, Samir Pandurangi wrote: > I'm blasting against databases I've created myself with formatdb. Each > of > one the deflines contains a one word unique identifier. > When I run formatdb with the -o T option, I get the following errors > from > blastn: > > [blastallnew] ERROR: ncbiapi [000.000] > Jf_2959984_fasta.screen.Contig45: > SeqPortNew: lcl|NM_031858 start(2490) >= len(2209) > [blastallnew] ERROR: ncbiapi [000.000] > Jf_2959984_fasta.screen.Contig45: > SeqPortNew: lcl|NM_031858 start(2490) >= len(2209) > [blastallnew] ERROR: ncbiapi [000.000] > Jf_2959984_fasta.screen.Contig45: > SeqPortNew: lcl|NM_005899 start(2490) >= len(1833) > [blastallnew] ERROR: ncbiapi [000.000] > Jf_2959984_fasta.screen.Contig45: > SeqPortNew: lcl|NM_005899 start(2490) >= len(1833) > > However, when I remove the -o T option, these error disappear, but I > run > into problems with duplicate target hits (where the HSPs are split > under > multiple hits with the same identifier). An example: > > Blastn run without -o T, exhibiting duplicate target problem: > ******************************************* >> NM_033178 > Length = 2560 > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 19/19 (100%) > Strand = Plus / Minus > > > Query: 14026 gccagccagccagccagcc 14044 > ||||||||||||||||||| > Sbjct: 1265 gccagccagccagccagcc 1247 > > > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 19/19 (100%) > Strand = Plus / Plus > > > Query: 46274 ggctggctggctggctggc 46292 > ||||||||||||||||||| > Sbjct: 1247 ggctggctggctggctggc 1265 > > >> NM_033178 > Length = 2560 > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 19/19 (100%) > Strand = Plus / Minus > > > Query: 14026 gccagccagccagccagcc 14044 > ||||||||||||||||||| > Sbjct: 1265 gccagccagccagccagcc 1247 > > > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 19/19 (100%) > Strand = Plus / Plus > > > Query: 46274 ggctggctggctggctggc 46292 > ||||||||||||||||||| > Sbjct: 1247 ggctggctggctggctggc 1265 > ************************************** > > Blastn run with same database except with -o T formatdb option (no > duplicates seqIds): > **************************************** > >> NM_033178 > Length = 2558 > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 0/19 (0%) > Strand = Plus / Minus > > > Query: 14026 gccagccagccagccagcc 14044 > > Sbjct: 1265 cagccagccagccagccag 1247 > > > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 0/19 (0%) > Strand = Plus / Plus > > > Query: 46274 ggctggctggctggctggc 46292 > > Sbjct: 1247 ctggctggctggctggctg 1265 > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 0/19 (0%) > Strand = Plus / Minus > > > Query: 14026 gccagccagccagccagcc 14044 > > Sbjct: 1265 cagccagccagccagccag 1247 > > > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 0/19 (0%) > Strand = Plus / Plus > > > Query: 46274 ggctggctggctggctggc 46292 > > Sbjct: 1247 ctggctggctggctggctg 1265 > > > ************************************ > Does anyone know what is happening here? > -- > Samir > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters