[Bioclusters] formatdb -o T/blast problem

Susan Chacko bioclusters@bioinformatics.org
Fri, 12 Dec 2003 10:00:52 -0500


I had the same problem last April, and communications with NCBI led me
to the following:

* The databases need to be formatted with the -A F flag, meaning 'Do  
not create
ASN.1 structured deflines'. This is an optional flag, according to the  
formatdb
documentation, and the default is F, but it still needs to be  
explicitly set in the
formatdb command.

SeqPortNew errors, in my experience, are typically also a symptom of  
other problems
in the blast run. If I ran against multiple databases like:
blastall -p blastx -i inputseq -d "db1 db2"
I would see SeqPortNew errors and obviously incorrect alignments. If I  
had
happened to use '-b 0' flag (don't show me any alignments) I might not  
have
noticed the incorrect results at all, and thus never realized that  
there was a
serious problem with the db formatting.

This was Blast 2.2.5, I don't know if the bug has been fixed in Blast  
2.2.6 (the
current version) but I still use the -A F to be safe.

If you want more information I can send you the email thread.

Susan.
------------------------------------------------------------------------ 
----------------------
Susan Chacko
Helix Systems
12B/2N207                                                        Ph:  
301-435-2982
National Institutes of Health                           Fax:  
301-402-2190
Bethesda, MD 20814                                      Email:  
susanc@nih.gov

On Dec 11, 2003, at 6:34 PM, Samir Pandurangi wrote:

> I'm blasting against databases I've created myself with formatdb. Each  
> of
> one the deflines contains a one word unique identifier.
> When I run formatdb with the -o T option, I get the following errors  
> from
> blastn:
>
> [blastallnew] ERROR: ncbiapi [000.000]   
> Jf_2959984_fasta.screen.Contig45:
> SeqPortNew: lcl|NM_031858 start(2490) >= len(2209)
> [blastallnew] ERROR: ncbiapi [000.000]   
> Jf_2959984_fasta.screen.Contig45:
> SeqPortNew: lcl|NM_031858 start(2490) >= len(2209)
> [blastallnew] ERROR: ncbiapi [000.000]   
> Jf_2959984_fasta.screen.Contig45:
> SeqPortNew: lcl|NM_005899 start(2490) >= len(1833)
> [blastallnew] ERROR: ncbiapi [000.000]   
> Jf_2959984_fasta.screen.Contig45:
> SeqPortNew: lcl|NM_005899 start(2490) >= len(1833)
>
> However, when I remove the -o T option, these error disappear, but I  
> run
> into problems with duplicate target hits (where the HSPs are split  
> under
> multiple hits with the same identifier). An example:
>
> Blastn run without -o T, exhibiting duplicate target problem:
> *******************************************
>> NM_033178
>           Length = 2560
>
>  Score = 38.2 bits (19), Expect = 8.9
>  Identities = 19/19 (100%)
>  Strand = Plus / Minus
>
>
> Query: 14026 gccagccagccagccagcc 14044
>              |||||||||||||||||||
> Sbjct: 1265  gccagccagccagccagcc 1247
>
>
>
>  Score = 38.2 bits (19), Expect = 8.9
>  Identities = 19/19 (100%)
>  Strand = Plus / Plus
>
>
> Query: 46274 ggctggctggctggctggc 46292
>              |||||||||||||||||||
> Sbjct: 1247  ggctggctggctggctggc 1265
>
>
>> NM_033178
>           Length = 2560
>
>  Score = 38.2 bits (19), Expect = 8.9
>  Identities = 19/19 (100%)
>  Strand = Plus / Minus
>
>
> Query: 14026 gccagccagccagccagcc 14044
>              |||||||||||||||||||
> Sbjct: 1265  gccagccagccagccagcc 1247
>
>
>
>  Score = 38.2 bits (19), Expect = 8.9
>  Identities = 19/19 (100%)
>  Strand = Plus / Plus
>
>
> Query: 46274 ggctggctggctggctggc 46292
>              |||||||||||||||||||
> Sbjct: 1247  ggctggctggctggctggc 1265
> **************************************
>
> Blastn run with same database except with -o T formatdb option (no
> duplicates seqIds):
> ****************************************
>
>> NM_033178
>           Length = 2558
>
>  Score = 38.2 bits (19), Expect = 8.9
>  Identities = 0/19 (0%)
>  Strand = Plus / Minus
>
>
> Query: 14026 gccagccagccagccagcc 14044
>
> Sbjct: 1265  cagccagccagccagccag 1247
>
>
>
>  Score = 38.2 bits (19), Expect = 8.9
>  Identities = 0/19 (0%)
>  Strand = Plus / Plus
>
>
> Query: 46274 ggctggctggctggctggc 46292
>
> Sbjct: 1247  ctggctggctggctggctg 1265
>
>  Score = 38.2 bits (19), Expect = 8.9
>  Identities = 0/19 (0%)
>  Strand = Plus / Minus
>
>
> Query: 14026 gccagccagccagccagcc 14044
>
> Sbjct: 1265  cagccagccagccagccag 1247
>
>
>
>  Score = 38.2 bits (19), Expect = 8.9
>  Identities = 0/19 (0%)
>  Strand = Plus / Plus
>
>
> Query: 46274 ggctggctggctggctggc 46292
>
> Sbjct: 1247  ctggctggctggctggctg 1265
>
>
> ************************************
> Does anyone know what is happening here?
> --
> Samir
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters