[Bioclusters] formatdb -o T/blast problem
Samir Pandurangi
bioclusters@bioinformatics.org
Thu, 11 Dec 2003 15:34:54 -0800
I'm blasting against databases I've created myself with formatdb. Each of
one the deflines contains a one word unique identifier.
When I run formatdb with the -o T option, I get the following errors from
blastn:
[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45:
SeqPortNew: lcl|NM_031858 start(2490) >= len(2209)
[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45:
SeqPortNew: lcl|NM_031858 start(2490) >= len(2209)
[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45:
SeqPortNew: lcl|NM_005899 start(2490) >= len(1833)
[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45:
SeqPortNew: lcl|NM_005899 start(2490) >= len(1833)
However, when I remove the -o T option, these error disappear, but I run
into problems with duplicate target hits (where the HSPs are split under
multiple hits with the same identifier). An example:
Blastn run without -o T, exhibiting duplicate target problem:
*******************************************
>NM_033178
Length = 2560
Score = 38.2 bits (19), Expect = 8.9
Identities = 19/19 (100%)
Strand = Plus / Minus
Query: 14026 gccagccagccagccagcc 14044
|||||||||||||||||||
Sbjct: 1265 gccagccagccagccagcc 1247
Score = 38.2 bits (19), Expect = 8.9
Identities = 19/19 (100%)
Strand = Plus / Plus
Query: 46274 ggctggctggctggctggc 46292
|||||||||||||||||||
Sbjct: 1247 ggctggctggctggctggc 1265
>NM_033178
Length = 2560
Score = 38.2 bits (19), Expect = 8.9
Identities = 19/19 (100%)
Strand = Plus / Minus
Query: 14026 gccagccagccagccagcc 14044
|||||||||||||||||||
Sbjct: 1265 gccagccagccagccagcc 1247
Score = 38.2 bits (19), Expect = 8.9
Identities = 19/19 (100%)
Strand = Plus / Plus
Query: 46274 ggctggctggctggctggc 46292
|||||||||||||||||||
Sbjct: 1247 ggctggctggctggctggc 1265
**************************************
Blastn run with same database except with -o T formatdb option (no
duplicates seqIds):
****************************************
>NM_033178
Length = 2558
Score = 38.2 bits (19), Expect = 8.9
Identities = 0/19 (0%)
Strand = Plus / Minus
Query: 14026 gccagccagccagccagcc 14044
Sbjct: 1265 cagccagccagccagccag 1247
Score = 38.2 bits (19), Expect = 8.9
Identities = 0/19 (0%)
Strand = Plus / Plus
Query: 46274 ggctggctggctggctggc 46292
Sbjct: 1247 ctggctggctggctggctg 1265
Score = 38.2 bits (19), Expect = 8.9
Identities = 0/19 (0%)
Strand = Plus / Minus
Query: 14026 gccagccagccagccagcc 14044
Sbjct: 1265 cagccagccagccagccag 1247
Score = 38.2 bits (19), Expect = 8.9
Identities = 0/19 (0%)
Strand = Plus / Plus
Query: 46274 ggctggctggctggctggc 46292
Sbjct: 1247 ctggctggctggctggctg 1265
************************************
Does anyone know what is happening here?
--
Samir