[Bioclusters] formatdb -o T/blast problem
Joe Landman
bioclusters@bioinformatics.org
Thu, 11 Dec 2003 19:54:31 -0500
Hi Samir:
Old memory (likely incorrect), but I seem to recall that blast uses a
fixed number of characters from the identifier for generating the db
hash index. If you subset your database down to say 4 sequences, do you
see the same error? If so, can you change the identifiers to something
short and unique, and see if you get the same error?
Joe
Samir Pandurangi wrote:
>I'm blasting against databases I've created myself with formatdb. Each of
>one the deflines contains a one word unique identifier.
>When I run formatdb with the -o T option, I get the following errors from
>blastn:
>
>[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45:
>SeqPortNew: lcl|NM_031858 start(2490) >= len(2209)
>[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45:
>SeqPortNew: lcl|NM_031858 start(2490) >= len(2209)
>[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45:
>SeqPortNew: lcl|NM_005899 start(2490) >= len(1833)
>[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45:
>SeqPortNew: lcl|NM_005899 start(2490) >= len(1833)
>
>However, when I remove the -o T option, these error disappear, but I run
>into problems with duplicate target hits (where the HSPs are split under
>multiple hits with the same identifier). An example:
>
>Blastn run without -o T, exhibiting duplicate target problem:
>*******************************************
>
>
>>NM_033178
>>
>>
> Length = 2560
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 19/19 (100%)
> Strand = Plus / Minus
>
>
>Query: 14026 gccagccagccagccagcc 14044
> |||||||||||||||||||
>Sbjct: 1265 gccagccagccagccagcc 1247
>
>
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 19/19 (100%)
> Strand = Plus / Plus
>
>
>Query: 46274 ggctggctggctggctggc 46292
> |||||||||||||||||||
>Sbjct: 1247 ggctggctggctggctggc 1265
>
>
>
>
>>NM_033178
>>
>>
> Length = 2560
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 19/19 (100%)
> Strand = Plus / Minus
>
>
>Query: 14026 gccagccagccagccagcc 14044
> |||||||||||||||||||
>Sbjct: 1265 gccagccagccagccagcc 1247
>
>
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 19/19 (100%)
> Strand = Plus / Plus
>
>
>Query: 46274 ggctggctggctggctggc 46292
> |||||||||||||||||||
>Sbjct: 1247 ggctggctggctggctggc 1265
>**************************************
>
>Blastn run with same database except with -o T formatdb option (no
>duplicates seqIds):
>****************************************
>
>
>
>>NM_033178
>>
>>
> Length = 2558
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 0/19 (0%)
> Strand = Plus / Minus
>
>
>Query: 14026 gccagccagccagccagcc 14044
>
>Sbjct: 1265 cagccagccagccagccag 1247
>
>
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 0/19 (0%)
> Strand = Plus / Plus
>
>
>Query: 46274 ggctggctggctggctggc 46292
>
>Sbjct: 1247 ctggctggctggctggctg 1265
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 0/19 (0%)
> Strand = Plus / Minus
>
>
>Query: 14026 gccagccagccagccagcc 14044
>
>Sbjct: 1265 cagccagccagccagccag 1247
>
>
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 0/19 (0%)
> Strand = Plus / Plus
>
>
>Query: 46274 ggctggctggctggctggc 46292
>
>Sbjct: 1247 ctggctggctggctggctg 1265
>
>
>************************************
>Does anyone know what is happening here?
>--
>Samir
>
>_______________________________________________
>Bioclusters maillist - Bioclusters@bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>
>
--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615