[Bioclusters] Re: Top hits, not so top

Andrew.Mather at dpi.vic.gov.au Andrew.Mather at dpi.vic.gov.au
Thu Jan 5 02:16:20 EST 2006


Hi All,

One of my users has encountered some odd behaviour when trying to blast a 
100MB query sequence against a human genomic sequence database.

His message is below, but basically, he's finding he gets different 
results depending on how many alignments andone-line descriptors he asks 
to see.  The input sequence, database, e-values etc remain constant, it's 
only the -v and -b options that change.

We're using Blastall v2.2.11 (newer one is in testing) on Intel machines 
running RHEL3.

Can anyone point me in an appropriate direction for things to look at 
please ?

Thanks,
Andrew

Bioinformatics Advanced Scientific Computing, 
Animal Genetics and Genomics, PIRVic Attwood
475 Mickleham Road, Attwood, 3049
ph +61 3 92174342
mob  0413 009 761


----------------
There are 10 kinds of people...those who understand binary and those who 
don't.

----- Forwarded by Andrew Mather/NRE on 05/01/06 06:08 PM -----


The following problem was discovered with BLAST jobs where you ask for 
different numbers of hits to be presented. 

A 100 MB segement of bovine Chromosome fasta was 
BLASTED onto Human with different -v and -b options to print the "Top" 
hits. 

The following were all run on -intel ques 

Consider this where we ask for the top hit only 

/usr/local/bin/blastall -p blastn -i test.out -d "/blastdb/human_genomic" 
-e 1e-25 -v 1 -b 1 -a4 -o one.b 
lastn 

This is what I get 
                                                                           
                                           Score    E 
                            Sequences producing significant alignments:   
                   (bits) Value 

ref|NC_000001.8|NC_000001 Homo sapiens chromosome 1, complete se...   224 
  8e-54 

Now if I ask for 5 top hits ..... 


/usr/local/bin/blastall -p blastn -i test.out -d "/blastdb/human_genomic" 
-e 1e-25 -v 5 -b 5 -a4 -o five.blastn 

This is what I get 

                                                                          
                                              Score    E 
                             Sequences producing significant alignments:   
                   (bits) Value 

ref|NC_000004.9|NC_000004 Homo sapiens chromosome 4, complete se...   281 
  4e-71 
ref|NC_000001.8|NC_000001 Homo sapiens chromosome 1, complete se...   224 
  8e-54 
ref|NC_000013.9|NC_000013 Homo sapiens chromosome 13, complete s...   222 
  3e-53 
ref|NC_000005.8|NC_000005 Homo sapiens chromosome 5, complete se...   214 
  8e-51 
ref|NC_000009.9|NC_000009 Homo sapiens chromosome 9, complete se...   206 
  2e-48 

Note, there is now a hit on Chromosome 4 that is above the top hit on 
Chromosome 1 that was seen previously.  The hit on chromosome 1 is the 
same as was seen previously though. 

It gets worse ...  what if we ask for the top 7 hits? 

/usr/local/bin/blastall -p blastn -i test.out -d 
"/blastdb/human_genomic" -e 1e-25 -v 7 -b 7 -a4 -o seven.blastn 

This is what we get 

                                                                 Score   
 E 
Sequences producing significant alignments:                      (bits) 
Value 

ref|NC_000004.9|NC_000004 Homo sapiens chromosome 4, complete se...   281 
  4e-71 
ref|NC_000011.8|NC_000011 Homo sapiens chromosome 11, complete s...   230 
  1e-55 
ref|NC_000001.8|NC_000001 Homo sapiens chromosome 1, complete se...   224 
  8e-54 
ref|NC_000013.9|NC_000013 Homo sapiens chromosome 13, complete s...   222 
  3e-53 
ref|NC_000005.8|NC_000005 Homo sapiens chromosome 5, complete se...   214 
  8e-51 
ref|NC_000009.9|NC_000009 Homo sapiens chromosome 9, complete se...   206 
  2e-48 
ref|NC_000023.8|NC_000023 Homo sapiens chromosome X, complete se...   198 
  5e-46 


Now we have a hit on Chromosome 11 that pops up between the top hit on 
Chromosome 4 and our original hit on Chromosome 1. 

Does it get any worse? I tried the top 10 hits ... 

/usr/local/bin/blastall -p blastn -i test.out -d "/blastdb/human_genomic" 
-e 1e-25 -v 10 -b 10 -a4 -o ten 
.blastn 

Here are the results 

                                                                 Score   
 E 
Sequences producing significant alignments:                      (bits) 
Value 

ref|NC_000004.9|NC_000004 Homo sapiens chromosome 4, complete se...   281 
  4e-71 
ref|NC_000011.8|NC_000011 Homo sapiens chromosome 11, complete s...   230 
  1e-55 
ref|NC_000001.8|NC_000001 Homo sapiens chromosome 1, complete se...   224 
  8e-54 
ref|NC_000013.9|NC_000013 Homo sapiens chromosome 13, complete s...   222 
  3e-53 
ref|NC_000005.8|NC_000005 Homo sapiens chromosome 5, complete se...   214 
  8e-51 
ref|NC_000009.9|NC_000009 Homo sapiens chromosome 9, complete se...   206 
  2e-48 
ref|NC_000008.9|NC_000008 Homo sapiens chromosome 8, complete se...   202 
  3e-47 
ref|NC_000023.8|NC_000023 Homo sapiens chromosome X, complete se...   198 
  5e-46 
ref|NC_000002.9|NC_000002 Homo sapiens chromosome 2, complete se...   198 
  5e-46 
ref|NC_000014.7|NC_000014 Homo sapiens chromosome 14, complete s...   194 
  7e-45 


No more hits jumping in near the top... but we do have a hit that pops 
in between 9 and X (8). 

So what is going on?  The same BLAST job on the same machine ... just 
changing the -v and -b option to pick the number of hits to display. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://bioinformatics.org/pipermail/bioclusters/attachments/20060105/4e67163a/attachment.html


More information about the Bioclusters mailing list