[Bioclusters] sensitivity & blast

Chris Dwan cdwan at bioteam.net
Wed Apr 6 14:24:54 EDT 2005


> Could you suggest whether we are on the right track?  What is the right
> approach to set a uniform sensitivity for all inputs?

E-values already incorporate statistics to eliminate (normalize for) a 
number of factors, including query size.  Getting rid of that 
normalization is possible, but not necessarily a good idea unless you 
know exactly what you're doing.

E values for identical HSPs grow with the product of the sizes of the 
query and the target set.  The rationale is that the same hit will be 
more and more likely to occur by random chance in a larger sample of 
sequence.  Said HSPs will be less and less statistically interesting as 
the query and the target set grow.

This leads to your observation that you must increase the E-value 
threshold to keep getting the same hits.

The question you seem to be asking is "find me all of the HSPs that fit 
some criterion, regardless of their statistical significance."  The 
question that BLAST is designed to answer is "find me most of the 
statistically significant HSPs for some particular search, and extend 
them to build up gapped local alignments."

If you're willing to share your goal in running these searches, the 
list might be able to suggest alternative tools better suited to your 
problem.

-Chris Dwan
  The BioTeam



More information about the Bioclusters mailing list