[Bioclusters] sensitivity & blast

Wed Apr 6 13:47:19 EDT 2005

Hello,

We ran into an issue involving blastall, which I suspect folks in this list
might know the answer to.  (I am fairly new to using blastall).  Blastall
seems to be sensitive to the input sequence size in detecting HSP. In other
words, depending on length of input, it sometimes does not report all HSP
(even with very large -b and -v).

We want to standardize blastall across all input sizes.  I am trying out the
following 2 methods, both of which seem to elicit the "right" results:

(1) modifying the "-e" e-value threshold by the input size
    e.g., if m = input sequence size, run blastall with
          "-e 10m"
    rationale: the E-value is a function of (mn)

(2) fixing the search space (-Y): which seems to fix some statistical
parameters for blastall's calculations
    e.g., "-Y 168000000000" for a human genome target

Could you suggest whether we are on the right track?  What is the right
approach to set a uniform sensitivity for all inputs?

Many thanks for your help in advance.

           Lik Mui