[Bioclusters] sensitivity & blast

Thu Apr 7 03:35:33 EDT 2005

Chris and Pam,

Thanks for your insights in the emails.

About what we are trying to do: we are trying to select 70mer DNA oligos for
microarrays.  We try to select the "best" oligo set which (1) minimizes
cross-hybridization with non-self seq in genome while (2) maximizing target
binding.

The troubling point which led to my earlier question is:

(1) from results based on feeding query sequences of varying length to
blastall, we select 70mers based on the 2 goals above

(2) when we feed the 70mers into blastall again, we get different HSP's when
the e-value is fixed at the default 10.

>From your feedbacks, to remove the dependence on the input size, setting the
"-Y" value seems to be a sensible approach.  Won't this restriction of
search space reduce the prob of finding the best HSPs?

Also: because we know the expect E value depends on (kmn)(exp(-Ls)), why not
find a base E for a given query length, and then vary the (-e) value by mE ?

Chris, you mentioned that there are other tools we should look at.  Please
advice on this.

                  Lik

Quoting Chris Dwan <cdwan at bioteam.net>:
> > Could you suggest whether we are on the right track?  What is the right
> > approach to set a uniform sensitivity for all inputs?
>
> E-values already incorporate statistics to eliminate (normalize for) a
> number of factors, including query size.  Getting rid of that
> normalization is possible, but not necessarily a good idea unless you
> know exactly what you're doing.
>
> E values for identical HSPs grow with the product of the sizes of the
> query and the target set.  The rationale is that the same hit will be
> more and more likely to occur by random chance in a larger sample of
> sequence.  Said HSPs will be less and less statistically interesting as
> the query and the target set grow.
>
> This leads to your observation that you must increase the E-value
> threshold to keep getting the same hits.
>
> The question you seem to be asking is "find me all of the HSPs that fit
> some criterion, regardless of their statistical significance."  The
> question that BLAST is designed to answer is "find me most of the
> statistically significant HSPs for some particular search, and extend
> them to build up gapped local alignments."
>
> If you're willing to share your goal in running these searches, the
> list might be able to suggest alternative tools better suited to your
> problem.
>
> -Chris Dwan
>   The BioTeam
>
>