[Bioclusters] sensitivity & blast
L. Mui
lmui at stanford.edu
Thu Apr 7 03:35:33 EDT 2005
Chris and Pam,
Thanks for your insights in the emails.
About what we are trying to do: we are trying to select 70mer DNA oligos for
microarrays. We try to select the "best" oligo set which (1) minimizes
cross-hybridization with non-self seq in genome while (2) maximizing target
binding.
The troubling point which led to my earlier question is:
(1) from results based on feeding query sequences of varying length to
blastall, we select 70mers based on the 2 goals above
(2) when we feed the 70mers into blastall again, we get different HSP's when
the e-value is fixed at the default 10.
>From your feedbacks, to remove the dependence on the input size, setting the
"-Y" value seems to be a sensible approach. Won't this restriction of
search space reduce the prob of finding the best HSPs?
Also: because we know the expect E value depends on (kmn)(exp(-Ls)), why not
find a base E for a given query length, and then vary the (-e) value by mE ?
Chris, you mentioned that there are other tools we should look at. Please
advice on this.
Lik
Quoting Chris Dwan <cdwan at bioteam.net>:
> > Could you suggest whether we are on the right track? What is the right
> > approach to set a uniform sensitivity for all inputs?
>
> E-values already incorporate statistics to eliminate (normalize for) a
> number of factors, including query size. Getting rid of that
> normalization is possible, but not necessarily a good idea unless you
> know exactly what you're doing.
>
> E values for identical HSPs grow with the product of the sizes of the
> query and the target set. The rationale is that the same hit will be
> more and more likely to occur by random chance in a larger sample of
> sequence. Said HSPs will be less and less statistically interesting as
> the query and the target set grow.
>
> This leads to your observation that you must increase the E-value
> threshold to keep getting the same hits.
>
> The question you seem to be asking is "find me all of the HSPs that fit
> some criterion, regardless of their statistical significance." The
> question that BLAST is designed to answer is "find me most of the
> statistically significant HSPs for some particular search, and extend
> them to build up gapped local alignments."
>
> If you're willing to share your goal in running these searches, the
> list might be able to suggest alternative tools better suited to your
> problem.
>
> -Chris Dwan
> The BioTeam
>
>
More information about the Bioclusters
mailing list