[Bioclusters] sensitivity & blast

Pamela Culpepper pculpep at hotmail.com
Thu Apr 7 11:37:21 EDT 2005


>From: "L. Mui" <lmui at stanford.edu>
>Reply-To: "Clustering,  compute farming & distributed computing in life 
>science informatics" <bioclusters at bioinformatics.org>
>To: Chris Dwan <cdwan at bioteam.net>, pculpep at hotmail.com
>CC: "Clustering,  compute farming & distributed computing in life science 
>informatics" <bioclusters at bioinformatics.org>
>Subject: Re: [Bioclusters] sensitivity & blast
>Date: Thu,  7 Apr 2005 00:35:33 -0700
>Chris and Pam,
>Thanks for your insights in the emails.
>About what we are trying to do: we are trying to select 70mer DNA oligos 
>microarrays.  We try to select the "best" oligo set which (1) minimizes
>cross-hybridization with non-self seq in genome while (2) maximizing target
>The troubling point which led to my earlier question is:
>(1) from results based on feeding query sequences of varying length to
>blastall, we select 70mers based on the 2 goals above
>(2) when we feed the 70mers into blastall again, we get different HSP's 
>the e-value is fixed at the default 10.
> >From your feedbacks, to remove the dependence on the input size, setting 
>"-Y" value seems to be a sensible approach.  Won't this restriction of
>search space reduce the prob of finding the best HSPs?
>Also: because we know the expect E value depends on (kmn)(exp(-Ls)), why 
>find a base E for a given query length, and then vary the (-e) value by mE 
>Chris, you mentioned that there are other tools we should look at.  Please
>advice on this.
>                   Lik
>Quoting Chris Dwan <cdwan at bioteam.net>:
> > > Could you suggest whether we are on the right track?  What is the 
> > > approach to set a uniform sensitivity for all inputs?
> >
> > E-values already incorporate statistics to eliminate (normalize for) a
> > number of factors, including query size.  Getting rid of that
> > normalization is possible, but not necessarily a good idea unless you
> > know exactly what you're doing.
> >
> > E values for identical HSPs grow with the product of the sizes of the
> > query and the target set.  The rationale is that the same hit will be
> > more and more likely to occur by random chance in a larger sample of
> > sequence.  Said HSPs will be less and less statistically interesting as
> > the query and the target set grow.
> >
> > This leads to your observation that you must increase the E-value
> > threshold to keep getting the same hits.
> >
> > The question you seem to be asking is "find me all of the HSPs that fit
> > some criterion, regardless of their statistical significance."  The
> > question that BLAST is designed to answer is "find me most of the
> > statistically significant HSPs for some particular search, and extend
> > them to build up gapped local alignments."
> >
> > If you're willing to share your goal in running these searches, the
> > list might be able to suggest alternative tools better suited to your
> > problem.
> >
> > -Chris Dwan
> >   The BioTeam
> >
> >
>Bioclusters maillist  -  Bioclusters at bioinformatics.org

More information about the Bioclusters mailing list