[Bioclusters] sensitivity & blast
Pamela Culpepper
pculpep at hotmail.com
Thu Apr 7 11:37:21 EDT 2005
Try BLAT.
>From: "L. Mui" <lmui at stanford.edu>
>Reply-To: "Clustering, compute farming & distributed computing in life
>science informatics" <bioclusters at bioinformatics.org>
>To: Chris Dwan <cdwan at bioteam.net>, pculpep at hotmail.com
>CC: "Clustering, compute farming & distributed computing in life science
>informatics" <bioclusters at bioinformatics.org>
>Subject: Re: [Bioclusters] sensitivity & blast
>Date: Thu, 7 Apr 2005 00:35:33 -0700
>
>Chris and Pam,
>
>Thanks for your insights in the emails.
>
>About what we are trying to do: we are trying to select 70mer DNA oligos
>for
>microarrays. We try to select the "best" oligo set which (1) minimizes
>cross-hybridization with non-self seq in genome while (2) maximizing target
>binding.
>
>The troubling point which led to my earlier question is:
>
>(1) from results based on feeding query sequences of varying length to
>blastall, we select 70mers based on the 2 goals above
>
>(2) when we feed the 70mers into blastall again, we get different HSP's
>when
>the e-value is fixed at the default 10.
>
> >From your feedbacks, to remove the dependence on the input size, setting
>the
>"-Y" value seems to be a sensible approach. Won't this restriction of
>search space reduce the prob of finding the best HSPs?
>
>Also: because we know the expect E value depends on (kmn)(exp(-Ls)), why
>not
>find a base E for a given query length, and then vary the (-e) value by mE
>?
>
>Chris, you mentioned that there are other tools we should look at. Please
>advice on this.
>
> Lik
>
>
>Quoting Chris Dwan <cdwan at bioteam.net>:
> > > Could you suggest whether we are on the right track? What is the
>right
> > > approach to set a uniform sensitivity for all inputs?
> >
> > E-values already incorporate statistics to eliminate (normalize for) a
> > number of factors, including query size. Getting rid of that
> > normalization is possible, but not necessarily a good idea unless you
> > know exactly what you're doing.
> >
> > E values for identical HSPs grow with the product of the sizes of the
> > query and the target set. The rationale is that the same hit will be
> > more and more likely to occur by random chance in a larger sample of
> > sequence. Said HSPs will be less and less statistically interesting as
> > the query and the target set grow.
> >
> > This leads to your observation that you must increase the E-value
> > threshold to keep getting the same hits.
> >
> > The question you seem to be asking is "find me all of the HSPs that fit
> > some criterion, regardless of their statistical significance." The
> > question that BLAST is designed to answer is "find me most of the
> > statistically significant HSPs for some particular search, and extend
> > them to build up gapped local alignments."
> >
> > If you're willing to share your goal in running these searches, the
> > list might be able to suggest alternative tools better suited to your
> > problem.
> >
> > -Chris Dwan
> > The BioTeam
> >
> >
>
>
>_______________________________________________
>Bioclusters maillist - Bioclusters at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
More information about the Bioclusters
mailing list