Chris and Pam, Thanks for your insights in the emails. About what we are trying to do: we are trying to select 70mer DNA oligos for microarrays. We try to select the "best" oligo set which (1) minimizes cross-hybridization with non-self seq in genome while (2) maximizing target binding. The troubling point which led to my earlier question is: (1) from results based on feeding query sequences of varying length to blastall, we select 70mers based on the 2 goals above (2) when we feed the 70mers into blastall again, we get different HSP's when the e-value is fixed at the default 10. >From your feedbacks, to remove the dependence on the input size, setting the "-Y" value seems to be a sensible approach. Won't this restriction of search space reduce the prob of finding the best HSPs? Also: because we know the expect E value depends on (kmn)(exp(-Ls)), why not find a base E for a given query length, and then vary the (-e) value by mE ? Chris, you mentioned that there are other tools we should look at. Please advice on this. Lik Quoting Chris Dwan <cdwan at bioteam.net>: > > Could you suggest whether we are on the right track? What is the right > > approach to set a uniform sensitivity for all inputs? > > E-values already incorporate statistics to eliminate (normalize for) a > number of factors, including query size. Getting rid of that > normalization is possible, but not necessarily a good idea unless you > know exactly what you're doing. > > E values for identical HSPs grow with the product of the sizes of the > query and the target set. The rationale is that the same hit will be > more and more likely to occur by random chance in a larger sample of > sequence. Said HSPs will be less and less statistically interesting as > the query and the target set grow. > > This leads to your observation that you must increase the E-value > threshold to keep getting the same hits. > > The question you seem to be asking is "find me all of the HSPs that fit > some criterion, regardless of their statistical significance." The > question that BLAST is designed to answer is "find me most of the > statistically significant HSPs for some particular search, and extend > them to build up gapped local alignments." > > If you're willing to share your goal in running these searches, the > list might be able to suggest alternative tools better suited to your > problem. > > -Chris Dwan > The BioTeam > >