[Bioclusters] sensitivity & blast

Chris Dwan cdwan at bioteam.net
Wed Apr 6 16:58:36 EDT 2005


BLAST is not a black box, and its function need not be determined by 
experiment:

- An excellent reference on the algorithm:  
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
- The source code:  ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/ncbi.tar.Z
- O'Reilly published an entire book on BLAST, whose author is active on 
this list.

Yes, the search space defaults to the product of the query length (m) 
and the target set length (n).  The -Y option overrides that search 
space.

Alignment Score depends only on the alignments and the substitution 
matrix.
Bit score normalizes for values specific to the substitution matrix.
Expect value normalizes out query and target set size.

Keep in mind as well:  BLAST is an heuristic algorithm with no 
knowledge of any structure beyond primary sequence.  If increased 
sensitivity is the goal, you will get much greater milage by using an 
algorithm which takes structure into account, or one which utilizes 
more than pairwise alignments.

However, taken very literally, your answer is correct.  If the goal is 
to remove query length as a factor in E value, the "-Y" option is the 
way to go.

-Chris Dwan
  The BioTeam

On Apr 6, 2005, at 4:39 PM, Pamela Culpepper wrote:

> orks as follows.
> In the absense of -Y, the "effective search space" is the product of 
> the query sequence length
> and the total database length.  It affects the calculation of the 
> expection value but not the score.
> It will thus vary with the query sequence length.
> Using "-Y 12345" sets the above "effective search space" to 12345, 
> constant for each query
> sequence.   To make the 



More information about the Bioclusters mailing list