[Bioclusters] sensitivity & blast

Pamela Culpepper pculpep at hotmail.com
Wed Apr 6 17:27:49 EDT 2005


Chris,

You might be interested in what we are working on --

http://www.lifeformulae.com

Pam

>From: Chris Dwan <cdwan at bioteam.net>
>Reply-To: "Clustering,  compute farming & distributed computing in life 
>science informatics" <bioclusters at bioinformatics.org>
>To: "Clustering,  compute farming & distributed computing in life science 
>informatics" <bioclusters at bioinformatics.org>
>Subject: Re: [Bioclusters] sensitivity & blast
>Date: Wed, 6 Apr 2005 16:58:36 -0400
>
>
>BLAST is not a black box, and its function need not be determined by 
>experiment:
>
>- An excellent reference on the algorithm:  
>http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
>- The source code:  ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/ncbi.tar.Z
>- O'Reilly published an entire book on BLAST, whose author is active on 
>this list.
>
>Yes, the search space defaults to the product of the query length (m) and 
>the target set length (n).  The -Y option overrides that search space.
>
>Alignment Score depends only on the alignments and the substitution matrix.
>Bit score normalizes for values specific to the substitution matrix.
>Expect value normalizes out query and target set size.
>
>Keep in mind as well:  BLAST is an heuristic algorithm with no knowledge of 
>any structure beyond primary sequence.  If increased sensitivity is the 
>goal, you will get much greater milage by using an algorithm which takes 
>structure into account, or one which utilizes more than pairwise 
>alignments.
>
>However, taken very literally, your answer is correct.  If the goal is to 
>remove query length as a factor in E value, the "-Y" option is the way to 
>go.
>
>-Chris Dwan
>  The BioTeam
>
>On Apr 6, 2005, at 4:39 PM, Pamela Culpepper wrote:
>
>>orks as follows.
>>In the absense of -Y, the "effective search space" is the product of the 
>>query sequence length
>>and the total database length.  It affects the calculation of the 
>>expection value but not the score.
>>It will thus vary with the query sequence length.
>>Using "-Y 12345" sets the above "effective search space" to 12345, 
>>constant for each query
>>sequence.   To make the
>
>_______________________________________________
>Bioclusters maillist  -  Bioclusters at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters




More information about the Bioclusters mailing list