[BiO BB] Inconsistent Blast Results

Tue Feb 12 10:50:46 EST 2008

Rebekah,

The reason for this is the way Blast calculates e-values.  The e-value is a function of the score.  The higher the score, the lower the e-value.  The score gets lower as the alignment gets worse and also depends on the length of the query sequence.  So, for a lower e-value to be obtained, say 10e-10, the alignment for the HSP must be better than the alignment for the HSP that generates an e-value of 10e-7.  If the alignment can be worse, chances are that more of the query sequence will show up in the HSPs, thus creating different output.  Also, the e-value is a function of the length of the sequence and the size of the database.  So a shorter query sequence that is 10% diverged from the hit will have a higher e-value than a query sequence that is 5 times longer than the short sequence with the same divergence.

I hope this helps. Unfortunately, comparing different e-values in Blast can be a little like comparing apples to oranges.  I have found that this can be circumvented by using a sliding e-value.  You can use this to make sure all query sequences, regardless of length, match a certain criteria, such as at least 50% similarity over the entire length of the query sequence.  It gets a little more complicated, but at least it is comparing apples to apples.

Thanks,
John Pace
PhD Candidate
University of Texas at Arlington

> Date: Fri, 8 Feb 2008 20:56:41 -0500> From: rebekah.rogers at gmail.com> To: bbb at bioinformatics.org> Subject: [BiO BB] Inconsistent Blast Results> > Hi:> > I'm currently running blast 2.2.14 locally on my mac. I've noticed> that the printout from a blastn run at an E cutoff of 10^-10 reads> differently than a blast run at an E cutoff of 10^-7 when hits worse> than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of> 10^-11 appear that weren't there before and even the relative strength> of different hits can change.> > I'm not certain I understand why this is true and it has a huge impact> on my results. I know that the Eval is dependent on certain constants> taken from the compared sequences, but I don't understand how this> could possibly change when I'm using the exact same input file and> database.> > Does anyone have an explanation?> > -Rebekah> > _______________________________________________> BBB mailing list> BBB at bioinformatics.org> http://www.bioinformatics.org/mailman/listinfo/bbb
_________________________________________________________________
Climb to the top of the charts! Play the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_jan