[Bioclusters] Blast Source

Tue, 15 Apr 2003 18:47:32 -0500 (CDT)

Depending on the exact goals of your analysis, BLAST can be a poor choice
for finding matches with very low sequence identity.  It was designed as,
and remains, an excellent, fast approximation to exhaustive pairwise
search (a la Smith & Waterman).

If you managed to turn the word size all the way down to 1 (or 2), you
would have complete sensitivity.  Effectively, this would disable the
heuristic by which BLAST achieves its speedup.  Another parameter to play
with is "neighborhood word size."  This parameter defines the distance (in
alignment score) of neighbors which BLAST will also allow as "perfect
matches" in the hit generation phase.

Really though, if you're searching for interestingly distant pairs,
another methodology might be in order.  If the sequences in question share
only 25% identity, it's unlikely that you're going to find them above the
noise, even if you manage to turn the "sensitivity" knob on BLAST all the
way up.

Statistical methods like motifs, PSSMs, PSI-BLAST, and HMMER have all been
used to greatly increase the sensitivity of such searches over pairwise
techniques.  Beyond these are structurally based methods, which are
popping up all over the place these days, as we finally have enough good
structure data to construct meaningful patterns.

Good luck.

-Chris Dwan 
 Center for Computational Genomics and Bioinformatics
 University of Minnesota

> A colleague of mine is trying to use blast to determine very loose matches.
> He would like to change the minimum seed length, which is hard-coded into
> Blast, from 7 to 5.  Does anyone know a backdoor way, ie undocumented
> parameter, which could do this?  Or, where we might find the blast source
> code so we can make this change manually?