[BiO BB] Papers/techniques on early culling of Smith-Waterman operation?

Thu Jun 15 19:32:16 EDT 2006

This sounds quite similar to a banded Smith-Waterman, which cuts off the
corners of the matrix using the assumption that the most significant score
will likely be close to the main diagonal. A banded alignment is used in
things like crossmatch, SSAHA, and some other programs.

Marty

On 6/15/06, Theodore H. Smith <delete at elfdata.com> wrote:
>
>
> Hi people,
>
> Do you know of any technique to end a Smith-Waterman operation early,
> by saying that if the operation cannot find a match of a certain
> desired minimum strength, then we won't do anymore? We'll end the
> operation early? We would at every row, check the "Best score we can
> possibly get based upon the current best score". If the "Best score
> we could possibly get" is below our desired score, we end the SW
> operation.
>
> Or do you know of any papers on this subject? Or even can you think
> of good keywords for me to search with? Perhaps the way I'm
> describing it isn't the best way.
>
> So, let's say you have a standard SW operation. You have a square
> Matrix, and you run from left to right across the matrix, line by
> line down the matrix. Let's say, the matrix has 90 rows, as we are
> trying to match a protein sequence of 90 letters.
>
> Now, is there anyway, some way, by logical analysis or guesswork, to
> tell perhaps by the time we get to row 70, we know that we'll not be
> able to get any better score matches than the one we got, and thus
> cull the SW operation? So that we don't search through rows 71 to 90?
> The numbers 70 or 90 don't matter, as long as we can check at every
> row, whether the "best match score" gets better or not.
>
> I have something like this for a global alignment producer, and it
> slices the time down to a small fraction!! Even though it only kills
> the last third or so, of the alignment, it still cuts down the time
> required down to less than 66%, perhaps even down to 20%.
>
> Basically, I've spent a few months, as a hobby, putting my inventive
> capacity into making a SmithWaterman-based, BLAST-like searcher. I do
> things like this because I can, although I think this is probably the
> most complex thing I'll ever work on in my entire life.
>
> My BLAST-like searcher, stores all of it's data in a tree (a "trie"
> actually). Protein sequences that are shared across many proteins,
> are stored in the root or earlier branches of the tree. So any work
> done on the early parts of the tree is reused across many many
> proteins. This is good, it saves CPU-time. Most of the work then, is
> done processing the leaves of the tree.
>
> This is why the last few rows of the SW Matrix are so important,
> because most of the work is done there to process them! This is why I
> want to shave the last few rows off, when possible. But of course,
> the idea is to process the entire leaf, if the "best score match" is
> going to get better. For that reason some kind of quick hueristic
> predictive trick would be desired, so that we don't cull good leaves.
>
> Any ideas anyone? This would mean so much to me if I can find a
> paper, or if anyone even knows what I am talking about and can give
> me some help.
>
> If anyone doesn't know what I am talking about but feels like helping
> should you understand me, please let me know that I need to explain
> this in a different better way?
>
> I've spent many months on this algorithm. I have a nearly identical
> algorithm that already works to do global alignmnents, it just
> doesn't do local alignments, and the global aligner is pretty quick
> actually. It's just the CPU-saving tricks I've done with global
> alignment, don't seem to cross over to local alignment, and I'm
> scratching my brain wondering if these past few months were a waste
> or not.
>
> Because of this "loss of tricks" (that worked with global alignment)
> situation, I am asking for new tricks, here!
>
> Thankfully, I was smart enough to do this work part-time, as a
> hobby :) I also have a proper normal job that doesn't rely on the
> success of proposed inventions to make me money. I decided it wasn't
> a good idea to starve myself just for an idea that might never
> produce anything worthwhile.
>
> Help anyone! Thanks a lot :)
>
> --
> http://elfdata.com/plugin/
>
>
>
> _______________________________________________
> Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>

-- 
-- 
Martin Gollery
Associate Director
Center For Bioinformatics
University of Nevada at Reno
Dept. of Biochemistry / MS330
775-784-7042
-----------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20060615/63da4ecc/attachment.html>