[BiO BB] Papers/techniques on early culling of Smith-Waterman operation?

Fri Jun 16 09:57:08 EDT 2006

Hi Martin,

Yes, this is exactly the kind of trick that I am looking for.

Interestingly enough, I did this "banding" already, with my global  
alignment algorithm, because I know for certain that I cannot miss  
any matches, due to the fact that a global alignment must align the  
entire string.

I actually was unaware of the term "banding". Instead I made up my  
own name :), I called it a "Diamond cut", because we'd "cut" away  
part of the matrix, to leave a diamond-like shape to process. It  
looked like this:

.........
xxxxx....
.xxxxx...
..xxxxx..
...xxxxx.
....xxxxx

Well, slightly diamond-like.

I believed that I could not do this banding with Smith-Waterman,  
because I could miss some good matches.

Now it turns out that some people are still doing this banding with  
Smith-Waterman? I will look into crossmatch, and SSAHA right away :)

Thanks very much for the tip!

On 16 Jun 2006, at 00:32, Martin Gollery wrote:

> This sounds quite similar to a banded Smith-Waterman, which cuts  
> off the corners of the matrix using the assumption that the most  
> significant score will likely be close to the main diagonal. A  
> banded alignment is used in things like crossmatch, SSAHA, and some  
> other programs.
>
> Marty
>
> On 6/15/06, Theodore H. Smith <delete at elfdata.com> wrote:
> Hi people,
>
> Do you know of any technique to end a Smith-Waterman operation early,
> by saying that if the operation cannot find a match of a certain
> desired minimum strength, then we won't do anymore? We'll end the
> operation early? We would at every row, check the "Best score we can
> possibly get based upon the current best score". If the "Best score
> we could possibly get" is below our desired score, we end the SW
> operation.
>
> Or do you know of any papers on this subject? Or even can you think
> of good keywords for me to search with? Perhaps the way I'm
> describing it isn't the best way.
>
> So, let's say you have a standard SW operation. You have a square
> Matrix, and you run from left to right across the matrix, line by
> line down the matrix. Let's say, the matrix has 90 rows, as we are
> trying to match a protein sequence of 90 letters.
>
> Now, is there anyway, some way, by logical analysis or guesswork, to
> tell perhaps by the time we get to row 70, we know that we'll not be
> able to get any better score matches than the one we got, and thus
> cull the SW operation? So that we don't search through rows 71 to 90?
> The numbers 70 or 90 don't matter, as long as we can check at every
> row, whether the "best match score" gets better or not.
>
> I have something like this for a global alignment producer, and it
> slices the time down to a small fraction!! Even though it only kills
> the last third or so, of the alignment, it still cuts down the time
> required down to less than 66%, perhaps even down to 20%.
>
> Basically, I've spent a few months, as a hobby, putting my inventive
> capacity into making a SmithWaterman-based, BLAST-like searcher. I do
> things like this because I can, although I think this is probably the
> most complex thing I'll ever work on in my entire life.
>
> My BLAST-like searcher, stores all of it's data in a tree (a "trie"
> actually). Protein sequences that are shared across many proteins,
> are stored in the root or earlier branches of the tree. So any work
> done on the early parts of the tree is reused across many many
> proteins. This is good, it saves CPU-time. Most of the work then, is
> done processing the leaves of the tree.
>
> This is why the last few rows of the SW Matrix are so important,
> because most of the work is done there to process them! This is why I
> want to shave the last few rows off, when possible. But of course,
> the idea is to process the entire leaf, if the "best score match" is
> going to get better. For that reason some kind of quick hueristic
> predictive trick would be desired, so that we don't cull good leaves.
>
> Any ideas anyone? This would mean so much to me if I can find a
> paper, or if anyone even knows what I am talking about and can give
> me some help.
>
> If anyone doesn't know what I am talking about but feels like helping
> should you understand me, please let me know that I need to explain
> this in a different better way?
>
> I've spent many months on this algorithm. I have a nearly identical
> algorithm that already works to do global alignmnents, it just
> doesn't do local alignments, and the global aligner is pretty quick
> actually. It's just the CPU-saving tricks I've done with global
> alignment, don't seem to cross over to local alignment, and I'm
> scratching my brain wondering if these past few months were a waste
> or not.
>
> Because of this "loss of tricks" (that worked with global alignment)
> situation, I am asking for new tricks, here!
>
> Thankfully, I was smart enough to do this work part-time, as a
> hobby :) I also have a proper normal job that doesn't rely on the
> success of proposed inventions to make me money. I decided it wasn't
> a good idea to starve myself just for an idea that might never
> produce anything worthwhile.
>
> Help anyone! Thanks a lot :)
>
> --
> http://elfdata.com/plugin/
>
>
>
> _______________________________________________
> Bioinformatics.Org general forum  -   
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>
>
> -- 
> -- 
> Martin Gollery
> Associate Director
> Center For Bioinformatics
> University of Nevada at Reno
> Dept. of Biochemistry / MS330
> 775-784-7042
> -----------
> _______________________________________________
> Bioinformatics.Org general forum  -   
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

--
http://elfdata.com/plugin/