[BiO BB] Understanding Smith-Waterman scoring

Theodore H. Smith delete at elfdata.com
Fri Feb 10 09:13:29 EST 2006

Hi people,

I'm trying to learn about Smith-Waterman. There is one thing I  
haven't seen answered in explanations of the Smith-Waterman algorithm.

How does it score alignments that come in sections? Does it give a  
penalty if a sequence must be split up?

For example, let's say I had the protein AAAABBBB, and I wanted to  
see how this scored against the protein BBBBAAAA. Let's ignore the  
fact that it can be reversed, for the moment, just so I can  
understand how should Smith-Waterman work.

Now, what would the match score be? Let's assume that A to A has a  
score of 1 and B to B also has a score of 1. Its a really simple  
example. So matching AAAABBBB to itself, would give a SW score of 8.

What would matching BBBBAAAA to AAAABBBB give?

I'd expect it to generate two "sections", like this:



But what should the overall score be? Is it still 8? Or should we  
give a penalty because we've had to split this up? Is it normal for  
alignment tools to give penalties to segmented sequences. Also is  
there some kind of "minimum length" that a Smith-Waterman based  
aligner would allow? Would it say that you can't have sections below  
a certain length? Are there any tools which let you specify such a  
minimum section length?

If you don't like that example above of AAAABBBB (as it can be  
reversed), then try this example. Assume all the proteins get a score  
of 1 against themselves. The protein: ABCDEFGH, if I did a Smith- 
Waterman score comparison against DCHABGEF, would the score still be  
8. After all, all the proteins are there, just in a different order.

I would expect this to get a score of zero or below.

It's a really basic question, sorry about that!

More information about the BBB mailing list