[BiO BB] Understanding Smith-Waterman scoring
Peter Rice
pmr at ebi.ac.uk
Fri Feb 10 09:22:39 EST 2006
Theodore H. Smith wrote:
> How does it score alignments that come in sections? Does it give a
> penalty if a sequence must be split up?
You get one alignment.
If more than one "section" aligns ... with the parts in the same order in both
proteins ... you can have a misaligned region and/or gaps in the sequences.
There are penalty scores for the misalignments and the gaps.
There is also a Smith-Waterman-Eggert variation of the algorithm that finds a
scond, third, fourth ... alignment that excludes all those already reported.
Smith-Waterman is a local alignment method, so any unaligned parts of either
sequence do not count in the score.
> What would matching BBBBAAAA to AAAABBBB give?
AAAA matching AAAA or BBBB matching BBBB (unless A has a positive score to
match B, then other results are possible)
> I'd expect it to generate two "sections", like this:
No, but you will get the second section from the Smith-Waterman-Eggert
algorithm. Each will have its own local alignment score.
> But what should the overall score be? Is it still 8? Or should we give
> a penalty because we've had to split this up? Is it normal for
> alignment tools to give penalties to segmented sequences. Also is there
> some kind of "minimum length" that a Smith-Waterman based aligner would
> allow? Would it say that you can't have sections below a certain
> length? Are there any tools which let you specify such a minimum
> section length?
> If you don't like that example above of AAAABBBB (as it can be
> reversed), then try this example. Assume all the proteins get a score
> of 1 against themselves. The protein: ABCDEFGH, if I did a Smith-
> Waterman score comparison against DCHABGEF, would the score still be 8.
> After all, all the proteins are there, just in a different order.
>
> I would expect this to get a score of zero or below.
Be careful not to confuse protein (the whole sequence) with amino acid or
residue (one character).
You will get at least 1 residue matching. Maybe more as some of the mismatches
will have a positive score.
Hope that helps. It is cmoplicated :-)
Peter
More information about the BBB
mailing list