[BiO BB] Understanding Smith-Waterman scoring

pmr at ebi.ac.uk pmr at ebi.ac.uk
Fri Feb 10 12:32:32 EST 2006


>> Theodore H. Smith wrote:
> OK. I understand. The most popular tools in use today, only find the
> best (or at least one) locally aligned section, but not all of them.
>
> Is this a problem in general? Or is it that multiple sections to be
> aligned, are quite rare in the kind of queries that biologists do today?

The algorithm guarantees one alignment, and it is always the "best"
(highest scoring) ... although in your AAAABBBB case there arer two
possible answers with the same score. Changing the comparison matrix
(scores for A:A, B:B and A:B) and changing the penalties for adding gaps
will of course change the scoring and may give another "best" alignment.

There is also the closely related Needleman-Wunsch global alignment
algorithm. This guarantees one best alignment over the whole of both
sequences. In global alignment there are usually options to penalise gaps
at the end of the sequence (usually not penalised as both sequences arer
assumed to be incomplete). In local alignments (Smith-Waterman) the
alignment is what you get ... there are no penalties for anything outside
the aligned regions (except that edxtending the alignment will always give
a worse score).

>> There is also a Smith-Waterman-Eggert variation of the algorithm
>> that finds a scond, third, fourth ... alignment that excludes all
>> those already reported.
>
> Am I right in seeing that this isn't talked about as much as Smith-
> Waterman though? It sounds promising for the line of work I am doing
> however, thanks very much for telling me of Smith-Waterman-Eggert, it
> looks like a good lead.

SMith-Waterman is standard. There are utilities that give the alternative
but often users fail to spot the possibility. The first alignment is
always the same for both.

> You might not be surprised to find out that I come from a software
> developer background. I won't make that mistake again.

Ah, in that case be careful what you describe as a "sequence" ...
mathematicians can have different ideas of what the word means :-)

>> You will get at least 1 residue matching. Maybe more as some of the
>> mismatches will have a positive score.

regards,

Peter Rice




More information about the BBB mailing list