Multiple sequence alignment
From Bioinformatics.Org Wiki
Two approaches to multiple sequence alignment (MSA) include progressive and iterative MSAs. As the names imply, progressive MSA starts with one sequence and progressively aligns the others, while iterative MSA realigns the sequences during multiple iterations of the process.
Contents |
Progressive
Steps:
- Start with the most similar sequence.
- Align the new sequence to each of the previous sequences.
- Create a distance matrix/function for each sequence pair.
- Create a phylogenetic “guide tree” from the matrices, placing the sequences at the terminal nodes.
- Use the guide tree to determine the next sequence to be added to the alignment.
- Preserve gaps.
- Go back to step 1.
Progressive MSA is one of the fastest approaches, considerably faster than the adaptation of pair-wise alignments to multiple sequences, which can become a very slow process for more than a few sequences.
One major disadvantage, however, is the reliance on a good alignment of the first two sequences. Errors there can propagate throughout the rest of the MSA. An alternative approach is iterative MSA (see below).
Iterative
For iterative MSA, the MSA is re-iterated, starting with the pair-wise re-alignment of sequences within subgroups, and then the re-alignment of the subgroups. The choice of subgroups can be made via sequence relations on the guide tree, random selection, and so on.
At heart, iterative MSA is an optimization method and may use machine learning approaches such as genetic algorithms and Hidden Markov Models. The disadvantages of iterative MSA are inherited from optimization methods: the process can get trapped in local minima and can be much slower.
Software
- Chimera - excellent molecular graphics package with support for a wide range of operations
- Clustal-W - the famous Clustal-W multiple alignment program
- Clustal-X - provides a window-based user interface to the Clustal-W multiple alignment program
- DCSE - a multiple alignment editor
- Friend - an Integrated Front-end Application for Bioinformatics
- Jalview - a Java multiple alignment editor
- Mauve - a multiple genome alignment and visualization package that considers large-scale rearrangements in addition to nucleotide substitution and indels
- ModView - a program to visualize and analyze multiple biomolecule structures and/or sequence alignments.
- Musca - multiple sequence alignment of amino acid or nucleotide sequences; uses pattern discovery
- MUSCLE - more accurate than T-Coffee, faster than Clustal-W.
- SeaView - a graphical multiple sequence alignment editor
- ShadyBox - the first GUI based WYSIWYG multiple sequence alignment drawing program for major Unix platforms
- UGENE - contains multiple alignment editor with MUSCLE alignment algorithm integrated.
- BSEdit - multiple DNA/RNA/Protein Sequence Editor for Windows XP/Vista/Windows 7.