MSAReveal Help

Getting Started:

Features:

Specifications

Errors Detected and Reported

Optional Advanced Features:

Credits
Versions

Overview

Collect amino acid sequences, e.g. from UniProt.Org. Instructions are provided.
Align sequences. Instructions are provided using free, straightforward, powerful Jalview. MSAReveal does not align sequences.
Save the alignment in a file in FASTA format.
Display the alignment, copy, and paste into MSAReveal.
Press the button Process Sequences.

Still Learning 1-Letter Amino Acid Codes?

No problem! MSAReveal shows you the 3-letter abbreviation in a tooltip whenever you touch a one-letter code in the color scheme options, or in the sequence alignment listing. When you touch a one-letter code column header in the statistics table, the full name of the amino acid is shown.

And here is a handy reference chart.

How To Download FASTA Sequences

We recommend downloading FASTA sequences from UniProt.Org:

At UniProt.Org, use the search slot at the top to describe a sequence. Examples: "yeast gal4", "sulfurreducens pila", "human pla2g6".
In the list of hits, click on the Entry code (in the left column of the table) for the sequence you want. (We recommend viewing the entire entry to confirm this is what you want.)
Click on the blue Sequence button at the left side of the page.

Click on the blue FASTA button.
Open your browser's File menu, and click Save Page As.
You may wish to rename the file to add the name of the protein or taxon. Keeping the file type ".fasta" is a good idea.

Click on the blue button Add to basket.
When you have added all the desired sequences to your basket, scroll to the top of the page and click on the blue Basket button.
In the box that opens, click on Download.

Select Uncompressed and click Go.
Select Save File and click OK.

You can now open your saved FASTA file (a plain text editor would be ideal, see below), select all, copy, and paste into MSAReveal.

NOTE that your sequences are not yet aligned. See How To Align Sequences.

FASTA files are plain text. You can edit them with a plain text editor, for example to separate or gather sequences. A plain text editor is one which does not "mark up" the text with formatting codes. In Windows, use Notepad. In Mac, use the free program TextWrangler. If you use WordPad, Word, TextEdit, or other "word processor" programs, it is often tricky to force the program to save as plain text.

How To Align Sequences

We recommend the free program Jalview because it is straightforward, and preserves the full UniProt headers (including genus and species). Jalview requires that free Java be installed on your computer. Alignments done in UniProt suffer from FASTA headers that have only the UniProt Accession Number, without the taxon (genus and species). Instructions for Jalview:

You will need files containing FASTA sequences that have been saved on your computer. See How To Download FASTA Sequences.
Run Jalview.
Drag a file containing one or more FASTA sequences and drop into Jalview. A window should appear that displays the sequence(s) at the top.
Drag additional files into the SAME window if you wish to add more sequences.
At the top of the window containing your sequences, click on Web Service and then click on Alignment.
Choose an alignment algorithm (such as MAFFT, MUSCLE, or TCOFFEE) and click on with defaults.
A second window opens and the alignment is performed. If you have many or long sequences, this might take a while.

A third window titled "So and so alignment" opens when the alignment is completed.
Open the File menu at the top left of the third window, and "Save As". You may want to double-click on Desktop to save it there temporarily. Use FASTA format, and name the file appropriately.
Your saved alignment is now ready to open (a plain text editor would be good), select all, copy and paste into MSAReveal.

Specifications:

Options:

Options (preferences) are remembered automatically between sessions, unless you have disabled "cookies" in your browser.

Sequences:

There is no maximum sequence length or maximum number of sequences. Tests have included human titin (34,350 amino acids) and an alignment with 99 sequences of length 345.
Various error conditions are detected and reported.
A number of sample sequence alignments (and one unaligned set) are provided. Press the button "Show Demos & Tests" above the sequence input box.

Headers:

UniProt headers work best but other header formats can be used.
Header formats can be mixed in the same group of sequences.
Genus and species will be tabulated when given in the header following "OS=" (UniProt format).
UniProt 6 or 10 character Accession Codes are detected (regardless of the surrounding characters) and tabulated with links to UniProt. UniParc Identifiers (beginning "UPI") are also used. If none of these are found, UniProt Entry Names are looked for.
The gene name is tabulated when given in the header following "GN=" (UniProt format).
If a 4-character PDB Entry Code is added to a header following "PDB=", it will be tabulated in the Statistics table and linked to display the 3D model in FirstGlance in Jmol. Demo: 9: Pilins.
When a description of an alignment is added to a header, it will be displayed above the sequences table.
When a description of an individual sequence added to its header, it will be displayed when the Taxon of that sequence is touched with the mouse.

Output:

Sequences can be displayed in a single horizontally-scrolling table, or broken into multiple tables ("wrapped") of specified length (default 100 amino acids each).
Touching any amino acid reports its sequence number in a tooltip, counting the first amino acid as number one.
The statistics table can be sorted by any column. Row numbers remain intact and can be used to cross-reference between the sequences table and list of full headers. The table can be "unsorted" by sorting on the row number column.
A single color scheme for amino acids is provided in this version. Others can be added by contacting emartz@microbio.umass.edu.
The state of checkboxes (colors applied or not, output wrapped or not) and other preferences are remembered between sessions and runs (using browser "cookies").

Consensus:

A consensus is shown below the sequence alignment. Touching any position (column) in the consensus reports the frequencies of amino acids and dashes in that column in a tooltip.

A Black upper case letters: 100% identical.
A Gray upper case letters: all but one (when 4-9 sequences), or >=90% (when 10 or more sequences).
a Gray lower case letters: >50% (when 3 or more sequences).
. Gray period ("dot"): "similar", >=90% in a single similarity group (therefore 100% in a single similarity group if there are fewer than 10 sequences).

ILMV AC (hydrophobic, not aromatic)
FYW (aromatic)
NQ ST Y (polar, not charged)
DEKR H (charged)
GP (P is helix-breaking; turns frequently include one or both)
Note that Y is included in both aromatic and polar, not charged.

Statistics:

The length of each sequence (exclusive of gaps/dashes) is given.
The length of the sequences in the alignment, including gaps/dashes, is given in the Consensus line below the aligned sequences.
The number of identical residues, and percentage of identical residues, relative to the first ("Reference") sequence. For the percentage, the denominator is the length of the sequence, regardless of whether the reference sequence is shorter.
Counts and percentages of various residues and groups of residues. More amino acids or groups can be added on request (emartz@microbio.umass.edu).
Net charge near neutral pH.
Number of gaps (groups of one or more consecutive dashes), dashes ("gapped" positions), and dashes as percentage of the length (denominator includes dashes).

Errors Detected and Reported

The following conditions are detected and reported. Each of these can be demonstrated with one of the Demo tests provided.

No header. Demo: Header Missing.
Illegal characters not representing amino acids. Demo: Illegal Characters.
Nucleic acid sequence instead of protein sequence. Demo: DNA/RNA.
Legal but ambiguous amino acid characters BJOUXZ. Demo: 1: With Gaps, Ambiguous AA.
A single sequence containing gaps (dashes), hence not an alignment. Demo: 1: With Gaps, Ambiguous AA.
Alignment having sequences of different lengths. Demo: Mismatched Lengths.
Header containing more than one distinct 6- or 10-character UniProt Accesion Numbers. Demo: Multiple accession numbers.

3D Structures (PDB Codes)

When a sequence has an empirical 3D structure in the Protein Data Bank, you may add "PDB=xxxx" to the header, where xxxx is the PDB accession code. Such PDB codes will appear in a "3D" column in the Statistics table, linked to display the corresponding structures in FirstGlance in Jmol. The addition must be before >> or >>>. Example: Demo "9: Pilins".

>>> & >>: Descriptions

Group Descriptions: If you add, for example, ">>> Aligned by MAFFT" to the end of a header, this will be displayed above the table of sequences, with a light green background. Such a group description would normally be added to only one header in a group of sequences. If several headers contain ">>>", the descriptions will be concatenated. Example: Gal4 Demo.

Sequence Descriptions: If you add, for example, ">> Mutant Y57W" to the end of a header, when you touch the Taxon in this row with the mouse, this sequence descripton will be shown above the table of sequences, with a pink background. Example: Gal4 Demo.

">>>" and ">>" can be in either order, but both must be at the end of the header.