Alphabets

There are 20(21) amino-acids. The standard protein alphabet is therefore 20. However, for certain uses it is better to use a redundant alphabet based on fewer letters, where each letter represents a physico-chemical grouping of several residue types. When requested, PeCoP uses the following seven-letter alphabet scheme:

Representative letter Physico-chemical property Included residue types
F Hydrophobic A, V, L, I, M, C
R Aromatic F, W, Y, H
O Polar S, T, N, Q
T Positive R, K
N Negative E, D
P Proline P
G Glycine G


Calculation of conservation based on different alphabets

The information content may be based on different alphbetical representations of the protein, as explained above.


Display the protein using different alphbaetical representations

It is possible to calculate the positional information content based on one alphabet, (e.g. 7-letter) and display it using another (e.g. 20-letter).


Priors

The use of priors in information content calculations is one of those tricky issues that basically depends on what you are asking. If the question is: "How conserved is amino-acid X in position j?" then you should not use priors. However, if the question is "How conserved is amino-acid X in position j, given its background distribution?" (or in other words: "how surprised are we to see X in position j?") then priors should be used.


First & last vs. Plurality

Here you set the method by which it is determined in the final consensus whether a position is conserved or not.
First & Last: this means that a position will be marked as conserved if it is conserved in the first PSI-BLAST iteration and in the last PSI-BLAST iteration
Plurality: this means that a position will be marked as conserved by a vote. If a given number of PSI-BLAST iterations mark it as conserved, then it is conserved. The number is user determined, in the next field.


E-value

The e-value (``Expect-value'') is a parameter that describes the number of hits one can ``expect'' to see just by chance when searching a database of a particular size. Essentially, the e-value describes the random background noise that exists for matches between sequences. The e-value is used as a convenient way to create a significance threshold for reporting results.