Representative letter | Physico-chemical property | Included residue types |
---|---|---|
F | Hydrophobic | A, V, L, I, M, C |
R | Aromatic | F, W, Y, H |
O | Polar | S, T, N, Q |
T | Positive | R, K |
N | Negative | E, D |
P | Proline | P |
G | Glycine | G |
Calculation of conservation based on different alphabets
The information content may be based on different alphbetical
representations of the protein, as explained above.
Display the protein using different alphbaetical representations
It is possible to calculate the positional information content based on one
alphabet, (e.g. 7-letter) and display it using another (e.g. 20-letter).
Priors
The use of priors in information content calculations is one of those tricky
issues that basically depends on what you are asking. If the question is:
"How conserved is amino-acid X in position j?" then you should
not use priors. However, if the question is
"How conserved is amino-acid X in position j, given its
background distribution?" (or in other words: "how surprised are we to
see X in position j?") then priors should be used.
First & last vs. Plurality
Here you set the method by which it is determined in the final consensus
whether a position is conserved or not.
First & Last: this means that a position will be marked as
conserved if it is conserved in the first PSI-BLAST iteration and in
the last PSI-BLAST iteration
Plurality: this means that a position will be marked as conserved by
a vote. If a given number of PSI-BLAST iterations mark it as conserved,
then it is conserved. The number is user determined, in the next field.
E-value
The e-value (``Expect-value'') is a parameter that describes the number
of hits one can ``expect'' to see just by chance when searching a database of a
particular size. Essentially, the e-value describes the random background noise
that exists for matches between sequences. The e-value is used as a convenient
way to create a significance threshold for reporting results.