M O D E L

Motivation A significant number of proteins have been found experimentally to fold via a twostate process in which only the fully unfolded and native states are ever populated. Our approach assumes two states in the protein folding process: earlystage and latestage folding. The twostates are closely related with the idea of the reduced conformational space for earlystage protein folding. The complete conformational space is available after reaching the proper conformation determined in a first step of folding. The existence of the first step determines to some extent the search for the final native conformation. Defining conformational subspace The earlystage folding was built on the characteristics of the polypeptide chain treated as a chain of rigid peptide planes, which can create shapes with different radii of curvature depending on the dihedral angles between two sequential peptide bond planes (Vangles) polypeptide conformations can be treated as helicoidal. βStructure represents the helical structure characterised by an infinitely large radius of curvature. It turned out that the Vangle can change form 0° (parallel mutual orientation) to 180° (antiparallel orientation) allowing creation of a very squeezed helix (Vangle close to zero); then an increase of the Vangle causes an increase of the Rradius of curvature, reaching R infinitely large for the βstructure and extended. Instead of the φ,ψ conformational space, the VR conformational space can be used for polypeptide structure representation. Using the above defined geometric parameters, the Ramachandran map can be interpreted as shown in Fig.1. Fig.1A shows the distribution of the R,radius of curvature (on log scale) for polypeptide fragments (pentapeptides) all over the conformational space. The distribution of Vangles is shown in Fig.1B.
When only lowenergy conformations (Fig.2A) are taken into account, the dependency of R (log scale) on the Vangle can be approximated to a seconddegree polynomial function (Fig.2B and Eq.1).
On the other hand, the distribution of points that satisfy the presented dependency (Fig.2A) is represented as shown in Fig.2C. A convenient way to search for lowenergy conformations seems to be the ellipseshaped curve shown in Fig.2D and Eq.2, which links all lowenergy conformations very nicely (Fig.2E).
In consequence, a geometric analysis of the polypeptide chain limited to simple R(V) representations prompts the limited conformational subspace, which can be treated as the earlystage folding (in silico) conformational subspace. This accords with the generally accepted assumption that the backbone conformation is responsible for the earlystage folding step. Independent support for the model from elements of information theory It turned out, that limited conformational subspace balances the amounts of information stored in the amino acid sequence and needed for structure prediction .The balance between the amount of information stored in the nucleotide (trinucleotide) appeared to be comparable with the amount of information necessary to determine the amino acid from among twenty of them. The same idea was applied for the relation between the amount of information carried by an amino acid and the amount of information necessary to predict a particular conformation determined by φ,ψ angles (a point on the Ramachandran map). The amount of information [bit] carried by a particular amino acid can be calculated using Shannon's equation:
where p_{i} expresses the probability of the ith amino acid's presence in a sequence (approx. p = 1/20). The amount of information necessary to predict a particular φ,ψ angle can be calculated according to the same equation with p equal to the probability of the occurrence of these angles (which is much lower than 1/20). The entropy of information measuring the averaged amount of information necessary to predict structure with 1x1° precision can be calculated for each amino acid as follows:
where p_{i} represents the probability of occurrence of particular φ,ψ angles (which can be calculated for PDB). These quantities were calculated (taking into account the different frequencies of particular amino acids and the probability distribution all over the Ramachandran map). Eq.4 applied to the ellipselimited conformational subspace leads to equilibration between these two quantities (Tab.1). It can be proved that the large disproportion between these two events for the whole Ramachandran map makes the problem unsolved. Limitation of the conformational space (Ramachandran map) to the subspace (for example the ellipse path) balances these two quantities. The amount of information carried by an amino acid (calculated according to its frequency) and the amount of information (entropy of information) necessary to predict the fragment of the ellipse (10°) becomes comparable (Tab.1).
Tab.1. The amount of information carried by an individual amino acid in
relation to of the model for insilico earlystage protein folding that you can find here here. 