Contacts are defined according to the value of the calculated frustration index. If the value of frustration index is 0.78 or higher magnitude [11], the contact is defined as 'minimally frustrated', this means, that other amino acid pairs in that position would be energetically unfavorable. If the local frustration index is lower than -1, the contact is defined as 'highly frustrated', that is, that other amino acid pairs in that position would be energetically more favorable for folding than the native ones. If the native energy is in between these limits, the contact is defined as 'neutral'.
Frustratometer can calculate frustration in 3 modes, the difference is in how the decoy set is generated, 2 for contacts, i.e. mutational and configurational frustraion indexes (FIs), and a single residue frustration index (SRFI) [17].
To explain results, we start by using the SRFI results. Let’s take the case of IM7 (link to the job example).
FrustraEvo first calculates frustration patterns to all the structures in the submitted dataset and maps frustration values from each of the structures to the corresponding sequence in the MSA, generating a MSFA (Multiple Sequence Frustration Alignment)
The next step in the algorithm involves to use Information Theory concepts to measure the conservation degree of frustration states on each of the MSFA columns.
The calculation of evolutionary frustration based on the single residue FI which can analogously be applied to contacts as well. Given a Multiple Sequence Alignment (MSA) and the corresponding structures for each sequence contained in the MSA, we can map local frustration values from the structures to each aligned residue within the MSA. Evolutionary frustration refers to the quantification of this conservation by calculating the Information Content (IC) for each MSA column, using the Shannon information content formulas:
The background frequencies for the minimally, neutral and highly frustrated states are defined as ~40%, ~50% and ~10% respectively, in correspondence to the frequencies observed by Ferreiro et al. [11] for the single residue, configurational and mutational indexes. The more conserved the frustration state is in a given MSA position, the higher its FrustIC will be and vice versa.
The MSA is processed such that only columns that have an amino acid (no gap positions) in the reference structure are kept. The Frustration Information Content (FrustIC) based on the distribution of frustration states and using Shannon information content formulas for each column in the MSA is calculated. As a result, for each position of the ungapped MSA, the information content contributions from each frustration state will calculate. The total information content of a given position will be calculated as the sum of the individual contributions from each frustration state.
In the previous figure, you can observe a typical sequence logo below which you can find a frustration logo that is derived in the same way. The way to interpret the frustration logo is analogous to the sequence one. Positions in the MSFA with high conservation of their frustration states will display tall bars (maximum theoretical height is log2(3)=1.58) while those with no conservation will be closer to 0 (or even negative values if the distribution of states do not follow what is expected according to the background frequencies used by the algorithm). Each bar will be constituted by 3 stacked proportions that correspond to the amount of information consent that each state contributes to the total height of the bar. We consider a position to be energetically conserved if its associated Frustration Information Content (FrustIC) is higher than 0.5. In the Im7 example we can easily see that Y55 and Y56 are in energetic conflict in a majority of the family members (tall, red bars) and therefore might be functionally relevant (these residues are known to be important to bind Colicin E7). Some other positions seem to be important for local stability, e.g. positions 19, 22, 37, 38, 53 or 54 that correspond to hydrophobic residues that are part of the hydrophobic core of the protein. On the contrary, some positions are not energetically conserved (small bars). This can be accompanied by sequence variability or not. Interpretation of such positions goes as follows. These bars are small because the frustration states distribution in that MSFA column is heterogeneous. This heterogeneity can be accompanied by sequence variability which can be its cause, as well as sequence variability in the contacting residues, the occurrence of nearby insertions or deletions. It can also be possible that positions with high sequence conservation have no frustration conservation. In this case, it might be useful to explore structural conformational variability of the region. This is because disordered or highly flexible regions can have multiple conservations in the static structure with heterogeneous frustration values even when having exactly the same sequence.
Similarly to the Frustration Logo, a reference structure is taken to define the contacts to be evaluated.
Taking in consideration that the MSA was ungapped according to the reference structure, the frequency of having
a contact between columns i, j in the MSA, on each structure in the dataset is calculated.
Where i, j ∈ [1, N], with N being the number of columns in the ungapped MSA. Subsequently, a
Frustration information content is calculated for each possible i,j contact, using the distribution of
observed frustration states across the structures that contain the i,j contact.
As a result for each possible contact, according to pairs of columns within the ungapped MSA,
the information content contributions from each frustration state is calculated. The total information content
of a given contact will be calculated as the sum of the individual contributions from each frustration state.
In what follows we show the contact map that results from using the mutational frustration index.
In the case of contact maps we pay attention to 3 things for each contact. 1) The Frustration Information Content (FrustIC, not shown in the figure), 2) the frustration state that contributes more energy to the total FrustIC (upper diagonal matrix) and 3) the proportion of structures relative to the total in the submitted dataset. In our analysis we often pay attention to contacts with FrustIC values higher than 0.5 and that are present in more than 50% of structures.
We map residues with FrustIC>0.5 to the structure and color them according to the most informative frustration state in the corresponding MSFA column. Residues with FrustIC<=0.5 (i.e. not energetically conserved) are coloured in black. Similarly, we map contacts from both the mutational and the configurational frustration indexes into the structure. We map contacts with FrustIC>0.5 and that are present in at least 50% of the structures in the submitted dataset. These 3 representations offer complementary ways to analyse the evolutionary constraints that are present in the dataset.