Folding and Proteins
Folding occurs in solvent 🡪 in a polar solvent a protein can only fold, and it does it spontaneously.
A protein is a complex system, because the properties of a protein cannot be derived by the sum of the chemical-physical properties of the residues. Also, proteins are social entities.
Proteins can be composed of more single polypeptide chains, in this case we say they are heteropolymers.
Stabilizing interactions in proteins:
- Dipole-Dipole interactions: molecules with non-symmetrical electron distributions.
- Ion-Ion interactions: interactions within oppositely charged molecules.
- Van der Waals interactions: mainly occurs between non-polar molecules.
- Hydrogen bonding.
- Disulfide bonds.
1. All alpha-proteins: they have at least 70% alpha helixes
2. All beta-proteins3. Alpha+beta proteins: alpha helixes and beta sheets occur separately along the protein 🡪 beta sheets are therefore mostly antiparallel4. Alpha/beta proteins: alpha helixes and beta sheets are alternating along the proteins 🡪 beta sheets are therefore mostly parallel
Protein Identity: protein with a 30% sequence identity have the same structure. This is an important statistic, because if we want to train a machine, we want to avoid to have a lot of proteins with the same structure. We can see from the PDB the number of non-redundant structures according to identity in the statistics section.
Dihedral angles
The most mobile angles of a protein backbone are the dihedral angles. The peptide bond is very rigid because it is stabilized by resonance, so it is not mobile, the average length of the peptide bond is 1.32 A. The possible dihedral angles of a polypeptide are represented in the Ramachandran plot. It shows the favoured, allowed and generously allowed (and forbidden) dihedral angles for each residue. The Ramachandran plot has on the x axis the Phi degrees, and in the y axis the Psi degrees. Each dot rapresents a residue.
The Phi (line + circle) angle is the angle between the alpha carbon and the nitrogen, the Psi (trident) angle is the angle between the alpha carbon and the carbon of the carboxylic acid.
Protein surface
Vad der Waals volume: the van der waals volume of a specific atom is the volume occupied by that atom. The volume has the shape of a sphere 🡪 due atomi non possono avvicinarsi tra di loro (per interagire) a una distanza minore dei loro raggi di van der waals, but, in a covalent bond, the space occupied by two atoms is not the sum of their van der waals volumes, because in covalent bond the van der waals volumes overlap.
The solvent accessible surface is computed using a probe in the shape of a sphere (the sphere represents the solvent, so it has the van der waals volume of a molecule of solvent). The probe is moved across the surface of the protein and the resulting line that the centre of the sphere draws is the solvent accessible surface.
The solvent excluded surface instead, is more similar to the real surface of the protein, since it is an approximation of the van der waals radii of the protein obtained by the boundary that separates protein and solvent.
Protein domains
A protein domain is a portion of a protein characterized by a set of secondary structures with a specific organization in space.
PFAM is a database. A large collection of protein families represented by multiple sequence alignments and HMMs. PFAM models are HMMs trained to recognize protein domains. It is the most used database for detecting domains in full length proteins.
PFAM 🡪 HMMs and MSA for protein family representation
PROSITE 🡪 small domains, motifs and conserved/active sites. Sequence analysis
INTERPRO 🡪 meta database for annotation
PROSITE: Databases that contain motifs and small domains. It focuses on active sites, binding sites ecc. It contains patterns (regular expressions) and profiles. Not used for whole domains.
INTERPRO: It is a meta-database that integrates many databases (PFAM and PROSITE for example). It is mainly used for functional annotation.
CATH: Class Architecture Topology/fold Homologous superfamily. It is a database resource that provides information on the evolutionary relationships of protein domains.
SCOP
SCOP 🡪 structural classification of domains:
Similar to CATH and Pfam databases, SCOP (structural classification of proteins) provides a classification of individual structural domains of proteins, rather than a classification of the entire proteins which may include a significant number of different domains. It focuses on the relationship between proteins and the classification of proteins into families starting from their structure. It has a hierarchical classification system.
Protein Families (SCOP):
Families: clearly evolutionary related. Protein in one family have almost all at least 30% sequence identity. 🡪 below 30% sequence identity we can have protein that share the same structure and proteins that have completely different structure. Sometimes protein can share the same structure even below 10% sequence identity, but we have to superimpose the structures to find out. 30% comes from the methods used in sequence alignment 🡪 those methods cannot predict the same structure for a protein under 30% identity of sequence. It is important to note that some family of proteins have the same function, but different structures 🡪 in this case, to know what the structure of a protein in a family of this type is to look at the length of the protein, and see what is the best structure inside that family that fits.
Superfamily: groups 2 or more families with probable common evolutionary origin, even if their sequence identity is low. Proteins in a superfamily have sequence identity below 30%. Proteins in superfamily have similar structures, and sometimes (not always) share function.
Fold: major structural similarity, proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. Having the same fold do not imply that the proteins share evolutionary history, it is purely a structural classification and may be the result of convergent evolution. Folds provide a useful way to understand the limited number of structural solutions used by nature.
Class: secondary structure-based classification (alpha proteins, beta proteins, alpha+beta, alpha/beta)