PubMed:14514716 / 92-107
Statistical analysis of the protein environment of N-glycosylation sites: implications for occupancy, structure, and folding.
We recently reported statistical analysis of structural data on glycosidic linkages. Here we extend this analysis to the glycan-protein linkage, and the peptide primary, secondary, and tertiary structures around N-glycosylation sites. We surveyed 506 glycoproteins in the Protein Data Bank crystallographic database, giving 2592 glycosylation sequons (1683 occupied) and generated a database of 626 nonredundant sequons with 386 occupied. Deviations in the expected amino acid composition were seen around occupied asparagines, particularly an increased occurrence of aromatic residues before the asparagine and threonine at position +2. Glycosylation alters the asparagine side chain torsion angle distribution and reduces its flexibility. There is an elevated probability of finding glycosylation sites in which secondary structure changes. An 11-class taxonomy was developed to describe protein surface geometry around glycosylation sites. Thirty-three percent of the occupied sites are on exposed convex surfaces, 10% in deep recesses and 20% on the edge of grooves with the glycan filling the cleft. A surprisingly large number of glycosylated asparagine residues have a low accessibility. The incidence of aromatic amino acids brought into close contact with the glycan by the folding process is higher than their normal levels on the surface or in the protein core. These data have significant implications for control of sequon occupancy and evolutionary selection of glycosylation sites and are discussed in relation to mechanisms of protein fold stabilization and regional quality control of protein folding. Hydrophobic protein-glycan interactions and the low accessibility of glycosylation sites in folded proteins are common features and may be critical in mediating these functions.
|