PMC:1867812 / 9996-19168 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"17407611-12422224-1692915","span":{"begin":895,"end":896},"obj":"12422224"},{"id":"17407611-10984515-1692916","span":{"begin":919,"end":921},"obj":"10984515"},{"id":"17407611-11243792-1692917","span":{"begin":922,"end":924},"obj":"11243792"},{"id":"17407611-12422224-1692918","span":{"begin":2274,"end":2275},"obj":"12422224"},{"id":"17407611-12422224-1692919","span":{"begin":3766,"end":3767},"obj":"12422224"},{"id":"17407611-12368107-1692920","span":{"begin":4111,"end":4113},"obj":"12368107"},{"id":"17407611-9545386-1692921","span":{"begin":4131,"end":4133},"obj":"9545386"},{"id":"17407611-10087919-1692922","span":{"begin":4690,"end":4692},"obj":"10087919"},{"id":"17407611-12422224-1692923","span":{"begin":4846,"end":4847},"obj":"12422224"}],"text":"2 Analyzing Protein Folding Trajectories\n\n2.1 Protein Folding Trajectories\nAdvances in high-performance computing technologies and molecular dynamics have led to successful simulations of folding dynamics for (small) proteins at the atomistic level [8]. Such simulations result in a large number of folding trajectories, each of which consists of a series of 3D conformations of the protein under simulation. These conformations are usually sampled regularly (e.g., every 200fs) during a simulation. In this article, we also refer to each conformation as a folding frame or simply a frame. Furthermore, to represent a protein conformation, we adopt one of the commonly adopted representation schemes, where a conformation is represented as a sequence of α-carbons (Cα) located in 3D space.\nIn this article, we focus on the folding trajectories of two mini proteins: BBA5 (Protein Data Bank ID) [9] and GSGS (orBeta3s) [10,11]. Such trajectories were produced by the Folding@ home research group at Stanford University [12].\nBBA5 is a 23-residue protein that folds at microsecond timescale. The native structure (or fold) of BBA5 shows a β-hairpin involving residues 1–10 and centering about residues 4–5. It also includes an α-helix involving the remaining residues 11–23. By convention, residues are numbered increasingly from the N-terminal to C-terminal of a protein. Figure 1(a) illustrates the native conformation of BBA5. The two folding trajectories, referred to as T23 and T24 respectively, are of different length. T23 consists of a series of 192 conformations (or frames), while T24 has 150 frames. Each conformation is described at atomistic level in PDB format adopted by the Protein Data Bank programs. GSGS (or Beta3s) is a 20-residue peptide with an average folding rate of microseconds. Its NMR conformation shows a three-stranded anti-parallel β-sheet with turns at residues 6 – 7 and 14 – 15. Figure 2(a) depicts this native conformation. There are a total of 5 GSGS folding trajectories: T1, T2, T3, T4, and T5. The number of conformations in each trajectory is listed in Table 1. Similar to BBA5, each conformation corresponds to one PDB file. Pande et al. explained in detail on the simulation model and methods employed to produce such trajectories [8,9].\nFigure 1 Different conformations of the small protein BBA5, where only the Cα atoms are shown. (a)The native NMR structure of BBA5 based on data from the SCOP website. (b)The initial conformation of both folding trajectories. (c)The last conformation in the first trajectory. (d)The last conformation in the second trajectory.\nFigure 2 Different conformations of the GSGS peptide, where only the Cα atoms are shown. (a)The native NMR conformation of GSGS. (b)The initial conformation in all the five folding trajectories. (c)The last conformation in the 1st trajectory. (d)The last conformation in the 3rd trajectory. (e)The last conformation in the 5th trajectory.\nTable 1 A brief description of the GSGS folding trajectories.\nTrajectory ID Total number of conformations\nT 1 25,664\nT 2 30,075\nT 3 19,649\nT 4 25,263\nT 5 25,664\n\n2.2 Comparing Conformations of BBA5 and GSGS Across Trajectories\nAlthough both trajectories of BBA5 start from the same extended conformation as shown in Figure 1(b), when we examine the visualized frames, they seem to identify two very different folding processes. Figures 1(c) and 1(d) illustrate the last frame in the two trajectories T23 and T24 respectively. This also applies to the five GSGS folding trajectories, where each starts with the same conformation (Figure 2(b)) but ends at a different conformation (Figures 2(c), 2(d) \u00262(e)).\nThis seeming difference might be attributed to the stochastic nature of the folding simulation process [8,9]. However, it is also desirable to characterize the similarities (or dissimilarities) across multiple trajectories.\nTo compare two trajectories, one must address the following key issue: how can we compare two protein conformations? Several measures have been commonly used towards such a purpose, including RMSD (root mean squared distance) [13], contact order [14], and native contacts [15]. However, all these measures are designed to quantify the global topology of a conformation. Furthermore, based on our empirical analysis of these measures, we notice that they are generally too coarse and thus can often be misleading. Even more importantly, such measures fail to identify similar local structures (or motifs) between conformations. This is especially crucial for small proteins like BBA5. As demonstrated in both experimental and theoretical studies, small proteins often fold hierarchically and begin locally [16]. For instance, it has been shown that BBA5 tends to first form secondary structures such as β-turns and α-helices, then conform to its global topology [9]. Finally, as suggested by Pande [8], both sterics (local motifs) and global topology might play an important role in protein folding. Therefore, to compare conformations of (small) proteins, a more reasonable comparison should consider both local and global structures. Moreover, it should also take the native topology of the protein under study into account.\nTo meet these requirements, we propose the following two-step approach to compare conformations of BBA5. First, we partition the 23 residues of BBA5 into four fragments: (i) F1: N-terminal 1–10 β-hairpin; (ii) F2: C-terminal 11 – 23 α-helix fragment; (iii) F3: the first half of F1 and the second half of F2; and (iv) F4: the second half of F1 and the first half of F2, i.e., the middle section in the primary sequence. This segmentation of is also summarized in Table 2. Second, we recognize the secondary structure propensity in each fragment. Two conformations are said to be similar if they demonstrate the same secondary structure propensity in the same fragment. For instance, the pair of conformations in Figure 3(a) are similar as residues in F1, F2 and F4 from both conformations indicate a β-turn like local motif. Please note that the orientation of local motifs does not affect the comparison. For instance, in Figure 3(d), we say the two conformations have a similar structure in F1 fragment, even though the β-turn motifs have different orientations.\nTable 2 Partitions along the primary sequence of BBA5.\nPartition Amino Acids Remark\nF 1 1–10 β-hairpin\nF 2 11–23 α-helix\nF 3 1–6, 16–23 The 1st half of F1 and the 2nd half of F2\nF 4 6–17 The 2nd half of F1 and the 1st half of F2\nFigure 3 Selected conformation-pairs along the consensus partial folding pathway of BBA5. The figure illustrates four conformation-pairs, one from each trajectory, along the consensus partial folding pathway identified in the two BBA5 trajectories. The same two-step approach is also applied to find similar GSGS conformations, except that a different segmentation strategy is adopted according to the native GSGS structure. A total of seven segments are being used to identify the relative location of a motif in GSGS. Table 3 lists such segments. Also listed are the residues involved in each segment and its biological meaning.\nTable 3 Partitions along the primary sequence of GSGS.\nPartition ID Amino Acids Remark\nF 1 1–15 The 1st β-turn\nF 2 1–7 The 1st β-strand\nF 3 3–10 Critical region of the 1st β-turn\nF 4 6–15 The 2nd β-strand\nF 5 6–20 The 2nd β-turn\nF 6 10–18 Critical region of the 2nd β-turn\nF 7 14–20 The 3rd β-strand To realize the comparison of conformations, two more issues must still be addressed. First, how can one effectively capture and represent local motifs? Second, how can we represent the global topology of a conformation in terms of local motifs? To address the first issue, we leverage the non-local patterns in protein contact maps. For the second, we characterize the spatial arrangement among non-local patterns. Please see Section 3 for more details.\n\n2.3 Folding Trajectory Analysis: Objectives\nThere are two main goals we would like to achieve in analyzing the folding trajectories. First, we would like to address the following issues for individual trajectories: (1) to detect (or predict) significant folding events, including the formation of β-turns, α-helices, and native-like conformations; and (2) to recognize the temporal ordering of important folding events in the trajectory. For instance, between the two secondary structures α-helix and β-hairpin in BBA5, which forms earlier? What is ordering of the two events preceding a β-hairpin formation: formation of two extended strands or formation of the turn?\nIn contrast to the first goal, our second goal concerns multiple trajectories. Specifically, we would like to identify a sub-sequence of similar conformations across trajectories. This sub-sequence of conformations is referred to as the consensus partial folding pathway. This is analogous to the Longest Common Sub-sequence (LCS) problem [17], but much more challenging due to the following reasons. First, we are dealing with time series of 3D protein structures. Second, we are looking for similar conformations across trajectories, and our work on mining spatio-temporal data [5]."}