2.2 Comparing Conformations of BBA5 and GSGS Across Trajectories Although both trajectories of BBA5 start from the same extended conformation as shown in Figure 1(b), when we examine the visualized frames, they seem to identify two very different folding processes. Figures 1(c) and 1(d) illustrate the last frame in the two trajectories T23 and T24 respectively. This also applies to the five GSGS folding trajectories, where each starts with the same conformation (Figure 2(b)) but ends at a different conformation (Figures 2(c), 2(d) &2(e)). This seeming difference might be attributed to the stochastic nature of the folding simulation process [8,9]. However, it is also desirable to characterize the similarities (or dissimilarities) across multiple trajectories. To compare two trajectories, one must address the following key issue: how can we compare two protein conformations? Several measures have been commonly used towards such a purpose, including RMSD (root mean squared distance) [13], contact order [14], and native contacts [15]. However, all these measures are designed to quantify the global topology of a conformation. Furthermore, based on our empirical analysis of these measures, we notice that they are generally too coarse and thus can often be misleading. Even more importantly, such measures fail to identify similar local structures (or motifs) between conformations. This is especially crucial for small proteins like BBA5. As demonstrated in both experimental and theoretical studies, small proteins often fold hierarchically and begin locally [16]. For instance, it has been shown that BBA5 tends to first form secondary structures such as β-turns and α-helices, then conform to its global topology [9]. Finally, as suggested by Pande [8], both sterics (local motifs) and global topology might play an important role in protein folding. Therefore, to compare conformations of (small) proteins, a more reasonable comparison should consider both local and global structures. Moreover, it should also take the native topology of the protein under study into account. To meet these requirements, we propose the following two-step approach to compare conformations of BBA5. First, we partition the 23 residues of BBA5 into four fragments: (i) F1: N-terminal 1–10 β-hairpin; (ii) F2: C-terminal 11 – 23 α-helix fragment; (iii) F3: the first half of F1 and the second half of F2; and (iv) F4: the second half of F1 and the first half of F2, i.e., the middle section in the primary sequence. This segmentation of is also summarized in Table 2. Second, we recognize the secondary structure propensity in each fragment. Two conformations are said to be similar if they demonstrate the same secondary structure propensity in the same fragment. For instance, the pair of conformations in Figure 3(a) are similar as residues in F1, F2 and F4 from both conformations indicate a β-turn like local motif. Please note that the orientation of local motifs does not affect the comparison. For instance, in Figure 3(d), we say the two conformations have a similar structure in F1 fragment, even though the β-turn motifs have different orientations. Table 2 Partitions along the primary sequence of BBA5. Partition Amino Acids Remark F 1 1–10 β-hairpin F 2 11–23 α-helix F 3 1–6, 16–23 The 1st half of F1 and the 2nd half of F2 F 4 6–17 The 2nd half of F1 and the 1st half of F2 Figure 3 Selected conformation-pairs along the consensus partial folding pathway of BBA5. The figure illustrates four conformation-pairs, one from each trajectory, along the consensus partial folding pathway identified in the two BBA5 trajectories. The same two-step approach is also applied to find similar GSGS conformations, except that a different segmentation strategy is adopted according to the native GSGS structure. A total of seven segments are being used to identify the relative location of a motif in GSGS. Table 3 lists such segments. Also listed are the residues involved in each segment and its biological meaning. Table 3 Partitions along the primary sequence of GSGS. Partition ID Amino Acids Remark F 1 1–15 The 1st β-turn F 2 1–7 The 1st β-strand F 3 3–10 Critical region of the 1st β-turn F 4 6–15 The 2nd β-strand F 5 6–20 The 2nd β-turn F 6 10–18 Critical region of the 2nd β-turn F 7 14–20 The 3rd β-strand To realize the comparison of conformations, two more issues must still be addressed. First, how can one effectively capture and represent local motifs? Second, how can we represent the global topology of a conformation in terms of local motifs? To address the first issue, we leverage the non-local patterns in protein contact maps. For the second, we characterize the spatial arrangement among non-local patterns. Please see Section 3 for more details.