3.3 Folding Trajectory Analysis In this section, we describe our strategy on utilizing SOAPs to summarize a folding trajectory and address the two folding analysis issues described in Section 2.3. SOAP-based Trajectory Summarization The previous mining phase discovers a collection of frequent (minLink = 1) SOAPs and the associated episodes in each trajectory. Therefore, it identifies all the conformations in the trajectories that contain at least one frequent (minLink = 1) SOAPs. For instance, the last conformation in trajectory T23 (Figure 1(c)) has two SOAPs of size 2:(5 8) (i.e., association of a type 5 and a type 8 bit-pattern) and (7 8), and three SOAPs of size 1: (5), (7), and (8), while the last conformation in trajectory T24 has three SOAPs: (7 8), (7) and (8). This leads to our SOAP-based approach for folding trajectory summarization. To summarize a folding trajectory, we perform the following three steps. First, for each conformation, we identify all the frequent SOAPs that appear in it and use these SOAPs to represent this conformation. Note that not every conformation contains frequent SOAPs, especially when minSupp is set high. Second, for each SOAP-representable conformation, we carry out two tasks on its associated SOAPs. We next use the folding trajectories of BBA5 to explain how these two tasks are carried out. In the first task, for each SOAP, we mark the relative location of each involved bit-pattern in the primary sequence of BBA5. This is done by identifying the segment of BBA5 where the majority of a bit-pattern's α-carbons are located. The segment can be one of the following as described in Section 2.2: F1, residues 1 – 10; F2 , residues 11 – 23; F3, residues 6–17; and F4: residues 1–5 and 18–23. Let us again take the last conformation in T24 as an example. It can be summarized by three SOAPs: (7 8), (7) and (8). When we look at the list of α-carbons involved in these bit-patterns, we find out that 7 is mainly located in F2 and 8 in F1. Therefore, we mark the three SOAPs as follows: (8.1 7.2), (7.2) and (8.1). We re-arrange the bit-patterns in a SOAP by their relative locations in BBA5. This super-imposes BBA5-specific spatial information to a SOAP. In the second task, we prune away redundant SOAPs after marking each bit-pattern with its relative location in BBA5. A SOAP is redundant if it is embedded in another SOAP. For instance, in the previous example, we can prune away (8.1) and (7.2) as both are embedded in (7.2 8.1). After pruning, most conformations in such a small protein can often be represented by a single SOAP. We can even take this summarization a step further, where we replace a bit-pattern with its corresponding 3D motif, as illustrated in Figure 5. For instance, SOAP (7.2 8.1) will be transformed into (β.1 α.2). We refer to such SOAPs as generalized SOAPs, and the corresponding trajectory as a generalized trajectory. Note that in a generalized trajectory, multiple types of bit-patterns can be mapped into a single type of 3D motif. For instance, the α-motif corresponds to three types of bit-patterns 4, 7, and 9 (Figure 5). Figure 9 shows a segment in each summarized BBA5 folding trajectory before and after being generalized with 3D motifs. Figure 9 SOAP-based folding trajectory summarization. An sample segment in each of the two BBA5 folding trajectories is presented, (a) After superimposing the relative location of each bit-pattern and pruning away redundant SOAPs. (b) After further generalizing each bit-pattern by corresponding 3D motif. Detecting Folding Events and Recognizing Ordering Among Events Once each folding trajectory is summarized into generalized SOAPs, it is fairly straightforward to detect folding events such as the formation of α-helix or β-turn like local structures. This can be done by simply locating the frames that contain the local motif(s) of interest. We can also easily identify native-like conformations, by finding those that contain the generalized SOAP (β.1 α.2). Finally, based on the summarization, one can quickly identify the ordering of folding events in a trajectory. For instance, to check which secondary structure forms more rapidly, α-helix or β-hairpin, one can simply compare the first occurrence of these structures in the summarized trajectory (Figure 9(b)). Identifying the Consensus Partial Folding Pathway Across Trajectories To do this, we simply compute the longest common sub-sequence (LCS) [17] between two summarized trajectories. One can utilize the summarization either before the 3D motif generalization (Figure 9(a)) or after (Figure 9(b)). We use the latter in our analysis. Based on the LCS of generalized SOAPs, we construct the consensus folding pathway by identifying pairs of conformations, one from each trajectory, along the LCS of two summarized trajectories. In other words, the resulting consensus pathway consists of a sequence of conformation-pairs of similar 3D structures. Notice here that the comparison between 3D protein conformations (as described in Section 2.2) is done by using bit-patterns to model local structural motifs, and associations of bit-patterns (SOAPs) to characterize the global structure. This forms a hierarchical comparison and is in accordance with the hierarchical folding process of small proteins.