3.2 Mining Spatio-temporal Object Association Patterns The preprocessing steps transform a 3D protein conformation into a set of labeled 2D bit-patterns, that indirectly capture the local 3D structural characteristics of the conformation. For the two BBA5 trajectories, each conformation contains an average of 6 bit-patterns. As for the five GSGS trajectories, the average number of bit-patterns in each conformation is 4. As BBA5 and GSGS fold, the dynamics among their residues is constantly changing until it reaches an equilibrium. This means that two residues previously in contact may become out of contact later. As a result, bit-patterns present in one conformation may be absent in the next. The evolving nature of contacting residues and in turn bit-patterns, is essentially the consequence of a variety of weak interactions among amino acids at different levels. Such weak interactions include hydrogen bonds, electrostatic interactions, van der Waal's packing and hydrophobic interactions [24]. To capture these (potential) interactions, a simple yet effective method is to consider how close two amino acids are located from each other in 3D. We also adopt this method here. Specifically, we consider interactions between local 3D motifs captured by labeled bit-patterns. We denote such interactions as "interactions among bit-patterns". Let pi and pj be two bit-patterns in a protein conformation, and pi.listCα and pj.listCα be the list of α-carbons involved in pi and pj, respectively. We define piand pj as interacting bit-patterns if at least one pair of α-carbons, each from pi.listCα and pj.listCα are located within a short distance δ. Note that the value of δ should be greater than the distance that is being used to identify contacting α-carbons when generating contact maps. In our analysis, we set δ = 10 Å. It is noteworthy that the above notion of interacting bit-patterns is new compared to our previous work, where two bit-patterns are associated if their distance in the 2D contact map space is below a certain threshold. This can be misleading in the context of protein folding analysis. As demonstrated in Figure 8, the two bit-patterns-BP #1 and BP #2-are only 2 amino acids away in the 2D contact map. However, they can be relatively far apart in 3D. On the other hand, although the bit-patterns BP #2 and BP #3 are relatively far apart from each other in the 2D contact map, they are close to each other in 3D. Therefore, measuring the distance between bit-patterns in the actual 3D conformation is more robust with respect to capturing potential interaction among local motifs. Figure 8 Discrepancy between distances in 2D and 3D spaces. Bit-patterns that are close to each other in the 2D contact map space, for instance, BP#1 and BP#2, can be distant from each other in 3D. Similarly, bit-patterns that are distant in 2D space, for instance, BP#1 and BP#3, can be close to each other in 3D. So far, we have discussed our approach of using bit-patterns in contact maps to characterize local 3D motifs and further represent a protein conformation during folding. We also define the notion of interacting bit-patterns in the folding context. We are now ready to present our method of summarizing folding trajectories to fulfill the two objectives described in Section 2.3. The main idea is that we can summarize a folding trajectory by characterizing the evolutionary behavior of interactions among different types of bit-patterns and in turn, the interactions among local 3D motifs. Definition of (minLink = 1) SOAP As proposed in our previous work [5,25], such interactions can be modeled and captured by discovering different types of spatial object association patterns (SOAPs). Essentially, SOAPs characterize the specific way that objects, bit-patterns in this case, are interacting with each other at a given time. Among the proposed SOAP types, after a careful evaluation, we empirically select (minLink = 1) SOAPs to model the interacting bit-patterns in the folding process. Let p = (g1, g2, ⋯, gk) be a (minLink = 1) SOAP of size k, where gi is one of the 10 types of bit-patterns described above. In the context of folding trajectories, p prescribes that there exists k bit-patterns b1, b2, ..., bk in a conformation, where bi.label = gi (1 ≤ i ≤ k). Furthermore, for each bi, it interacts with at least one of the remaining (k - 1) bit-patterns. Note that the k labels in p are not mutually exclusive. For instance, one can have SOAPs such as (7 9 9), which involves one type 7 bit-pattern and two type 9 bit-patterns. We further restrict ourselves to SOAPs that occur frequently during the folding process (frequent SOAPs). However, we are not ruling out rarely-occurring SOAPS in our future studies. A SOAP is said to be frequent if it appears in no fewer than minSupp frames in a trajectory. In our studies, we set minSupp = 5 for BBA5 and 10 from GSGS. SOAP Episodes The next step is to capture the evolutionary nature of the folding process. We do this by identifying the evolutionary nature of SOAPs. As mentioned earlier, small proteins like BBA5 and GSGS often fold hierarchically and begin with local folded structures. As they fold, new SOAPs can be created and existing one can dissipate. To capture such evolutionary behavior, we proposed the concept of SOAP episodes, which provide an effective approach to model the evolution of interactions among spatial objects over time [5]. To reiterate, a SOAP episode E is defined as follows: E = (p, Fbeg, Fend), where p is a SOAP composed of one or more bit-patterns, p was created in frame Fbeg and persisted till frame Fend. Note that for a given p, it can be created more than once during protein folding, and thus can have more than one episode. To discover frequent (minLink = 1) SOAPs and their episodes in the trajectories of BBA5 and GSGS, we apply our SOAP mining algorithm as explained in our previous work [5]. In summary, this mining phase produces the following results: (i) A list of (minLink = 1) SOAPs of bit-patterns that appeared in at least 5 conformations in each folding trajectories for the protein BBA5 and 10 for GSGS; and (ii) A list of episodes, ordered by beginning frame Fbeg, associated with each of these SOAPs.