Determination of Important Rules To select a set of informative and discriminative rules for the extraction of knowledge, most of the existing approaches rank the association rules based on the confidence value of a individual rule. A strong rule which is highly confident and represents general knowledge, may not be a good discriminative rule for the classification. Instead, a better measure of the importance of a rule should include the following factors considered together: correlation between a property and a class, the degree of classification power, confidence and support, top K coverage and uniqueness of a rule. As noted in the previous section, the inclusion of the SSE content information in our ARBC approach has a positive effect on the classification accuracy (Table 4). The importance of a rule can be quantified by integrating the various factors including the SSE content information. We defined a importance factor (I in Tables 6 and 7) by an average value of all the factors. In order to illustrate the informativeness of the rules in understanding interface features, some representative rules within the top 30% (ranked higher than 48) of I are listed in Table 6. The list was complemented by some other rules ranked below 48 in order to explain overlapping rules and compare association rules to rules generated from a decision tree. Similarly, rules describing the ENZ type with varying different structural features are listed in Table 7. Rules in Tables 6 and 7 are sorted by Type and I. Table 6 Representative examples of association rules for each type #a Ob Rule descriptionc Typed Confe Suppf Cg Gh Ki Uj Sk Il 1 3 If 77.31 ≤ Loop < 80.56 ENZ 0.811 0.032 1 0.214 1 1 1 0.722 2 8 If 17.57 ≤ Helix < 20.87 ENZ 0.545 0.032 1 0.102 1 1 1 0.668 3 9 If SCOPClass = 7 ENZ 0.725 0.053 1 0.184 1 1 - 0.660 4 26 If 67.59 ≤ Loop < 70.83 ENZ 0.526 0.032 - 0.048 1 1 1 0.601 5 28 If 461.83 ≤ df-ASA < 681.42 AND 2.3 ≤ LCS < 2.73 ENZ 0.625 0.032 - 0.120 1 1 - 0.555 6 37 If 57.87 ≤ Loop < 61.11 ENZ 0.467 0.037 - 0.045 - 1 1 0.510 7 2 If SCOPClass = 1 AND 12.25 ≤ nFrag < 16 AND NoStrand nonENZ 0.882 0.032 1 0.250 1 1 1 0.738 8 11 If .66 ≤ inPro < .87 nonENZ 0.597 0.042 1 0.129 1 1 - 0.628 9 15 If 26.74 ≤ nAA < 35.32 AND 901.01 ≤ df-ASA < 1120.6 nonENZ 0.556 0.032 1 0.133 1 1 - 0.620 10 18 If SCOPClass = 1 AND 1.87 <= LCS < 2.3 9 nonENZ 0.545 0.032 1 0.137 1 1 - 0.619 11 20 If 1.43 ≤ LCS < 1.87 nonENZ 0.556 0.042 1 0.074 1 1 - 0.612 12 21 If NoStrand AND 1.87 ≤ LCS < 2.3 nonENZ 0.515 0.037 - 0.113 1 1 1 0.611 13 36 If 58.11 ≤ ASAPR < 59.52 nonENZ 0.476 0.032 1 0.065 - 1 - 0.515 14 38 If 41.67 ≤ Loop < 44.91 nonENZ 0.423 0.032 - 0.046 - 1 1 0.500 15 40 If SCOPClass = 1 AND NoStrand nonENZ 0.484 0.064 - 0.074 - 1 0.406 16 46 If 125.14 ≤ nAtom < 165.52 AND 901.01 ≤ df-ASA < 1120.6 nonENZ 0.412 0.037 - 0.050 - 1 - 0.375 17 64 If .42 ≤ HH < .44 nonENZ 0.347 0.037 - 0.009 - 1 - 0.348 18 5 If 7.78 ≤ Strand < 10.27 HET 0.660 0.037 1 0.141 1 1 1 0.691 19 7 If 2.8 ≤ Strand < 5.29 HET 0.565 0.037 1 0.089 1 1 1 0.670 20 12 If 205.9 ≤ nAtom < 246.28 HET 0.574 0.037 1 0.143 1 1 - 0.626 21 25 If 44.91 ≤ Loop < 48.15 HET 0.479 0.037 1 0.110 - 1 1 0.604 22 32 If 3.6 ≤ LCS < 4.03 HET 0.461 0.037 1 0.100 - 1 - 0.520 23 33 If .44 ≤ HH < .46 HET 0.467 0.045 1 0.070 - 1 - 0.516 24 63 If SCOPClass = 1 AND NoStrand HET 0.282 0.037 - 0.074 - - 1 0.348 25 31 If SCOPClass = 3 AND 2.3 ≤ LCS < 2.73 HOM 0.470 0.033 1 0.100 - 1 - 0.521 26 98 If 3.17 ≤ LCS < 3.6 HOM 0.337 0.035 - 0.034 - - - 0.135 27 133 If 26.74 ≤ nAA < 35.32 HOM 0.237 0.039 - 0.041 - - - 0.106 Representative examples of 27 rules within top 30% are listed by sorting Columns Type and I. Rules of which order is below 48 are added for explaining overlapping rules and the comparison to rules produced from a decision tree. a#: Rule identifier; bO: Order of a rule ranking by importance factor; cRule description: The body of a rule; dType: The head of a rule representing a PPI type; eConf: Confidence of a rule; f Supp: Support of a rule; gC: Rules selected from correlation-based feature subset selection [32]; hG: The worth of a rule by measuring the gain ratio [33]with respect to PPI types; iK: Top K rules ranked within top 30%; jU: Unique rules; kS: SSE content rules; lI: Importance factor of a rule calculated by an average of all factors such as Conf, Supp, C, G, K, U and S; "-" is replaced with value 0 when the importance factor was calculated. Table 7 Representative examples of ENZ type presenting different structural features # O Rule description Subtype Conf Supp C G K U S I 28 24 If NoHelix ENZ_A, ENZ_B, ENZ_C 0.508 0.069 - 0.058 1 1 1 0.606 29 1 If SCOPClass = 7 AND NoHelix ENZ_A, ENZ_B 1.000 0.032 1 0.315 1 1 1 0.764 30 17 If 461.83 ≤ df-ASA < 681.42 AND NoHelix ENZ_A, ENZ_B 0.593 0.037 - 0.085 1 1 1 0.619 31 39 If 461.83 ≤ df-ASA < 681.42 ENZ_A, ENZ_B 0.477 0.111 1 0.076 - - - 0.416 32 16 If NoHelix AND nFrag < 4.75 ENZ_A 0.612 0.032 - 0.076 1 1 1 0.620 33 19 If 4.75 ≤ nSSE < 6.62 AND NoHelix ENZ_A 0.588 0.032 - 0.072 1 1 1 0.538 34 51 If 461.83 ≤ df-ASA < 681.42 AND 4.75 ≤ nSSE < 6.62 ENZ_A 0.417 0.032 - 0.018 - 1 - 0.367 35 77 If 44.38 ≤ nAtom < 84.76 AND 461.83 ≤ df-ASA < 681.42 ENZ_A 0.396 0.058 - 0.023 - - - 0.159 36 34 If 9.58 ≤ nAA < 18.16 AND 44.38 ≤ nAtom < 84.76 AND 461.83 ≤ df-ASA < 681.42 ENZ_A 0.500 0.032 - 0.045 1 1 - 0.515 37 60 If 18.16 ≤ nAA < 26.74 AND 44.38 ≤ nAtom < 84.76 ENZ_A 0.357 0.032 - 0.015 - 1 - 0.351 38 10 If 84.76 ≤ nAtom < 125.14 AND 461.83 ≤ df-ASA <681.42 ENZ_B 0.617 0.053 1 0.145 1 1 - 0.636 39 13 If 12.66 ≤ sRatio < 15.06 AND 461.83 ≤ df-ASA < 681.42 ENZ_B 0.600 0.032 1 0.113 1 1 - 0.624 40 14 If 461.83 ≤ df-ASA < 681.42 AND 10.38 ≤ nSSE < 12.25 AND SCOPClass = 2 ENZ_B 0.857 0.032 - 0.230 1 1 - 0.624 41 27 If SCOPClass = 2 AND 461.83 ≤ df-ASA < 681.42 AND 84.76 ≤ nAtom < 125.14 ENZ_B 0.789 0.032 - 0.176 1 1 - 0.599 42 35 If 10.38 ≤ nSSE < 12.25 AND 12.25 ≤ nFrag < 16 ENZ_B 0.500 0.032 - 0.043 1 1 - 0.515 43 73 If 84.76 ≤ nAtom < 125.14 AND SCOPClass = 2 ENZ_B 0.408 0.042 - 0.043 - - - 0.164 44 114 If 84.76 ≤ nAtom < 125.14 AND 26.74 ≤ nAA < 35.32 ENZ_B 0.307 0.037 - 0.024 - - - 0.123 45 109 If 681.42 ≤ df-ASA < 901.01 ENZ_C 0.317 0.048 - 0.013 - - - 0.126 46 137 If 84.76 ≤ nAtom < 125.14 AND 681.42 ≤ df-ASA < 901.01 ENZ_C 0.252 0.032 - 0.009 - - - 0.098 47 146 If SCOPClass = 4 ENZ_C 0.221 0.042 - 0.011 - - - 0.091 48 101 If 35.32 901.01 nAA < 43.9 AND 125.14 ≤ nAtom < 165.52 ENZ_D 0.323 0.032 - 0.041 - - - 0.132 49 130 If SCOPClass = 3 ENZ_D 0.238 0.069 - 0.016 - - - 0.108 50 141 If 901.01 ≤ df-ASA < 1120.6 ENZ_D 0.207 0.032 - 0.050 - - - 0.096 51 54 If 1120.6 ≤ df-ASA < 1340.19 ENZ_E 0.392 0.042 - 0.018 - 1 - 0.363 Abbreviation of column names is the same as that of Table 6. The ENZ subtypes are defined in Figure 4. Note that ENZ_B includes both inhibitors and enzymes while the others are exclusively formed by inhibitors (e.g. ENZ_A, ENZ_C and ENZ_E) or enzymes (e.g. ENZ_D). We have shown that the interaction sites were dominated by non-regular region: especially for ENZ interactions, almost 23 of the sites in average were composed of non-helix and non-beta strand regions (Figure 1). This is manifested in rules 29 (Table 7), 1, 4 and 6, all of which require 50 – 80% content of non-regular regions to be classified as ENZ. Some of the rules containing negation predicates are strong indicators of certain interaction types. For example, "Nohelix " and "Nostrand " in the interaction sites imply ENZ (Rule 29) and nonENZ (Rules 7, 12 and 15), respectively. HET is characterized by relatively small portions of strands (Rules 18, and 19) and "Nostrand " (Rule 24). It is also observed that rules containing such SSE content information conjuncted with other properties (Rules 29, 7, 12, 15 and 24 in Figure 2) or combined with other rules (Figure 3(a), (b) and 3(c)) become stronger discriminators for classifying PPI types than rules containing only SSE content information (Rules 1, 2, 4, 6, 14, 18, 19 and 21 in Figure 2). We note that some rules (Rules 29 and 7 in Figure 2) containing SSE information with SCOP classes are the most discriminative and informative in order to characterize ENZ and nonENZ. Figure 2 A scatter Plot matrix for PPI types and association rules. This scatter plot matrix shows clusters as collection of points separated by association rules encoding SSE content information or a SCOP class. Different colors of the left in each plot (a cell) correspond to four PPI types. The right of a plot area presents the distribution of points met with a rule on the head of a cell. Rules 29, 40, 1, and 3 separate ENZ and nonENZ from other types remarkably with few errors. The Rule 29 is a strong discriminator to classify ENZ from other types completely. Figure 3 2D plots for pairs of association rules. These plot data points by pairs of association rules. X and Y axes are a pair of rules and each of them have two boolean values. 0 represents negative data points not meeting with a rule of each axis and 1 represents for positive data points meeting with the rule. The data points on the upper left corner meet a rule used for Y axis and the data points on the down right corner meet a rule used for X axis. The points on the upper right corner meet with both rules used for X and Y axes. Plots in Figure 3(a), (b), and (c) characterize distribution of inhibitors in enzyme-inhibitors interactions. Rule 28 is used for X axis in plots (a), (b) and (c). Rules 1, 3 and 38 are used for the Y axis in those plots. (a) represents an example for a pair of rules both including SSE information (e.g. helix and loop content). (b) and (c) show examples for combination of SSE content information (Rule 28: "Nohelix ") with other properties (e.g. SCOPClass, number of atoms and etc.). Plot (b) (Rule 3 versus Rule 28) is identical to the plot generated by Rule 29. Enzymes interacting with a group of inhibitors characterized by (a), (b), and (c) are featured by in Figure 3(e), and (f). Enzymes and inhibitors described by Rules 40 and 29 respectively are plotted in (d) where there is no point matching with both rules. Plot (d) reflects proper interpretation of association rules regarding interactions between enzymes and inhibitors.