2.3.1 Cluster analysis
555 preliminary clusters are obtained from the yeast PPI network and merged using 1.0 as the merge threshold. In Table 3, all 60 clusters that have more than 4 proteins are listed, and it also shows their topological characteristics and their assigned molecular functions from MIPS functional categories. To facilitate critical assessments, the percentage of proteins that are in concordance with the major assigned function (hits), the discordant proteins (misses) and un-known are also indicated. Among these 60 clusters, the largest one contains 210 proteins and the smallest one contains 5 in them. On average, we have 40.1 proteins in a cluster, and the average density of the subgraphs of the clusters extracted from the PPI network is 0.2145. The -log p values of the major function identified in each cluster is also shown and these values provide a measure of the relative enrichment of a cluster for a given functional category: higher values of -log p indicate greater enrichment. The results demonstrate that the STM method can detect large but sparsely connected clusters as well as small densely connected clusters. The high values of -log p (values greater than 2.0 indicate statistical significance at α < 0.01) indicate that clusters are significantly enriched for biological function and can be considered to be functional modules. As a result, our method can clearly identify larger modules that have low density but still biologically enriched as we can see from the size, the density, and the P-value of the clusters in Table 3.
Table 3  STM clustering result on the yeast PPI dataset
Distribution
Cluster  Size  Density  H  D  U  -Logp  Function
1  214  0.019  24.7  69.6  5.6  43.9  Nuclear transport
2  188  0.015  69.1  25.0  5.8  36.4  Cell cycle and DNA processing
3  181  0.022  22.0  72.3  5.5  17.2  Cytoplasmic and nuclear protein degradation
4  170  0.028  46.4  42.9  10.5  31.6  Transported compounds (substrates)
5  131  0.028  37.4  55.7  6.8  28.6  Vesicular transport (Golgi network, etc.)
6  125  0.030  60.8  33.6  5.6  32.2  tRNA synthesis
7  113  0.027  19.4  71.6  8.8  11.8  Actin cytoskeleton
8  79  0.045  17.7  73.4  8.8  12.3  Homeostasis of protons
9  78  0.033  26.9  62.8  10.2  12.5  Ribosome biogenesis
10  76  0.041  38.1  59.2  2.6  20.2  rRNA processing
11  72  0.030  5.6  84.7  9.7  6.2  Calcium binding
12  68  0.064  66.1  25.0  8.8  44.5  mRNA processing
13  61  0.041  40.9  52.4  6.5  11.5  Cytoskeleton
14  58  0.064  72.4  27.6  0.0  37.4  General transcription activities
15  53  0.048  15.0  71.6  13.2  7.9  MAPKKK cascade
16  50  0.064  66.0  32.0  2.0  33.5  rRNA processing
17  45  0.055  24.4  73.3  2.2  11.1  Metabolism of energy reserves
18  44  0.058  59.0  36.3  4.5  5.1  Metabolism
19  39  0.072  10.2  89.7  0.0  7.3  Cell-cell adhesion
20  36  0.125  58.3  36.1  5.5  16.9  Vesicular transport
21  29  0.091  55.1  44.8  0.0  8.3  Phosphate metabolism
22  28  0.074  14.2  78.5  7.1  4.5  Lysosomal and vacuolar protein degradation
23  27  0.119  29.6  66.6  3.7  7.3  Cytokinesis (cell division)/septum formation
24  26  0.153  53.8  46.1  0.0  28.6  Peroxisomal transport
25  25  0.090  28.0  68.0  4.0  4.6  Regulation of C-compound and carbohydrate utilization
26  25  0.116  68.0  28  4.0  12.9  Cell fate
27  22  0.151  59.0  36.3  4.5  11.4  DNA conformation modification
28  21  0.147  76.1  19.0  4.7  23.9  Mitochondrial transport
29  20  0.200  75.0  20.0  5.0  24.0  rRNA synthesis
30  19  0.228  78.9  15.7  5.2  17.9  Splicing
31  17  0.220  70.5  29.4  0.0  19.7  Microtubule cytoskeleton
32  17  0.183  23.5  76.4  0.0  8.2  Regulation of nitrogen utilization
33  15  0.304  86.6  13.3  0.0  31.3  Energy generation
34  14  0.142  50.0  42.8  7.1  9.0  Small GTPase mediated signal transduction
35  13  0.564  76.9  23.0  0.0  15.9  Mitosis
36  13  0.358  84.6  15.4  0.0  12.4  DNA conformation modification
37  13  0.410  69.2  23.0  7.6  17.6  3'-end processing
38  13  0.179  61.5  30.7  7.6  6.7  DNA recombination and DNA repair
39  12  0.196  16.6  75.0  8.3  3.9  Unspecified signal transduction
40  12  0.363  58.3  41.6  0.0  14.7  Posttranslational modification of amino acids
41  12  0.166  16.6  75.0  8.3  2.4  Autoproteolytic processing
42  11  0.218  54.5  45.4  0.0  2.9  Transcriptional control
43  11  0.200  72.7  27.2  0.0  8.2  Enzymatic activity regulation/enzyme regulator
44  10  0.466  80.0  20.0  0.0  14.8  Translation initiation
45  9  0.361  77.7  22.2  0.0  12.8  Translation initiation
46  8  0.321  50.0  37.5  12.5  5.6  Metabolism of energy reserves
47  8  0.321  75.0  25.0  0.0  9.0  Modification by ubiquitination, deubiquitination
48  8  0.321  37.5  62.5  0.0  3.7  Mitosis
49  7  0.333  42.8  57.1  0.0  3.5  DNA damage response
50  7  0.333  57.1  28.5  14.2  4.1  Vacuolar transport
51  7  0.285  28.5  71.4  0.0  4.4  Biosynthesis of serine
52  6  0.333  50.0  33.3  16.6  2.38  Modification by phosphorylation, dephosphorylation, etc.
53  5  0.400  100  0.0  0.0  7.0  Meiosis
54  5  0.600  100  0.0  0.0  7.0  Vacuolar transport
55  5  0.400  100  0.0  0.0  8.5  ER to Golgi transport
56  5  0.400  20.0  40.0  40.0  1.8  cAMP mediated signal transduction
57  5  0.500  40.0  40.0  20.0  3.1  Oxidative stress response
58  5  0.500  80.0  20.0  0.0  4.4  Intracellular signalling
59  5  0.600  40.0  60.0  0.0  4.2  Tetracyclic and pentacyclic triterpenes
60  5  0.400  60.0  40.0  0.0  4.1  Mitochondrial transport
The first column is a cluster identifier; the Size column indicates the number of proteins in each cluster; the Density indicates the density of the cluster; the H column indicates the percentage of proteins concordant with the major function indicated in the last column; the D column indicates the percentage of proteins discordant with the major function and U column indicates percentage of proteins not assigned to any function. Figure 4 exhibits the distribution of the hit, miss, and unknown percentage of member proteins with the assigned function for each cluster in Table 3 for better understanding visually. We found that most of the proteins in a cluster have the same functions that are assigned as a main function for the cluster as shown in Figure 4.
Figure 4  Distribution of the three classes of 60 clusters. Distribution of the three classes of 60 clusters: the hit percentage with the assigned function, discordant percentage from the assigned function, and unknown percentage.