2.3.1 Cluster analysis 555 preliminary clusters are obtained from the yeast PPI network and merged using 1.0 as the merge threshold. In Table 3, all 60 clusters that have more than 4 proteins are listed, and it also shows their topological characteristics and their assigned molecular functions from MIPS functional categories. To facilitate critical assessments, the percentage of proteins that are in concordance with the major assigned function (hits), the discordant proteins (misses) and un-known are also indicated. Among these 60 clusters, the largest one contains 210 proteins and the smallest one contains 5 in them. On average, we have 40.1 proteins in a cluster, and the average density of the subgraphs of the clusters extracted from the PPI network is 0.2145. The -log p values of the major function identified in each cluster is also shown and these values provide a measure of the relative enrichment of a cluster for a given functional category: higher values of -log p indicate greater enrichment. The results demonstrate that the STM method can detect large but sparsely connected clusters as well as small densely connected clusters. The high values of -log p (values greater than 2.0 indicate statistical significance at α < 0.01) indicate that clusters are significantly enriched for biological function and can be considered to be functional modules. As a result, our method can clearly identify larger modules that have low density but still biologically enriched as we can see from the size, the density, and the P-value of the clusters in Table 3. Table 3 STM clustering result on the yeast PPI dataset Distribution Cluster Size Density H D U -Logp Function 1 214 0.019 24.7 69.6 5.6 43.9 Nuclear transport 2 188 0.015 69.1 25.0 5.8 36.4 Cell cycle and DNA processing 3 181 0.022 22.0 72.3 5.5 17.2 Cytoplasmic and nuclear protein degradation 4 170 0.028 46.4 42.9 10.5 31.6 Transported compounds (substrates) 5 131 0.028 37.4 55.7 6.8 28.6 Vesicular transport (Golgi network, etc.) 6 125 0.030 60.8 33.6 5.6 32.2 tRNA synthesis 7 113 0.027 19.4 71.6 8.8 11.8 Actin cytoskeleton 8 79 0.045 17.7 73.4 8.8 12.3 Homeostasis of protons 9 78 0.033 26.9 62.8 10.2 12.5 Ribosome biogenesis 10 76 0.041 38.1 59.2 2.6 20.2 rRNA processing 11 72 0.030 5.6 84.7 9.7 6.2 Calcium binding 12 68 0.064 66.1 25.0 8.8 44.5 mRNA processing 13 61 0.041 40.9 52.4 6.5 11.5 Cytoskeleton 14 58 0.064 72.4 27.6 0.0 37.4 General transcription activities 15 53 0.048 15.0 71.6 13.2 7.9 MAPKKK cascade 16 50 0.064 66.0 32.0 2.0 33.5 rRNA processing 17 45 0.055 24.4 73.3 2.2 11.1 Metabolism of energy reserves 18 44 0.058 59.0 36.3 4.5 5.1 Metabolism 19 39 0.072 10.2 89.7 0.0 7.3 Cell-cell adhesion 20 36 0.125 58.3 36.1 5.5 16.9 Vesicular transport 21 29 0.091 55.1 44.8 0.0 8.3 Phosphate metabolism 22 28 0.074 14.2 78.5 7.1 4.5 Lysosomal and vacuolar protein degradation 23 27 0.119 29.6 66.6 3.7 7.3 Cytokinesis (cell division)/septum formation 24 26 0.153 53.8 46.1 0.0 28.6 Peroxisomal transport 25 25 0.090 28.0 68.0 4.0 4.6 Regulation of C-compound and carbohydrate utilization 26 25 0.116 68.0 28 4.0 12.9 Cell fate 27 22 0.151 59.0 36.3 4.5 11.4 DNA conformation modification 28 21 0.147 76.1 19.0 4.7 23.9 Mitochondrial transport 29 20 0.200 75.0 20.0 5.0 24.0 rRNA synthesis 30 19 0.228 78.9 15.7 5.2 17.9 Splicing 31 17 0.220 70.5 29.4 0.0 19.7 Microtubule cytoskeleton 32 17 0.183 23.5 76.4 0.0 8.2 Regulation of nitrogen utilization 33 15 0.304 86.6 13.3 0.0 31.3 Energy generation 34 14 0.142 50.0 42.8 7.1 9.0 Small GTPase mediated signal transduction 35 13 0.564 76.9 23.0 0.0 15.9 Mitosis 36 13 0.358 84.6 15.4 0.0 12.4 DNA conformation modification 37 13 0.410 69.2 23.0 7.6 17.6 3'-end processing 38 13 0.179 61.5 30.7 7.6 6.7 DNA recombination and DNA repair 39 12 0.196 16.6 75.0 8.3 3.9 Unspecified signal transduction 40 12 0.363 58.3 41.6 0.0 14.7 Posttranslational modification of amino acids 41 12 0.166 16.6 75.0 8.3 2.4 Autoproteolytic processing 42 11 0.218 54.5 45.4 0.0 2.9 Transcriptional control 43 11 0.200 72.7 27.2 0.0 8.2 Enzymatic activity regulation/enzyme regulator 44 10 0.466 80.0 20.0 0.0 14.8 Translation initiation 45 9 0.361 77.7 22.2 0.0 12.8 Translation initiation 46 8 0.321 50.0 37.5 12.5 5.6 Metabolism of energy reserves 47 8 0.321 75.0 25.0 0.0 9.0 Modification by ubiquitination, deubiquitination 48 8 0.321 37.5 62.5 0.0 3.7 Mitosis 49 7 0.333 42.8 57.1 0.0 3.5 DNA damage response 50 7 0.333 57.1 28.5 14.2 4.1 Vacuolar transport 51 7 0.285 28.5 71.4 0.0 4.4 Biosynthesis of serine 52 6 0.333 50.0 33.3 16.6 2.38 Modification by phosphorylation, dephosphorylation, etc. 53 5 0.400 100 0.0 0.0 7.0 Meiosis 54 5 0.600 100 0.0 0.0 7.0 Vacuolar transport 55 5 0.400 100 0.0 0.0 8.5 ER to Golgi transport 56 5 0.400 20.0 40.0 40.0 1.8 cAMP mediated signal transduction 57 5 0.500 40.0 40.0 20.0 3.1 Oxidative stress response 58 5 0.500 80.0 20.0 0.0 4.4 Intracellular signalling 59 5 0.600 40.0 60.0 0.0 4.2 Tetracyclic and pentacyclic triterpenes 60 5 0.400 60.0 40.0 0.0 4.1 Mitochondrial transport The first column is a cluster identifier; the Size column indicates the number of proteins in each cluster; the Density indicates the density of the cluster; the H column indicates the percentage of proteins concordant with the major function indicated in the last column; the D column indicates the percentage of proteins discordant with the major function and U column indicates percentage of proteins not assigned to any function. Figure 4 exhibits the distribution of the hit, miss, and unknown percentage of member proteins with the assigned function for each cluster in Table 3 for better understanding visually. We found that most of the proteins in a cluster have the same functions that are assigned as a main function for the cluster as shown in Figure 4. Figure 4 Distribution of the three classes of 60 clusters. Distribution of the three classes of 60 clusters: the hit percentage with the assigned function, discordant percentage from the assigned function, and unknown percentage.