1.3 Cluster assessment The structures of the clusters identified by STM and other competing alternative approaches are assessed using several metrics. The clustering coefficient, C(v), of a node v measures the connectivity among its direct neighbors: C ( v ) = 2 | ∪ i , j ∈ N ( v ) ( i , j ) | d ( v ) ( d ( v ) − 1 )       ( 4 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGdbWqcqGGOaakcqWG2bGDcqGGPaqkcqGH9aqpdaWcaaqaaiabikdaYmaaemaabaWaambeaeaacqGGOaakcqWGPbqAcqGGSaalcqWGQbGAcqGGPaqkaSqaaiabdMgaPjabcYcaSiabdQgaQjabgIGiolabd6eaojabcIcaOiabdAha2jabcMcaPaqab0GaeSOkIufaaOGaay5bSlaawIa7aaqaaiabdsgaKjabcIcaOiabdAha2jabcMcaPiabcIcaOiabdsgaKjabcIcaOiabdAha2jabcMcaPiabgkHiTiabigdaXiabcMcaPaaacaWLjaGaaCzcamaabmaabaGaeGinaqdacaGLOaGaayzkaaaaaa@567E@ In Equation 4, N (v) is the set of the direct neighbors of node v and d (v) is the number of the direct neighbors of node v. Highly connected nodes have high values of clustering coefficient. Degree centrality orders nodes by the number of their direct neighbors, and betweenness centrality measures the nodes' importance from the information flow point of view in a network. Degree and betweenness centrality commonly used to measure the importance of a node in a network. The Betweeness Centrality, CB (v), is a measure of the global importance of a node that assesses the proportion of shortest paths between all node pairs that pass through the node of interest. The Betweeness Centrality, CB (v) for a node of interest, v, is defined by: C B ( v ) = ∑ s ≠ v ≠ t ∈ V ρ s t ( v ) ρ s t       ( 5 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGdbWqdaWgaaWcbaGaemOqaieabeaakiabcIcaOiabdAha2jabcMcaPiabg2da9maaqafabaWaaSaaaeaaiiGacqWFbpGCdaWgaaWcbaGaem4CamNaemiDaqhabeaakiabcIcaOiabdAha2jabcMcaPaqaaiab=f8aYnaaBaaaleaacqWGZbWCcqWG0baDaeqaaaaaaeaacqWGZbWCcqGHGjsUcqWG2bGDcqGHGjsUcqWG0baDcqGHiiIZcqWGwbGvaeqaniabggHiLdGccaWLjaGaaCzcamaabmaabaGaeGynaudacaGLOaGaayzkaaaaaa@5089@ In the Equation 5, ρst is the number of shortest paths from node s to t and ρst (v) the number of shortest paths from s to t that pass through the node v. The extent to which the clusters are associated with a specific biological function is evaluated using a p-value based on the hypergeometric distribution [7]. The p-value is the probability that a cluster would be enriched with proteins with a particular function by chance alone. The p-value is given by: p = 1 − ∑ i = 0 k − 1 ( C i ) ( G − C n − i ) ( G n )       ( 6 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCcqGH9aqpcqaIXaqmcqGHsisldaaeWbqaamaalaaabaWaaeWaaeaafaqabeGabaaabaGaem4qameabaGaemyAaKgaaaGaayjkaiaawMcaamaabmaabaqbaeqabiqaaaqaaiabdEeahjabgkHiTiabdoeadbqaaiabd6gaUjabgkHiTiabdMgaPbaaaiaawIcacaGLPaaaaeaadaqadaqaauaabeqaceaaaeaacqWGhbWraeaacqWGUbGBaaaacaGLOaGaayzkaaaaaaWcbaGaemyAaKMaeyypa0JaeGimaadabaGaem4AaSMaeyOeI0IaeGymaedaniabggHiLdGccaWLjaGaaCzcamaabmaabaGaeGOnaydacaGLOaGaayzkaaaaaa@4E0E@ In Equation 6, C is the size of the cluster containing k proteins with a given function; G is the size of the universal set of proteins of known proteins and contains n proteins with the function. Because the p-values are frequently small numbers with positive values between 0 and 1, the negative logarithms (to base 10, denoted -log p) are used. A -log p value of 2 or greater indicates statistical significance at α = 0.01. The density of a subgraph s in a PPI network is measured by: D s = 2 e n ( n − 1 )       ( 7 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGebardaWgaaWcbaGaem4Camhabeaakiabg2da9maalaaabaGaeGOmaiJaemyzaugabaGaemOBa4MaeiikaGIaemOBa4MaeyOeI0IaeGymaeJaeiykaKcaaiaaxMaacaWLjaWaaeWaaeaacqaI3aWnaiaawIcacaGLPaaaaaa@3CDF@ In Equation 7, n is the number of proteins and e is the number of interactions in a subgraph s of a PPI network.