1.2 Clustering model
STM algorithm simulates the perturbation from each node to the other nodes in a network using Equation 2, which reflects the biological and topological properties of the node. Module representatives are the nodes that record the highest scores by Equation 2 on every node in a module, i.e., they are the most influential nodes in a module biologically and topologically. After the signal transduction simulation, each node selects the most influential nodes as the representatives of modules. From these representatives, preliminary modules can be formed by aggregating each node into each module that each of its representatives stands for. Finally, these preliminary modules are merged if there are substantial interconnections between them.
The pseudocode for the STM algorithm, which employs the signal transduction function of Equation 2 and a democratic representatives selection algorithm, is shown in Table 1. The algorithm involves four sequential processes:
Table 1  STM algorithm
Algorithm 1: STM(G)
1: V: set of nodes in Graph G
2: F(c): Transduction behavior function
3: S(v, w): arrived signal from node v to node w
4: C: the list of final clusters
5: PreClusters: the list of preliminary clusters
6: for each node pair(v, w) v, w ∈ V, v ≠ w do
7:    distance(v, w) ← the shortest path length from node v to node w
8:    set parameter c in function as F(c) as distance(v, w)
9:    signal(v, w) ← S(v→w)=d(v)∏i∈P(v,w)d(i)F(c) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWucqGGOaakcqWG2bGDcqGHsgIRcqWG3bWDcqGGPaqkcqGH9aqpdaWcaaqaaiabdsgaKjabcIcaOiabdAha2jabcMcaPaqaamaarababaGaemizaqMaeiikaGIaemyAaKMaeiykaKcaleaacqWGPbqAcqGHiiIZcqWGqbaucqGGOaakcqWG2bGDcqGGSaalcqWG3bWDcqGGPaqkaeqaniabg+GivdaaaOGaemOrayKaeiikaGIaem4yamMaeiykaKcaaa@4DCA@
10: end for
11: for each node v ∈ V do
12:       v. representative ← select the best scored node w for node v
13:       if cluster_w == null then
14:          make cluster_w
15:          cluster_w.add(v)
16:          PreClusters.add(cluster_w)
17:       else
18:          cluster_w.add(v)
19:       end if
20: end for
21: C ← Merge(PreClusters) Process 1: Compute signals transduced between all node pairs.
Process 2: Select cluster representatives for each node.
Process 3: Formation of preliminary clusters.
Process 4: Merge preliminary clusters.
Process 1 propagates signals from each source node and records the signal quantities on each target node for all node pairs according to Equation 2. The implementation of Process 1 is shown on lines 6–10 of the STM algorithm in Table 1.
In Process 2, each node elects the nodes from which it receives the highest signal value as the representatives of the clusters that the node will belong to. For example, in Figure 2, nodes A, B, C, D, E, and F will choose node A and nodes L, G, and N will choose node F, which are the best scored nodes on those nodes, as the representatives.
Each preliminary cluster is initialized by taking its representative as its initial member. Preliminary clusters are then augmented by accumulating each node toward the representatives chosen by each node. Lines from 11–20 in Table 1 contain the representative selection process and the preliminary cluster formation process. Notice that STM allows overlaps among clusters by opening the possibility of multiple representatives which have the tie score on a node, etc. For example, node G picks nodes F and H, which have the tie score on node G, as its representatives in Figure 2. Then, G will belong to the cluster formed by nodes F and the cluster formed by H. Therefore, overlaps occur between the cluster formed by node F, {F, G, L, N}, and the cluster formed by node H, {G, H, I, J, K, M}. STM identified three preliminary clusters, {A, B, C, D, E, F}, {F, G, L, N}, and {G, H, I, J, K, M}, based on the choice of representatives in Figure 2.
So far, STM considers only the biological influence and the least resistance paths between protein pairs in a network. Density, i.e., interconnectivity and intraconnectivity, of detected clusters should be another important aspect that we need to consider in modularization since the clusters that have high interconnections between them have high possibility being in the same functional module. In the final merge process described in Table 2, STM takes density among detected preliminary clusters into consideration by utilizing interconnectivity among detected preliminary clusters. Some preliminary clusters should be merged if they have substantial number of interconnections to improve clusters' quality. We propose to measure the degree of interconnectivity between clusters by the similarity of two clusters i and j defined below:
Table 2  Procedure: Merge(C)
1: C: the cluster list
2: MaxPair: the cluster pair(c, k) with max interconnections among all cluster pair
3: Max.value: interconnections between cluster pair c and k
4: MaxPair ← findMaxPair(C,null)
5: while Max.value ≥ do
6:    newCluster ← merge MaxPair c and k
7:    Replace cluster c with newCluster
8:    Remove cluster k
9:    MaxPair ← findMaxPair(C,newCluster)
10: end while
11: return C S i m i l a r i t y ( i , j ) = i n t e r c o n n e c t i v i t y ( i , j )  m i n s i z e ( i , j )         ( 3 )    MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWucqWGPbqAcqWGTbqBcqWGPbqAcqWGSbaBcqWGHbqycqWGYbGCcqWGPbqAcqWG0baDcqWG5bqEcqGGOaakcqWGPbqAcqGGSaalcqWGQbGAcqGGPaqkcqGH9aqpdaWcaaqaaGqaciab=LgaPjab=5gaUjab=rha0jabdwgaLjabdkhaYjabdogaJjabd+gaVjabd6gaUjabd6gaUjabdwgaLjabdogaJjabdsha0jabdMgaPjabdAha2jabdMgaPjabdsha0jabdMha5jabcIcaOiabdMgaPjabcYcaSiabdQgaQjabcMcaPaqaaiab=1gaTjab=LgaPjab=5gaUjabdohaZjabdMgaPjabdQha6jabdwgaLjabcIcaOiabdMgaPjabcYcaSiabdQgaQjabcMcaPaaacaWLjaGaaCzcamaabmaabaGaeG4mamdacaGLOaGaayzkaaaaaa@7054@
where interconnectivity (i, j) is the number of connections between clusters i and j, and minsize(i, j) is the size of the smaller cluster among clusters i and j. The Similarity(i, j) between two clusters i and j is the ratio of the number of the connections between them to the size of the smaller cluster. Highly interconnected clusters are iteratively merged based on the similarity of the clusters. The pair of clusters that have the highest similarity are merged in each iteration and the merge process iterates until the highest similarity of all cluster pairs is less than a given threshold. The selection of the threshold for merging clusters is a critical factor for the final cluster outcome. We can see that every member of the smaller cluster has at least one interaction with the members of the other cluster if inter connectivity (i, j) ≥ minsize(i, j) between cluster pair i and j. Therefore, we conclude that two clusters should be in the same functional module if every member of the smaller cluster has at least one interaction with the members of the other cluster. Theoretically and experimentally, we can see when interconnectivity (i, j) ≥ minsize(i, j), clusters i and j have substantial interconnections. Three clusters, {A, B, C, D, E, F}, {F, G, L, N}, {G, H, I, J, K, M}, are obtained after the Process 4 when 2.0 is used as the merge threshold. Two clusters, {A, B, C, D, E, F, G, L, N}, {G, H, I, J, K, M}, are obtained after the Merge process when 1.0 is used as the merge threshold.