PMC:4307189 / 15340-18274 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"25599599-20100321-14868122","span":{"begin":418,"end":420},"obj":"20100321"},{"id":"25599599-20100321-14868123","span":{"begin":1019,"end":1021},"obj":"20100321"}],"text":"Network model\nFor statistical modeling of networks, exponential families of distributions offer robust and flexible parametric models [24]. These probabilistic models can be used to evaluate the probability that an edge is present in the network. They can also be used to quantify topological properties of networks by summarizing them in a parametric form and associating sufficient statistics with those parameters [19,24]. In this study, we use a special class of exponential family distributions known as ERGM (Exponential Random Graph Models), also known as the p1-model, which was introduced by Holland and Leinhardt [24].\nA gene-gene relationship network with g genes can be regarded as a random variable X taking values from a set G containing all 2g(g−1) possible relationship networks [24,25]. Let u be a generic point of G which can alternatively be denoted as the realization of X by X = u. Let the binary outcome uij = 1 if genei interacts with genej, or uij = 0 otherwise. Then u is a binary data matrix [19]. Let Pr(u) be the probability function on G given by (1) Pr(u)=Pr(X=u)=1κθexp∑pθpzpu\nwhere zp(u) is the network statistic of type p, θp is the parameter associated with zp(u) and κ(θ) is the normalizing constant that ensures Pr(u) is a proper probability distribution (sums to 1 over all u in G) [26]. The parameter θ is a vector of model parameters associated with network statistics and needs to be estimated. See [24] for further details.\nA major limitation of the p1-model is the difficulty of calculating the normalizing constant, κ(θ), since it is a sum over the entire graph space. Estimating the maximum likelihood of this model becomes intractable as there are 2g(g−1) possible directed graphs (or 2g(g−1)2 undirected graphs), each having g nodes (genes). A technique called maximum pseudolikelihood estimation has been developed to address this problem [27]. This technique employs MCMC methods such as Gibbs or Metropolis-Hastings sampling algorithms [28].\nThe construction of the p1-model for a directed network is described in an Appendix Additional file 1: Appendix I. For the gene-gene relationship network with undirected edges, the description of the p1-model can be simplified by using only two Bernoulli variables Yij0 and Yij1 instead of four as follows: Yijk=1ifuij=k,0otherwise\nThe simplified p1-model can then be defined using the following two equations to predict the probability of an edge being present between genei and genej: (2) logPrYij1=1=λij+θ+αi+αj\n(3) log Pr Y ij 0 = 1 = λ ij\nfor i\u003cj. Note that λij is chosen to ensure Pr(Yij0=1)+Pr(Yij1=1)=1. In this formulation, the expansiveness and attractiveness parameters were reduced to a single parameter, α, which represents the propensity of a gene to be connected in an undirected network. Hence, the p1-model seeks to find the probabilities of edge formation in a network considering its structural features explicitly."}