PMC:1481596 / 9548-21168 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1481596","sourcedb":"PMC","sourceid":"1481596","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1481596","text":"Complex overlap decomposition\nOur method of representing overlapping functional groups, which is depicted in Figure 1, builds on chordal and cograph graph theories. Chordal graphs constitute an important and well studied graph family [18,19]. A chord in a graph is any edge that connects two non-consecutive nodes of a cycle. A chordal graph is a graph which does not contain chordless cycles of length greater than three.\nFigure 1 Complex Overlap Decomposition. A simplified illustration of the Complex Overlap Decomposition (COD) method. An edge, (3, 4), connecting a pair of weak siblings is added to the graph. A fill-in edge between proteins 5 and 8 is added to eliminate all five 4-cycles in the graph: {5, 6, 8, 7}, {1, 5, 7, 8}, {2, 5, 7, 8}, {1, 5, 6, 8}, and {2, 5, 6, 8}. If the modified graph is chordal, all clique tree representations are computed (cf. Methods). Each clique tree representation results in a Tree of Complexes representation, where the Tree of Complexes is constructed by projecting each maximal clique in the modified graph, G*, to a functional group in the original graph G. For example, a four node maximal clique, {1, 2, 5, 8}, in G* is projected to a four node functional group in G, by removing a fill-in edge (5, 8). Each functional group is represented by a Boolean expression, such as (1 ∧ 2) ∧ (5 ∨ 8), which means that the functional group contains two variants of a complex, {1, 2, 5} and {1, 2, 8}. An important property of chordal graphs, which is explored directly in this paper, is that every chordal graph has a corresponding clique tree representation or clique tree [17]. The nodes in the tree are maximal cliques. Moreover, for every node in the graph, a set of maximal cliques that contain this node form a connected subgraph of the clique tree. Thus, there is a mapping between the nodes in the graph and subtrees in the clique tree. The \"connected subgraph\" requirement puts constraints on the topology of the clique tree. In fact, the topology of the tree is determined by the structure of overlaps between the maximal cliques in the graph. Thus, the clique tree captures information about the structure of the overlaps, which is lost in a simple clique intersection graph as shown in the example below.\nExample Consider a hypothetical protein interaction network in Figure 2(a). This network is chordal and its maximal cliques are listed in Figure 2(b). We want to contrast the clique tree representation in Figure 2(d) to a naive representation in Figure 2(c), where every pair of maximal cliques that contain a protein in common is connected by an edge.\nFigure 2 A Hypothetical Protein Interaction Network. (a) A hypothetical protein interaction network. (b) A list of all maximal cliques in the network. (c) A naive representation of overlaps between maximal cliques. Each maximal clique is a node and there is an edge between two maximal cliques if and only if they share a protein. (d) The clique tree representation. Once again, every maximal clique is a node, but the cliques are connected in such a way that the resulting graph is a tree. Moreover, cliques that contain a given protein form a connected subgraph. (e) This color scheme is used to show the subtree of every protein. For example, protein 3 is contained in maximal cliques A, B, and C, which is shown by placing yellow dots above the maximal cliques. While both representations show the overlap between maximal cliques, the interconnection pattern of cliques in the naive representation carries little additional information about the structure of this overlap. On the other hand, a very specific tree-like interconnection pattern in the clique tree representation can expose a special structure of such overlap. For example, consider maximal cliques B through F. In the naive representation, the overlap between these maximal cliques is collapsed to a clique. Thus, the representation treats the maximal cliques and overlaps between them equally. In particular, there is no way to tell that, for example, D occupies a more central position in the network than B. In the clique tree representation this information can be extracted from the relative position of cliques in the tree. For example, B is connected to F by a path that passes through C and D, which means that any protein shared by B and F is also contained in C and D. In other words, the overlap between B and F is entirely contained in the overlap between B and D, which in turn is entirely contained in the overlap between B and C. Thus, there is a correlation between the amount of overlap between maximal cliques and their distance in the clique tree.\nNice properties of the clique tree mentioned above make it a good choice for representation of overlaps between functional groups. However, not every protein interaction network is chordal and maximal cliques may not always be the best way to represent functional groups. For example, in Figure 1, cliques {1, 2, 3} and {1, 2, 4} may correspond to two variants of one complex, where proteins 3 and 4 replace each other, forming one rather than two functional groups. Therefore, in the COD decomposition we relax the assumption that every functional group is a single protein complex (maximal clique) and allow it to contain several protein complexes (maximal cliques). In doing so we have to ensure that a functional group is not just any collection of protein complexes but rather a set of closely related protein complexes which represent possible variants of one complex (such as complexes {1, 2, 3} and {1, 2, 4} in the above example). To capture this systematically we model a functional group with a graph from a family of graphs known as cographs.\nCographs are another well-studied graph family [20]. A cograph can be characterized by an absence of an induced subgraph which is a path of length four (P4), where the length is the number of nodes in the path. Thus, the diameter of a connected cograph is at most two. Subsequently, connected cographs are dense and cliquish, consistently with the assumption made by algorithms that delineate protein complexes. What makes cographs even more attractive is that for every cograph there exists a Boolean expression which describes all the maximal cliques in the graph. (In terms of modular decomposition used in [16] it means that a cograph can be decomposed by modular decomposition without leaving non-trivial non-decomposable prime module.) This Booolean expression describes in a compact and hierarchical way all the possible variants of protein complex within a functional group.\nThe main idea behind COD method is to provide a representation of a functional module, which is analogous to the clique tree, in which nodes are cographs (representing variants of protein complex within a functional group) rather than maximal cliques. If we knew in advance all the functional groups in the module, we could simply connect the proteins within each functional group turning it into a clique and, under the assumption that the resulting graph is chordal, apply clique tree construction algorithm to the graph. Since we do not have predefined functional groups, our algorithm identifies them by adding edges to the graph in such a way that each added edge connects a pair of nodes that putatively belong to the same functional group.\nThe COD method's edge addition strategy and its biological motivation builds on a concept of weak siblings. We call a pair of nodes weak siblings if and only if they are connected to the exactly the same set of neighbors, but are not connected to each other. In terms of protein interaction networks, weak siblings are proteins which interact with the same set of proteins but do not interact with each other. In particular, proteins that can substitute each other in a protein interaction network may have this property. Similarly, a pair of proteins that belong to the same complex but are not connected due to missing data or an experimental error will be represented as weak siblings. Since the weak siblings relationship suggests functional similarity, the COD method takes a first step towards delineation of functional groups by connecting every pair of weak siblings by an edge. As this modification may also eliminate some of the chordless cycles of length four (squares) in the graph, functional group delineation happens simultaneously with transformation of the protein interaction graph into a chordal graph.\nIf, after connecting all pairs of weak siblings, the resulting graph is not chordal, the COD method attempts to transform it to chordal by adding some additional edges. Consistently with our assumption that we connect only nodes corresponding to proteins that could be put in the same functional group, we impose restrictions on this \"fill-in\" process. Namely, we require that, each introduced edge connects a pair of nodes which are close to being weak siblings. In such case the new edge is a diagonal of one or more squares in the graph. We emphasize that adding edges between nodes of longer cycles has no such justification. To summarize our edge addition procedure, our method attempts to eliminate all the squares in the protein interaction network by adding a limited set of diagonals that satisfies following conditions (i) connects potentially functionally equivalent proteins, as measured by the overlap in neighborhoods or distance from being a pair of weak siblings; (ii) ensures that functional groups correspond to cographs; we argue that this condition is guaranteed if the set of added edges does not form a P4 in a maximal clique of the modified graph (cf. Methods);\nIf the modification step succeeds, i.e., the modified graph is chordal, all the clique tree representations of the modified graph are computed and then each clique tree is extended to a Tree of Complexes representation of the original graph. The COD algorithm keeps track of all the edge additions and uses this information to delineate functional groups by projecting each maximal clique onto original network and removing all introduced edges contained in the clique. For example, in the modified graph of Figure 1 a maximal clique with four nodes, {1, 2, 5, 8}, is projected to a functional group by removing an edge connecting proteins 5 and 8. This functional group contains two variants of a protein complex, {1, 2, 5} and {1, 2, 8}, which are compactly represented by a (1 ∧ 2) ∧ (5 ∨ 8) Boolean expression. If, on the other hand, the modified graph is not chordal, the COD method stops without producing the representation. Since the clique tree representation for a chordal graph is not unique, the Tree of Complexes representation that derives from it is not unique either. As all clique trees of a chordal graph have the same set of nodes (the nodes are the maximal cliques in the graph), the difference between clique trees comes from the topology of the tree. The clique tree topology is determined by the \"connected subgraph\" constraints and restriction power of these constraints depends on the structure of the underlying graph, i.e., there are graphs with a unique clique tree representation and there are graphs for which almost any tree that spans all the maximal cliques in the graph is a valid clique tree. As a result a protein interaction network may have several Tree of Complexes representations; all such representations will have the same functional groups but will differ in the way these functional groups are interconnected. For every protein interaction network analyzed bellow we explicitly state all the possible Tree of Complexes representations.","divisions":[{"label":"title","span":{"begin":0,"end":29}},{"label":"p","span":{"begin":30,"end":422}},{"label":"figure","span":{"begin":423,"end":1442}},{"label":"label","span":{"begin":423,"end":431}},{"label":"caption","span":{"begin":433,"end":1442}},{"label":"p","span":{"begin":433,"end":1442}},{"label":"p","span":{"begin":1443,"end":2258}},{"label":"p","span":{"begin":2259,"end":2611}},{"label":"figure","span":{"begin":2612,"end":3378}},{"label":"label","span":{"begin":2612,"end":2620}},{"label":"caption","span":{"begin":2622,"end":3378}},{"label":"p","span":{"begin":2622,"end":3378}},{"label":"p","span":{"begin":3379,"end":4647}},{"label":"p","span":{"begin":4648,"end":5702}},{"label":"p","span":{"begin":5703,"end":6585}},{"label":"p","span":{"begin":6586,"end":7332}},{"label":"p","span":{"begin":7333,"end":8454}},{"label":"p","span":{"begin":8455,"end":9639}}],"tracks":[{"project":"2_test","denotations":[{"id":"16722537-15287979-1694488","span":{"begin":6314,"end":6316},"obj":"15287979"}],"attributes":[{"subj":"16722537-15287979-1694488","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#93ece5","default":true}]}]}}