PMC:5872385 / 1900-7364
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"29589566-23765498-14859306","span":{"begin":93,"end":94},"obj":"23765498"},{"id":"29589566-15364960-14859307","span":{"begin":402,"end":403},"obj":"15364960"},{"id":"29589566-11257102-14859308","span":{"begin":427,"end":428},"obj":"11257102"},{"id":"29589566-26734603-14859309","span":{"begin":464,"end":465},"obj":"26734603"},{"id":"29589566-24045689-14859310","span":{"begin":671,"end":672},"obj":"24045689"},{"id":"29589566-10592173-14859311","span":{"begin":1207,"end":1208},"obj":"10592173"},{"id":"29589566-28249561-14859311","span":{"begin":1207,"end":1208},"obj":"28249561"},{"id":"29589566-24211779-14859312","span":{"begin":1539,"end":1541},"obj":"24211779"},{"id":"29589566-22426491-14859313","span":{"begin":1790,"end":1792},"obj":"22426491"},{"id":"29589566-27081328-14859314","span":{"begin":2041,"end":2043},"obj":"27081328"},{"id":"29589566-24554629-14859315","span":{"begin":2045,"end":2047},"obj":"24554629"},{"id":"29589566-24554629-14859316","span":{"begin":2297,"end":2299},"obj":"24554629"},{"id":"29589566-23986566-14859317","span":{"begin":3097,"end":3099},"obj":"23986566"},{"id":"29589566-23986566-14859318","span":{"begin":3412,"end":3414},"obj":"23986566"},{"id":"29589566-25979476-14859319","span":{"begin":3416,"end":3418},"obj":"25979476"},{"id":"29589566-26510976-14859320","span":{"begin":3626,"end":3628},"obj":"26510976"},{"id":"29589566-27291302-14859321","span":{"begin":5119,"end":5121},"obj":"27291302"}],"text":"Background\nThe exponential increase of biological data (genomic, transcriptomic, proteomic) [1] and of biological interaction knowledge in Pathway Databases allows modeling cellular regulatory mechanisms. Modeling biological mechanisms is done, most of the time, using boolean or ordinary differential equation representations. Those approaches have shown their efficiency in cellular phenomena study [2], disease research [3, 4], and bio-production optimization [5]. However, those modeling approaches cannot take into account the large amount of OMIC data. This limitation requires that the researcher preselects the OMIC data and network, adding bias to the analysis [6]. A classical way to perform OMIC data preselection is to use differentially expressed genes [7], this leads to select genes by imposing common fixed thresholds while their activation threshold may be specific for each gene. As a consequence the selected pathways may not be specific for the biological problematic. A common way to perform network preselection consists on choosing specific pathways according to the type of data and the biological problematic. Moreover, several regulatory databases such as KEGG, CBN, and Reactome [8–10] allow to select specific (e.g. apoptosis) pathways directly. Nevertheless, this network preselection approach can hide unsuspected pathways, reducing the possibility to discover new ones.\nSome of the methods that identify subnetworks or network components, recognize specific pathways based on differentially expressed genes [11]. However, this kind of approaches considers pathways independently, and does not take into account the interactions between biological compounds. Other methods were developed to find involved pathways by identifying subgraphs or network clusters [12] from a regulatory network using topological informations and then use the gene expression profiles (GEPs) to identify a specific cluster. The majority of such methods uses protein-protein interaction (PPI) networks and GEPs to identify subgraphs [13, 14]. Those methods consider the interactions between biological compounds but infer protein states based on the associated GEP. That is, the built subgraph contains expressed proteins (obtained from associated genes expression) and their interactions [14]. These methods assume that a correleation between gene expression and protein activity exists, which is not necessarily true since an increase on gene expression can account of an increase of protein quantity, however in order to increase the activity of a protein another (e.g. phosphorylation) mechanism may need to be included. Methods using PPI networks are limited since they do not consider causality logic and different interaction roles. While the notion of causality is used by methods such as [15] to find a subgraph which maximizes the genes expression variation information; to our knowledge few subgraph identification methods based on GEPs consider direct interactions in regulatory networks, and much less include the different kind of interaction role (activation or inhibition) [16]. Moreover, the majority of those methods study protein interactions based on GEPs and without taking into account the difference between transcriptional and post-translational regulation. Finally, approaches that include the interaction role in their integrative analysis to link regulatory networks with GEPs [16, 17] use a local strategy, that is, they analyze sequentially each node in the graph with respect to its predecessors.\nIn this study we propose a method based on exhaustive and global graph coloring approaches [18]. These approaches are able to predict the graph coloring configurations, in terms of discrete states (e.g. active or inactive) of the molecular species of a biological network with respect to a set of experimental observations. In this work we extend those approaches by looking for harmonious or perfect colorations. The intuition behind the harmonious or perfectness notion is to point to reachable network discrete states that maximize the agreement between the molecular species active or inactive states and the directionality of the pathways reactions according to their activator or inhibitor control role. This can be expressed in natural language as follows: “for a given node in the graph we impose that its discrete active or inactive state is explained by a maximal number of regulators”. This statement is inspired from a hypothesis of redundancy in biological networks control, and we use Logic Programming to express this statement and search for coloring models where it holds for every node in the graph. Afterwards, we correlate the graph coloring models that maximize the perfectness notion and in this way build correlated graph components. After adding experimental data, our method is able to identify components of interest. We present an application of this method with transcriptomic data from myeloma cells (MC) of 602 MM patients and from normal plasma cells (NPC) of 9 healthy donors. Multiple myeloma is a hematologic malignancy representing 1% of all cancer [19] with a survival rate of 49.6% after 5 years. Our method of perfect graph colorings identification allowed us to identify 15 components. One of these components was statistically specific to MC in comparison to NPC. Using gene ontology enrichment analysis with the PANTHER tool we were able to associate this component to oncogenic phenomena."}