PMC:5369021 / 11468-16093
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"28347313-25161226-14906455","span":{"begin":63,"end":65},"obj":"25161226"},{"id":"28347313-20709693-14906456","span":{"begin":3605,"end":3607},"obj":"20709693"},{"id":"28347313-22955619-14906457","span":{"begin":3621,"end":3623},"obj":"22955619"},{"id":"28347313-20442302-14906458","span":{"begin":3852,"end":3854},"obj":"20442302"},{"id":"28347313-16990858-14906459","span":{"begin":3856,"end":3858},"obj":"16990858"},{"id":"28347313-16827748-14906460","span":{"begin":3910,"end":3912},"obj":"16827748"}],"text":"Estimation of TF activity by the effect on their target genes [31]\nThe idea of this method is to use the expression levels of TF’s target genes to infer their integrated effect (see Fig. 3). The method uses expression data and database curated TF binding information as input whereby the TF – gene network is restricted to genes regulated by more than 10 TFs and TFs with at least 5 target genes. The model is closely related to the abovementioned general framework, only adding a term for the sample specific effect of a TF. Specifically, the activity of a TF is modelled linearly by its cumulative effect on its target genes normalized by the sum of target genes or the TF’s gene expression level:\nFig. 3 Flow chart of the approach by Schacht et al. The input data sets (marked in blue) are partly filtered and passed to a linear regression model (yellow) which calculates an activity value for each TF (green) \\documentclass[12pt]{minimal} \t\t\t\t\\usepackage{amsmath} \t\t\t\t\\usepackage{wasysym} \t\t\t\t\\usepackage{amsfonts} \t\t\t\t\\usepackage{amssymb} \t\t\t\t\\usepackage{amsbsy} \t\t\t\t\\usepackage{mathrsfs} \t\t\t\t\\usepackage{upgreek} \t\t\t\t\\setlength{\\oddsidemargin}{-69pt} \t\t\t\t\\begin{document}$$ \\widehat{g_{i, s}}= c+{\\displaystyle \\sum_t}{\\beta}_t{b}_{t, i}\\left({\\theta}_{a, t} ac{t}_{t, s}+{\\theta}_{g, t}{g}_{t, s}\\right) $$\\end{document}gi,s^=c+∑tβtbt,iθa,tactt,s+θg,tgt,swhere \\documentclass[12pt]{minimal} \t\t\t\t\\usepackage{amsmath} \t\t\t\t\\usepackage{wasysym} \t\t\t\t\\usepackage{amsfonts} \t\t\t\t\\usepackage{amssymb} \t\t\t\t\\usepackage{amsbsy} \t\t\t\t\\usepackage{mathrsfs} \t\t\t\t\\usepackage{upgreek} \t\t\t\t\\setlength{\\oddsidemargin}{-69pt} \t\t\t\t\\begin{document}$$ \\widehat{g_{i, s}} $$\\end{document}gi,s^ denotes the predicted gene expression of gene i in sample s, c is an additive offset, β t describes the estimated activity of TF t and b t,i refers to the underlying strength of the relation between TF t and gene i reflecting the binding affinity. The estimated effect of a TF in a certain sample is calculated via the switch-like term in parentheses, where either the activity definition \\documentclass[12pt]{minimal} \t\t\t\t\\usepackage{amsmath} \t\t\t\t\\usepackage{wasysym} \t\t\t\t\\usepackage{amsfonts} \t\t\t\t\\usepackage{amssymb} \t\t\t\t\\usepackage{amsbsy} \t\t\t\t\\usepackage{mathrsfs} \t\t\t\t\\usepackage{upgreek} \t\t\t\t\\setlength{\\oddsidemargin}{-69pt} \t\t\t\t\\begin{document}$$ a c{t}_{t, s}=\\frac{{\\displaystyle {\\sum}_i}{b}_{t, i}{g}_{i, s}}{{\\displaystyle {\\sum}_i}{b}_{t, i}} $$\\end{document}actt,s=∑ibt,igi,s∑ibt,i or the gene expression of the TF itself g t,s is taken into account using the restrictions θ a,t, θ g,t ∈ {0, 1} and θ a,t + θ g,t = 1. This switch term represents a meta-parameter to find the best model and has no biological interpretation. The model outputs an activity value and the information which switch parameter is chosen for each TF of the reduced network.\nDuring the optimization, the sum of error terms (absolute value of the difference between predicted and measured gene expression) is minimized which is achieved via mixed-integer linear programming using the Gurobi 5.5 optimizer.1 The authors of this method state that the activity definition (see above) was used in 95% of their test cases, but the switch-like combination of both terms yielded still better optimization results. In the paper, the optimization task is greatly simplified as the model is computed for each gene separately and allows only a maximum number of 6 regulating TFs. The TF – gene network indicating the strength of a relation between a TF and a gene is created for 1120 TFs using knowledge from the commercial MetaCore™ database,2 ChEA [36] and ENCODE [15]. Due to the restriction of the network mentioned above, the actual model is then based on 521 TFs and 636 target genes only.\nEvaluation of the results was performed using expression data from 59 cell lines of the NCI-60 panel [37, 38] and from melanoma cell lines (“Mannheim cohort”) [39]. A sample based leave-one-out and 10-fold cross validation of predicted and measured gene expression yielded Pearson correlation scores of about 0.6 for both data sets. A gene set enrichment analysis of the target genes for TFs modelled by the activity definition yielded 64 significantly enriched concepts including cell cycle, immune response and cell growth for the data from the NCI-60 panel. Additionally, a t-test was computed between melanoma and other cell lines of the NCI-60 panel to find differentially expressed genes of melanogenesis. For the resulting genes, regulation models were built and used to predict gene expression in the melanoma cell line data set yielding good prediction performances."}