PMC:5369021 / 22166-26730
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"28347313-21822212-14906467","span":{"begin":478,"end":480},"obj":"21822212"},{"id":"28347313-24263090-14906468","span":{"begin":1191,"end":1193},"obj":"24263090"},{"id":"28347313-23846655-14906469","span":{"begin":3424,"end":3426},"obj":"23846655"},{"id":"28347313-23539594-14906470","span":{"begin":4011,"end":4013},"obj":"23539594"},{"id":"28347313-14993899-14906471","span":{"begin":4053,"end":4055},"obj":"14993899"},{"id":"28347313-25190456-14906472","span":{"begin":4097,"end":4099},"obj":"25190456"}],"text":"RABIT [33]\nRegression Analysis with Background Integration (RABIT) is a method for finding expression regulators in cancer by a large scale analysis across diverse cancer types. It integrates TF binding information with tumor profiling data to search for TFs driving tumor-specific gene expression patterns (see Fig. 5). It can be applied to predict cancer-associated RNA-binding protein (RBP) recognition motifs which are key components in the determination of miRNA function [43].\nFig. 5 Flow chart of RABIT method. The input data sets (marked in blue) are passed to a linear regression model (yellow) which calculates sample specific activity values for each regulator and determines general regulatory activities (green) \nIn contrast to our general framework, RABIT can, like RACER, make use of CNV and DNA methylation data additionally integrating promoter CpG content and promoter degree information (total number of ChIP-seq peaks near the gene transcription start site) and takes RBP or TF binding information as regulatory input. The computational model consists of three steps (see Fig. 5). First, RABIT tests in each tumor whether the target genes, identified by the BETA method [44], show differential expression compared to the normal controls including a control for background effects from CNVs, promoter DNA methylation, promoter CpG content and promoter degree:\\documentclass[12pt]{minimal} \t\t\t\t\\usepackage{amsmath} \t\t\t\t\\usepackage{wasysym} \t\t\t\t\\usepackage{amsfonts} \t\t\t\t\\usepackage{amssymb} \t\t\t\t\\usepackage{amsbsy} \t\t\t\t\\usepackage{mathrsfs} \t\t\t\t\\usepackage{upgreek} \t\t\t\t\\setlength{\\oddsidemargin}{-69pt} \t\t\t\t\\begin{document}$$ \\widehat{g_i} = {\\displaystyle \\sum_f}{\\theta}_f{B}_{f, i} + {\\displaystyle \\sum_t}{\\beta}_t{b}_{t, i} $$\\end{document}gi^=∑fθfBf,i+∑tβtbt,iwhere \\documentclass[12pt]{minimal} \t\t\t\t\\usepackage{amsmath} \t\t\t\t\\usepackage{wasysym} \t\t\t\t\\usepackage{amsfonts} \t\t\t\t\\usepackage{amssymb} \t\t\t\t\\usepackage{amsbsy} \t\t\t\t\\usepackage{mathrsfs} \t\t\t\t\\usepackage{upgreek} \t\t\t\t\\setlength{\\oddsidemargin}{-69pt} \t\t\t\t\\begin{document}$$ \\widehat{g_i} $$\\end{document}gi^ represents the predicted differential gene expression between tumor and normal samples in gene i, B includes values of the f different background factors for gene i, b contains RBP or TF binding information and θ and β are the respective regression parameter vectors. The regression coefficients β are estimated by minimizing the squared difference between measured and predicted gene expression. The regulatory activity score for each TF/RBP is defined by a t-value (regression coefficient divided by standard error) and its significance by the corresponding t-test. If multiple profiles exist for the same TF from different conditions or cell lines, the profile with the highest absolute value of TF regulatory activity score is selected. In a second step, a stepwise forward selection is applied to find a subset of TFs among those screened in step one optimizing the model error. Lastly, TFs with insignificant cross-tumor correlation are removed from the results.\nComputationally, the regression coefficients are calculated via the efficient Frisch-Waugh-Lovell method. TF binding information is taken from 686 TF ChIP-seq profiles from ENCODE representing 150 TFs and 90 cell types. Additionally, recognition motifs for 133 RBPs and their putative targets are collected by searching recognition motifs over the 3’UTR regions [45]. An implementation of the RABIT method can be downloaded from http://rabit.dfci.harvard.edu/download.\nRABIT was applied to 7484 tumor profiles of 18 cancer types from TCGA using gene expression, somatic mutation, CNV and DNA methylation data. To systematically assess the results, the cancer relevance level of a TF was calculated as percentage of tumors with the TF target genes differentially regulated (averaged across all TCGA cancer types). A comparison to cancer gene databases, i.e., the NCI cancer gene index project [46], the Bushman Laboratory cancer driver gene list [47, 48], the COSMIC somatic mutation catalog [49] and the CCGD mouse cancer driver genes [50], showed a consistent picture. Further, RABIT’s performance was compared to other regression models like LAR or LASSO where RABIT had the best classification results when classifying all TFs into three categories by NCI cancer index and achieved better cross-validation error and shorter running time. The regulatory activity of RBPs showed that some alternative splicing factors could affect tumor-specific gene expression by binding to target gene 3’UTR regions."}