PMC:1852316 / 29217-39381 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1852316","sourcedb":"PMC","sourceid":"1852316","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1852316","text":"Calculation of TF activities using FTF\nThe essential equation on which FTF is based was arrived at empirically after extensive numerical experimentation with synthetic data. In this way we actually know the TRN, TF activities, and the nature of noise added to the expression data, and thereby could quantitatively assess the accuracy of FTF predictions. FTF is based on the following ansatz:\nT n r − T n s = ∑ i = 1 N g e n e H ( m i r − m i s ) b i n Ψ i n ,       ( 3 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGubavdaqhaaWcbaGaemOBa4gabaGaemOCaihaaOGaeyOeI0Iaemivaq1aa0baaSqaaiabd6gaUbqaaiabdohaZbaakiabg2da9maaqahabaGaemisaGKaeiikaGIaemyBa02aa0baaSqaaiabdMgaPbqaaiabdkhaYbaakiabgkHiTiabd2gaTnaaDaaaleaacqWGPbqAaeaacqWGZbWCaaGccqGGPaqkcqWGIbGydaWgaaWcbaGaemyAaKMaemOBa4gabeaakiabfI6aznaaBaaaleaacqWGPbqAcqWGUbGBaeqaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabd6eaonaaBaaameaacqWGNbWzcqWGLbqzcqWGUbGBcqWGLbqzaeqaaaqdcqGHris5aOGaeiilaWIaaCzcaiaaxMaadaqadaqaaiabiodaZaGaayjkaiaawMcaaaaa@5D38@\nwhere Tnr MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGubavdaqhaaWcbaGaemOBa4gabaGaemOCaihaaaaa@30DC@ = activity of TF n at condition or time r, mir MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGTbqBdaqhaaWcbaGaemyAaKgabaGaemOCaihaaaaa@3104@ = microarray response of gene i at condition r, bin = TRN (bin = +1/-1for gene i up/down regulated by TF n, bin = 0 for no regulation), H(x) = ± 1 for x \u003e or \u003c 0, = 0 for x = 0, and Ψin = 2Li MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqaIYaGmdaahaaWcbeqaaiabdYeamnaaBaaameaacqWGPbqAaeqaaaaaaaa@3074@/(Mn(2Li MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqaIYaGmdaahaaWcbeqaaiabdYeamnaaBaaameaacqWGPbqAaeqaaaaaaaa@3074@ - 1)) for Li = number of TFs controlling gene i and Mn = number of genes TF n regulates. If there are Nexpression times or conditions, then eq. (1) constitutes Nexpression × (Nexpression -1)/2 equations for the Nexpression activities Tnr MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGubavdaqhaaWcbaGaemOBa4gabaGaemOCaihaaaaa@30DC@ for each of the TFs. Therefore, the problem is overdetermined. In our approach the problem is solved via normal equations, i.e. using a least square approach so that all the expression data is utilized and thereby statistics can help to overcome data uncertainty.\nOnce TF activities are calculated in this manner, the linear (Pearson) correlation is calculated for all possible gene-TF pairs. This serves as a score used to construct probability distributions for the training set (known gene/TF interactions) and random set (all possible gene/TF pairs). Comparison of these probability distributions gives an idea about the fitness of the preliminary TRN and expression data, and to which degree we can rely on the predictions of FTF. If the preliminary TRN is too small or of poor quality, or if there are too few expression datasets, the training versus random set probability distributions are difficult to distinguish. The scores can also be used to rank genes that are more likely to have expression data which is inconsistent with the preliminary TRN.\nTo test FTF we generated a TRN that consists of 1000 genes and 100 TFs. The properties of the TRN are shown in Fig. 2. The synthetic expression data was generated by assumed random TF activities. Expression data for gene i was generated using mir=∑n=1NTFQinbinTnr MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGTbqBdaqhaaWcbaGaemyAaKgabaGaemOCaihaaOGaeyypa0ZaaabCaeaacqWGrbqudaWgaaWcbaGaemyAaKMaemOBa4gabeaakiabdkgaInaaBaaaleaacqWGPbqAcqWGUbGBaeqaaOGaemivaq1aa0baaSqaaiabd6gaUbqaaiabdkhaYbaaaeaacqWGUbGBcqGH9aqpcqaIXaqmaeaacqWGobGtdaWgaaadbaGaemivaqLaemOrayeabeaaa0GaeyyeIuoaaaa@47D2@. Here, mir MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGTbqBdaqhaaWcbaGaemyAaKgabaGaemOCaihaaaaa@3104@ is the expression level of gene i at experiment r, Tnr MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGubavdaqhaaWcbaGaemOBa4gabaGaemOCaihaaaaa@30DC@ is the activity of TF n at experiment r, NTF is the number of TFs, and Qin is a measure of the binding affinity of TF n and gene i.\nFigure 2 Properties of TRNs used in the synthetic examples. Networks that consist of 1000 genes and 100 TFs are generated using the probability distribution for the number of genes regulated by a given TF shown in (a). The corresponding probability distribution for the number of regulators per gene is shown in (b). The average number of regulators per gene is 3.62, 5.22, and 7.02 for Networks 1, 2 and 3, respectively. Equal likelihood is chosen for up versus down regulation. To construct a synthetic TRN, for each TF we assigned un = c1 + c2e−c3z MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGJbWydaWgaaWcbaGaeGOmaidabeaakiabdwgaLnaaCaaaleqabaGaeyOeI0Iaem4yam2aaSbaaWqaaiabiodaZaqabaWccqWG6bGEaaaaaa@3588@ where c1, c2, c3 are constants (taken to be 0.02, 0.15, and 5, respectively) and z is a random number (between 0 and 1). Then for each gene/TF pair, we assigned a random number hin (between 0 and 1). For parameter e, which determines how dense the synthetic TRN is, if hinun \u003ce we set bin = -1 (down regulation), if e ≤ hinun \u003c 2e, we set bin = 1 (up regulation), assuming the probability of up and down regulation is the same. The Qin were allowed to change 20 fold and were generated randomly (in the logarithmic scale). TF activities were assumed to be random as well. Our synthetic examples with large TRNs show that, despite the simplicity of the FTF approach, the constructed TF activity profiles are reliable. To test the approach, one can compare the TF activities constructed and those used in the generation of synthetic expression data. For example, for a TRN that has the properties shown in Fig. 2, even when we eliminate 50% of the TRN to create a \"preliminary TRN\", 90% of the constructed TF activities have a Pearson correlation coefficient of at least 0.70 with the TF activities used to generate the synthetic expression data (when 20 or more microarray experimental conditions were used). Fig. 3 shows the dependence of the results on the number of experiments. This graph shows that, for practical reason, it is not feasible to recover the full network. Fig. 4a shows the effect of network structure on the results. As the network gets denser, the percentage of the network that can be recovered decreases. Fig. 4b illustrates the dependence of the percentage of recovery on the degree of incompleteness in the preliminary TRN. As anticipated, more complete preliminary TRNs allow a higher percentage of the unknown part of the network to be recovered using expression data. These results suggest that in a real world application such as E. coli (for which we have probably less than 40% of the TRN – based on the number of gene/TF interactions known and expected number of TFs), one can not expect to construct the full TRN using expression data alone, regardless of the number of expression datasets available.\nFigure 3 Reconstruction of TRNs. We have used the Network 1 of Fig. 2 and generated synthetic expression data. Then, we eliminated 50% of the network (randomly), and used FTF to reconstruct the deleted network. Fig. a) shows the percentage of the deleted network recovered as a function of success rate, a measure of the likelihood that an interaction is correct, as estimated from the training set (known interactions). As the number of microarray experiments increases, a higher percentage of the network can be reconstructed. However, full reconstruction requires too many experiments. Fig. b) shows success rate as a function of the absolute value of the linear correlation between the constructed TF activity profiles and gene expression data.\nFigure 4 Effect of TRN properties. We used Networks 1, 2 and 2 of Fig. 2 to generate 100 synthetic expression data sets, and eliminated 50% of the gene/TF interactions in the TRN. Shown is the percentage of the deleted network recovered as a function of success rate. As the number interactions increases, the percentage of the network that can be recovered decreases. b) Same as a) except we used Network 1 and eliminated 25%, 50%, and 75% of the network. As expected, higher percentage of the deleted network is recoverable when a more complete network is known.","divisions":[{"label":"title","span":{"begin":0,"end":38}},{"label":"p","span":{"begin":39,"end":391}},{"label":"p","span":{"begin":392,"end":1204}},{"label":"p","span":{"begin":1205,"end":3448}},{"label":"p","span":{"begin":3449,"end":4243}},{"label":"p","span":{"begin":4244,"end":5805}},{"label":"figure","span":{"begin":5806,"end":6286}},{"label":"label","span":{"begin":5806,"end":5814}},{"label":"caption","span":{"begin":5816,"end":6286}},{"label":"p","span":{"begin":5816,"end":6286}},{"label":"p","span":{"begin":6287,"end":8848}},{"label":"figure","span":{"begin":8849,"end":9598}},{"label":"label","span":{"begin":8849,"end":8857}},{"label":"caption","span":{"begin":8859,"end":9598}},{"label":"p","span":{"begin":8859,"end":9598}},{"label":"label","span":{"begin":9599,"end":9607}}],"tracks":[]}