Microarray analysis Kinetic cell models hold great promise for predicting cell behavior [28-32]. Unfortunately there is a lack of information about many of the rate and equilibrium constants for the reaction and transport processes involved [33,34]. Simultaneously calibrating all the reaction/transport rate parameters and discovering the gene/TF interaction network structure from available data does not appear to be feasible. Therefore, instead of using a kinetic approach as a basis of TRN construction, we have developed FTF (Fast Transcription Factor analyzer) for network construction via (1) TF activity estimation, (2) statistical arguments, and (3) a preliminary TRN. Once a reliable TRN is obtained using FTF, it can then be used to calibrate the rate and equilibrium constants that appear in transcription/translation kinetic models. An example of such an approach is available at [35]. FTF was designed based on the following notions: • a method based on TFs has the advantage that microarray noise, and errors in preliminary TRN, can be overcome by statistics – i.e. the regulation of many genes by a given TF; • due to data uncertainty, there is not usually enough information content in many single-gene responses to unambiguously determine the effect of all TFs regulating it; and • TRN discovery requires many automated trials of possible networks, so the algorithm must be efficient. Calculation of TF activities using FTF The essential equation on which FTF is based was arrived at empirically after extensive numerical experimentation with synthetic data. In this way we actually know the TRN, TF activities, and the nature of noise added to the expression data, and thereby could quantitatively assess the accuracy of FTF predictions. FTF is based on the following ansatz: T n r − T n s = ∑ i = 1 N g e n e H ( m i r − m i s ) b i n Ψ i n ,       ( 3 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGubavdaqhaaWcbaGaemOBa4gabaGaemOCaihaaOGaeyOeI0Iaemivaq1aa0baaSqaaiabd6gaUbqaaiabdohaZbaakiabg2da9maaqahabaGaemisaGKaeiikaGIaemyBa02aa0baaSqaaiabdMgaPbqaaiabdkhaYbaakiabgkHiTiabd2gaTnaaDaaaleaacqWGPbqAaeaacqWGZbWCaaGccqGGPaqkcqWGIbGydaWgaaWcbaGaemyAaKMaemOBa4gabeaakiabfI6aznaaBaaaleaacqWGPbqAcqWGUbGBaeqaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabd6eaonaaBaaameaacqWGNbWzcqWGLbqzcqWGUbGBcqWGLbqzaeqaaaqdcqGHris5aOGaeiilaWIaaCzcaiaaxMaadaqadaqaaiabiodaZaGaayjkaiaawMcaaaaa@5D38@ where Tnr MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGubavdaqhaaWcbaGaemOBa4gabaGaemOCaihaaaaa@30DC@ = activity of TF n at condition or time r, mir MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGTbqBdaqhaaWcbaGaemyAaKgabaGaemOCaihaaaaa@3104@ = microarray response of gene i at condition r, bin = TRN (bin = +1/-1for gene i up/down regulated by TF n, bin = 0 for no regulation), H(x) = ± 1 for x > or < 0, = 0 for x = 0, and Ψin = 2Li MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqaIYaGmdaahaaWcbeqaaiabdYeamnaaBaaameaacqWGPbqAaeqaaaaaaaa@3074@/(Mn(2Li MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqaIYaGmdaahaaWcbeqaaiabdYeamnaaBaaameaacqWGPbqAaeqaaaaaaaa@3074@ - 1)) for Li = number of TFs controlling gene i and Mn = number of genes TF n regulates. If there are Nexpression times or conditions, then eq. (1) constitutes Nexpression × (Nexpression -1)/2 equations for the Nexpression activities Tnr MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGubavdaqhaaWcbaGaemOBa4gabaGaemOCaihaaaaa@30DC@ for each of the TFs. Therefore, the problem is overdetermined. In our approach the problem is solved via normal equations, i.e. using a least square approach so that all the expression data is utilized and thereby statistics can help to overcome data uncertainty. Once TF activities are calculated in this manner, the linear (Pearson) correlation is calculated for all possible gene-TF pairs. This serves as a score used to construct probability distributions for the training set (known gene/TF interactions) and random set (all possible gene/TF pairs). Comparison of these probability distributions gives an idea about the fitness of the preliminary TRN and expression data, and to which degree we can rely on the predictions of FTF. If the preliminary TRN is too small or of poor quality, or if there are too few expression datasets, the training versus random set probability distributions are difficult to distinguish. The scores can also be used to rank genes that are more likely to have expression data which is inconsistent with the preliminary TRN. To test FTF we generated a TRN that consists of 1000 genes and 100 TFs. The properties of the TRN are shown in Fig. 2. The synthetic expression data was generated by assumed random TF activities. Expression data for gene i was generated using mir=∑n=1NTFQinbinTnr MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGTbqBdaqhaaWcbaGaemyAaKgabaGaemOCaihaaOGaeyypa0ZaaabCaeaacqWGrbqudaWgaaWcbaGaemyAaKMaemOBa4gabeaakiabdkgaInaaBaaaleaacqWGPbqAcqWGUbGBaeqaaOGaemivaq1aa0baaSqaaiabd6gaUbqaaiabdkhaYbaaaeaacqWGUbGBcqGH9aqpcqaIXaqmaeaacqWGobGtdaWgaaadbaGaemivaqLaemOrayeabeaaa0GaeyyeIuoaaaa@47D2@. Here, mir MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGTbqBdaqhaaWcbaGaemyAaKgabaGaemOCaihaaaaa@3104@ is the expression level of gene i at experiment r, Tnr MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGubavdaqhaaWcbaGaemOBa4gabaGaemOCaihaaaaa@30DC@ is the activity of TF n at experiment r, NTF is the number of TFs, and Qin is a measure of the binding affinity of TF n and gene i. Figure 2 Properties of TRNs used in the synthetic examples. Networks that consist of 1000 genes and 100 TFs are generated using the probability distribution for the number of genes regulated by a given TF shown in (a). The corresponding probability distribution for the number of regulators per gene is shown in (b). The average number of regulators per gene is 3.62, 5.22, and 7.02 for Networks 1, 2 and 3, respectively. Equal likelihood is chosen for up versus down regulation. To construct a synthetic TRN, for each TF we assigned un = c1 + c2e−c3z MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGJbWydaWgaaWcbaGaeGOmaidabeaakiabdwgaLnaaCaaaleqabaGaeyOeI0Iaem4yam2aaSbaaWqaaiabiodaZaqabaWccqWG6bGEaaaaaa@3588@ where c1, c2, c3 are constants (taken to be 0.02, 0.15, and 5, respectively) and z is a random number (between 0 and 1). Then for each gene/TF pair, we assigned a random number hin (between 0 and 1). For parameter e, which determines how dense the synthetic TRN is, if hinun