3. Extensions of NCA
Despite its successful implementation in yeast data, NCA exhibits several shortcomings, which prevent its application to a wide class of regulatory network inference problems. In the literature, several papers have been proposed to tackle these issues. In this section, we focus on several improvements for NCA proposed recently in the literature. In these works, the core estimation methods are identical to NCA, but some enhancements have been implemented to make the NCA algorithm more applicable to various setups.

3.1. Motif-Directed NCA
In the original NCA work [17], the prior information about the connectivity matrix, i.e., A(I), is provided by high-throughput experiments. However, the high-throughput ChIP-on-chip data are not available for some common species, such as rodents and humans [25]. With respect to this fact, Wang et al. [25] proposed a motif-directed NCA (mNCA) algorithm, which incorporates the motif information to obtain the prior network structure information and to infer TRNs. Due to the fact that the regulation between TFs and genes occurs only after TFs bind to the DNA sequence motifs in the gene’s promoter region [25], the authors incorporate the motif information to recover the interaction between TFs and genes. Moreover, since the prior topology information, either from ChIP-on-chip data or motif analysis, comes from biological experiments, it may contain many false positives/negatives. Thus, a stability analysis is further proposed in [25] to extract stable TFAs from the NCA algorithm. Specifically, the authors of [25] intentionally perturb the connectivity information and use the Pearson correlation coefficient as a stability measurement to determine whether the estimated TFAs are stable or not. Experimental results on muscle regeneration microarray data demonstrate that mNCA is able to reveal important TFAs, as well as their connectivity strength to corresponding genes.

3.2. Generalized NCA
The work in [26] proposed the generalized NCA (gNCA) in an attempt to improve the NCA criteria. gNCA extends the system identification criteria required by NCA by additionally incorporating the prior information about regulatory matrix S, such as the regulatory information obtained from regulatory gene knockouts (a gene knockout (KO) refers to a genetic technique through which one or more genes from an organism are made inoperative (“knocked out”)) [26]. Thus, for the gNCA criteria to guarantee a unique decomposition solution, they require a full column rank condition for A, a full row rank condition for S and an additional condition that preserves the essential features of A and S. In this way, given the topology information about S, the uniqueness of the decomposition problem might still be ensured by alternatively checking the gNCA criteria, even if the connectivity structure of A does not satisfy the NCA criteria. Even when the connectivity topology satisfies the NCA criteria, gNCA reduces the number of parameters to be estimated by combining the prior information about S.

3.3. Revised NCA
The work in [27] also focuses on enhancing the NCA criteria. The work in [27] proposed revised NCA (NCAr), where the third criterion of NCA is revised to improve the applicability of NCA. As discussed earlier, to ensure a unique solution for the matrix factorization problem, the third criterion of NCA requires the matrix S to have full row rank, which implies that the number of TFs must be less than or equal to the number of experiments. This requirement significantly limits the sample size of TFs. The work in [27] revises the third criterion of NCA based on the observation that most of the genes are only regulated by a smaller number of TFs than the total number of TFs (i.e., the connectivity matrix A is row-wise sparse). In particular, this condition, instead of being associated with the rank properties of matrix S, is related to the rank properties of reduced-size matrices. Particularly, it requires that the number of experiments for each gene be greater than or equal to the number of TFs regulating that gene. The revised criterion enables NCA to be applicable to a wider class of TRN inference problems, since the number of TFs regulating a gene is generally less than five or six [27]. In this way, a large dimensional regulatory network can be uniquely inferred, even in the presence of a limited number of experiments.

3.4. Generalized-Framework NCA
The original NCA work requires the biological system to satisfy all three criteria to ensure a unique decomposition up to a scaling factor. However, NCA only checks the compliance for the initialized matrix A. It may occur that the derived matrix A at certain iterations violates the NCA criteria. The work in [28] generalizes the NCA criteria, such that the system identification can be determined directly from the connectivity (topology) information, rather than checking the rank properties of the unknown connectivity matrix A. In other words, if a certain connectivity topology, i.e., A(I0), meets the newly-derived conditions, then all matrices A∈A(I0) satisfy the first and second criterion of NCA, and thus, they guarantee the feasibility of A during each iteration of the NCA algorithm. To deal with the issue that the connectivity topology does not satisfy the newly-derived conditions or the TF matrix does not satisfy the third criterion of NCA (for example, when M>K, the linear independence of TFs is violated), the authors in [28] alternatively seek to infer subnetworks by removing the selected TF node together with all of its associated genes until all of the system identification criteria of the reduced subnetwork are verified. The resulting algorithm is referred to as generalized-framework NCA (gfNCA) [28].