Principal process analysis of biological models Abstract Background Understanding the dynamical behaviour of biological systems is challenged by their large number of components and interactions. While efforts have been made in this direction to reduce model complexity, they often prove insufficient to grasp which and when model processes play a crucial role. Answering these questions is fundamental to unravel the functioning of living organisms. Results We design a method for dealing with model complexity, based on the analysis of dynamical models by means of Principal Process Analysis. We apply the method to a well-known model of circadian rhythms in mammals. The knowledge of the system trajectories allows us to decompose the system dynamics into processes that are active or inactive with respect to a certain threshold value. Process activities are graphically represented by Boolean and Dynamical Process Maps. We detect model processes that are always inactive, or inactive on some time interval. Eliminating these processes reduces the complex dynamics of the original model to the much simpler dynamics of the core processes, in a succession of sub-models that are easier to analyse. We quantify by means of global relative errors the extent to which the simplified models reproduce the main features of the original system dynamics and apply global sensitivity analysis to test the influence of model parameters on the errors. Conclusion The results obtained prove the robustness of the method. The analysis of the sub-model dynamics allows us to identify the source of circadian oscillations. We find that the negative feedback loop involving proteins PER, CRY, CLOCK-BMAL1 is the main oscillator, in agreement with previous modelling and experimental studies. In conclusion, Principal Process Analysis is a simple-to-use method, which constitutes an additional and useful tool for analysing the complex dynamical behaviour of biological systems. Background Mathematical modelling has been used for decades as an approach to understand the functioning of biological systems in terms of their internal processes and components. The latter form complex networks that vary in nature. For instance, biochemical networks include processes controlling the intracellular level of metabolites, RNAs and proteins, which allow cells to live and grow. A process either corresponds to a single biochemical reaction, for example protein phosphorylation, or encompasses many biochemical reactions like those involved in general cell functions (translation of proteins, transcription of RNAs...). In ecological networks, the processes can refer to events influencing the distribution and abundance of organisms, or to fluxes of energy and matter. Numerous kinetic models of these networks have been developed in computational biology, of increasing complexity due to advances in modelling and parameter estimation approaches (see [1, 2] for an example). Complexity arises from the high dimension of the networks, the large number of biological processes involved and their non linearity due to the complex feedback loops that regulate them. One approach often used to tackle the problem of complexity is model reduction (see [3] for a recent review). The simplified models are easier to analyse, while retaining the main features of the original ones and their biological significance. Briefly, methods of model reduction shorten the list of network species or of network reactions (e.g. [4, 5]), lump state variables (e.g. [6]) or decompose the system into slow and fast dynamics (e.g. [7–9]). The often used quasi-steady-state approximation falls in the latter category (e.g. [10]). Other approaches simplify the mathematical functions describing the molecular processes. For instance, piece-wise affine differential equations approximate by step functions the sigmoidal functions used to describe the regulation of gene expression. The dynamics of the simplified system can be easily analysed by means of state transition graphs [11]. However, these simplifications are generally restricted to models of gene expression and are more difficult to apply to other types of networks [12]. Reduction approaches have proven successful to significantly reduce model complexity, but they do not provide a mean to understand how the system dynamics emerges from the cascade of biological processes and regulatory mechanisms at work. This is especially true when the reduced models remain complex, with many coupled equations sharing common processes and involving complex feedback loops. For instance, regulatory mechanisms switch on certain biological processes at some times and off at others. It is thus important for a good understanding of the system behaviour to identify which and when processes significantly influence the system dynamics. In other words, instead of analysing a single reduced model in place of the original one, valid on the whole time interval, we may want to analyse series of simplified models highlighting the important processes of the original model during the periods of time in which they are active. This is how we address the problem of high dimensional model analysis in this study. We develop a mathematical and numerical approach based on the boolean concept of activity/inactivity. The method, called Principal Process Analysis (PPA), determines the contribution of each biological process to the output of the dynamical system. In models of biological networks, these processes appear in a linear additive manner in each ODE. We first identify the inactive processes and neglect them. In a second step, we treat processes whose activity varies along time: we define time windows in which these processes are either always active or always inactive. We eventually create sub-models for each time window that only contain the active processes. This procedure leads to the simplification of the system to its core mechanisms. The simplified system can be further studied, to understand the role of each active process in the system dynamics. PPA is a general approach that can be easily applied to any biological system described by ordinary differential equations (ODEs). It shares common features with a model reduction method focusing on major model parameters rather than processes [4], in which parameters that are not required for the system behaviour are removed. Another approach dedicated to chemical reactions identifies and removes chemical species that contribute less to the model output [5]. In this case, the problem is solved using optimization approaches (see also [13]). Despite these similarities, PPA is not a model reduction approach. It provides a mean to access to and dissect the more complex dynamics of the original model through the analysis of simplified versions in given time windows. Results are easily interpretable and do not require additional and complicated computations. Preliminary work on PPA has been described in an earlier conference paper [14], in which we applied PPA to two ODE models of biochemical networks whose simplification preserved their dynamical behaviour: the model of circadian rhythms in Drosophila [15] and the model of the regulation of the ERK signalling pathway [16]. Questions remained open though, concerning the scalability of the approach and its robustness: to which extent does PPA preserve model dynamics in systems of higher dimension, with many more biological processes involved and including interlocked feedback loops? And since the approach requires a priori knowledge of the parameter values, how sensitive are process activities or inactivities to the value of these parameters? In this study, we address these questions by studying a much more complex model of circadian rhythms in mammals, including 16 variables, 76 processes, and intertwined positive and negative feedback loops [17]. Parameter sensitivity analysis of the global relative error between the original and reduced systems allows us to assess the quality and robustness of our approach. The paper is organized as follows. “Methods” section describes the principle of Principal Process Analysis as well as global sensitivity analysis. “Model description” section introduces the model of mammalian circadian clock. We apply our approach to this complex model in “Principal Process Analysis of the circadian clock model” to “Influence of parameter values” sections, before concluding in “Conclusions” section. Methods We summarize below the basics of the method of Principal Process Analysis. We will use as running example the 14th variable of the mammalian circadian clock model analysed in “Model description” section (see also Appendix B. It describes how the concentration of the nuclear form of protein BMAL1 (BN=x14) changes: 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ {}\begin{aligned} \begin{array}{ll} \frac{dB_{N}}{dt}\,=\,&\,-\,V_{3B} \frac{B_{N}}{K_{p}+B_{N}}\,+\,\!V_{4B} \frac{B_{NP}}{K_{dp}+B_{NP}} \,+\,k_{5} B_{C} \,-\, k_{6} B_{N} \,-\, k_{7} B_{N} PC_{N}\\ &+k_{8} I_{N} - k_{dn} B_{N}. \end{array} \end{aligned} $$ \end{document}dBNdt=−V3BBNKp+BN+V4BBNPKdp+BNP+k5BC−k6BN−k7BNPCN+k8IN−kdnBN. Principal Process Analysis (PPA) Consider the following ODE model of biological network: 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \dot{x}=f\left(x,p\right) $$ \end{document}x˙=fx,p where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$x=\left (x_{1},x_{2},\ldots,x_{n}\right) \epsilon \mathbb {R}^{n}$\end{document}x=x1,x2,…,xnεℝn is the vector of component concentrations, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$x0=\left (x0_{1},x0_{2},\ldots,x0_{n}\right) \epsilon \mathbb {R}^{n}$\end{document}x0=x01,x02,…,x0nεℝn the vector of their initial values and p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\epsilon \mathbb {R}^{b}$\end{document}εℝb the vector of parameters. Each equation is decomposed into a sum of biological processes: 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \dot{x}_{i}=\sum\limits_{j}f_{ij}\left(x,p\right) $$ \end{document}x˙i=∑jfijx,p where fij represents the jth process involved in the dynamical evolution of the ith variable of the system over a period of time [ t0,T]. Example 1 Equation (1) includes seven processes, each associated with a specific biological function. They take a positive or negative value, depending on whether they affect positively or negatively the variation of BMAL1 concentration. The equation of the protein is rewritten as: 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \dot{x}_{14}= f_{14,1}+f_{14,2}+f_{14,3}+f_{14,4}+f_{14,5}+f_{14,6}+f_{14,7} $$ \end{document}x˙14=f14,1+f14,2+f14,3+f14,4+f14,5+f14,6+f14,7 where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$f_{14,1}= - V_{3B} \frac {B_{N}}{K_{p}+B_{N}},...,f_{14,7}= - k_{dn} B_{N} $\end{document}f14,1=−V3BBNKp+BN,...,f14,7=−kdnBN. Figure 1a shows the dynamical evolution of processes f14,1 to f14,7 during a day. Nuclear import of BMAL1 is the fastest process of Eq. (1), while the basal degradation of the protein is the slowest. Fig. 1 Dynamics of processes that change the nuclear concentration of protein BMAL1 (BN, see Eqs. (1) and (4)) over a 24-h time window. a Absolute value of the processes along time (one colour per process). b Weights associated with the processes along time. The threshold δ is set at 0.1 Comparison criteria are needed to weigh the influence of the different processes fij on the time evolution of each variable xi. There are several alternatives. For instance, we can compare their absolute value (|fij(x,p)|), scale it by the ith initial condition \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\left (\frac {|f_{ij}(x(t),p)|}{x_{0i}} \right)$\end{document}|fij(x(t),p)|x0i, or scale it by the solution of the ith ODE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\left (\frac {|f_{ij}(x(t),p)|}{x(t)_{i}}\right)$\end{document}|fij(x(t),p)|x(t)i. In this work we associate a relative weight with each process to make it dimensionless: 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ W_{ij}(t,p)=\frac{|f_{ij}(x(t),p)|}{\sum_{j}|f_{ij}(x(t),p)|} $$ \end{document}Wij(t,p)=|fij(x(t),p)|∑j|fij(x(t),p)| where 0≤Wij(t,p)≤ 1 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sum _{j}W_{ij}(t,p)=1$\end{document}∑jWij(t,p)=1. Definition 1 Let the continuous function fij(x(t),p) be the jth process of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\dot {x}_{i}(t)$\end{document}x˙i(t) in t ε[t0,T] and let the threshold δε [0,1]. We call a process fij(x(t),p) always inactive when Wij(t,p)<δ∀ t ε [0,T]. We call a process fij(x(t),p) inactive at time t when Wij(t,p)<δ. We call a process fij(x(t),p) active at time t when Wij(t,p)≥δ. Switching time for a process fij(x(t),p) is the time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t_{ij}^{s}$\end{document}tijs at which Wij(t,p)=δ. A process can have 0,1,…,z switching times. The switching time set Si for the ith variable contains all the switching times \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t_{ij}^{s}$\end{document}tijs where j=1,..,k and s=1,…,z. The global switching time set S is the union of all Si. The choice of δ is important, since it determines above which weight a process can be considered active or inactive and, as we will see it later, if the process should be kept or omitted in the simplified model. An excessively high value might lead to an oversimplified model, without many dynamical features of the original model. Conversely a very low value might result in a model insufficiently simplified, which remains too complicated to analyse. From our experience, a convenient value is δε [0, 0.1], where the value of δ can be adjusted to the number of processes. For instance, if an ODE contains numerous processes of similar value, each individual process weighs little. In this case, δ should not be chosen too high to avoid omitting all these processes; it can be inversely proportional to the total number N of processes in the equation: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\delta \propto \frac {1}{N}$\end{document}δ∝1N. In this paper, fine-tuning the threshold value is not justified: there are not many processes per equation and they have very different values. We will always take δ=0.1. Example 2 We apply Eq. (5) to determine the dynamical weight of the seven processes in Eq. (1). Results are shown in Fig. 1b. As expected, the nuclear import, which is the fastest process, weighs more in the dynamical evolution of BMAL1 concentration, while the basal degradation of the protein weighs little. We determine the process activities using δ=0.1: The weight of processes W14,2, W14,6, W14,7 is always below δ: their related processes f14,2, f14,6, f14,7 are thus always inactive;The processes W14,1 and W14,3 are always above δ: f14,1 and f14,3 are active during the whole system dynamics;The weight of processes W14,4 and W14,5 crosses the threshold twice and the switching times \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t_{14,4}^{1}=4.4$\end{document}t14,41=4.4h, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t_{14,4}^{2}=20.7$\end{document}t14,42=20.7h, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t_{14,5}^{1}=0.8$\end{document}t14,51=0.8h and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t_{14,5}^{2}=20.3$\end{document}t14,52=20.3h are collected in the set S14. Visualization of process activities Graphical tools turn out to be useful to analyse the dynamical weights of complex systems such as the mammalian circadian clock model. We use three of them in PPA, which are described below. The Boolean Process Map summarizes qualitatively the knowledge of the process activity or inactivity along time for each variable. A black bar means that the process is active, while the white bar indicates an inactive process.Example: The Boolean Process Map in Fig. 2a represents the process activities deduced from the dynamical weights in Fig. 1b. We observe that there is always an active phosphorylation of BMAL1 in the nucleus, while the basal degradation can be considered always inactive. The nuclear export is solely active in the first and last periods of time. Fig. 2 Visual tools. aBoolean Process Map, bDynamical Process Map between times \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t^{1}_{14,4}$\end{document}t14,41 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t^{1}_{14,5}$\end{document}t14,51, c3-D Process Map for the variable x14 and its corresponding 2-D versionThe Dynamical Process Map is a network representation of the process activities. Variables (represented by boxes) are connected by processes (arrows). Three cases arise, which depend on the activity of processes shared by several variables: black-coloured arrows represent processes that are inactive for all variables involved, while active processes are displayed as red arrows. Yellow arrows are used for processes shared by several variables that have different activities: for instance, one process is considered active in one equation, but inactive in another one. Note that the model simplification by elimination of inactive processes, as will be described in “Model simplification by elimination of always inactive processes” section, will have for effect to remove black arrows in the Dynamical Process Map.Example 3 Figure 2b represents the Dynamical Process Map for x14, the nuclear concentration of Bmal1, in the time interval between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t^{1}_{14,4}$\end{document}t14,41 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t^{1}_{14,5}$\end{document}t14,51. Phosphorylation is an example of active process for the nuclear BMAL1 concentration (see the Boolean Process Map in Panel A). It is shown in red because it is also considered active at the same moment for the other variable sharing this process, the concentration of phosphorylated BMAL1.The 3-D Process Map represents the time-dependent evolution of the intensity of each process. Process activities are averaged per hour, which leads to a discretisation of time. Vertical bars represent process weights for each hour. Their color code represents the intensity of process weights relatively to the other weights.Example 4 Figure 2c describes the 3-D Process Map for the concentration of nuclear BMAL1. The phosphorylation of the protein, its nuclear import and its consumption for the formation of a large complex are the processes the most active over time. Model simplification by elimination of always inactive processes Eliminating processes that play a minor role in the system dynamics facilitates the analysis of large models. Since in the previous steps of PPA we have determined the process activities in system (2), we now neglect processes that are considered always inactive. This will give us g(xr), the function approximating f(x) in (2) with less processes. We thus introduce the ODE system (6), which approximates system (2): 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \dot{x}^{r}=g\left(x^{r},p^{r}\right) $$ \end{document}x˙r=gxr,pr where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$x^{r}=\left (x_{1}^{r},x_{2}^{r},\ldots,x_{n}^{r}\right) \epsilon \mathbb {R}^{n}$\end{document}xr=x1r,x2r,…,xnrεℝn is the vector of component concentrations, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$x0=\left (x0_{1},x0_{2},\ldots,x0_{n}\right) \epsilon \mathbb {R}^{n}$\end{document}x0=x01,x02,…,x0nεℝn the vector of their initial values, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$p^{r} \epsilon \mathbb {R}^{c}$\end{document}prεℝc, where c≤b is the vector of parameters. The model simplification approach relies basically on the following theorem: if the vector fields of two systems are close (f(x)≈g(x)), then the solutions of the original and approximated systems are close during some time interval under the assumptions on the Lipschitz conditions listed in [18, p. 79, Th. 2.5]. Based on the dynamical weights determined in “Principal Process Analysis (PPA)” section and the threshold value δ, we apply the following rule to define g(xr,pr): if Wij(x(t),p) < δ∀ t ε [ t0,T]thengij = 0; if not, gij≡fij. We thus define xr as an approximation of x and pr as a subset of p. Example 5 We proceed to the simplification of processes in Eq. 1. Because f14,2, f14,6, f14,7 are always inactive, g14,2=0, g14,6=0, g14,7=0 and g14,1≡f14,1, g14,3≡f14,3, g14,4≡f14,4, g14,5≡f14,5. The resulting ODE for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$x^{r}_{14}$\end{document}x14r is: 7 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \frac{dB^{r}_{N}}{dt}=-V_{3B} \frac{B^{r}_{N}}{K_{p}+B^{r}_{N}}+ k_{5} B^{r}_{C} - k_{6} B^{r}_{N} - k_{7} B^{r}_{N} PC^{r}_{N}. $$ \end{document}dBNrdt=−V3BBNrKp+BNr+k5BCr−k6BNr−k7BNrPCNr. Note that Principal Process Analysis is applied to each ODE separately. As a consequence, processes shared by two equations can be active in one equation, but inactive in the other. Elimination of the inactive processes breaks mass balance in the simplified model. For our purpose, this is not an issue: the simplification does not aim at reducing the model, but rather analysing a sub-model of the original one, which describes the dynamics of the important phenomena. It is interesting to quantify the extent to which the simplified system (6) preserves the behaviour of the original one. This gives a better sense of how the active processes kept in the simplified model are responsible for the dynamics of the original system. In addition, this helps identifying potential problems related to the model simplification, for instance involving a wrong choice of the δ value. One can also imagine pathological cases, when the simplified system does not reproduce the main dynamical features of the original model: for instance, if it evolves towards a different basin of attraction or if the removal of a consumption term does not compensate a synthesis term any more, leading the simplified system to explode in finite time. It is non nonsensical in all these cases to analyse simplified models that behave so differently from the original ones. The δ threshold should be adjusted to a new value and Principle Process Analysis re-run until model simplification proves satisfactory according to the criteria described below. We present in Appendix A an a priori analysis of the error made when removing some inactive processes. This analysis gives a theoretical, but very conservative, bound on the error. In practice, we numerically compute the global relative error between the original and simplified models. Several forms of error are possible. We have chosen the following one, analysed over a period of time [ t0,T], in which yh and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$y^{r}_{h}$\end{document}yhr are the hth outputs of the original and simplified systems, respectively: 8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ e_{h}=\frac{\int_{t_{0}}^{T}|y_{h}(t)-y_{h}^{r}(t)|dt}{\int_{t_{0}}^{T}|y_{h}(t)|dt}. $$ \end{document}eh=∫t0T|yh(t)−yhr(t)|dt∫t0T|yh(t)|dt. How to choose the model outputs? They can correspond to all model variables or combinations of them, if the latter are involved in some biological phenomena of interest for instance. In the case of the circadian clock model, six variables were specifically studied in the original papers [17, 19], which we will use as outputs to determine the global relative error between the original and simplified models: the concentrations of Per mRNA (MP), Cry mRNA (MC), Bmal1 mRNA (MB), total PER protein (PTot), total CRY protein (CTot) and total BMAL1 protein (BTot)1. Creation of sequences of sub-models In the previous step of PPA, the models are simplified by elimination of always inactive processes. Here we go one step further in the simplification, by eliminating processes that are inactive at times. This is achieved by decomposing the period of time during which the system evolves into time intervals. To that end we use the switching times tb (with b=1,…,d) determined in “Principal Process Analysis (PPA)” section: this allows creating a succession of sub-models for each time interval, which contain the core mechanisms in that period of time. To avoid creating large sequences of sub-models, we reduce the number of time windows by grouping proximal switching times with the easy-to-compute k-means clustering [20]. Hence the dswitching times included in the global switching time set S=[t1,t2,...,td] are grouped into z (≤d) clusters C= {C1,C2,...,Cz}, so as to minimize the within-cluster sum of square (or within-cluster inertia): 9 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \mathrm{argmin_{C}}\sum\limits_{v=1}^{z}\sum_{t\epsilon C_{v}}||t-\mu_{v}||^{2} $$ \end{document}argminC∑v=1z∑tεCv||t−μv||2 where μv is the mean of the switching times in Cv. The consequence is that processes with switching times belonging to cluster Cv are assumed to switch together at the same time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t^{r}_{v}=\mu _{v}$\end{document}tvr=μv, the mean switching time in cluster Cv. How to define the right number of clusters? A too large number of clusters will result in a low error, but also in numerous time windows that make the simplified models still too complex to analyse. Equation (10) describes how to take into account this trade-off between the number z of clusters and the error. It is related to the difference between the maximum and the minimum number of active processes during the temporal evolution of the system: if this difference is low, z should be chosen low as well. We thus define z, rounded to the nearest number, as: 10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ z = \frac {\max\limits_{v}(n^{v}_{act})-\min\limits_{v}(n^{v}_{act})}{2}, $$ \end{document}z=maxv(nactv)−minv(nactv)2, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$n^{v}_{act}$\end{document}nactv denotes the number of active processes in the vth time window. We eventually end up with a sequence of z+1 sub-models in the time interval [ 0,T], the first one being valid in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\left [0,t^{r}_{1}\right ]$\end{document}0,t1r and the last one, in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\left [t^{r}_{z},T\right ]$\end{document}tzr,T. Similarly to the global errors determined in “Model simplification by elimination of always inactive processes” section, we can also assess how the newly simplified models reproduce the dynamical behaviour of the original model in each time window \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\left [t^{r}_{v-1},t^{r}_{v}\right ]$\end{document}tv−1r,tvr, by measuring the error: 11 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ e^{v}_{h}=\frac{\int_{t^{r}_{v-1}}^{t^{r}_{v}}|y_{h}(t)-y_{h}^{r}(t)|dt}{\int_{t^{r}_{v-1}}^{t^{r}_{v}}|y_{h}(t)|dt}. $$ \end{document}ehv=∫tv−1rtvr|yh(t)−yhr(t)|dt∫tv−1rtvr|yh(t)|dt. We compute the error (11) between the original model and each sub-model, with or without propagating errors: in the first case, for each time window \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\left [t^{r}_{v-1},t^{r}_{v}\right ]$\end{document}tv−1r,tvr (v=1,…,z+1 with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t^{r}_{0}=t_{0}$\end{document}t0r=t0 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t^{r}_{z+1}=T$\end{document}tz+1r=T), the initial values of the h outputs of sub-model SMv are equal to the final values at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t^{r}_{v-1}$\end{document}tv−1r of sub-model SMv−1; in the second case, they are equal to the values of the original model at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t^{r}_{v-1}$\end{document}tv−1r. Global sensitivity analysis Principal Process Analysis is applied to models with given parameter values and initial conditions. It may be questioned whether the uncertainty of their values influences the simplification of the model and thus, the analysis of the system dynamics. While we have shown PPA to be robust to variations of initial conditions in [21], the question remains open for parameter values. To that aim, we perform global sensitivity analyses on the global relative errors between the original model and the reduced model (defined in Eq. (11)). Such an analysis consists in quantifying the parameter influence on the error, while varying the parameters simultaneously in given ranges. In contrast, in a local sensitivity analysis, parameters would vary one-at-a-time in the neighbourhood of their nominal value. First, we perform an analysis on each of the six errors defined for the six model outputs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \left (e^{v}_{M_{P}},e^{v}_{M_{C}},e^{v}_{M_{B}},e_{P_{Tot}},e^{v}_{C_{Tot}},e^{v}_{B_{Tot}}\right)$\end{document}eMPv,eMCv,eMBv,ePTot,eCTotv,eBTotv. Then, in a more detailed analysis, we compute the global relative error for each state variable, according to Eq.(11) (with yh=xi,i=1,…,16); sensitivity analyses are also performed on each of these 16 errors. The method used is based on factorial design [22], analysis of variance (ANOVA) and principal component analysis (PCA) [23]. We first explore the parameter space using a factorial design. We vary Nf=51 parameters of the model [17] (see “Model description” section). We choose Nl=2 levels for each parameter pf (or factor): \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$p^{-}_{f}= 0.8\: p_{f}$\end{document}pf−=0.8pf and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$p^{+}_{f}=1.2\: p_{f}$\end{document}pf+=1.2pf. A full factorial design, defined as all possible combinations of the parameter levels, would be necessary to estimate the main effects and interactions of all parameters. Such a full design corresponds to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$N_{l}^{N_{f}}=2^{51}$\end{document}NlNf=251 parameter combinations and would necessitate the same number of model simulations to compute the corresponding outputs, which are far too many. Thus we implement a fractional factorial design [24], which is a subset (fraction) of the full design of size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$N_{j}