PMC:2654804 / 23267-23271 JSON TXT

Benchmarks for identification of ordinary differential equations from time series data Abstract Motivation: In recent years, the biological literature has seen a significant increase of reported methods for identifying both structure and parameters of ordinary differential equations (ODEs) from time series data. A natural way to evaluate the performance of such methods is to try them on a sufficient number of realistic test cases. However, weak practices in specifying identification problems and lack of commonly accepted benchmark problems makes it difficult to evaluate and compare different methods. Results: To enable better evaluation and comparisons between different methods, we propose how to specify identification problems as optimization problems with a model space of allowed reactions (e.g. reaction kinetics like Michaelis–Menten or S-systems), ranges for the parameters, time series data and an error function. We also define a file format for such problems. We then present a collection of more than 40 benchmark problems for ODE model identification of cellular systems. The collection includes realistic problems of different levels of difficulty w.r.t. size and quality of data. We consider both problems with simulated data from known systems, and problems with real data. Finally, we present results based on our identification algorithm for all benchmark problems. In comparison with publications on which we have based some of the benchmark problems, our approach allows all problems to be solved without the use of supercomputing. Availability: The benchmark problems are available at www.odeidentification.org Contact: peterg@chalmers.se Supplementary information: Supplementary data are available at Bioinformatics online. 1 INTRODUCTION We consider the problem of identifying both the structure and the parameters of an ordinary differential equation (ODE) system from time series data. In recent years, there has been a significant increase in the number of reported methods approaching this problem in the biological literature (see Table 1 and the references). Table 1. Benchmark problems for ODE system identification Problem name Original problem reference Source system/Model #var #exp #pts Noise (%) simpleLin1 Simple linear system 3 3 13 0 simpleLin2 8 13 10 simpleFb1 Simple feedback loop (McKinney et al., 2006) 3 4 7 0 simpleFb2 4 7 5 simpleFb3 1 7 0 simpleFb4 (McKinney et al., 2006) 1 7 ≈5a osc1 An oscillator (Karnaukhov et al., 2007) 3 1 41 0 osc2 (Karnaukhov et al., 2007) 1 41 3 metabol1 (Gennemark and Wedelin, 2007) A metabolic pathway (Arkin and Ross, 1995) 5 12 7 0 metabol2 (Gennemark and Wedelin, 2007) 12 21 10 metabol3 (Gennemark and Wedelin, 2007) 12 21 20 3genes1 A 3-step gene network (Moles et al., 2003) 8 16 21 0 ss_cascade1 (Tsai and Wang, 2005) A cascaded pathway (Voit, 2000) 3 8 41 0 ss_cascade2 4 41 0 ss_cascade3 8 41 5 ss_branch1 (Voit and Almeida, 2004) A branched pathway (Voit, 2000) 4 3 21 0 ss_branch2 (Marino and Voit, 2006) 6 51 0 ss_branch3 (Tucker and Moulton, 2006) 5 20 0 ss_branch4 (Kutalik et al., 2007) 4 20 0 ss_branch5 (Kutalik et al., 2007) 4 20 2.5 ss_branch6 (Gonzalez et al., 2007) 5 31 0 ss_5genes1 (Kikuchi et al., 2003) A genetic network (Hlavacek and Savageau, 1996) 5 10 11 0 ss_5genes2 (Gennemark and Wedelin, 2007) 10 9 20 ss_5genes3 (Gennemark and Wedelin, 2007) 10 3 0 ss_5genes4 (Kimura et al., 2005) 15 11 0 ss_5genes5 (Daisuke and Horton, 2006) 10 11 0 ss_5genes6 (Cho et al., 2006) 1 16 0 ss_5genes7 (Tucker and Moulton, 2006) 10 20 0 ss_5genes8 (Tsai and Wang, 2005) 8 41 0 (Liu and Wang, 2008) ss_15genes1 A genetic network (Maki et al., 2001) 15 10 11 0 ss_15genes2 20 11 10 ss_30genes1 A genetic network (Maki et al., 2001) 30 15 11 0 ss_30genes2 (Kimura et al., 2005) 20 11 10 ss_30genes3 (Liu and Wang, 2008) 8 41 0 cytokine1 (McKinney et al., 2006) Immunologic data (Rock et al., 2004) 4 1 7 10a cytokine2 1 7 10a ss_ethanolferm1 (Liu and Wang, 2008) Ethanol fermentation (Wang et al., 2001) 4 2 11–15 ≈30a ss_ethanolferm2 3 11–19 ≈30a ss_sosrepair1 (Cho et al., 2006) SOS repair system Escherichia coli (Ronen et al., 2002) 6 1 50 10a ss_sosrepair2 1 50 10a ss_cadBA1 (Gonzalez et al., 2007) cadBA network in E. coli (Kuper and Jung, 2005) 5 1 25 <20a ss_cadBA2 1 25 <20a ss_clock1 (Daisuke and Horton, 2006) Mice cell cycle (Barrett et al., 2005) 7 1 12 ≈10a ss_clock2 1 12 ≈10a aEstimate from or assumption about data. See web site for further information. #var, number of dependent variables; #exp, number of experimental conditions with different initial conditions and/or input functions; #pts, number of data-points per time series. Noise is added from a Gaussian distribution with SD given as a certain percentage (denoted Noise) of each experimental value. Problem names starting with ‘ss_’ correspond to S-systems. The last section lists problems with real data from biological systems. For systems of realistic size and with realistic amount and quality of data this is a hard problem and heuristic algorithms are therefore typically employed. Such algorithms cannot guarantee that they give the best answer to a given problem. An established way to evaluate the performance of heuristic algorithms is to try them on a sufficient number of realistic test cases. However, lack of well defined and commonly accepted benchmark problems makes it difficult to evaluate and compare different methods: Problems are often incompletely and/or ambiguously specified, and personal communication is required to reconstruct essential aspects of the problems. Very few test problems are usually considered. This serves the purpose of demonstrating feasibility, but says little about general performance. Even when problems originate from the same system, different papers typically consider slight modifications of the original problem, which make direct comparisons difficult. An example is a test problem from Kikuchi et al., 2003. Several recent publications (Cho et al., 2006; Daisuke and Horton, 2006; Gennemark and Wedelin, 2007; Kimura et al., 2005; Liu and Wang, 2008; Tsai and Wang, 2005; Tucker and Moulton, 2006) consider the same system, but usually with a more informative dataset. To a considerable degree, the lack of benchmark problems is caused by a more fundamental underlying difficulty in how to specify such identification problems mathematically, and how to conveniently represent them in files. This is much more complex when both structure and parameters are unknown, compared with the well known case of parameter estimation in a given structure. In particular, it is not sufficient to give only data from one or several experiments, or a source model from which such data can be simulated, see Figure 1A. In addition to the data, a model space must be defined, to specify the possible forms of the right-hand side of the ODEs and appropriate parameter ranges. Clearly, if we allow few possible reactions and narrow parameter ranges we obtain a simpler identification problem, and if we allow many possible reactions and wide parameter ranges we obtain a more difficult problem. Furthermore, known prior information about parts of the model may be given. Finally, we must define an objective function that also includes some notion of model complexity. This is needed since we are searching among different structures, and a simple maximum likelihood criterion is not sufficient and will lead to over-fitting. Together, this defines the identification problem mathematically as an optimization problem, the solution of which is a model. An illustration of an identification problem is shown in Figure 1B, and the solution model is shown in Figure 1C. Fig. 1. Identification of ODE systems. (A) An identification problem can be specified with real or simulated data from one or several experiments, a model space of allowed reactions occurring on the right-hand side of the ODEs, an initial model representing prior knowledge and an error function. (B) An example of an identification problem with a model space of three traditional chemical reaction types, an error function where L=likelihood function, λ K=structural complexity term (λ=constant and K=number of model parameters) and an initial model with no prior information. (C) An example of a solution model. Based on these considerations, we present a collection of more than 40 fully specified and publicly available benchmark problems for ODE system identification. The problems have mainly been collected and translated from recent publications, but we have also included new and carefully designed problems. The collection includes both simple problems suitable for initial testing, and more complex problems with realistic size and quality of data. Most problems are based on simulated data from a known source model, which is important for evaluation purposes. However, several problems are also based on real data. The collection consists of problems of two different kinds. The first kind are problems whose solutions are models based on traditional chemical rate equations like unimolecular mass action and Michaelis–Menten's kinetics. An example of such a system with three variables is shown in Figure 1C. Typically for systems of this kind, the rate equations are composed as linear combinations of known reaction types that can be non-linear in both the parameters and in the state variables. The other kind of problem is where the solutions are defined on a fixed form, like S-systems which is currently the most popular form for identification. The S-system formalism (Savageau, 1976; Voit, 2000) is based on approximating kinetic laws with multivariate power-law functions. A model consists of n non-linear ODEs and the generic form of equation i reads (1) where X is a vector (length n) of dependent variables, α and β are vectors (length n) of non-negative rate constants and g and h are matrices (n × n) of kinetic orders, that can be negative as well as positive. The structure of the system is defined by the non-zero elements of g and h, and for realistic models most of these parameters are zero. To our knowledge, this is the first article to address how to best represent this kind of identification problems as optimization problems and to define a suitable file format. As far as we know, this is also the first collection of benchmark problems for ODE system identification, where both structure and parameters are unknown. Another paper that recognizes the importance of systematic evaluation of identification algorithms, is Mendes et al. (2003), which presents a system for generating realistic artificial gene networks and simulated data, and a large number of problems generated by this system. In comparison, our benchmark problems are typically smaller and address the more specific need of evaluating algorithms identifying both structure and parameters, fully specifying problems as optimization problems. For parameter estimation only, several benchmark problems are publicly available, e.g. in the data base EASY-FIT (Schittkowski, 2002). The two kinds of systems we consider, chemical rate equations and S-systems, have previously been used for identification of ODE models as indicated by references in Table 1. In addition to these kinds of systems, other fixed forms have been considered in the literature. One example is Generalized Mass Action (GMA) models (Tucker et al., 2007; Voit, 2000), and another is linear dynamical models with non-linear transfer functions of sigmoid type (D'haeseleer, 2000; Nelander et al., 2008; Wahde and Hertz, 2000). However, at this stage, we have chosen to focus on the more frequently used S-systems as one representative fixed form system. We finally note that identification of ODEs where more basic operands and operators are used as structural components have been considered (Bongard and Lipson, 2007). 2 SPECIFYING IDENTIFICATION PROBLEMS AS OPTIMIZATION PROBLEMS We now outline how we define identification problems mathematically. Our first and most important concern has been that an identification problem must be defined unambiguously as an optimization problem. This ensures that a problem has a well defined solution independently of any method for solving it, and that the task of modelling identification problems is separated from the algorithmic task of solving given problems. It is to be noted that if we fail to explicitly specify the entire problem, missing parts have to be supplied by the user or be implicitly defined by the algorithm, resulting in confusion and lack of reproducibility. Our second concern has been to find a reasonably simple standard form that can still represent a wide range of identification problems. We have then chosen the following way to structure and specify our problems. Data. Time series data for one or several experiments. In the case of several experiments, they may differ with respect to initial values of the variables and/or input functions. For each data-point, the standard deviation is also given in the problem specification. Model space. The model space determines the allowed form of the right-hand side of the ODEs. For models based on traditional chemical rate equations, each ODE in the model is assumed to be a sum of a number of reactions. The possible reactions must belong to a subset of predefined reaction types, where each allowed reaction type is specified by its name, a subset of possible input variables and ranges of allowed parameter values. The allowed reaction types can be specified individually for each state variable. As an example, in Figure 1B, the allowed reactions types are a unimolecular mass action reaction, a Michaelis–Menten reaction, and a simplified Hill equation. For reactions having multiple input variables, e.g. a bimolecular mass action reaction with equation k1 Xi Xj, it is implicitly assumed that i and j are not equal (to consider equality as in problem osc1 and osc2, we define an additional reaction type k1 Xi2). The model space of an S-system is simply defined by lower and upper bounds for each element in the parameter vectors (α and β) and matrices (g and h). Sometimes, additional constraints are required. For example, three of the benchmark problems include an additional constraint of type {gi,j∈[−3, 3], gi,j≠0} (there is an interaction between variable i and j but the direction is unknown). Finally, we also define lower and upper bounds for the initial data-point in each time series. For noisy data, these bounds were set to ±2 SDs. Hence, for noisy data there is one additional parameter for each time series, but these parameters are typically bound tighter compared with the model parameters. Initial model. It is convenient to allow definition of an initial model, corresponding to prior knowledge of the system. The initial model is described as known reactions (terms) on the right-hand side of the ODEs. Also reactions from outside the model space can be included in the initial model. In principle, one can also think of prior information in the form of starting points for iterative algorithms and thus not technically a part of the defined problem. No such information is assumed known in our current problems. Error function. We have chosen to minimize (2) The first term is the negative log-likelihood of the experimental data, and the second term is a term that penalizes structural complexity of the model. This kind of error function is common, and is related to several different proposed methods for handling model complexity (Crampin et al., 2004). In detail, L is the log-likelihood, denotes the experimental data, k is a vector of parameters, λ is a constant and K is the number of parameters. By assuming independent and normally distributed measurement errors and disregarding constant terms we can express the log-likelihood for one time series as (3) where i indexes the measurement points, and where Xj, and σj denotes simulated data, experimental data and SD for variable j, respectively. The total log-likelihood is defined by summing over all variables and all experiments. For models based on chemical rate equations, K is simply the total number of parameters on the right-hand side of the ODEs. For S-systems, it is natural to define K as the total number of non-zero elements in g and h plus the number of parameters in α and β. To establish the relationship with standard optimization terminology, the model space and and the initial model define the feasible set, and the data together with the error function define the objective function. 2.1 File format for identification problems In order to work with identification problems and to provide them as input to identification algorithms, we also need to represent such problems in files, and it is highly desirable that an entire problem can be represented in a single file. Since an identification problem is an optimization problem and not a model, common model formats such as SBML are not applicable, and no existing format known to us can be used for this purpose. However, the output of an identification algorithm is a model and can be represented in SBML. Also, if a partially known initial model is available in SBML, an identification algorithm can input that part in SBML and the rest of the identification problem in our format. The file format we have designed for the identification problems can be seen as a special case of a more general format that is currently developed as a separate project. The format attempts to be self documenting and easy to read both by the human and the computer. Compared with a typical XML-equivalent, more structure is explicit and it is easier to read. It bears some resemblance to how data structures are built in a computer programme, but without the explicit use of pointers. An extract of the format is given below: This particular extract describes first some variables and then a possible reaction in the model space. Finally, a sample from the first experiment is given. The format can also be used to specify a parameter estimation problem only. Method specific parameters (like random seed) can be defined in the same file or separately. Finally, we can easily extend the format to describe new classes of problems, for example with compartment modelling and system modifications like gene deletions. We anticipate that the exact mathematical form of the identification problems, as well as the file format, will be extended over time. Up-to-date information and detailed documentation is therefore available on the web site, as well as a simple parser for the file format. 3 THE COLLECTION OF BENCHMARK PROBLEMS An overview of the collection of benchmark problems is presented in Table 1. The problems have different sources, and in most cases several different problems are given for each system. The number of variables in combination with the amount of data and noise level, give a rough indication of the difficulty of each problem. The first problem for each system generally has a lower level of difficulty than the subsequent problems, and is hence a good starting point when evaluating new methods. A part of the collection is based on identification problems considered in the literature, which is dominated by problems based on S-systems, probably because these are relatively easy to represent. By including translations of these problems that were previously not easily accessible, the collection has an immediate value for those actively developing algorithms in this area. However, we balance the S-system dominance by also defining new problems based on traditional chemical reactions that are more frequently used in the biological modelling community. We note that research in this area is in an early stage, and in our experience identification algorithms for S-systems can relatively easily be modified for other kinds of models. Our proposed way of specifying identification problems can then facilitate and contribute to a more widespread use of problems based on other kinds of models. For problems that have been collected and translated from another publication, we have taken care to match the original problem as closely as available information and the standardization to our formalism allow, see the Supplementary Material for details. We have replaced some non-standard objective functions with our simple function, and relaxed a few unusual constraints. Noisy data have been simulated from random distributions, and in one case we have also reduced an excessive and unrealistic number of data-points in the original problem. However, the solutions in the next section confirm that the changes have not negatively affected the quality of the solutions when compared directly with the source models. We note that our problems never have more data or a smaller model space, so they are in this sense at least as difficult as the original problems. 3.1 Best known solutions In Table 2 we report the best known solutions for all benchmark problems, and we will now describe the detailed information in the table. The solutions have been found using our own identification algorithm (Gennemark and Wedelin, 2007). All solution models as well as source models are available in SBML on our web site. Table 2. Best known solutions to the benchmark problems Problem Error of source model −L+λK Best known solutiona Solution of original problem (if available) Error −L+λK Residual −L Structure FP/FN/LD Stability between runs Average time (s) (1 GHz) Structure FP/FN/LD Time (s) (1 GHz) simpleLin1 8 8 0 0/0/0 5/5 17 simpleLin2 187.4 140.2 132.2 0/0/– 5/5 86 simpleFb1 7 7 0 0/0/0 5/5 14 simpleFb2 45.46 42.14 34.14 1/1/– 3/5 36 simpleFb3 7 7 0 0/0/0 3/5 41 simpleFb4 18.39 7.071 0.0713 0/0/– 2/5 47 0/0/– – osc1 6 6.149 0.1491 0/0/0 5/5 18 osc2 122.1 87.34 57.34 0/0/0 4/5 24 – – metabol1 30 30 0 0/0/0 3/3 5400 metabol2 1214 751.6 601.6 0/0/– 2/3 11 000 metabol3 1442 861.1 696.1 2/1/– 1/5b 13 000 3genes1 39 39 0 0/0/0 3/3 35 000 ss_cascade1 14 14 0 0/0/0 5/5 180 0/0/300 8900 ss_cascade2 14 14 0 0/0/0 4/5 130 ss_cascade3 498.9 476.7 462.7 1/1/– 1/5b 860 ss_branch1 17 17 0 0/0/0 5/5 68 0/0/15 2200 ss_branch2 17 17 0 0/0/0 5/5 110 0/0/5 35 000 ss_branch3 17 17 0 0/0/0 5/5 150 0/0/2 15 000 ss_branch4 17 17 0 0/0/0 5/5 71 0/0/0 25 000 ss_branch5 211.3 142.1 122.1 4/1/– 3/5 300 – – ss_branch6 18 18 0 0/0/0 5/5 90 0/0/5 1100 ss_5genes1 23 23 0 0/0/0 5/5 600 1/0/– 2.4E+8 ss_5genes2 300.3 211.0 183.0 5/0/– 2/5b 2100 ss_5genes3 23 23 0 0/0/0 5/5 380 ss_5genes4 23 23 0 0/0/0 5/5 640 37/0/– 40 000 ss_5genes5 23 23.00 1.14E−3 0/0/0.02 5/5 400 4/2/– 7200 ss_5genes6 23 23 0 0/0/0 5/5 130 0/0/50 56 000 ss_5genes7 23 23 0 0/0/0 5/5 190 0/0/0.2 15 000 ss_5genes8 28 28 0 0/0/0 5/5 440 0/0/5 26 000 0/0/2.5 7000 ss_15genes1 62 62 0 0/0/0 5/5 1500 ss_15genes2 2010 1783 1478 3/4/– 2/5b 18 000 ss_30genes1 128 128 0 0/0/0 5/5 7900 ss_30genes2 4098 3628 2993 6/7/– 1/3b 94 000 242/10/– 6.2E+6 ss_30genes3 128 128 0 0/0/0 3/3 18 000 0/0/25 8.6E+6 cytokine1 25.92 17.92 1/5c 11 – – cytokine2 42.17 32.17 1/5c 23 ss_ethanolferm1 127.4 110.4 1/5c 190 err>127.4d – ss_ethanolferm2 1308 1292 1/5c 360 ss_sosrepair1 2642 2611 1/5c 510 err>2642d 2.7E+5 ss_sosrepair2 2823 2789 1/5c 470 ss_cadBA1 750.6 726.6 1/5c 250 err>750.6d >540 ss_cadBA2 709.1 687.1 1/5c 260 ss_clock1 928.4 803.4 1/5c 360 – 15 000 ss_clock2 814.5 649.5 1/5c 440 aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed. b The majority of runs have an error below the error of the source model, but differ slightly in structure. cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions. dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution. The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs. For problems with noisy or otherwise insufficient data, an optimal solution has slightly different parameter values and sometimes also different structure than the source model. To clarify this for our solutions, Table 2 gives the error of the source model from which data was simulated. We can see that for all problems based on simulated data the found solutions have approximately equal or lower error than that of the source model. We also give the number of FP and FN interactions, to indicate if the found structure is the same as the source model. If the structure is the same as for the source model, we also give the largest deviation (LD) in percent for any parameter compared with the source model. For the problems based on real data, no source model is available, and no such comparisons can be made. For our algorithm, as well as for most algorithms in the literature, the results may differ between runs depending on a random seed. With our approach, the main difference between runs is that sometimes the best structure is not found due to the heuristic nature of the search algorithm. However, for two solutions with the same structure, the difference in the parameters is negligible for all tested problems. To indicate the stability of the algorithm, we therefore report the proportion of runs where the best structure was found. Note, however, the fact that solutions are stable does not generally imply that they are optimal. Computation times are reported as the average computation time for runs with different random seeds, scaled to a 1 GHz processor (the actual runs were performed on a Pentium IV, 2.13 GHz). The times can be improved by adjusting the method parameters for individual problems, but we report the computation time using our standard settings, see the Supplementary Material. For the benchmark problems that have been adapted from problems in the literature, the final columns report available results for these original problems. In addition to the numerical details, Table 2 allows us to draw the following qualitative conclusions about the benchmark problems. Every problem with data from a known source model is well formulated in the sense that the solution model is similar to the source model, in terms of the FP/FN/LD values. Every solution model is at least as similar to the source model as any previously reported solution. This confirms that the translation to our standard format works well, and also that our solutions are indeed the best known solutions to the benchmark problems. Every problem can be solved within hours on an ordinary computer with our approach, so there is no need for supercomputing. This was not previously established since some previous results for the original problems required running times of several months if run on a single processor. 4 CONCLUSIONS We have proposed a way to unambiguously specify ODE identification problems as mathematical optimization problems, and have also defined a file format making it easy to document and exchange such problems. The benchmark problems include translations of problems that have been frequently used in the development of algorithms, as well as new problems based on chemical rate equations. The problems, their best solutions and the source models from which problem data was simulated are all available on our web site. This makes it easy for everyone to evaluate and compare algorithms in a reproducible way. To maintain a useful collection of problems of various types, we intend to extend the collection, and are interested in contributions and feedback from others, see the web site. Finally, our own solutions show that all the benchmark problems can be solved in reasonable time, in contrast to results for some of the original problems where supercomputing was required. Supplementary Material [Supplementary Data] ACKNOWLEDGEMENTS Thanks to Sven Nelander (MSKCC) and the reviewers for valuable comments. Funding: Chalmers Bioscience Program and Göteborg Mathematical Modelling Centre. Conflict of Interest: none declared.

Document structure show

article-title	Benchmarks for identification of ordinary differential equations from time series data
abstract	Motivation: In recent years, the biological literature has seen a significant increase of reported methods for identifying both structure and parameters of ordinary differential equations (ODEs) from time series data. A natural way to evaluate the performance of such methods is to try them on a sufficient number of realistic test cases. However, weak practices in specifying identification problems and lack of commonly accepted benchmark problems makes it difficult to evaluate and compare different methods. Results: To enable better evaluation and comparisons between different methods, we propose how to specify identification problems as optimization problems with a model space of allowed reactions (e.g. reaction kinetics like Michaelis–Menten or S-systems), ranges for the parameters, time series data and an error function. We also define a file format for such problems. We then present a collection of more than 40 benchmark problems for ODE model identification of cellular systems. The collection includes realistic problems of different levels of difficulty w.r.t. size and quality of data. We consider both problems with simulated data from known systems, and problems with real data. Finally, we present results based on our identification algorithm for all benchmark problems. In comparison with publications on which we have based some of the benchmark problems, our approach allows all problems to be solved without the use of supercomputing. Availability: The benchmark problems are available at www.odeidentification.org Contact: peterg@chalmers.se Supplementary information: Supplementary data are available at Bioinformatics online.
p	Motivation: In recent years, the biological literature has seen a significant increase of reported methods for identifying both structure and parameters of ordinary differential equations (ODEs) from time series data. A natural way to evaluate the performance of such methods is to try them on a sufficient number of realistic test cases. However, weak practices in specifying identification problems and lack of commonly accepted benchmark problems makes it difficult to evaluate and compare different methods.
p	Results: To enable better evaluation and comparisons between different methods, we propose how to specify identification problems as optimization problems with a model space of allowed reactions (e.g. reaction kinetics like Michaelis–Menten or S-systems), ranges for the parameters, time series data and an error function. We also define a file format for such problems.
p	We then present a collection of more than 40 benchmark problems for ODE model identification of cellular systems. The collection includes realistic problems of different levels of difficulty w.r.t. size and quality of data. We consider both problems with simulated data from known systems, and problems with real data. Finally, we present results based on our identification algorithm for all benchmark problems. In comparison with publications on which we have based some of the benchmark problems, our approach allows all problems to be solved without the use of supercomputing.
p	Availability: The benchmark problems are available at www.odeidentification.org
p	Contact: peterg@chalmers.se
p	Supplementary information: Supplementary data are available at Bioinformatics online.
body	1 INTRODUCTION We consider the problem of identifying both the structure and the parameters of an ordinary differential equation (ODE) system from time series data. In recent years, there has been a significant increase in the number of reported methods approaching this problem in the biological literature (see Table 1 and the references). Table 1. Benchmark problems for ODE system identification Problem name Original problem reference Source system/Model #var #exp #pts Noise (%) simpleLin1 Simple linear system 3 3 13 0 simpleLin2 8 13 10 simpleFb1 Simple feedback loop (McKinney et al., 2006) 3 4 7 0 simpleFb2 4 7 5 simpleFb3 1 7 0 simpleFb4 (McKinney et al., 2006) 1 7 ≈5a osc1 An oscillator (Karnaukhov et al., 2007) 3 1 41 0 osc2 (Karnaukhov et al., 2007) 1 41 3 metabol1 (Gennemark and Wedelin, 2007) A metabolic pathway (Arkin and Ross, 1995) 5 12 7 0 metabol2 (Gennemark and Wedelin, 2007) 12 21 10 metabol3 (Gennemark and Wedelin, 2007) 12 21 20 3genes1 A 3-step gene network (Moles et al., 2003) 8 16 21 0 ss_cascade1 (Tsai and Wang, 2005) A cascaded pathway (Voit, 2000) 3 8 41 0 ss_cascade2 4 41 0 ss_cascade3 8 41 5 ss_branch1 (Voit and Almeida, 2004) A branched pathway (Voit, 2000) 4 3 21 0 ss_branch2 (Marino and Voit, 2006) 6 51 0 ss_branch3 (Tucker and Moulton, 2006) 5 20 0 ss_branch4 (Kutalik et al., 2007) 4 20 0 ss_branch5 (Kutalik et al., 2007) 4 20 2.5 ss_branch6 (Gonzalez et al., 2007) 5 31 0 ss_5genes1 (Kikuchi et al., 2003) A genetic network (Hlavacek and Savageau, 1996) 5 10 11 0 ss_5genes2 (Gennemark and Wedelin, 2007) 10 9 20 ss_5genes3 (Gennemark and Wedelin, 2007) 10 3 0 ss_5genes4 (Kimura et al., 2005) 15 11 0 ss_5genes5 (Daisuke and Horton, 2006) 10 11 0 ss_5genes6 (Cho et al., 2006) 1 16 0 ss_5genes7 (Tucker and Moulton, 2006) 10 20 0 ss_5genes8 (Tsai and Wang, 2005) 8 41 0 (Liu and Wang, 2008) ss_15genes1 A genetic network (Maki et al., 2001) 15 10 11 0 ss_15genes2 20 11 10 ss_30genes1 A genetic network (Maki et al., 2001) 30 15 11 0 ss_30genes2 (Kimura et al., 2005) 20 11 10 ss_30genes3 (Liu and Wang, 2008) 8 41 0 cytokine1 (McKinney et al., 2006) Immunologic data (Rock et al., 2004) 4 1 7 10a cytokine2 1 7 10a ss_ethanolferm1 (Liu and Wang, 2008) Ethanol fermentation (Wang et al., 2001) 4 2 11–15 ≈30a ss_ethanolferm2 3 11–19 ≈30a ss_sosrepair1 (Cho et al., 2006) SOS repair system Escherichia coli (Ronen et al., 2002) 6 1 50 10a ss_sosrepair2 1 50 10a ss_cadBA1 (Gonzalez et al., 2007) cadBA network in E. coli (Kuper and Jung, 2005) 5 1 25 <20a ss_cadBA2 1 25 <20a ss_clock1 (Daisuke and Horton, 2006) Mice cell cycle (Barrett et al., 2005) 7 1 12 ≈10a ss_clock2 1 12 ≈10a aEstimate from or assumption about data. See web site for further information. #var, number of dependent variables; #exp, number of experimental conditions with different initial conditions and/or input functions; #pts, number of data-points per time series. Noise is added from a Gaussian distribution with SD given as a certain percentage (denoted Noise) of each experimental value. Problem names starting with ‘ss_’ correspond to S-systems. The last section lists problems with real data from biological systems. For systems of realistic size and with realistic amount and quality of data this is a hard problem and heuristic algorithms are therefore typically employed. Such algorithms cannot guarantee that they give the best answer to a given problem. An established way to evaluate the performance of heuristic algorithms is to try them on a sufficient number of realistic test cases. However, lack of well defined and commonly accepted benchmark problems makes it difficult to evaluate and compare different methods: Problems are often incompletely and/or ambiguously specified, and personal communication is required to reconstruct essential aspects of the problems. Very few test problems are usually considered. This serves the purpose of demonstrating feasibility, but says little about general performance. Even when problems originate from the same system, different papers typically consider slight modifications of the original problem, which make direct comparisons difficult. An example is a test problem from Kikuchi et al., 2003. Several recent publications (Cho et al., 2006; Daisuke and Horton, 2006; Gennemark and Wedelin, 2007; Kimura et al., 2005; Liu and Wang, 2008; Tsai and Wang, 2005; Tucker and Moulton, 2006) consider the same system, but usually with a more informative dataset. To a considerable degree, the lack of benchmark problems is caused by a more fundamental underlying difficulty in how to specify such identification problems mathematically, and how to conveniently represent them in files. This is much more complex when both structure and parameters are unknown, compared with the well known case of parameter estimation in a given structure. In particular, it is not sufficient to give only data from one or several experiments, or a source model from which such data can be simulated, see Figure 1A. In addition to the data, a model space must be defined, to specify the possible forms of the right-hand side of the ODEs and appropriate parameter ranges. Clearly, if we allow few possible reactions and narrow parameter ranges we obtain a simpler identification problem, and if we allow many possible reactions and wide parameter ranges we obtain a more difficult problem. Furthermore, known prior information about parts of the model may be given. Finally, we must define an objective function that also includes some notion of model complexity. This is needed since we are searching among different structures, and a simple maximum likelihood criterion is not sufficient and will lead to over-fitting. Together, this defines the identification problem mathematically as an optimization problem, the solution of which is a model. An illustration of an identification problem is shown in Figure 1B, and the solution model is shown in Figure 1C. Fig. 1. Identification of ODE systems. (A) An identification problem can be specified with real or simulated data from one or several experiments, a model space of allowed reactions occurring on the right-hand side of the ODEs, an initial model representing prior knowledge and an error function. (B) An example of an identification problem with a model space of three traditional chemical reaction types, an error function where L=likelihood function, λ K=structural complexity term (λ=constant and K=number of model parameters) and an initial model with no prior information. (C) An example of a solution model. Based on these considerations, we present a collection of more than 40 fully specified and publicly available benchmark problems for ODE system identification. The problems have mainly been collected and translated from recent publications, but we have also included new and carefully designed problems. The collection includes both simple problems suitable for initial testing, and more complex problems with realistic size and quality of data. Most problems are based on simulated data from a known source model, which is important for evaluation purposes. However, several problems are also based on real data. The collection consists of problems of two different kinds. The first kind are problems whose solutions are models based on traditional chemical rate equations like unimolecular mass action and Michaelis–Menten's kinetics. An example of such a system with three variables is shown in Figure 1C. Typically for systems of this kind, the rate equations are composed as linear combinations of known reaction types that can be non-linear in both the parameters and in the state variables. The other kind of problem is where the solutions are defined on a fixed form, like S-systems which is currently the most popular form for identification. The S-system formalism (Savageau, 1976; Voit, 2000) is based on approximating kinetic laws with multivariate power-law functions. A model consists of n non-linear ODEs and the generic form of equation i reads (1) where X is a vector (length n) of dependent variables, α and β are vectors (length n) of non-negative rate constants and g and h are matrices (n × n) of kinetic orders, that can be negative as well as positive. The structure of the system is defined by the non-zero elements of g and h, and for realistic models most of these parameters are zero. To our knowledge, this is the first article to address how to best represent this kind of identification problems as optimization problems and to define a suitable file format. As far as we know, this is also the first collection of benchmark problems for ODE system identification, where both structure and parameters are unknown. Another paper that recognizes the importance of systematic evaluation of identification algorithms, is Mendes et al. (2003), which presents a system for generating realistic artificial gene networks and simulated data, and a large number of problems generated by this system. In comparison, our benchmark problems are typically smaller and address the more specific need of evaluating algorithms identifying both structure and parameters, fully specifying problems as optimization problems. For parameter estimation only, several benchmark problems are publicly available, e.g. in the data base EASY-FIT (Schittkowski, 2002). The two kinds of systems we consider, chemical rate equations and S-systems, have previously been used for identification of ODE models as indicated by references in Table 1. In addition to these kinds of systems, other fixed forms have been considered in the literature. One example is Generalized Mass Action (GMA) models (Tucker et al., 2007; Voit, 2000), and another is linear dynamical models with non-linear transfer functions of sigmoid type (D'haeseleer, 2000; Nelander et al., 2008; Wahde and Hertz, 2000). However, at this stage, we have chosen to focus on the more frequently used S-systems as one representative fixed form system. We finally note that identification of ODEs where more basic operands and operators are used as structural components have been considered (Bongard and Lipson, 2007). 2 SPECIFYING IDENTIFICATION PROBLEMS AS OPTIMIZATION PROBLEMS We now outline how we define identification problems mathematically. Our first and most important concern has been that an identification problem must be defined unambiguously as an optimization problem. This ensures that a problem has a well defined solution independently of any method for solving it, and that the task of modelling identification problems is separated from the algorithmic task of solving given problems. It is to be noted that if we fail to explicitly specify the entire problem, missing parts have to be supplied by the user or be implicitly defined by the algorithm, resulting in confusion and lack of reproducibility. Our second concern has been to find a reasonably simple standard form that can still represent a wide range of identification problems. We have then chosen the following way to structure and specify our problems. Data. Time series data for one or several experiments. In the case of several experiments, they may differ with respect to initial values of the variables and/or input functions. For each data-point, the standard deviation is also given in the problem specification. Model space. The model space determines the allowed form of the right-hand side of the ODEs. For models based on traditional chemical rate equations, each ODE in the model is assumed to be a sum of a number of reactions. The possible reactions must belong to a subset of predefined reaction types, where each allowed reaction type is specified by its name, a subset of possible input variables and ranges of allowed parameter values. The allowed reaction types can be specified individually for each state variable. As an example, in Figure 1B, the allowed reactions types are a unimolecular mass action reaction, a Michaelis–Menten reaction, and a simplified Hill equation. For reactions having multiple input variables, e.g. a bimolecular mass action reaction with equation k1 Xi Xj, it is implicitly assumed that i and j are not equal (to consider equality as in problem osc1 and osc2, we define an additional reaction type k1 Xi2). The model space of an S-system is simply defined by lower and upper bounds for each element in the parameter vectors (α and β) and matrices (g and h). Sometimes, additional constraints are required. For example, three of the benchmark problems include an additional constraint of type {gi,j∈[−3, 3], gi,j≠0} (there is an interaction between variable i and j but the direction is unknown). Finally, we also define lower and upper bounds for the initial data-point in each time series. For noisy data, these bounds were set to ±2 SDs. Hence, for noisy data there is one additional parameter for each time series, but these parameters are typically bound tighter compared with the model parameters. Initial model. It is convenient to allow definition of an initial model, corresponding to prior knowledge of the system. The initial model is described as known reactions (terms) on the right-hand side of the ODEs. Also reactions from outside the model space can be included in the initial model. In principle, one can also think of prior information in the form of starting points for iterative algorithms and thus not technically a part of the defined problem. No such information is assumed known in our current problems. Error function. We have chosen to minimize (2) The first term is the negative log-likelihood of the experimental data, and the second term is a term that penalizes structural complexity of the model. This kind of error function is common, and is related to several different proposed methods for handling model complexity (Crampin et al., 2004). In detail, L is the log-likelihood, denotes the experimental data, k is a vector of parameters, λ is a constant and K is the number of parameters. By assuming independent and normally distributed measurement errors and disregarding constant terms we can express the log-likelihood for one time series as (3) where i indexes the measurement points, and where Xj, and σj denotes simulated data, experimental data and SD for variable j, respectively. The total log-likelihood is defined by summing over all variables and all experiments. For models based on chemical rate equations, K is simply the total number of parameters on the right-hand side of the ODEs. For S-systems, it is natural to define K as the total number of non-zero elements in g and h plus the number of parameters in α and β. To establish the relationship with standard optimization terminology, the model space and and the initial model define the feasible set, and the data together with the error function define the objective function. 2.1 File format for identification problems In order to work with identification problems and to provide them as input to identification algorithms, we also need to represent such problems in files, and it is highly desirable that an entire problem can be represented in a single file. Since an identification problem is an optimization problem and not a model, common model formats such as SBML are not applicable, and no existing format known to us can be used for this purpose. However, the output of an identification algorithm is a model and can be represented in SBML. Also, if a partially known initial model is available in SBML, an identification algorithm can input that part in SBML and the rest of the identification problem in our format. The file format we have designed for the identification problems can be seen as a special case of a more general format that is currently developed as a separate project. The format attempts to be self documenting and easy to read both by the human and the computer. Compared with a typical XML-equivalent, more structure is explicit and it is easier to read. It bears some resemblance to how data structures are built in a computer programme, but without the explicit use of pointers. An extract of the format is given below: This particular extract describes first some variables and then a possible reaction in the model space. Finally, a sample from the first experiment is given. The format can also be used to specify a parameter estimation problem only. Method specific parameters (like random seed) can be defined in the same file or separately. Finally, we can easily extend the format to describe new classes of problems, for example with compartment modelling and system modifications like gene deletions. We anticipate that the exact mathematical form of the identification problems, as well as the file format, will be extended over time. Up-to-date information and detailed documentation is therefore available on the web site, as well as a simple parser for the file format. 3 THE COLLECTION OF BENCHMARK PROBLEMS An overview of the collection of benchmark problems is presented in Table 1. The problems have different sources, and in most cases several different problems are given for each system. The number of variables in combination with the amount of data and noise level, give a rough indication of the difficulty of each problem. The first problem for each system generally has a lower level of difficulty than the subsequent problems, and is hence a good starting point when evaluating new methods. A part of the collection is based on identification problems considered in the literature, which is dominated by problems based on S-systems, probably because these are relatively easy to represent. By including translations of these problems that were previously not easily accessible, the collection has an immediate value for those actively developing algorithms in this area. However, we balance the S-system dominance by also defining new problems based on traditional chemical reactions that are more frequently used in the biological modelling community. We note that research in this area is in an early stage, and in our experience identification algorithms for S-systems can relatively easily be modified for other kinds of models. Our proposed way of specifying identification problems can then facilitate and contribute to a more widespread use of problems based on other kinds of models. For problems that have been collected and translated from another publication, we have taken care to match the original problem as closely as available information and the standardization to our formalism allow, see the Supplementary Material for details. We have replaced some non-standard objective functions with our simple function, and relaxed a few unusual constraints. Noisy data have been simulated from random distributions, and in one case we have also reduced an excessive and unrealistic number of data-points in the original problem. However, the solutions in the next section confirm that the changes have not negatively affected the quality of the solutions when compared directly with the source models. We note that our problems never have more data or a smaller model space, so they are in this sense at least as difficult as the original problems. 3.1 Best known solutions In Table 2 we report the best known solutions for all benchmark problems, and we will now describe the detailed information in the table. The solutions have been found using our own identification algorithm (Gennemark and Wedelin, 2007). All solution models as well as source models are available in SBML on our web site. Table 2. Best known solutions to the benchmark problems Problem Error of source model −L+λK Best known solutiona Solution of original problem (if available) Error −L+λK Residual −L Structure FP/FN/LD Stability between runs Average time (s) (1 GHz) Structure FP/FN/LD Time (s) (1 GHz) simpleLin1 8 8 0 0/0/0 5/5 17 simpleLin2 187.4 140.2 132.2 0/0/– 5/5 86 simpleFb1 7 7 0 0/0/0 5/5 14 simpleFb2 45.46 42.14 34.14 1/1/– 3/5 36 simpleFb3 7 7 0 0/0/0 3/5 41 simpleFb4 18.39 7.071 0.0713 0/0/– 2/5 47 0/0/– – osc1 6 6.149 0.1491 0/0/0 5/5 18 osc2 122.1 87.34 57.34 0/0/0 4/5 24 – – metabol1 30 30 0 0/0/0 3/3 5400 metabol2 1214 751.6 601.6 0/0/– 2/3 11 000 metabol3 1442 861.1 696.1 2/1/– 1/5b 13 000 3genes1 39 39 0 0/0/0 3/3 35 000 ss_cascade1 14 14 0 0/0/0 5/5 180 0/0/300 8900 ss_cascade2 14 14 0 0/0/0 4/5 130 ss_cascade3 498.9 476.7 462.7 1/1/– 1/5b 860 ss_branch1 17 17 0 0/0/0 5/5 68 0/0/15 2200 ss_branch2 17 17 0 0/0/0 5/5 110 0/0/5 35 000 ss_branch3 17 17 0 0/0/0 5/5 150 0/0/2 15 000 ss_branch4 17 17 0 0/0/0 5/5 71 0/0/0 25 000 ss_branch5 211.3 142.1 122.1 4/1/– 3/5 300 – – ss_branch6 18 18 0 0/0/0 5/5 90 0/0/5 1100 ss_5genes1 23 23 0 0/0/0 5/5 600 1/0/– 2.4E+8 ss_5genes2 300.3 211.0 183.0 5/0/– 2/5b 2100 ss_5genes3 23 23 0 0/0/0 5/5 380 ss_5genes4 23 23 0 0/0/0 5/5 640 37/0/– 40 000 ss_5genes5 23 23.00 1.14E−3 0/0/0.02 5/5 400 4/2/– 7200 ss_5genes6 23 23 0 0/0/0 5/5 130 0/0/50 56 000 ss_5genes7 23 23 0 0/0/0 5/5 190 0/0/0.2 15 000 ss_5genes8 28 28 0 0/0/0 5/5 440 0/0/5 26 000 0/0/2.5 7000 ss_15genes1 62 62 0 0/0/0 5/5 1500 ss_15genes2 2010 1783 1478 3/4/– 2/5b 18 000 ss_30genes1 128 128 0 0/0/0 5/5 7900 ss_30genes2 4098 3628 2993 6/7/– 1/3b 94 000 242/10/– 6.2E+6 ss_30genes3 128 128 0 0/0/0 3/3 18 000 0/0/25 8.6E+6 cytokine1 25.92 17.92 1/5c 11 – – cytokine2 42.17 32.17 1/5c 23 ss_ethanolferm1 127.4 110.4 1/5c 190 err>127.4d – ss_ethanolferm2 1308 1292 1/5c 360 ss_sosrepair1 2642 2611 1/5c 510 err>2642d 2.7E+5 ss_sosrepair2 2823 2789 1/5c 470 ss_cadBA1 750.6 726.6 1/5c 250 err>750.6d >540 ss_cadBA2 709.1 687.1 1/5c 260 ss_clock1 928.4 803.4 1/5c 360 – 15 000 ss_clock2 814.5 649.5 1/5c 440 aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed. b The majority of runs have an error below the error of the source model, but differ slightly in structure. cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions. dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution. The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs. For problems with noisy or otherwise insufficient data, an optimal solution has slightly different parameter values and sometimes also different structure than the source model. To clarify this for our solutions, Table 2 gives the error of the source model from which data was simulated. We can see that for all problems based on simulated data the found solutions have approximately equal or lower error than that of the source model. We also give the number of FP and FN interactions, to indicate if the found structure is the same as the source model. If the structure is the same as for the source model, we also give the largest deviation (LD) in percent for any parameter compared with the source model. For the problems based on real data, no source model is available, and no such comparisons can be made. For our algorithm, as well as for most algorithms in the literature, the results may differ between runs depending on a random seed. With our approach, the main difference between runs is that sometimes the best structure is not found due to the heuristic nature of the search algorithm. However, for two solutions with the same structure, the difference in the parameters is negligible for all tested problems. To indicate the stability of the algorithm, we therefore report the proportion of runs where the best structure was found. Note, however, the fact that solutions are stable does not generally imply that they are optimal. Computation times are reported as the average computation time for runs with different random seeds, scaled to a 1 GHz processor (the actual runs were performed on a Pentium IV, 2.13 GHz). The times can be improved by adjusting the method parameters for individual problems, but we report the computation time using our standard settings, see the Supplementary Material. For the benchmark problems that have been adapted from problems in the literature, the final columns report available results for these original problems. In addition to the numerical details, Table 2 allows us to draw the following qualitative conclusions about the benchmark problems. Every problem with data from a known source model is well formulated in the sense that the solution model is similar to the source model, in terms of the FP/FN/LD values. Every solution model is at least as similar to the source model as any previously reported solution. This confirms that the translation to our standard format works well, and also that our solutions are indeed the best known solutions to the benchmark problems. Every problem can be solved within hours on an ordinary computer with our approach, so there is no need for supercomputing. This was not previously established since some previous results for the original problems required running times of several months if run on a single processor. 4 CONCLUSIONS We have proposed a way to unambiguously specify ODE identification problems as mathematical optimization problems, and have also defined a file format making it easy to document and exchange such problems. The benchmark problems include translations of problems that have been frequently used in the development of algorithms, as well as new problems based on chemical rate equations. The problems, their best solutions and the source models from which problem data was simulated are all available on our web site. This makes it easy for everyone to evaluate and compare algorithms in a reproducible way. To maintain a useful collection of problems of various types, we intend to extend the collection, and are interested in contributions and feedback from others, see the web site. Finally, our own solutions show that all the benchmark problems can be solved in reasonable time, in contrast to results for some of the original problems where supercomputing was required. Supplementary Material [Supplementary Data]
sec	1 INTRODUCTION We consider the problem of identifying both the structure and the parameters of an ordinary differential equation (ODE) system from time series data. In recent years, there has been a significant increase in the number of reported methods approaching this problem in the biological literature (see Table 1 and the references). Table 1. Benchmark problems for ODE system identification Problem name Original problem reference Source system/Model #var #exp #pts Noise (%) simpleLin1 Simple linear system 3 3 13 0 simpleLin2 8 13 10 simpleFb1 Simple feedback loop (McKinney et al., 2006) 3 4 7 0 simpleFb2 4 7 5 simpleFb3 1 7 0 simpleFb4 (McKinney et al., 2006) 1 7 ≈5a osc1 An oscillator (Karnaukhov et al., 2007) 3 1 41 0 osc2 (Karnaukhov et al., 2007) 1 41 3 metabol1 (Gennemark and Wedelin, 2007) A metabolic pathway (Arkin and Ross, 1995) 5 12 7 0 metabol2 (Gennemark and Wedelin, 2007) 12 21 10 metabol3 (Gennemark and Wedelin, 2007) 12 21 20 3genes1 A 3-step gene network (Moles et al., 2003) 8 16 21 0 ss_cascade1 (Tsai and Wang, 2005) A cascaded pathway (Voit, 2000) 3 8 41 0 ss_cascade2 4 41 0 ss_cascade3 8 41 5 ss_branch1 (Voit and Almeida, 2004) A branched pathway (Voit, 2000) 4 3 21 0 ss_branch2 (Marino and Voit, 2006) 6 51 0 ss_branch3 (Tucker and Moulton, 2006) 5 20 0 ss_branch4 (Kutalik et al., 2007) 4 20 0 ss_branch5 (Kutalik et al., 2007) 4 20 2.5 ss_branch6 (Gonzalez et al., 2007) 5 31 0 ss_5genes1 (Kikuchi et al., 2003) A genetic network (Hlavacek and Savageau, 1996) 5 10 11 0 ss_5genes2 (Gennemark and Wedelin, 2007) 10 9 20 ss_5genes3 (Gennemark and Wedelin, 2007) 10 3 0 ss_5genes4 (Kimura et al., 2005) 15 11 0 ss_5genes5 (Daisuke and Horton, 2006) 10 11 0 ss_5genes6 (Cho et al., 2006) 1 16 0 ss_5genes7 (Tucker and Moulton, 2006) 10 20 0 ss_5genes8 (Tsai and Wang, 2005) 8 41 0 (Liu and Wang, 2008) ss_15genes1 A genetic network (Maki et al., 2001) 15 10 11 0 ss_15genes2 20 11 10 ss_30genes1 A genetic network (Maki et al., 2001) 30 15 11 0 ss_30genes2 (Kimura et al., 2005) 20 11 10 ss_30genes3 (Liu and Wang, 2008) 8 41 0 cytokine1 (McKinney et al., 2006) Immunologic data (Rock et al., 2004) 4 1 7 10a cytokine2 1 7 10a ss_ethanolferm1 (Liu and Wang, 2008) Ethanol fermentation (Wang et al., 2001) 4 2 11–15 ≈30a ss_ethanolferm2 3 11–19 ≈30a ss_sosrepair1 (Cho et al., 2006) SOS repair system Escherichia coli (Ronen et al., 2002) 6 1 50 10a ss_sosrepair2 1 50 10a ss_cadBA1 (Gonzalez et al., 2007) cadBA network in E. coli (Kuper and Jung, 2005) 5 1 25 <20a ss_cadBA2 1 25 <20a ss_clock1 (Daisuke and Horton, 2006) Mice cell cycle (Barrett et al., 2005) 7 1 12 ≈10a ss_clock2 1 12 ≈10a aEstimate from or assumption about data. See web site for further information. #var, number of dependent variables; #exp, number of experimental conditions with different initial conditions and/or input functions; #pts, number of data-points per time series. Noise is added from a Gaussian distribution with SD given as a certain percentage (denoted Noise) of each experimental value. Problem names starting with ‘ss_’ correspond to S-systems. The last section lists problems with real data from biological systems. For systems of realistic size and with realistic amount and quality of data this is a hard problem and heuristic algorithms are therefore typically employed. Such algorithms cannot guarantee that they give the best answer to a given problem. An established way to evaluate the performance of heuristic algorithms is to try them on a sufficient number of realistic test cases. However, lack of well defined and commonly accepted benchmark problems makes it difficult to evaluate and compare different methods: Problems are often incompletely and/or ambiguously specified, and personal communication is required to reconstruct essential aspects of the problems. Very few test problems are usually considered. This serves the purpose of demonstrating feasibility, but says little about general performance. Even when problems originate from the same system, different papers typically consider slight modifications of the original problem, which make direct comparisons difficult. An example is a test problem from Kikuchi et al., 2003. Several recent publications (Cho et al., 2006; Daisuke and Horton, 2006; Gennemark and Wedelin, 2007; Kimura et al., 2005; Liu and Wang, 2008; Tsai and Wang, 2005; Tucker and Moulton, 2006) consider the same system, but usually with a more informative dataset. To a considerable degree, the lack of benchmark problems is caused by a more fundamental underlying difficulty in how to specify such identification problems mathematically, and how to conveniently represent them in files. This is much more complex when both structure and parameters are unknown, compared with the well known case of parameter estimation in a given structure. In particular, it is not sufficient to give only data from one or several experiments, or a source model from which such data can be simulated, see Figure 1A. In addition to the data, a model space must be defined, to specify the possible forms of the right-hand side of the ODEs and appropriate parameter ranges. Clearly, if we allow few possible reactions and narrow parameter ranges we obtain a simpler identification problem, and if we allow many possible reactions and wide parameter ranges we obtain a more difficult problem. Furthermore, known prior information about parts of the model may be given. Finally, we must define an objective function that also includes some notion of model complexity. This is needed since we are searching among different structures, and a simple maximum likelihood criterion is not sufficient and will lead to over-fitting. Together, this defines the identification problem mathematically as an optimization problem, the solution of which is a model. An illustration of an identification problem is shown in Figure 1B, and the solution model is shown in Figure 1C. Fig. 1. Identification of ODE systems. (A) An identification problem can be specified with real or simulated data from one or several experiments, a model space of allowed reactions occurring on the right-hand side of the ODEs, an initial model representing prior knowledge and an error function. (B) An example of an identification problem with a model space of three traditional chemical reaction types, an error function where L=likelihood function, λ K=structural complexity term (λ=constant and K=number of model parameters) and an initial model with no prior information. (C) An example of a solution model. Based on these considerations, we present a collection of more than 40 fully specified and publicly available benchmark problems for ODE system identification. The problems have mainly been collected and translated from recent publications, but we have also included new and carefully designed problems. The collection includes both simple problems suitable for initial testing, and more complex problems with realistic size and quality of data. Most problems are based on simulated data from a known source model, which is important for evaluation purposes. However, several problems are also based on real data. The collection consists of problems of two different kinds. The first kind are problems whose solutions are models based on traditional chemical rate equations like unimolecular mass action and Michaelis–Menten's kinetics. An example of such a system with three variables is shown in Figure 1C. Typically for systems of this kind, the rate equations are composed as linear combinations of known reaction types that can be non-linear in both the parameters and in the state variables. The other kind of problem is where the solutions are defined on a fixed form, like S-systems which is currently the most popular form for identification. The S-system formalism (Savageau, 1976; Voit, 2000) is based on approximating kinetic laws with multivariate power-law functions. A model consists of n non-linear ODEs and the generic form of equation i reads (1) where X is a vector (length n) of dependent variables, α and β are vectors (length n) of non-negative rate constants and g and h are matrices (n × n) of kinetic orders, that can be negative as well as positive. The structure of the system is defined by the non-zero elements of g and h, and for realistic models most of these parameters are zero. To our knowledge, this is the first article to address how to best represent this kind of identification problems as optimization problems and to define a suitable file format. As far as we know, this is also the first collection of benchmark problems for ODE system identification, where both structure and parameters are unknown. Another paper that recognizes the importance of systematic evaluation of identification algorithms, is Mendes et al. (2003), which presents a system for generating realistic artificial gene networks and simulated data, and a large number of problems generated by this system. In comparison, our benchmark problems are typically smaller and address the more specific need of evaluating algorithms identifying both structure and parameters, fully specifying problems as optimization problems. For parameter estimation only, several benchmark problems are publicly available, e.g. in the data base EASY-FIT (Schittkowski, 2002). The two kinds of systems we consider, chemical rate equations and S-systems, have previously been used for identification of ODE models as indicated by references in Table 1. In addition to these kinds of systems, other fixed forms have been considered in the literature. One example is Generalized Mass Action (GMA) models (Tucker et al., 2007; Voit, 2000), and another is linear dynamical models with non-linear transfer functions of sigmoid type (D'haeseleer, 2000; Nelander et al., 2008; Wahde and Hertz, 2000). However, at this stage, we have chosen to focus on the more frequently used S-systems as one representative fixed form system. We finally note that identification of ODEs where more basic operands and operators are used as structural components have been considered (Bongard and Lipson, 2007).
title	1 INTRODUCTION
p	We consider the problem of identifying both the structure and the parameters of an ordinary differential equation (ODE) system from time series data. In recent years, there has been a significant increase in the number of reported methods approaching this problem in the biological literature (see Table 1 and the references). Table 1. Benchmark problems for ODE system identification Problem name Original problem reference Source system/Model #var #exp #pts Noise (%) simpleLin1 Simple linear system 3 3 13 0 simpleLin2 8 13 10 simpleFb1 Simple feedback loop (McKinney et al., 2006) 3 4 7 0 simpleFb2 4 7 5 simpleFb3 1 7 0 simpleFb4 (McKinney et al., 2006) 1 7 ≈5a osc1 An oscillator (Karnaukhov et al., 2007) 3 1 41 0 osc2 (Karnaukhov et al., 2007) 1 41 3 metabol1 (Gennemark and Wedelin, 2007) A metabolic pathway (Arkin and Ross, 1995) 5 12 7 0 metabol2 (Gennemark and Wedelin, 2007) 12 21 10 metabol3 (Gennemark and Wedelin, 2007) 12 21 20 3genes1 A 3-step gene network (Moles et al., 2003) 8 16 21 0 ss_cascade1 (Tsai and Wang, 2005) A cascaded pathway (Voit, 2000) 3 8 41 0 ss_cascade2 4 41 0 ss_cascade3 8 41 5 ss_branch1 (Voit and Almeida, 2004) A branched pathway (Voit, 2000) 4 3 21 0 ss_branch2 (Marino and Voit, 2006) 6 51 0 ss_branch3 (Tucker and Moulton, 2006) 5 20 0 ss_branch4 (Kutalik et al., 2007) 4 20 0 ss_branch5 (Kutalik et al., 2007) 4 20 2.5 ss_branch6 (Gonzalez et al., 2007) 5 31 0 ss_5genes1 (Kikuchi et al., 2003) A genetic network (Hlavacek and Savageau, 1996) 5 10 11 0 ss_5genes2 (Gennemark and Wedelin, 2007) 10 9 20 ss_5genes3 (Gennemark and Wedelin, 2007) 10 3 0 ss_5genes4 (Kimura et al., 2005) 15 11 0 ss_5genes5 (Daisuke and Horton, 2006) 10 11 0 ss_5genes6 (Cho et al., 2006) 1 16 0 ss_5genes7 (Tucker and Moulton, 2006) 10 20 0 ss_5genes8 (Tsai and Wang, 2005) 8 41 0 (Liu and Wang, 2008) ss_15genes1 A genetic network (Maki et al., 2001) 15 10 11 0 ss_15genes2 20 11 10 ss_30genes1 A genetic network (Maki et al., 2001) 30 15 11 0 ss_30genes2 (Kimura et al., 2005) 20 11 10 ss_30genes3 (Liu and Wang, 2008) 8 41 0 cytokine1 (McKinney et al., 2006) Immunologic data (Rock et al., 2004) 4 1 7 10a cytokine2 1 7 10a ss_ethanolferm1 (Liu and Wang, 2008) Ethanol fermentation (Wang et al., 2001) 4 2 11–15 ≈30a ss_ethanolferm2 3 11–19 ≈30a ss_sosrepair1 (Cho et al., 2006) SOS repair system Escherichia coli (Ronen et al., 2002) 6 1 50 10a ss_sosrepair2 1 50 10a ss_cadBA1 (Gonzalez et al., 2007) cadBA network in E. coli (Kuper and Jung, 2005) 5 1 25 <20a ss_cadBA2 1 25 <20a ss_clock1 (Daisuke and Horton, 2006) Mice cell cycle (Barrett et al., 2005) 7 1 12 ≈10a ss_clock2 1 12 ≈10a aEstimate from or assumption about data. See web site for further information. #var, number of dependent variables; #exp, number of experimental conditions with different initial conditions and/or input functions; #pts, number of data-points per time series. Noise is added from a Gaussian distribution with SD given as a certain percentage (denoted Noise) of each experimental value. Problem names starting with ‘ss_’ correspond to S-systems. The last section lists problems with real data from biological systems.
table-wrap	Table 1. Benchmark problems for ODE system identification Problem name Original problem reference Source system/Model #var #exp #pts Noise (%) simpleLin1 Simple linear system 3 3 13 0 simpleLin2 8 13 10 simpleFb1 Simple feedback loop (McKinney et al., 2006) 3 4 7 0 simpleFb2 4 7 5 simpleFb3 1 7 0 simpleFb4 (McKinney et al., 2006) 1 7 ≈5a osc1 An oscillator (Karnaukhov et al., 2007) 3 1 41 0 osc2 (Karnaukhov et al., 2007) 1 41 3 metabol1 (Gennemark and Wedelin, 2007) A metabolic pathway (Arkin and Ross, 1995) 5 12 7 0 metabol2 (Gennemark and Wedelin, 2007) 12 21 10 metabol3 (Gennemark and Wedelin, 2007) 12 21 20 3genes1 A 3-step gene network (Moles et al., 2003) 8 16 21 0 ss_cascade1 (Tsai and Wang, 2005) A cascaded pathway (Voit, 2000) 3 8 41 0 ss_cascade2 4 41 0 ss_cascade3 8 41 5 ss_branch1 (Voit and Almeida, 2004) A branched pathway (Voit, 2000) 4 3 21 0 ss_branch2 (Marino and Voit, 2006) 6 51 0 ss_branch3 (Tucker and Moulton, 2006) 5 20 0 ss_branch4 (Kutalik et al., 2007) 4 20 0 ss_branch5 (Kutalik et al., 2007) 4 20 2.5 ss_branch6 (Gonzalez et al., 2007) 5 31 0 ss_5genes1 (Kikuchi et al., 2003) A genetic network (Hlavacek and Savageau, 1996) 5 10 11 0 ss_5genes2 (Gennemark and Wedelin, 2007) 10 9 20 ss_5genes3 (Gennemark and Wedelin, 2007) 10 3 0 ss_5genes4 (Kimura et al., 2005) 15 11 0 ss_5genes5 (Daisuke and Horton, 2006) 10 11 0 ss_5genes6 (Cho et al., 2006) 1 16 0 ss_5genes7 (Tucker and Moulton, 2006) 10 20 0 ss_5genes8 (Tsai and Wang, 2005) 8 41 0 (Liu and Wang, 2008) ss_15genes1 A genetic network (Maki et al., 2001) 15 10 11 0 ss_15genes2 20 11 10 ss_30genes1 A genetic network (Maki et al., 2001) 30 15 11 0 ss_30genes2 (Kimura et al., 2005) 20 11 10 ss_30genes3 (Liu and Wang, 2008) 8 41 0 cytokine1 (McKinney et al., 2006) Immunologic data (Rock et al., 2004) 4 1 7 10a cytokine2 1 7 10a ss_ethanolferm1 (Liu and Wang, 2008) Ethanol fermentation (Wang et al., 2001) 4 2 11–15 ≈30a ss_ethanolferm2 3 11–19 ≈30a ss_sosrepair1 (Cho et al., 2006) SOS repair system Escherichia coli (Ronen et al., 2002) 6 1 50 10a ss_sosrepair2 1 50 10a ss_cadBA1 (Gonzalez et al., 2007) cadBA network in E. coli (Kuper and Jung, 2005) 5 1 25 <20a ss_cadBA2 1 25 <20a ss_clock1 (Daisuke and Horton, 2006) Mice cell cycle (Barrett et al., 2005) 7 1 12 ≈10a ss_clock2 1 12 ≈10a aEstimate from or assumption about data. See web site for further information. #var, number of dependent variables; #exp, number of experimental conditions with different initial conditions and/or input functions; #pts, number of data-points per time series. Noise is added from a Gaussian distribution with SD given as a certain percentage (denoted Noise) of each experimental value. Problem names starting with ‘ss_’ correspond to S-systems. The last section lists problems with real data from biological systems.
label	Table 1.
caption	Benchmark problems for ODE system identification
p	Benchmark problems for ODE system identification
table	Problem name Original problem reference Source system/Model #var #exp #pts Noise (%) simpleLin1 Simple linear system 3 3 13 0 simpleLin2 8 13 10 simpleFb1 Simple feedback loop (McKinney et al., 2006) 3 4 7 0 simpleFb2 4 7 5 simpleFb3 1 7 0 simpleFb4 (McKinney et al., 2006) 1 7 ≈5a osc1 An oscillator (Karnaukhov et al., 2007) 3 1 41 0 osc2 (Karnaukhov et al., 2007) 1 41 3 metabol1 (Gennemark and Wedelin, 2007) A metabolic pathway (Arkin and Ross, 1995) 5 12 7 0 metabol2 (Gennemark and Wedelin, 2007) 12 21 10 metabol3 (Gennemark and Wedelin, 2007) 12 21 20 3genes1 A 3-step gene network (Moles et al., 2003) 8 16 21 0 ss_cascade1 (Tsai and Wang, 2005) A cascaded pathway (Voit, 2000) 3 8 41 0 ss_cascade2 4 41 0 ss_cascade3 8 41 5 ss_branch1 (Voit and Almeida, 2004) A branched pathway (Voit, 2000) 4 3 21 0 ss_branch2 (Marino and Voit, 2006) 6 51 0 ss_branch3 (Tucker and Moulton, 2006) 5 20 0 ss_branch4 (Kutalik et al., 2007) 4 20 0 ss_branch5 (Kutalik et al., 2007) 4 20 2.5 ss_branch6 (Gonzalez et al., 2007) 5 31 0 ss_5genes1 (Kikuchi et al., 2003) A genetic network (Hlavacek and Savageau, 1996) 5 10 11 0 ss_5genes2 (Gennemark and Wedelin, 2007) 10 9 20 ss_5genes3 (Gennemark and Wedelin, 2007) 10 3 0 ss_5genes4 (Kimura et al., 2005) 15 11 0 ss_5genes5 (Daisuke and Horton, 2006) 10 11 0 ss_5genes6 (Cho et al., 2006) 1 16 0 ss_5genes7 (Tucker and Moulton, 2006) 10 20 0 ss_5genes8 (Tsai and Wang, 2005) 8 41 0 (Liu and Wang, 2008) ss_15genes1 A genetic network (Maki et al., 2001) 15 10 11 0 ss_15genes2 20 11 10 ss_30genes1 A genetic network (Maki et al., 2001) 30 15 11 0 ss_30genes2 (Kimura et al., 2005) 20 11 10 ss_30genes3 (Liu and Wang, 2008) 8 41 0 cytokine1 (McKinney et al., 2006) Immunologic data (Rock et al., 2004) 4 1 7 10a cytokine2 1 7 10a ss_ethanolferm1 (Liu and Wang, 2008) Ethanol fermentation (Wang et al., 2001) 4 2 11–15 ≈30a ss_ethanolferm2 3 11–19 ≈30a ss_sosrepair1 (Cho et al., 2006) SOS repair system Escherichia coli (Ronen et al., 2002) 6 1 50 10a ss_sosrepair2 1 50 10a ss_cadBA1 (Gonzalez et al., 2007) cadBA network in E. coli (Kuper and Jung, 2005) 5 1 25 <20a ss_cadBA2 1 25 <20a ss_clock1 (Daisuke and Horton, 2006) Mice cell cycle (Barrett et al., 2005) 7 1 12 ≈10a ss_clock2 1 12 ≈10a
tr	Problem name Original problem reference Source system/Model #var #exp #pts Noise (%)
th	Problem name
th	Original problem reference
th	Source system/Model
th	#var
th	#exp
th	#pts
th	Noise (%)
tr	simpleLin1 Simple linear system 3 3 13 0
td	simpleLin1
td	Simple linear system
td	3
td	3
td	13
td	0
tr	simpleLin2 8 13 10
td	simpleLin2
td	8
td	13
td	10
tr	simpleFb1 Simple feedback loop (McKinney et al., 2006) 3 4 7 0
td	simpleFb1
td	Simple feedback loop (McKinney et al., 2006)
td	3
td	4
td	7
td	0
tr	simpleFb2 4 7 5
td	simpleFb2
td	4
td	7
td	5
tr	simpleFb3 1 7 0
td	simpleFb3
td	1
td	7
td	0
tr	simpleFb4 (McKinney et al., 2006) 1 7 ≈5a
td	simpleFb4
td	(McKinney et al., 2006)
td	1
td	7
td	≈5a
tr	osc1 An oscillator (Karnaukhov et al., 2007) 3 1 41 0
td	osc1
td	An oscillator (Karnaukhov et al., 2007)
td	3
td	1
td	41
td	0
tr	osc2 (Karnaukhov et al., 2007) 1 41 3
td	osc2
td	(Karnaukhov et al., 2007)
td	1
td	41
td	3
tr	metabol1 (Gennemark and Wedelin, 2007) A metabolic pathway (Arkin and Ross, 1995) 5 12 7 0
td	metabol1
td	(Gennemark and Wedelin, 2007)
td	A metabolic pathway (Arkin and Ross, 1995)
td	5
td	12
td	7
td	0
tr	metabol2 (Gennemark and Wedelin, 2007) 12 21 10
td	metabol2
td	(Gennemark and Wedelin, 2007)
td	12
td	21
td	10
tr	metabol3 (Gennemark and Wedelin, 2007) 12 21 20
td	metabol3
td	(Gennemark and Wedelin, 2007)
td	12
td	21
td	20
tr	3genes1 A 3-step gene network (Moles et al., 2003) 8 16 21 0
td	3genes1
td	A 3-step gene network (Moles et al., 2003)
td	8
td	16
td	21
td	0
tr	ss_cascade1 (Tsai and Wang, 2005) A cascaded pathway (Voit, 2000) 3 8 41 0
td	ss_cascade1
td	(Tsai and Wang, 2005)
td	A cascaded pathway (Voit, 2000)
td	3
td	8
td	41
td	0
tr	ss_cascade2 4 41 0
td	ss_cascade2
td	4
td	41
td	0
tr	ss_cascade3 8 41 5
td	ss_cascade3
td	8
td	41
td	5
tr	ss_branch1 (Voit and Almeida, 2004) A branched pathway (Voit, 2000) 4 3 21 0
td	ss_branch1
td	(Voit and Almeida, 2004)
td	A branched pathway (Voit, 2000)
td	4
td	3
td	21
td	0
tr	ss_branch2 (Marino and Voit, 2006) 6 51 0
td	ss_branch2
td	(Marino and Voit, 2006)
td	6
td	51
td	0
tr	ss_branch3 (Tucker and Moulton, 2006) 5 20 0
td	ss_branch3
td	(Tucker and Moulton, 2006)
td	5
td	20
td	0
tr	ss_branch4 (Kutalik et al., 2007) 4 20 0
td	ss_branch4
td	(Kutalik et al., 2007)
td	4
td	20
td	0
tr	ss_branch5 (Kutalik et al., 2007) 4 20 2.5
td	ss_branch5
td	(Kutalik et al., 2007)
td	4
td	20
td	2.5
tr	ss_branch6 (Gonzalez et al., 2007) 5 31 0
td	ss_branch6
td	(Gonzalez et al., 2007)
td	5
td	31
td	0
tr	ss_5genes1 (Kikuchi et al., 2003) A genetic network (Hlavacek and Savageau, 1996) 5 10 11 0
td	ss_5genes1
td	(Kikuchi et al., 2003)
td	A genetic network (Hlavacek and Savageau, 1996)
td	5
td	10
td	11
td	0
tr	ss_5genes2 (Gennemark and Wedelin, 2007) 10 9 20
td	ss_5genes2
td	(Gennemark and Wedelin, 2007)
td	10
td	9
td	20
tr	ss_5genes3 (Gennemark and Wedelin, 2007) 10 3 0
td	ss_5genes3
td	(Gennemark and Wedelin, 2007)
td	10
td	3
td	0
tr	ss_5genes4 (Kimura et al., 2005) 15 11 0
td	ss_5genes4
td	(Kimura et al., 2005)
td	15
td	11
td	0
tr	ss_5genes5 (Daisuke and Horton, 2006) 10 11 0
td	ss_5genes5
td	(Daisuke and Horton, 2006)
td	10
td	11
td	0
tr	ss_5genes6 (Cho et al., 2006) 1 16 0
td	ss_5genes6
td	(Cho et al., 2006)
td	1
td	16
td	0
tr	ss_5genes7 (Tucker and Moulton, 2006) 10 20 0
td	ss_5genes7
td	(Tucker and Moulton, 2006)
td	10
td	20
td	0
tr	ss_5genes8 (Tsai and Wang, 2005) 8 41 0
td	ss_5genes8
td	(Tsai and Wang, 2005)
td	8
td	41
td	0
tr	(Liu and Wang, 2008)
td	(Liu and Wang, 2008)
tr	ss_15genes1 A genetic network (Maki et al., 2001) 15 10 11 0
td	ss_15genes1
td	A genetic network (Maki et al., 2001)
td	15
td	10
td	11
td	0
tr	ss_15genes2 20 11 10
td	ss_15genes2
td	20
td	11
td	10
tr	ss_30genes1 A genetic network (Maki et al., 2001) 30 15 11 0
td	ss_30genes1
td	A genetic network (Maki et al., 2001)
td	30
td	15
td	11
td	0
tr	ss_30genes2 (Kimura et al., 2005) 20 11 10
td	ss_30genes2
td	(Kimura et al., 2005)
td	20
td	11
td	10
tr	ss_30genes3 (Liu and Wang, 2008) 8 41 0
td	ss_30genes3
td	(Liu and Wang, 2008)
td	8
td	41
td	0
tr	cytokine1 (McKinney et al., 2006) Immunologic data (Rock et al., 2004) 4 1 7 10a
td	cytokine1
td	(McKinney et al., 2006)
td	Immunologic data (Rock et al., 2004)
td	4
td	1
td	7
td	10a
tr	cytokine2 1 7 10a
td	cytokine2
td	1
td	7
td	10a
tr	ss_ethanolferm1 (Liu and Wang, 2008) Ethanol fermentation (Wang et al., 2001) 4 2 11–15 ≈30a
td	ss_ethanolferm1
td	(Liu and Wang, 2008)
td	Ethanol fermentation (Wang et al., 2001)
td	4
td	2
td	11–15
td	≈30a
tr	ss_ethanolferm2 3 11–19 ≈30a
td	ss_ethanolferm2
td	3
td	11–19
td	≈30a
tr	ss_sosrepair1 (Cho et al., 2006) SOS repair system Escherichia coli (Ronen et al., 2002) 6 1 50 10a
td	ss_sosrepair1
td	(Cho et al., 2006)
td	SOS repair system Escherichia coli (Ronen et al., 2002)
td	6
td	1
td	50
td	10a
tr	ss_sosrepair2 1 50 10a
td	ss_sosrepair2
td	1
td	50
td	10a
tr	ss_cadBA1 (Gonzalez et al., 2007) cadBA network in E. coli (Kuper and Jung, 2005) 5 1 25 <20a
td	ss_cadBA1
td	(Gonzalez et al., 2007)
td	cadBA network in E. coli (Kuper and Jung, 2005)
td	5
td	1
td	25
td	<20a
tr	ss_cadBA2 1 25 <20a
td	ss_cadBA2
td	1
td	25
td	<20a
tr	ss_clock1 (Daisuke and Horton, 2006) Mice cell cycle (Barrett et al., 2005) 7 1 12 ≈10a
td	ss_clock1
td	(Daisuke and Horton, 2006)
td	Mice cell cycle (Barrett et al., 2005)
td	7
td	1
td	12
td	≈10a
tr	ss_clock2 1 12 ≈10a
td	ss_clock2
td	1
td	12
td	≈10a
table-wrap-foot	aEstimate from or assumption about data. See web site for further information. #var, number of dependent variables; #exp, number of experimental conditions with different initial conditions and/or input functions; #pts, number of data-points per time series. Noise is added from a Gaussian distribution with SD given as a certain percentage (denoted Noise) of each experimental value. Problem names starting with ‘ss_’ correspond to S-systems. The last section lists problems with real data from biological systems.
footnote	aEstimate from or assumption about data. See web site for further information.
p	aEstimate from or assumption about data. See web site for further information.
footnote	#var, number of dependent variables; #exp, number of experimental conditions with different initial conditions and/or input functions; #pts, number of data-points per time series.
p	#var, number of dependent variables; #exp, number of experimental conditions with different initial conditions and/or input functions; #pts, number of data-points per time series.
footnote	Noise is added from a Gaussian distribution with SD given as a certain percentage (denoted Noise) of each experimental value. Problem names starting with ‘ss_’ correspond to S-systems. The last section lists problems with real data from biological systems.
p	Noise is added from a Gaussian distribution with SD given as a certain percentage (denoted Noise) of each experimental value. Problem names starting with ‘ss_’ correspond to S-systems. The last section lists problems with real data from biological systems.
p	For systems of realistic size and with realistic amount and quality of data this is a hard problem and heuristic algorithms are therefore typically employed. Such algorithms cannot guarantee that they give the best answer to a given problem. An established way to evaluate the performance of heuristic algorithms is to try them on a sufficient number of realistic test cases. However, lack of well defined and commonly accepted benchmark problems makes it difficult to evaluate and compare different methods: Problems are often incompletely and/or ambiguously specified, and personal communication is required to reconstruct essential aspects of the problems. Very few test problems are usually considered. This serves the purpose of demonstrating feasibility, but says little about general performance. Even when problems originate from the same system, different papers typically consider slight modifications of the original problem, which make direct comparisons difficult. An example is a test problem from Kikuchi et al., 2003. Several recent publications (Cho et al., 2006; Daisuke and Horton, 2006; Gennemark and Wedelin, 2007; Kimura et al., 2005; Liu and Wang, 2008; Tsai and Wang, 2005; Tucker and Moulton, 2006) consider the same system, but usually with a more informative dataset.
p	Problems are often incompletely and/or ambiguously specified, and personal communication is required to reconstruct essential aspects of the problems.
p	Very few test problems are usually considered. This serves the purpose of demonstrating feasibility, but says little about general performance.
p	Even when problems originate from the same system, different papers typically consider slight modifications of the original problem, which make direct comparisons difficult. An example is a test problem from Kikuchi et al., 2003. Several recent publications (Cho et al., 2006; Daisuke and Horton, 2006; Gennemark and Wedelin, 2007; Kimura et al., 2005; Liu and Wang, 2008; Tsai and Wang, 2005; Tucker and Moulton, 2006) consider the same system, but usually with a more informative dataset.
p	To a considerable degree, the lack of benchmark problems is caused by a more fundamental underlying difficulty in how to specify such identification problems mathematically, and how to conveniently represent them in files. This is much more complex when both structure and parameters are unknown, compared with the well known case of parameter estimation in a given structure.
p	In particular, it is not sufficient to give only data from one or several experiments, or a source model from which such data can be simulated, see Figure 1A. In addition to the data, a model space must be defined, to specify the possible forms of the right-hand side of the ODEs and appropriate parameter ranges. Clearly, if we allow few possible reactions and narrow parameter ranges we obtain a simpler identification problem, and if we allow many possible reactions and wide parameter ranges we obtain a more difficult problem. Furthermore, known prior information about parts of the model may be given. Finally, we must define an objective function that also includes some notion of model complexity. This is needed since we are searching among different structures, and a simple maximum likelihood criterion is not sufficient and will lead to over-fitting. Together, this defines the identification problem mathematically as an optimization problem, the solution of which is a model. An illustration of an identification problem is shown in Figure 1B, and the solution model is shown in Figure 1C. Fig. 1. Identification of ODE systems. (A) An identification problem can be specified with real or simulated data from one or several experiments, a model space of allowed reactions occurring on the right-hand side of the ODEs, an initial model representing prior knowledge and an error function. (B) An example of an identification problem with a model space of three traditional chemical reaction types, an error function where L=likelihood function, λ K=structural complexity term (λ=constant and K=number of model parameters) and an initial model with no prior information. (C) An example of a solution model.
figure	Fig. 1. Identification of ODE systems. (A) An identification problem can be specified with real or simulated data from one or several experiments, a model space of allowed reactions occurring on the right-hand side of the ODEs, an initial model representing prior knowledge and an error function. (B) An example of an identification problem with a model space of three traditional chemical reaction types, an error function where L=likelihood function, λ K=structural complexity term (λ=constant and K=number of model parameters) and an initial model with no prior information. (C) An example of a solution model.
label	Fig. 1.
caption	Identification of ODE systems. (A) An identification problem can be specified with real or simulated data from one or several experiments, a model space of allowed reactions occurring on the right-hand side of the ODEs, an initial model representing prior knowledge and an error function. (B) An example of an identification problem with a model space of three traditional chemical reaction types, an error function where L=likelihood function, λ K=structural complexity term (λ=constant and K=number of model parameters) and an initial model with no prior information. (C) An example of a solution model.
p	Identification of ODE systems. (A) An identification problem can be specified with real or simulated data from one or several experiments, a model space of allowed reactions occurring on the right-hand side of the ODEs, an initial model representing prior knowledge and an error function. (B) An example of an identification problem with a model space of three traditional chemical reaction types, an error function where L=likelihood function, λ K=structural complexity term (λ=constant and K=number of model parameters) and an initial model with no prior information. (C) An example of a solution model.
p	Based on these considerations, we present a collection of more than 40 fully specified and publicly available benchmark problems for ODE system identification. The problems have mainly been collected and translated from recent publications, but we have also included new and carefully designed problems.
p	The collection includes both simple problems suitable for initial testing, and more complex problems with realistic size and quality of data. Most problems are based on simulated data from a known source model, which is important for evaluation purposes. However, several problems are also based on real data.
p	The collection consists of problems of two different kinds. The first kind are problems whose solutions are models based on traditional chemical rate equations like unimolecular mass action and Michaelis–Menten's kinetics. An example of such a system with three variables is shown in Figure 1C. Typically for systems of this kind, the rate equations are composed as linear combinations of known reaction types that can be non-linear in both the parameters and in the state variables.
p	The other kind of problem is where the solutions are defined on a fixed form, like S-systems which is currently the most popular form for identification. The S-system formalism (Savageau, 1976; Voit, 2000) is based on approximating kinetic laws with multivariate power-law functions. A model consists of n non-linear ODEs and the generic form of equation i reads (1) where X is a vector (length n) of dependent variables, α and β are vectors (length n) of non-negative rate constants and g and h are matrices (n × n) of kinetic orders, that can be negative as well as positive. The structure of the system is defined by the non-zero elements of g and h, and for realistic models most of these parameters are zero.
label	(1)
p	To our knowledge, this is the first article to address how to best represent this kind of identification problems as optimization problems and to define a suitable file format. As far as we know, this is also the first collection of benchmark problems for ODE system identification, where both structure and parameters are unknown.
p	Another paper that recognizes the importance of systematic evaluation of identification algorithms, is Mendes et al. (2003), which presents a system for generating realistic artificial gene networks and simulated data, and a large number of problems generated by this system. In comparison, our benchmark problems are typically smaller and address the more specific need of evaluating algorithms identifying both structure and parameters, fully specifying problems as optimization problems. For parameter estimation only, several benchmark problems are publicly available, e.g. in the data base EASY-FIT (Schittkowski, 2002).
p	The two kinds of systems we consider, chemical rate equations and S-systems, have previously been used for identification of ODE models as indicated by references in Table 1. In addition to these kinds of systems, other fixed forms have been considered in the literature. One example is Generalized Mass Action (GMA) models (Tucker et al., 2007; Voit, 2000), and another is linear dynamical models with non-linear transfer functions of sigmoid type (D'haeseleer, 2000; Nelander et al., 2008; Wahde and Hertz, 2000). However, at this stage, we have chosen to focus on the more frequently used S-systems as one representative fixed form system.
p	We finally note that identification of ODEs where more basic operands and operators are used as structural components have been considered (Bongard and Lipson, 2007).
sec	2 SPECIFYING IDENTIFICATION PROBLEMS AS OPTIMIZATION PROBLEMS We now outline how we define identification problems mathematically. Our first and most important concern has been that an identification problem must be defined unambiguously as an optimization problem. This ensures that a problem has a well defined solution independently of any method for solving it, and that the task of modelling identification problems is separated from the algorithmic task of solving given problems. It is to be noted that if we fail to explicitly specify the entire problem, missing parts have to be supplied by the user or be implicitly defined by the algorithm, resulting in confusion and lack of reproducibility. Our second concern has been to find a reasonably simple standard form that can still represent a wide range of identification problems. We have then chosen the following way to structure and specify our problems. Data. Time series data for one or several experiments. In the case of several experiments, they may differ with respect to initial values of the variables and/or input functions. For each data-point, the standard deviation is also given in the problem specification. Model space. The model space determines the allowed form of the right-hand side of the ODEs. For models based on traditional chemical rate equations, each ODE in the model is assumed to be a sum of a number of reactions. The possible reactions must belong to a subset of predefined reaction types, where each allowed reaction type is specified by its name, a subset of possible input variables and ranges of allowed parameter values. The allowed reaction types can be specified individually for each state variable. As an example, in Figure 1B, the allowed reactions types are a unimolecular mass action reaction, a Michaelis–Menten reaction, and a simplified Hill equation. For reactions having multiple input variables, e.g. a bimolecular mass action reaction with equation k1 Xi Xj, it is implicitly assumed that i and j are not equal (to consider equality as in problem osc1 and osc2, we define an additional reaction type k1 Xi2). The model space of an S-system is simply defined by lower and upper bounds for each element in the parameter vectors (α and β) and matrices (g and h). Sometimes, additional constraints are required. For example, three of the benchmark problems include an additional constraint of type {gi,j∈[−3, 3], gi,j≠0} (there is an interaction between variable i and j but the direction is unknown). Finally, we also define lower and upper bounds for the initial data-point in each time series. For noisy data, these bounds were set to ±2 SDs. Hence, for noisy data there is one additional parameter for each time series, but these parameters are typically bound tighter compared with the model parameters. Initial model. It is convenient to allow definition of an initial model, corresponding to prior knowledge of the system. The initial model is described as known reactions (terms) on the right-hand side of the ODEs. Also reactions from outside the model space can be included in the initial model. In principle, one can also think of prior information in the form of starting points for iterative algorithms and thus not technically a part of the defined problem. No such information is assumed known in our current problems. Error function. We have chosen to minimize (2) The first term is the negative log-likelihood of the experimental data, and the second term is a term that penalizes structural complexity of the model. This kind of error function is common, and is related to several different proposed methods for handling model complexity (Crampin et al., 2004). In detail, L is the log-likelihood, denotes the experimental data, k is a vector of parameters, λ is a constant and K is the number of parameters. By assuming independent and normally distributed measurement errors and disregarding constant terms we can express the log-likelihood for one time series as (3) where i indexes the measurement points, and where Xj, and σj denotes simulated data, experimental data and SD for variable j, respectively. The total log-likelihood is defined by summing over all variables and all experiments. For models based on chemical rate equations, K is simply the total number of parameters on the right-hand side of the ODEs. For S-systems, it is natural to define K as the total number of non-zero elements in g and h plus the number of parameters in α and β. To establish the relationship with standard optimization terminology, the model space and and the initial model define the feasible set, and the data together with the error function define the objective function. 2.1 File format for identification problems In order to work with identification problems and to provide them as input to identification algorithms, we also need to represent such problems in files, and it is highly desirable that an entire problem can be represented in a single file. Since an identification problem is an optimization problem and not a model, common model formats such as SBML are not applicable, and no existing format known to us can be used for this purpose. However, the output of an identification algorithm is a model and can be represented in SBML. Also, if a partially known initial model is available in SBML, an identification algorithm can input that part in SBML and the rest of the identification problem in our format. The file format we have designed for the identification problems can be seen as a special case of a more general format that is currently developed as a separate project. The format attempts to be self documenting and easy to read both by the human and the computer. Compared with a typical XML-equivalent, more structure is explicit and it is easier to read. It bears some resemblance to how data structures are built in a computer programme, but without the explicit use of pointers. An extract of the format is given below: This particular extract describes first some variables and then a possible reaction in the model space. Finally, a sample from the first experiment is given. The format can also be used to specify a parameter estimation problem only. Method specific parameters (like random seed) can be defined in the same file or separately. Finally, we can easily extend the format to describe new classes of problems, for example with compartment modelling and system modifications like gene deletions. We anticipate that the exact mathematical form of the identification problems, as well as the file format, will be extended over time. Up-to-date information and detailed documentation is therefore available on the web site, as well as a simple parser for the file format.
title	2 SPECIFYING IDENTIFICATION PROBLEMS AS OPTIMIZATION PROBLEMS
p	We now outline how we define identification problems mathematically. Our first and most important concern has been that an identification problem must be defined unambiguously as an optimization problem. This ensures that a problem has a well defined solution independently of any method for solving it, and that the task of modelling identification problems is separated from the algorithmic task of solving given problems. It is to be noted that if we fail to explicitly specify the entire problem, missing parts have to be supplied by the user or be implicitly defined by the algorithm, resulting in confusion and lack of reproducibility.
p	Our second concern has been to find a reasonably simple standard form that can still represent a wide range of identification problems. We have then chosen the following way to structure and specify our problems. Data. Time series data for one or several experiments. In the case of several experiments, they may differ with respect to initial values of the variables and/or input functions. For each data-point, the standard deviation is also given in the problem specification. Model space. The model space determines the allowed form of the right-hand side of the ODEs. For models based on traditional chemical rate equations, each ODE in the model is assumed to be a sum of a number of reactions. The possible reactions must belong to a subset of predefined reaction types, where each allowed reaction type is specified by its name, a subset of possible input variables and ranges of allowed parameter values. The allowed reaction types can be specified individually for each state variable. As an example, in Figure 1B, the allowed reactions types are a unimolecular mass action reaction, a Michaelis–Menten reaction, and a simplified Hill equation. For reactions having multiple input variables, e.g. a bimolecular mass action reaction with equation k1 Xi Xj, it is implicitly assumed that i and j are not equal (to consider equality as in problem osc1 and osc2, we define an additional reaction type k1 Xi2). The model space of an S-system is simply defined by lower and upper bounds for each element in the parameter vectors (α and β) and matrices (g and h). Sometimes, additional constraints are required. For example, three of the benchmark problems include an additional constraint of type {gi,j∈[−3, 3], gi,j≠0} (there is an interaction between variable i and j but the direction is unknown). Finally, we also define lower and upper bounds for the initial data-point in each time series. For noisy data, these bounds were set to ±2 SDs. Hence, for noisy data there is one additional parameter for each time series, but these parameters are typically bound tighter compared with the model parameters. Initial model. It is convenient to allow definition of an initial model, corresponding to prior knowledge of the system. The initial model is described as known reactions (terms) on the right-hand side of the ODEs. Also reactions from outside the model space can be included in the initial model. In principle, one can also think of prior information in the form of starting points for iterative algorithms and thus not technically a part of the defined problem. No such information is assumed known in our current problems. Error function. We have chosen to minimize (2) The first term is the negative log-likelihood of the experimental data, and the second term is a term that penalizes structural complexity of the model. This kind of error function is common, and is related to several different proposed methods for handling model complexity (Crampin et al., 2004). In detail, L is the log-likelihood, denotes the experimental data, k is a vector of parameters, λ is a constant and K is the number of parameters. By assuming independent and normally distributed measurement errors and disregarding constant terms we can express the log-likelihood for one time series as (3) where i indexes the measurement points, and where Xj, and σj denotes simulated data, experimental data and SD for variable j, respectively. The total log-likelihood is defined by summing over all variables and all experiments. For models based on chemical rate equations, K is simply the total number of parameters on the right-hand side of the ODEs. For S-systems, it is natural to define K as the total number of non-zero elements in g and h plus the number of parameters in α and β.
p	Data. Time series data for one or several experiments. In the case of several experiments, they may differ with respect to initial values of the variables and/or input functions. For each data-point, the standard deviation is also given in the problem specification.
p	Model space. The model space determines the allowed form of the right-hand side of the ODEs.
p	For models based on traditional chemical rate equations, each ODE in the model is assumed to be a sum of a number of reactions. The possible reactions must belong to a subset of predefined reaction types, where each allowed reaction type is specified by its name, a subset of possible input variables and ranges of allowed parameter values. The allowed reaction types can be specified individually for each state variable. As an example, in Figure 1B, the allowed reactions types are a unimolecular mass action reaction, a Michaelis–Menten reaction, and a simplified Hill equation. For reactions having multiple input variables, e.g. a bimolecular mass action reaction with equation k1 Xi Xj, it is implicitly assumed that i and j are not equal (to consider equality as in problem osc1 and osc2, we define an additional reaction type k1 Xi2).
p	The model space of an S-system is simply defined by lower and upper bounds for each element in the parameter vectors (α and β) and matrices (g and h). Sometimes, additional constraints are required. For example, three of the benchmark problems include an additional constraint of type {gi,j∈[−3, 3], gi,j≠0} (there is an interaction between variable i and j but the direction is unknown).
p	Finally, we also define lower and upper bounds for the initial data-point in each time series. For noisy data, these bounds were set to ±2 SDs. Hence, for noisy data there is one additional parameter for each time series, but these parameters are typically bound tighter compared with the model parameters.
p	Initial model. It is convenient to allow definition of an initial model, corresponding to prior knowledge of the system. The initial model is described as known reactions (terms) on the right-hand side of the ODEs. Also reactions from outside the model space can be included in the initial model.
p	In principle, one can also think of prior information in the form of starting points for iterative algorithms and thus not technically a part of the defined problem. No such information is assumed known in our current problems.
p	Error function. We have chosen to minimize (2)
label	(2)
p	The first term is the negative log-likelihood of the experimental data, and the second term is a term that penalizes structural complexity of the model. This kind of error function is common, and is related to several different proposed methods for handling model complexity (Crampin et al., 2004).
p	In detail, L is the log-likelihood, denotes the experimental data, k is a vector of parameters, λ is a constant and K is the number of parameters. By assuming independent and normally distributed measurement errors and disregarding constant terms we can express the log-likelihood for one time series as (3) where i indexes the measurement points, and where Xj, and σj denotes simulated data, experimental data and SD for variable j, respectively. The total log-likelihood is defined by summing over all variables and all experiments.
label	(3)
p	For models based on chemical rate equations, K is simply the total number of parameters on the right-hand side of the ODEs. For S-systems, it is natural to define K as the total number of non-zero elements in g and h plus the number of parameters in α and β.
p	To establish the relationship with standard optimization terminology, the model space and and the initial model define the feasible set, and the data together with the error function define the objective function.
sec	2.1 File format for identification problems In order to work with identification problems and to provide them as input to identification algorithms, we also need to represent such problems in files, and it is highly desirable that an entire problem can be represented in a single file. Since an identification problem is an optimization problem and not a model, common model formats such as SBML are not applicable, and no existing format known to us can be used for this purpose. However, the output of an identification algorithm is a model and can be represented in SBML. Also, if a partially known initial model is available in SBML, an identification algorithm can input that part in SBML and the rest of the identification problem in our format. The file format we have designed for the identification problems can be seen as a special case of a more general format that is currently developed as a separate project. The format attempts to be self documenting and easy to read both by the human and the computer. Compared with a typical XML-equivalent, more structure is explicit and it is easier to read. It bears some resemblance to how data structures are built in a computer programme, but without the explicit use of pointers. An extract of the format is given below: This particular extract describes first some variables and then a possible reaction in the model space. Finally, a sample from the first experiment is given. The format can also be used to specify a parameter estimation problem only. Method specific parameters (like random seed) can be defined in the same file or separately. Finally, we can easily extend the format to describe new classes of problems, for example with compartment modelling and system modifications like gene deletions. We anticipate that the exact mathematical form of the identification problems, as well as the file format, will be extended over time. Up-to-date information and detailed documentation is therefore available on the web site, as well as a simple parser for the file format.
title	2.1 File format for identification problems
p	In order to work with identification problems and to provide them as input to identification algorithms, we also need to represent such problems in files, and it is highly desirable that an entire problem can be represented in a single file. Since an identification problem is an optimization problem and not a model, common model formats such as SBML are not applicable, and no existing format known to us can be used for this purpose. However, the output of an identification algorithm is a model and can be represented in SBML. Also, if a partially known initial model is available in SBML, an identification algorithm can input that part in SBML and the rest of the identification problem in our format.
p	The file format we have designed for the identification problems can be seen as a special case of a more general format that is currently developed as a separate project. The format attempts to be self documenting and easy to read both by the human and the computer. Compared with a typical XML-equivalent, more structure is explicit and it is easier to read. It bears some resemblance to how data structures are built in a computer programme, but without the explicit use of pointers. An extract of the format is given below:
p	This particular extract describes first some variables and then a possible reaction in the model space. Finally, a sample from the first experiment is given.
p	The format can also be used to specify a parameter estimation problem only. Method specific parameters (like random seed) can be defined in the same file or separately. Finally, we can easily extend the format to describe new classes of problems, for example with compartment modelling and system modifications like gene deletions.
p	We anticipate that the exact mathematical form of the identification problems, as well as the file format, will be extended over time. Up-to-date information and detailed documentation is therefore available on the web site, as well as a simple parser for the file format.
sec	3 THE COLLECTION OF BENCHMARK PROBLEMS An overview of the collection of benchmark problems is presented in Table 1. The problems have different sources, and in most cases several different problems are given for each system. The number of variables in combination with the amount of data and noise level, give a rough indication of the difficulty of each problem. The first problem for each system generally has a lower level of difficulty than the subsequent problems, and is hence a good starting point when evaluating new methods. A part of the collection is based on identification problems considered in the literature, which is dominated by problems based on S-systems, probably because these are relatively easy to represent. By including translations of these problems that were previously not easily accessible, the collection has an immediate value for those actively developing algorithms in this area. However, we balance the S-system dominance by also defining new problems based on traditional chemical reactions that are more frequently used in the biological modelling community. We note that research in this area is in an early stage, and in our experience identification algorithms for S-systems can relatively easily be modified for other kinds of models. Our proposed way of specifying identification problems can then facilitate and contribute to a more widespread use of problems based on other kinds of models. For problems that have been collected and translated from another publication, we have taken care to match the original problem as closely as available information and the standardization to our formalism allow, see the Supplementary Material for details. We have replaced some non-standard objective functions with our simple function, and relaxed a few unusual constraints. Noisy data have been simulated from random distributions, and in one case we have also reduced an excessive and unrealistic number of data-points in the original problem. However, the solutions in the next section confirm that the changes have not negatively affected the quality of the solutions when compared directly with the source models. We note that our problems never have more data or a smaller model space, so they are in this sense at least as difficult as the original problems. 3.1 Best known solutions In Table 2 we report the best known solutions for all benchmark problems, and we will now describe the detailed information in the table. The solutions have been found using our own identification algorithm (Gennemark and Wedelin, 2007). All solution models as well as source models are available in SBML on our web site. Table 2. Best known solutions to the benchmark problems Problem Error of source model −L+λK Best known solutiona Solution of original problem (if available) Error −L+λK Residual −L Structure FP/FN/LD Stability between runs Average time (s) (1 GHz) Structure FP/FN/LD Time (s) (1 GHz) simpleLin1 8 8 0 0/0/0 5/5 17 simpleLin2 187.4 140.2 132.2 0/0/– 5/5 86 simpleFb1 7 7 0 0/0/0 5/5 14 simpleFb2 45.46 42.14 34.14 1/1/– 3/5 36 simpleFb3 7 7 0 0/0/0 3/5 41 simpleFb4 18.39 7.071 0.0713 0/0/– 2/5 47 0/0/– – osc1 6 6.149 0.1491 0/0/0 5/5 18 osc2 122.1 87.34 57.34 0/0/0 4/5 24 – – metabol1 30 30 0 0/0/0 3/3 5400 metabol2 1214 751.6 601.6 0/0/– 2/3 11 000 metabol3 1442 861.1 696.1 2/1/– 1/5b 13 000 3genes1 39 39 0 0/0/0 3/3 35 000 ss_cascade1 14 14 0 0/0/0 5/5 180 0/0/300 8900 ss_cascade2 14 14 0 0/0/0 4/5 130 ss_cascade3 498.9 476.7 462.7 1/1/– 1/5b 860 ss_branch1 17 17 0 0/0/0 5/5 68 0/0/15 2200 ss_branch2 17 17 0 0/0/0 5/5 110 0/0/5 35 000 ss_branch3 17 17 0 0/0/0 5/5 150 0/0/2 15 000 ss_branch4 17 17 0 0/0/0 5/5 71 0/0/0 25 000 ss_branch5 211.3 142.1 122.1 4/1/– 3/5 300 – – ss_branch6 18 18 0 0/0/0 5/5 90 0/0/5 1100 ss_5genes1 23 23 0 0/0/0 5/5 600 1/0/– 2.4E+8 ss_5genes2 300.3 211.0 183.0 5/0/– 2/5b 2100 ss_5genes3 23 23 0 0/0/0 5/5 380 ss_5genes4 23 23 0 0/0/0 5/5 640 37/0/– 40 000 ss_5genes5 23 23.00 1.14E−3 0/0/0.02 5/5 400 4/2/– 7200 ss_5genes6 23 23 0 0/0/0 5/5 130 0/0/50 56 000 ss_5genes7 23 23 0 0/0/0 5/5 190 0/0/0.2 15 000 ss_5genes8 28 28 0 0/0/0 5/5 440 0/0/5 26 000 0/0/2.5 7000 ss_15genes1 62 62 0 0/0/0 5/5 1500 ss_15genes2 2010 1783 1478 3/4/– 2/5b 18 000 ss_30genes1 128 128 0 0/0/0 5/5 7900 ss_30genes2 4098 3628 2993 6/7/– 1/3b 94 000 242/10/– 6.2E+6 ss_30genes3 128 128 0 0/0/0 3/3 18 000 0/0/25 8.6E+6 cytokine1 25.92 17.92 1/5c 11 – – cytokine2 42.17 32.17 1/5c 23 ss_ethanolferm1 127.4 110.4 1/5c 190 err>127.4d – ss_ethanolferm2 1308 1292 1/5c 360 ss_sosrepair1 2642 2611 1/5c 510 err>2642d 2.7E+5 ss_sosrepair2 2823 2789 1/5c 470 ss_cadBA1 750.6 726.6 1/5c 250 err>750.6d >540 ss_cadBA2 709.1 687.1 1/5c 260 ss_clock1 928.4 803.4 1/5c 360 – 15 000 ss_clock2 814.5 649.5 1/5c 440 aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed. b The majority of runs have an error below the error of the source model, but differ slightly in structure. cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions. dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution. The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs. For problems with noisy or otherwise insufficient data, an optimal solution has slightly different parameter values and sometimes also different structure than the source model. To clarify this for our solutions, Table 2 gives the error of the source model from which data was simulated. We can see that for all problems based on simulated data the found solutions have approximately equal or lower error than that of the source model. We also give the number of FP and FN interactions, to indicate if the found structure is the same as the source model. If the structure is the same as for the source model, we also give the largest deviation (LD) in percent for any parameter compared with the source model. For the problems based on real data, no source model is available, and no such comparisons can be made. For our algorithm, as well as for most algorithms in the literature, the results may differ between runs depending on a random seed. With our approach, the main difference between runs is that sometimes the best structure is not found due to the heuristic nature of the search algorithm. However, for two solutions with the same structure, the difference in the parameters is negligible for all tested problems. To indicate the stability of the algorithm, we therefore report the proportion of runs where the best structure was found. Note, however, the fact that solutions are stable does not generally imply that they are optimal. Computation times are reported as the average computation time for runs with different random seeds, scaled to a 1 GHz processor (the actual runs were performed on a Pentium IV, 2.13 GHz). The times can be improved by adjusting the method parameters for individual problems, but we report the computation time using our standard settings, see the Supplementary Material. For the benchmark problems that have been adapted from problems in the literature, the final columns report available results for these original problems. In addition to the numerical details, Table 2 allows us to draw the following qualitative conclusions about the benchmark problems. Every problem with data from a known source model is well formulated in the sense that the solution model is similar to the source model, in terms of the FP/FN/LD values. Every solution model is at least as similar to the source model as any previously reported solution. This confirms that the translation to our standard format works well, and also that our solutions are indeed the best known solutions to the benchmark problems. Every problem can be solved within hours on an ordinary computer with our approach, so there is no need for supercomputing. This was not previously established since some previous results for the original problems required running times of several months if run on a single processor.
title	3 THE COLLECTION OF BENCHMARK PROBLEMS
p	An overview of the collection of benchmark problems is presented in Table 1. The problems have different sources, and in most cases several different problems are given for each system.
p	The number of variables in combination with the amount of data and noise level, give a rough indication of the difficulty of each problem. The first problem for each system generally has a lower level of difficulty than the subsequent problems, and is hence a good starting point when evaluating new methods.
p	A part of the collection is based on identification problems considered in the literature, which is dominated by problems based on S-systems, probably because these are relatively easy to represent. By including translations of these problems that were previously not easily accessible, the collection has an immediate value for those actively developing algorithms in this area. However, we balance the S-system dominance by also defining new problems based on traditional chemical reactions that are more frequently used in the biological modelling community. We note that research in this area is in an early stage, and in our experience identification algorithms for S-systems can relatively easily be modified for other kinds of models. Our proposed way of specifying identification problems can then facilitate and contribute to a more widespread use of problems based on other kinds of models.
p	For problems that have been collected and translated from another publication, we have taken care to match the original problem as closely as available information and the standardization to our formalism allow, see the Supplementary Material for details. We have replaced some non-standard objective functions with our simple function, and relaxed a few unusual constraints. Noisy data have been simulated from random distributions, and in one case we have also reduced an excessive and unrealistic number of data-points in the original problem. However, the solutions in the next section confirm that the changes have not negatively affected the quality of the solutions when compared directly with the source models. We note that our problems never have more data or a smaller model space, so they are in this sense at least as difficult as the original problems.
sec	3.1 Best known solutions In Table 2 we report the best known solutions for all benchmark problems, and we will now describe the detailed information in the table. The solutions have been found using our own identification algorithm (Gennemark and Wedelin, 2007). All solution models as well as source models are available in SBML on our web site. Table 2. Best known solutions to the benchmark problems Problem Error of source model −L+λK Best known solutiona Solution of original problem (if available) Error −L+λK Residual −L Structure FP/FN/LD Stability between runs Average time (s) (1 GHz) Structure FP/FN/LD Time (s) (1 GHz) simpleLin1 8 8 0 0/0/0 5/5 17 simpleLin2 187.4 140.2 132.2 0/0/– 5/5 86 simpleFb1 7 7 0 0/0/0 5/5 14 simpleFb2 45.46 42.14 34.14 1/1/– 3/5 36 simpleFb3 7 7 0 0/0/0 3/5 41 simpleFb4 18.39 7.071 0.0713 0/0/– 2/5 47 0/0/– – osc1 6 6.149 0.1491 0/0/0 5/5 18 osc2 122.1 87.34 57.34 0/0/0 4/5 24 – – metabol1 30 30 0 0/0/0 3/3 5400 metabol2 1214 751.6 601.6 0/0/– 2/3 11 000 metabol3 1442 861.1 696.1 2/1/– 1/5b 13 000 3genes1 39 39 0 0/0/0 3/3 35 000 ss_cascade1 14 14 0 0/0/0 5/5 180 0/0/300 8900 ss_cascade2 14 14 0 0/0/0 4/5 130 ss_cascade3 498.9 476.7 462.7 1/1/– 1/5b 860 ss_branch1 17 17 0 0/0/0 5/5 68 0/0/15 2200 ss_branch2 17 17 0 0/0/0 5/5 110 0/0/5 35 000 ss_branch3 17 17 0 0/0/0 5/5 150 0/0/2 15 000 ss_branch4 17 17 0 0/0/0 5/5 71 0/0/0 25 000 ss_branch5 211.3 142.1 122.1 4/1/– 3/5 300 – – ss_branch6 18 18 0 0/0/0 5/5 90 0/0/5 1100 ss_5genes1 23 23 0 0/0/0 5/5 600 1/0/– 2.4E+8 ss_5genes2 300.3 211.0 183.0 5/0/– 2/5b 2100 ss_5genes3 23 23 0 0/0/0 5/5 380 ss_5genes4 23 23 0 0/0/0 5/5 640 37/0/– 40 000 ss_5genes5 23 23.00 1.14E−3 0/0/0.02 5/5 400 4/2/– 7200 ss_5genes6 23 23 0 0/0/0 5/5 130 0/0/50 56 000 ss_5genes7 23 23 0 0/0/0 5/5 190 0/0/0.2 15 000 ss_5genes8 28 28 0 0/0/0 5/5 440 0/0/5 26 000 0/0/2.5 7000 ss_15genes1 62 62 0 0/0/0 5/5 1500 ss_15genes2 2010 1783 1478 3/4/– 2/5b 18 000 ss_30genes1 128 128 0 0/0/0 5/5 7900 ss_30genes2 4098 3628 2993 6/7/– 1/3b 94 000 242/10/– 6.2E+6 ss_30genes3 128 128 0 0/0/0 3/3 18 000 0/0/25 8.6E+6 cytokine1 25.92 17.92 1/5c 11 – – cytokine2 42.17 32.17 1/5c 23 ss_ethanolferm1 127.4 110.4 1/5c 190 err>127.4d – ss_ethanolferm2 1308 1292 1/5c 360 ss_sosrepair1 2642 2611 1/5c 510 err>2642d 2.7E+5 ss_sosrepair2 2823 2789 1/5c 470 ss_cadBA1 750.6 726.6 1/5c 250 err>750.6d >540 ss_cadBA2 709.1 687.1 1/5c 260 ss_clock1 928.4 803.4 1/5c 360 – 15 000 ss_clock2 814.5 649.5 1/5c 440 aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed. b The majority of runs have an error below the error of the source model, but differ slightly in structure. cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions. dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution. The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs. For problems with noisy or otherwise insufficient data, an optimal solution has slightly different parameter values and sometimes also different structure than the source model. To clarify this for our solutions, Table 2 gives the error of the source model from which data was simulated. We can see that for all problems based on simulated data the found solutions have approximately equal or lower error than that of the source model. We also give the number of FP and FN interactions, to indicate if the found structure is the same as the source model. If the structure is the same as for the source model, we also give the largest deviation (LD) in percent for any parameter compared with the source model. For the problems based on real data, no source model is available, and no such comparisons can be made. For our algorithm, as well as for most algorithms in the literature, the results may differ between runs depending on a random seed. With our approach, the main difference between runs is that sometimes the best structure is not found due to the heuristic nature of the search algorithm. However, for two solutions with the same structure, the difference in the parameters is negligible for all tested problems. To indicate the stability of the algorithm, we therefore report the proportion of runs where the best structure was found. Note, however, the fact that solutions are stable does not generally imply that they are optimal. Computation times are reported as the average computation time for runs with different random seeds, scaled to a 1 GHz processor (the actual runs were performed on a Pentium IV, 2.13 GHz). The times can be improved by adjusting the method parameters for individual problems, but we report the computation time using our standard settings, see the Supplementary Material. For the benchmark problems that have been adapted from problems in the literature, the final columns report available results for these original problems. In addition to the numerical details, Table 2 allows us to draw the following qualitative conclusions about the benchmark problems. Every problem with data from a known source model is well formulated in the sense that the solution model is similar to the source model, in terms of the FP/FN/LD values. Every solution model is at least as similar to the source model as any previously reported solution. This confirms that the translation to our standard format works well, and also that our solutions are indeed the best known solutions to the benchmark problems. Every problem can be solved within hours on an ordinary computer with our approach, so there is no need for supercomputing. This was not previously established since some previous results for the original problems required running times of several months if run on a single processor.
title	3.1 Best known solutions
p	In Table 2 we report the best known solutions for all benchmark problems, and we will now describe the detailed information in the table. The solutions have been found using our own identification algorithm (Gennemark and Wedelin, 2007). All solution models as well as source models are available in SBML on our web site. Table 2. Best known solutions to the benchmark problems Problem Error of source model −L+λK Best known solutiona Solution of original problem (if available) Error −L+λK Residual −L Structure FP/FN/LD Stability between runs Average time (s) (1 GHz) Structure FP/FN/LD Time (s) (1 GHz) simpleLin1 8 8 0 0/0/0 5/5 17 simpleLin2 187.4 140.2 132.2 0/0/– 5/5 86 simpleFb1 7 7 0 0/0/0 5/5 14 simpleFb2 45.46 42.14 34.14 1/1/– 3/5 36 simpleFb3 7 7 0 0/0/0 3/5 41 simpleFb4 18.39 7.071 0.0713 0/0/– 2/5 47 0/0/– – osc1 6 6.149 0.1491 0/0/0 5/5 18 osc2 122.1 87.34 57.34 0/0/0 4/5 24 – – metabol1 30 30 0 0/0/0 3/3 5400 metabol2 1214 751.6 601.6 0/0/– 2/3 11 000 metabol3 1442 861.1 696.1 2/1/– 1/5b 13 000 3genes1 39 39 0 0/0/0 3/3 35 000 ss_cascade1 14 14 0 0/0/0 5/5 180 0/0/300 8900 ss_cascade2 14 14 0 0/0/0 4/5 130 ss_cascade3 498.9 476.7 462.7 1/1/– 1/5b 860 ss_branch1 17 17 0 0/0/0 5/5 68 0/0/15 2200 ss_branch2 17 17 0 0/0/0 5/5 110 0/0/5 35 000 ss_branch3 17 17 0 0/0/0 5/5 150 0/0/2 15 000 ss_branch4 17 17 0 0/0/0 5/5 71 0/0/0 25 000 ss_branch5 211.3 142.1 122.1 4/1/– 3/5 300 – – ss_branch6 18 18 0 0/0/0 5/5 90 0/0/5 1100 ss_5genes1 23 23 0 0/0/0 5/5 600 1/0/– 2.4E+8 ss_5genes2 300.3 211.0 183.0 5/0/– 2/5b 2100 ss_5genes3 23 23 0 0/0/0 5/5 380 ss_5genes4 23 23 0 0/0/0 5/5 640 37/0/– 40 000 ss_5genes5 23 23.00 1.14E−3 0/0/0.02 5/5 400 4/2/– 7200 ss_5genes6 23 23 0 0/0/0 5/5 130 0/0/50 56 000 ss_5genes7 23 23 0 0/0/0 5/5 190 0/0/0.2 15 000 ss_5genes8 28 28 0 0/0/0 5/5 440 0/0/5 26 000 0/0/2.5 7000 ss_15genes1 62 62 0 0/0/0 5/5 1500 ss_15genes2 2010 1783 1478 3/4/– 2/5b 18 000 ss_30genes1 128 128 0 0/0/0 5/5 7900 ss_30genes2 4098 3628 2993 6/7/– 1/3b 94 000 242/10/– 6.2E+6 ss_30genes3 128 128 0 0/0/0 3/3 18 000 0/0/25 8.6E+6 cytokine1 25.92 17.92 1/5c 11 – – cytokine2 42.17 32.17 1/5c 23 ss_ethanolferm1 127.4 110.4 1/5c 190 err>127.4d – ss_ethanolferm2 1308 1292 1/5c 360 ss_sosrepair1 2642 2611 1/5c 510 err>2642d 2.7E+5 ss_sosrepair2 2823 2789 1/5c 470 ss_cadBA1 750.6 726.6 1/5c 250 err>750.6d >540 ss_cadBA2 709.1 687.1 1/5c 260 ss_clock1 928.4 803.4 1/5c 360 – 15 000 ss_clock2 814.5 649.5 1/5c 440 aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed. b The majority of runs have an error below the error of the source model, but differ slightly in structure. cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions. dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution. The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs.
table-wrap	Table 2. Best known solutions to the benchmark problems Problem Error of source model −L+λK Best known solutiona Solution of original problem (if available) Error −L+λK Residual −L Structure FP/FN/LD Stability between runs Average time (s) (1 GHz) Structure FP/FN/LD Time (s) (1 GHz) simpleLin1 8 8 0 0/0/0 5/5 17 simpleLin2 187.4 140.2 132.2 0/0/– 5/5 86 simpleFb1 7 7 0 0/0/0 5/5 14 simpleFb2 45.46 42.14 34.14 1/1/– 3/5 36 simpleFb3 7 7 0 0/0/0 3/5 41 simpleFb4 18.39 7.071 0.0713 0/0/– 2/5 47 0/0/– – osc1 6 6.149 0.1491 0/0/0 5/5 18 osc2 122.1 87.34 57.34 0/0/0 4/5 24 – – metabol1 30 30 0 0/0/0 3/3 5400 metabol2 1214 751.6 601.6 0/0/– 2/3 11 000 metabol3 1442 861.1 696.1 2/1/– 1/5b 13 000 3genes1 39 39 0 0/0/0 3/3 35 000 ss_cascade1 14 14 0 0/0/0 5/5 180 0/0/300 8900 ss_cascade2 14 14 0 0/0/0 4/5 130 ss_cascade3 498.9 476.7 462.7 1/1/– 1/5b 860 ss_branch1 17 17 0 0/0/0 5/5 68 0/0/15 2200 ss_branch2 17 17 0 0/0/0 5/5 110 0/0/5 35 000 ss_branch3 17 17 0 0/0/0 5/5 150 0/0/2 15 000 ss_branch4 17 17 0 0/0/0 5/5 71 0/0/0 25 000 ss_branch5 211.3 142.1 122.1 4/1/– 3/5 300 – – ss_branch6 18 18 0 0/0/0 5/5 90 0/0/5 1100 ss_5genes1 23 23 0 0/0/0 5/5 600 1/0/– 2.4E+8 ss_5genes2 300.3 211.0 183.0 5/0/– 2/5b 2100 ss_5genes3 23 23 0 0/0/0 5/5 380 ss_5genes4 23 23 0 0/0/0 5/5 640 37/0/– 40 000 ss_5genes5 23 23.00 1.14E−3 0/0/0.02 5/5 400 4/2/– 7200 ss_5genes6 23 23 0 0/0/0 5/5 130 0/0/50 56 000 ss_5genes7 23 23 0 0/0/0 5/5 190 0/0/0.2 15 000 ss_5genes8 28 28 0 0/0/0 5/5 440 0/0/5 26 000 0/0/2.5 7000 ss_15genes1 62 62 0 0/0/0 5/5 1500 ss_15genes2 2010 1783 1478 3/4/– 2/5b 18 000 ss_30genes1 128 128 0 0/0/0 5/5 7900 ss_30genes2 4098 3628 2993 6/7/– 1/3b 94 000 242/10/– 6.2E+6 ss_30genes3 128 128 0 0/0/0 3/3 18 000 0/0/25 8.6E+6 cytokine1 25.92 17.92 1/5c 11 – – cytokine2 42.17 32.17 1/5c 23 ss_ethanolferm1 127.4 110.4 1/5c 190 err>127.4d – ss_ethanolferm2 1308 1292 1/5c 360 ss_sosrepair1 2642 2611 1/5c 510 err>2642d 2.7E+5 ss_sosrepair2 2823 2789 1/5c 470 ss_cadBA1 750.6 726.6 1/5c 250 err>750.6d >540 ss_cadBA2 709.1 687.1 1/5c 260 ss_clock1 928.4 803.4 1/5c 360 – 15 000 ss_clock2 814.5 649.5 1/5c 440 aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed. b The majority of runs have an error below the error of the source model, but differ slightly in structure. cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions. dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution. The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs.
label	Table 2.
caption	Best known solutions to the benchmark problems
p	Best known solutions to the benchmark problems
table	Problem Error of source model −L+λK Best known solutiona Solution of original problem (if available) Error −L+λK Residual −L Structure FP/FN/LD Stability between runs Average time (s) (1 GHz) Structure FP/FN/LD Time (s) (1 GHz) simpleLin1 8 8 0 0/0/0 5/5 17 simpleLin2 187.4 140.2 132.2 0/0/– 5/5 86 simpleFb1 7 7 0 0/0/0 5/5 14 simpleFb2 45.46 42.14 34.14 1/1/– 3/5 36 simpleFb3 7 7 0 0/0/0 3/5 41 simpleFb4 18.39 7.071 0.0713 0/0/– 2/5 47 0/0/– – osc1 6 6.149 0.1491 0/0/0 5/5 18 osc2 122.1 87.34 57.34 0/0/0 4/5 24 – – metabol1 30 30 0 0/0/0 3/3 5400 metabol2 1214 751.6 601.6 0/0/– 2/3 11 000 metabol3 1442 861.1 696.1 2/1/– 1/5b 13 000 3genes1 39 39 0 0/0/0 3/3 35 000 ss_cascade1 14 14 0 0/0/0 5/5 180 0/0/300 8900 ss_cascade2 14 14 0 0/0/0 4/5 130 ss_cascade3 498.9 476.7 462.7 1/1/– 1/5b 860 ss_branch1 17 17 0 0/0/0 5/5 68 0/0/15 2200 ss_branch2 17 17 0 0/0/0 5/5 110 0/0/5 35 000 ss_branch3 17 17 0 0/0/0 5/5 150 0/0/2 15 000 ss_branch4 17 17 0 0/0/0 5/5 71 0/0/0 25 000 ss_branch5 211.3 142.1 122.1 4/1/– 3/5 300 – – ss_branch6 18 18 0 0/0/0 5/5 90 0/0/5 1100 ss_5genes1 23 23 0 0/0/0 5/5 600 1/0/– 2.4E+8 ss_5genes2 300.3 211.0 183.0 5/0/– 2/5b 2100 ss_5genes3 23 23 0 0/0/0 5/5 380 ss_5genes4 23 23 0 0/0/0 5/5 640 37/0/– 40 000 ss_5genes5 23 23.00 1.14E−3 0/0/0.02 5/5 400 4/2/– 7200 ss_5genes6 23 23 0 0/0/0 5/5 130 0/0/50 56 000 ss_5genes7 23 23 0 0/0/0 5/5 190 0/0/0.2 15 000 ss_5genes8 28 28 0 0/0/0 5/5 440 0/0/5 26 000 0/0/2.5 7000 ss_15genes1 62 62 0 0/0/0 5/5 1500 ss_15genes2 2010 1783 1478 3/4/– 2/5b 18 000 ss_30genes1 128 128 0 0/0/0 5/5 7900 ss_30genes2 4098 3628 2993 6/7/– 1/3b 94 000 242/10/– 6.2E+6 ss_30genes3 128 128 0 0/0/0 3/3 18 000 0/0/25 8.6E+6 cytokine1 25.92 17.92 1/5c 11 – – cytokine2 42.17 32.17 1/5c 23 ss_ethanolferm1 127.4 110.4 1/5c 190 err>127.4d – ss_ethanolferm2 1308 1292 1/5c 360 ss_sosrepair1 2642 2611 1/5c 510 err>2642d 2.7E+5 ss_sosrepair2 2823 2789 1/5c 470 ss_cadBA1 750.6 726.6 1/5c 250 err>750.6d >540 ss_cadBA2 709.1 687.1 1/5c 260 ss_clock1 928.4 803.4 1/5c 360 – 15 000 ss_clock2 814.5 649.5 1/5c 440
tr	Problem Error of source model −L+λK Best known solutiona Solution of original problem (if available)
th	Problem
th	Error of source model −L+λK
th	Best known solutiona
th	Solution of original problem (if available)
tr	Error −L+λK Residual −L Structure FP/FN/LD Stability between runs Average time (s) (1 GHz) Structure FP/FN/LD Time (s) (1 GHz)
th	Error −L+λK
th	Residual −L
th	Structure FP/FN/LD
th	Stability between runs
th	Average time (s) (1 GHz)
th	Structure FP/FN/LD
th	Time (s) (1 GHz)
tr	simpleLin1 8 8 0 0/0/0 5/5 17
td	simpleLin1
td	8
td	8
td	0
td	0/0/0
td	5/5
td	17
tr	simpleLin2 187.4 140.2 132.2 0/0/– 5/5 86
td	simpleLin2
td	187.4
td	140.2
td	132.2
td	0/0/–
td	5/5
td	86
tr	simpleFb1 7 7 0 0/0/0 5/5 14
td	simpleFb1
td	7
td	7
td	0
td	0/0/0
td	5/5
td	14
tr	simpleFb2 45.46 42.14 34.14 1/1/– 3/5 36
td	simpleFb2
td	45.46
td	42.14
td	34.14
td	1/1/–
td	3/5
td	36
tr	simpleFb3 7 7 0 0/0/0 3/5 41
td	simpleFb3
td	7
td	7
td	0
td	0/0/0
td	3/5
td	41
tr	simpleFb4 18.39 7.071 0.0713 0/0/– 2/5 47 0/0/– –
td	simpleFb4
td	18.39
td	7.071
td	0.0713
td	0/0/–
td	2/5
td	47
td	0/0/–
td	–
tr	osc1 6 6.149 0.1491 0/0/0 5/5 18
td	osc1
td	6
td	6.149
td	0.1491
td	0/0/0
td	5/5
td	18
tr	osc2 122.1 87.34 57.34 0/0/0 4/5 24 – –
td	osc2
td	122.1
td	87.34
td	57.34
td	0/0/0
td	4/5
td	24
td	–
td	–
tr	metabol1 30 30 0 0/0/0 3/3 5400
td	metabol1
td	30
td	30
td	0
td	0/0/0
td	3/3
td	5400
tr	metabol2 1214 751.6 601.6 0/0/– 2/3 11 000
td	metabol2
td	1214
td	751.6
td	601.6
td	0/0/–
td	2/3
td	11 000
tr	metabol3 1442 861.1 696.1 2/1/– 1/5b 13 000
td	metabol3
td	1442
td	861.1
td	696.1
td	2/1/–
td	1/5b
td	13 000
tr	3genes1 39 39 0 0/0/0 3/3 35 000
td	3genes1
td	39
td	39
td	0
td	0/0/0
td	3/3
td	35 000
tr	ss_cascade1 14 14 0 0/0/0 5/5 180 0/0/300 8900
td	ss_cascade1
td	14
td	14
td	0
td	0/0/0
td	5/5
td	180
td	0/0/300
td	8900
tr	ss_cascade2 14 14 0 0/0/0 4/5 130
td	ss_cascade2
td	14
td	14
td	0
td	0/0/0
td	4/5
td	130
tr	ss_cascade3 498.9 476.7 462.7 1/1/– 1/5b 860
td	ss_cascade3
td	498.9
td	476.7
td	462.7
td	1/1/–
td	1/5b
td	860
tr	ss_branch1 17 17 0 0/0/0 5/5 68 0/0/15 2200
td	ss_branch1
td	17
td	17
td	0
td	0/0/0
td	5/5
td	68
td	0/0/15
td	2200
tr	ss_branch2 17 17 0 0/0/0 5/5 110 0/0/5 35 000
td	ss_branch2
td	17
td	17
td	0
td	0/0/0
td	5/5
td	110
td	0/0/5
td	35 000
tr	ss_branch3 17 17 0 0/0/0 5/5 150 0/0/2 15 000
td	ss_branch3
td	17
td	17
td	0
td	0/0/0
td	5/5
td	150
td	0/0/2
td	15 000
tr	ss_branch4 17 17 0 0/0/0 5/5 71 0/0/0 25 000
td	ss_branch4
td	17
td	17
td	0
td	0/0/0
td	5/5
td	71
td	0/0/0
td	25 000
tr	ss_branch5 211.3 142.1 122.1 4/1/– 3/5 300 – –
td	ss_branch5
td	211.3
td	142.1
td	122.1
td	4/1/–
td	3/5
td	300
td	–
td	–
tr	ss_branch6 18 18 0 0/0/0 5/5 90 0/0/5 1100
td	ss_branch6
td	18
td	18
td	0
td	0/0/0
td	5/5
td	90
td	0/0/5
td	1100
tr	ss_5genes1 23 23 0 0/0/0 5/5 600 1/0/– 2.4E+8
td	ss_5genes1
td	23
td	23
td	0
td	0/0/0
td	5/5
td	600
td	1/0/–
td	2.4E+8
tr	ss_5genes2 300.3 211.0 183.0 5/0/– 2/5b 2100
td	ss_5genes2
td	300.3
td	211.0
td	183.0
td	5/0/–
td	2/5b
td	2100
tr	ss_5genes3 23 23 0 0/0/0 5/5 380
td	ss_5genes3
td	23
td	23
td	0
td	0/0/0
td	5/5
td	380
tr	ss_5genes4 23 23 0 0/0/0 5/5 640 37/0/– 40 000
td	ss_5genes4
td	23
td	23
td	0
td	0/0/0
td	5/5
td	640
td	37/0/–
td	40 000
tr	ss_5genes5 23 23.00 1.14E−3 0/0/0.02 5/5 400 4/2/– 7200
td	ss_5genes5
td	23
td	23.00
td	1.14E−3
td	0/0/0.02
td	5/5
td	400
td	4/2/–
td	7200
tr	ss_5genes6 23 23 0 0/0/0 5/5 130 0/0/50 56 000
td	ss_5genes6
td	23
td	23
td	0
td	0/0/0
td	5/5
td	130
td	0/0/50
td	56 000
tr	ss_5genes7 23 23 0 0/0/0 5/5 190 0/0/0.2 15 000
td	ss_5genes7
td	23
td	23
td	0
td	0/0/0
td	5/5
td	190
td	0/0/0.2
td	15 000
tr	ss_5genes8 28 28 0 0/0/0 5/5 440 0/0/5 26 000
td	ss_5genes8
td	28
td	28
td	0
td	0/0/0
td	5/5
td	440
td	0/0/5
td	26 000
tr	0/0/2.5 7000
td	0/0/2.5
td	7000
tr	ss_15genes1 62 62 0 0/0/0 5/5 1500
td	ss_15genes1
td	62
td	62
td	0
td	0/0/0
td	5/5
td	1500
tr	ss_15genes2 2010 1783 1478 3/4/– 2/5b 18 000
td	ss_15genes2
td	2010
td	1783
td	1478
td	3/4/–
td	2/5b
td	18 000
tr	ss_30genes1 128 128 0 0/0/0 5/5 7900
td	ss_30genes1
td	128
td	128
td	0
td	0/0/0
td	5/5
td	7900
tr	ss_30genes2 4098 3628 2993 6/7/– 1/3b 94 000 242/10/– 6.2E+6
td	ss_30genes2
td	4098
td	3628
td	2993
td	6/7/–
td	1/3b
td	94 000
td	242/10/–
td	6.2E+6
tr	ss_30genes3 128 128 0 0/0/0 3/3 18 000 0/0/25 8.6E+6
td	ss_30genes3
td	128
td	128
td	0
td	0/0/0
td	3/3
td	18 000
td	0/0/25
td	8.6E+6
tr	cytokine1 25.92 17.92 1/5c 11 – –
td	cytokine1
td	25.92
td	17.92
td	1/5c
td	11
td	–
td	–
tr	cytokine2 42.17 32.17 1/5c 23
td	cytokine2
td	42.17
td	32.17
td	1/5c
td	23
tr	ss_ethanolferm1 127.4 110.4 1/5c 190 err>127.4d –
td	ss_ethanolferm1
td	127.4
td	110.4
td	1/5c
td	190
td	err>127.4d
td	–
tr	ss_ethanolferm2 1308 1292 1/5c 360
td	ss_ethanolferm2
td	1308
td	1292
td	1/5c
td	360
tr	ss_sosrepair1 2642 2611 1/5c 510 err>2642d 2.7E+5
td	ss_sosrepair1
td	2642
td	2611
td	1/5c
td	510
td	err>2642d
td	2.7E+5
tr	ss_sosrepair2 2823 2789 1/5c 470
td	ss_sosrepair2
td	2823
td	2789
td	1/5c
td	470
tr	ss_cadBA1 750.6 726.6 1/5c 250 err>750.6d >540
td	ss_cadBA1
td	750.6
td	726.6
td	1/5c
td	250
td	err>750.6d
td	>540
tr	ss_cadBA2 709.1 687.1 1/5c 260
td	ss_cadBA2
td	709.1
td	687.1
td	1/5c
td	260
tr	ss_clock1 928.4 803.4 1/5c 360 – 15 000
td	ss_clock1
td	928.4
td	803.4
td	1/5c
td	360
td	–
td	15 000
tr	ss_clock2 814.5 649.5 1/5c 440
td	ss_clock2
td	814.5
td	649.5
td	1/5c
td	440
table-wrap-foot	aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed. b The majority of runs have an error below the error of the source model, but differ slightly in structure. cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions. dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution. The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs.
footnote	aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed.
p	aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed.
footnote	b The majority of runs have an error below the error of the source model, but differ slightly in structure.
p	b The majority of runs have an error below the error of the source model, but differ slightly in structure.
footnote	cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions.
p	cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions.
footnote	dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution.
p	dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution.
footnote	The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs.
p	The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs.
p	For problems with noisy or otherwise insufficient data, an optimal solution has slightly different parameter values and sometimes also different structure than the source model. To clarify this for our solutions, Table 2 gives the error of the source model from which data was simulated. We can see that for all problems based on simulated data the found solutions have approximately equal or lower error than that of the source model. We also give the number of FP and FN interactions, to indicate if the found structure is the same as the source model. If the structure is the same as for the source model, we also give the largest deviation (LD) in percent for any parameter compared with the source model. For the problems based on real data, no source model is available, and no such comparisons can be made.
p	For our algorithm, as well as for most algorithms in the literature, the results may differ between runs depending on a random seed. With our approach, the main difference between runs is that sometimes the best structure is not found due to the heuristic nature of the search algorithm. However, for two solutions with the same structure, the difference in the parameters is negligible for all tested problems. To indicate the stability of the algorithm, we therefore report the proportion of runs where the best structure was found. Note, however, the fact that solutions are stable does not generally imply that they are optimal.
p	Computation times are reported as the average computation time for runs with different random seeds, scaled to a 1 GHz processor (the actual runs were performed on a Pentium IV, 2.13 GHz). The times can be improved by adjusting the method parameters for individual problems, but we report the computation time using our standard settings, see the Supplementary Material.
p	For the benchmark problems that have been adapted from problems in the literature, the final columns report available results for these original problems.
p	In addition to the numerical details, Table 2 allows us to draw the following qualitative conclusions about the benchmark problems. Every problem with data from a known source model is well formulated in the sense that the solution model is similar to the source model, in terms of the FP/FN/LD values. Every solution model is at least as similar to the source model as any previously reported solution. This confirms that the translation to our standard format works well, and also that our solutions are indeed the best known solutions to the benchmark problems. Every problem can be solved within hours on an ordinary computer with our approach, so there is no need for supercomputing. This was not previously established since some previous results for the original problems required running times of several months if run on a single processor.
p	Every problem with data from a known source model is well formulated in the sense that the solution model is similar to the source model, in terms of the FP/FN/LD values.
p	Every solution model is at least as similar to the source model as any previously reported solution. This confirms that the translation to our standard format works well, and also that our solutions are indeed the best known solutions to the benchmark problems.
p	Every problem can be solved within hours on an ordinary computer with our approach, so there is no need for supercomputing. This was not previously established since some previous results for the original problems required running times of several months if run on a single processor.
sec	4 CONCLUSIONS We have proposed a way to unambiguously specify ODE identification problems as mathematical optimization problems, and have also defined a file format making it easy to document and exchange such problems. The benchmark problems include translations of problems that have been frequently used in the development of algorithms, as well as new problems based on chemical rate equations. The problems, their best solutions and the source models from which problem data was simulated are all available on our web site. This makes it easy for everyone to evaluate and compare algorithms in a reproducible way. To maintain a useful collection of problems of various types, we intend to extend the collection, and are interested in contributions and feedback from others, see the web site. Finally, our own solutions show that all the benchmark problems can be solved in reasonable time, in contrast to results for some of the original problems where supercomputing was required.
title	4 CONCLUSIONS
p	We have proposed a way to unambiguously specify ODE identification problems as mathematical optimization problems, and have also defined a file format making it easy to document and exchange such problems.
p	The benchmark problems include translations of problems that have been frequently used in the development of algorithms, as well as new problems based on chemical rate equations. The problems, their best solutions and the source models from which problem data was simulated are all available on our web site. This makes it easy for everyone to evaluate and compare algorithms in a reproducible way. To maintain a useful collection of problems of various types, we intend to extend the collection, and are interested in contributions and feedback from others, see the web site.
p	Finally, our own solutions show that all the benchmark problems can be solved in reasonable time, in contrast to results for some of the original problems where supercomputing was required.
sec	Supplementary Material [Supplementary Data]
title	Supplementary Material
caption	[Supplementary Data]
title	[Supplementary Data]
back	ACKNOWLEDGEMENTS Thanks to Sven Nelander (MSKCC) and the reviewers for valuable comments. Funding: Chalmers Bioscience Program and Göteborg Mathematical Modelling Centre. Conflict of Interest: none declared.
ack	ACKNOWLEDGEMENTS Thanks to Sven Nelander (MSKCC) and the reviewers for valuable comments. Funding: Chalmers Bioscience Program and Göteborg Mathematical Modelling Centre. Conflict of Interest: none declared.
title	ACKNOWLEDGEMENTS
p	Thanks to Sven Nelander (MSKCC) and the reviewers for valuable comments.
p	Funding: Chalmers Bioscience Program and Göteborg Mathematical Modelling Centre.
p	Conflict of Interest: none declared.

projects that have annotations to this span

There is no project

TAB JSON ListView MergeView

PMC:2654804 / 23267-23271 JSONTXT

Document structure show

projects that have annotations to this span

PMC:2654804 / 23267-23271 JSON TXT