Our second concern has been to find a reasonably simple standard form that can still represent a wide range of identification problems. We have then chosen the following way to structure and specify our problems. Data. Time series data for one or several experiments. In the case of several experiments, they may differ with respect to initial values of the variables and/or input functions. For each data-point, the standard deviation is also given in the problem specification.
Model space. The model space determines the allowed form of the right-hand side of the ODEs.
For models based on traditional chemical rate equations, each ODE in the model is assumed to be a sum of a number of reactions. The possible reactions must belong to a subset of predefined reaction types, where each allowed reaction type is specified by its name, a subset of possible input variables and ranges of allowed parameter values. The allowed reaction types can be specified individually for each state variable. As an example, in Figure 1B, the allowed reactions types are a unimolecular mass action reaction, a Michaelis–Menten reaction, and a simplified Hill equation. For reactions having multiple input variables, e.g. a bimolecular mass action reaction with equation k1 Xi Xj, it is implicitly assumed that i and j are not equal (to consider equality as in problem osc1 and osc2, we define an additional reaction type k1 Xi2).
The model space of an S-system is simply defined by lower and upper bounds for each element in the parameter vectors (α and β) and matrices (g and h). Sometimes, additional constraints are required. For example, three of the benchmark problems include an additional constraint of type {gi,j∈[−3, 3], gi,j≠0} (there is an interaction between variable i and j but the direction is unknown).
Finally, we also define lower and upper bounds for the initial data-point in each time series. For noisy data, these bounds were set to ±2 SDs. Hence, for noisy data there is one additional parameter for each time series, but these parameters are typically bound tighter compared with the model parameters.
Initial model. It is convenient to allow definition of an initial model, corresponding to prior knowledge of the system. The initial model is described as known reactions (terms) on the right-hand side of the ODEs. Also reactions from outside the model space can be included in the initial model.
In principle, one can also think of prior information in the form of starting points for iterative algorithms and thus not technically a part of the defined problem. No such information is assumed known in our current problems.
Error function. We have chosen to minimize (2)
The first term is the negative log-likelihood of the experimental data, and the second term is a term that penalizes structural complexity of the model. This kind of error function is common, and is related to several different proposed methods for handling model complexity (Crampin et al., 2004).
In detail, L is the log-likelihood,  denotes the experimental data, k is a vector of parameters, λ is a constant and K is the number of parameters. By assuming independent and normally distributed measurement errors and disregarding constant terms we can express the log-likelihood for one time series as (3)  where i indexes the measurement points, and where Xj,  and σj denotes simulated data, experimental data and SD for variable j, respectively. The total log-likelihood  is defined by summing over all variables and all experiments.
For models based on chemical rate equations, K is simply the total number of parameters on the right-hand side of the ODEs. For S-systems, it is natural to define K as the total number of non-zero elements in g and h plus the number of parameters in α and β.