3 THE COLLECTION OF BENCHMARK PROBLEMS
An overview of the collection of benchmark problems is presented in Table 1. The problems have different sources, and in most cases several different problems are given for each system.
The number of variables in combination with the amount of data and noise level, give a rough indication of the difficulty of each problem. The first problem for each system generally has a lower level of difficulty than the subsequent problems, and is hence a good starting point when evaluating new methods.
A part of the collection is based on identification problems considered in the literature, which is dominated by problems based on S-systems, probably because these are relatively easy to represent. By including translations of these problems that were previously not easily accessible, the collection has an immediate value for those actively developing algorithms in this area. However, we balance the S-system dominance by also defining new problems based on traditional chemical reactions that are more frequently used in the biological modelling community. We note that research in this area is in an early stage, and in our experience identification algorithms for S-systems can relatively easily be modified for other kinds of models. Our proposed way of specifying identification problems can then facilitate and contribute to a more widespread use of problems based on other kinds of models.
For problems that have been collected and translated from another publication, we have taken care to match the original problem as closely as available information and the standardization to our formalism allow, see the Supplementary Material for details. We have replaced some non-standard objective functions with our simple function, and relaxed a few unusual constraints. Noisy data have been simulated from random distributions, and in one case we have also reduced an excessive and unrealistic number of data-points in the original problem. However, the solutions in the next section confirm that the changes have not negatively affected the quality of the solutions when compared directly with the source models. We note that our problems never have more data or a smaller model space, so they are in this sense at least as difficult as the original problems.

3.1 Best known solutions
In Table 2 we report the best known solutions for all benchmark problems, and we will now describe the detailed information in the table. The solutions have been found using our own identification algorithm (Gennemark and Wedelin, 2007). All solution models as well as source models are available in SBML on our web site.
Table 2. Best known solutions to the benchmark problems
Problem Error of source model −L+λK Best known solutiona Solution of original problem (if available)
Error −L+λK Residual −L Structure FP/FN/LD Stability between runs Average time (s) (1 GHz) Structure FP/FN/LD Time (s) (1 GHz)
simpleLin1 8 8 0 0/0/0 5/5 17
simpleLin2 187.4 140.2 132.2 0/0/– 5/5 86
simpleFb1 7 7 0 0/0/0 5/5 14
simpleFb2 45.46 42.14 34.14 1/1/– 3/5 36
simpleFb3 7 7 0 0/0/0 3/5 41
simpleFb4 18.39 7.071 0.0713 0/0/– 2/5 47 0/0/– –
osc1 6 6.149 0.1491 0/0/0 5/5 18
osc2 122.1 87.34 57.34 0/0/0 4/5 24 – –
metabol1 30 30 0 0/0/0 3/3 5400
metabol2 1214 751.6 601.6 0/0/– 2/3 11 000
metabol3 1442 861.1 696.1 2/1/– 1/5b 13 000
3genes1 39 39 0 0/0/0 3/3 35 000
ss_cascade1 14 14 0 0/0/0 5/5 180 0/0/300 8900
ss_cascade2 14 14 0 0/0/0 4/5 130
ss_cascade3 498.9 476.7 462.7 1/1/– 1/5b 860
ss_branch1 17 17 0 0/0/0 5/5 68 0/0/15 2200
ss_branch2 17 17 0 0/0/0 5/5 110 0/0/5 35 000
ss_branch3 17 17 0 0/0/0 5/5 150 0/0/2 15 000
ss_branch4 17 17 0 0/0/0 5/5 71 0/0/0 25 000
ss_branch5 211.3 142.1 122.1 4/1/– 3/5 300 – –
ss_branch6 18 18 0 0/0/0 5/5 90 0/0/5 1100
ss_5genes1 23 23 0 0/0/0 5/5 600 1/0/– 2.4E+8
ss_5genes2 300.3 211.0 183.0 5/0/– 2/5b 2100
ss_5genes3 23 23 0 0/0/0 5/5 380
ss_5genes4 23 23 0 0/0/0 5/5 640 37/0/– 40 000
ss_5genes5 23 23.00 1.14E−3 0/0/0.02 5/5 400 4/2/– 7200
ss_5genes6 23 23 0 0/0/0 5/5 130 0/0/50 56 000
ss_5genes7 23 23 0 0/0/0 5/5 190 0/0/0.2 15 000
ss_5genes8 28 28 0 0/0/0 5/5 440 0/0/5 26 000
0/0/2.5 7000
ss_15genes1 62 62 0 0/0/0 5/5 1500
ss_15genes2 2010 1783 1478 3/4/– 2/5b 18 000
ss_30genes1 128 128 0 0/0/0 5/5 7900
ss_30genes2 4098 3628 2993 6/7/– 1/3b 94 000 242/10/– 6.2E+6
ss_30genes3 128 128 0 0/0/0 3/3 18 000 0/0/25 8.6E+6
cytokine1 25.92 17.92 1/5c 11 – –
cytokine2 42.17 32.17 1/5c 23
ss_ethanolferm1 127.4 110.4 1/5c 190 err>127.4d –
ss_ethanolferm2 1308 1292 1/5c 360
ss_sosrepair1 2642 2611 1/5c 510 err>2642d 2.7E+5
ss_sosrepair2 2823 2789 1/5c 470
ss_cadBA1 750.6 726.6 1/5c 250 err>750.6d >540
ss_cadBA2 709.1 687.1 1/5c 260
ss_clock1 928.4 803.4 1/5c 360 – 15 000
ss_clock2 814.5 649.5 1/5c 440
aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed.
b The majority of runs have an error below the error of the source model, but differ slightly in structure.
cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions.
dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution.
The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs.
For problems with noisy or otherwise insufficient data, an optimal solution has slightly different parameter values and sometimes also different structure than the source model. To clarify this for our solutions, Table 2 gives the error of the source model from which data was simulated. We can see that for all problems based on simulated data the found solutions have approximately equal or lower error than that of the source model. We also give the number of FP and FN interactions, to indicate if the found structure is the same as the source model. If the structure is the same as for the source model, we also give the largest deviation (LD) in percent for any parameter compared with the source model. For the problems based on real data, no source model is available, and no such comparisons can be made.
For our algorithm, as well as for most algorithms in the literature, the results may differ between runs depending on a random seed. With our approach, the main difference between runs is that sometimes the best structure is not found due to the heuristic nature of the search algorithm. However, for two solutions with the same structure, the difference in the parameters is negligible for all tested problems. To indicate the stability of the algorithm, we therefore report the proportion of runs where the best structure was found. Note, however, the fact that solutions are stable does not generally imply that they are optimal.
Computation times are reported as the average computation time for runs with different random seeds, scaled to a 1 GHz processor (the actual runs were performed on a Pentium IV, 2.13 GHz). The times can be improved by adjusting the method parameters for individual problems, but we report the computation time using our standard settings, see the Supplementary Material.
For the benchmark problems that have been adapted from problems in the literature, the final columns report available results for these original problems.
In addition to the numerical details, Table 2 allows us to draw the following qualitative conclusions about the benchmark problems. Every problem with data from a known source model is well formulated in the sense that the solution model is similar to the source model, in terms of the FP/FN/LD values.
Every solution model is at least as similar to the source model as any previously reported solution. This confirms that the translation to our standard format works well, and also that our solutions are indeed the best known solutions to the benchmark problems.
Every problem can be solved within hours on an ordinary computer with our approach, so there is no need for supercomputing. This was not previously established since some previous results for the original problems required running times of several months if run on a single processor.