For systems of realistic size and with realistic amount and quality of data this is a hard problem and heuristic algorithms are therefore typically employed. Such algorithms cannot guarantee that they give the best answer to a given problem. An established way to evaluate the performance of heuristic algorithms is to try them on a sufficient number of realistic test cases. However, lack of well defined and commonly accepted benchmark problems makes it difficult to evaluate and compare different methods: Problems are often incompletely and/or ambiguously specified, and personal communication is required to reconstruct essential aspects of the problems. Very few test problems are usually considered. This serves the purpose of demonstrating feasibility, but says little about general performance. Even when problems originate from the same system, different papers typically consider slight modifications of the original problem, which make direct comparisons difficult. An example is a test problem from Kikuchi et al., 2003. Several recent publications (Cho et al., 2006; Daisuke and Horton, 2006; Gennemark and Wedelin, 2007; Kimura et al., 2005; Liu and Wang, 2008; Tsai and Wang, 2005; Tucker and Moulton, 2006) consider the same system, but usually with a more informative dataset.