In our application of multiple linear regression, Y is the performance of the tools for a dataset, which is measured by the highest nucleotide-level correlation coefficient score nCC (see [9]) among all the tools. The reason for using the highest score is to smooth the disadvantages of each individual tool. The predictor variables are a set of features of a dataset which we think may be possible factors. These features include: