Comparison with existing prediction methods Table 3 shows the predictive results of SVM-PSSM-DT and four other state-of-the-art methods on the benchmark dataset through jackknife validation, including DNAbinder(dimension 21) [21], DNAbinder(dimension 400) [21], DNA-Port [74] and iDNA-Prot [16]. DNAbinder(dimension 21) and DNAbinder(dimension 400) encode features from their PSSM based evolutionary information and utilize SVM to build prediction model. iDNA-Prot applies grey model to integrate the features from protein sequence into the general form of pseudo amino acid composition and then inputs into a Random Forest classifier. DNA-Prot is a Random Forest classifier based on the amino acid composition, predicted second structure and some physicochemical properties. The ROC curves of the proposed method and the four predictive methods are shown in Figure 4. Table 3 Results on benchmark dataset of different predictors through jackknife validation. metric ACC (%) MCC SN (%) SP (%) AUC(%) DNAbinder(dimension 21)a 73.95 0.480 68.57 79.09 81.40 DNAbinder(dimension 400)b 73.58 0.470 66.47 80.36 81.50 DNA-Protc 72.55 0.440 82.67 59.76 78.90 iDNA-Protd 75.40 0.500 83.81 64.73 76.10 PSSM-DTf 79.96 0.622 81.91 78.00 86.50 The four methods in the front of the table are four state-of-the-art predicting methods for identification of DNA-binding proteins proposed in the past and were demonstrated to have good performance. The results of the four existing methods and SVM-PSSM-DT were got by testing on benchmark dataset through jackknife validation. aresults obtained by in-house implementation of DNAbinder [21] bresults obtained by in-house implementation of DNAbinder [21] cresults obtained by in-house implementation of DNA-Prot [74] dresults obtained by in-house implementation of iDNA-Prot [16] fresults obtained by using PSSM-DT as protein representation Figure 4 The ROC curves of several predictive methods on benchmark dataset. The receiver operating characteristic (ROC) curves of SVM-PSSM-DT and several other existing DNA-binding protein predictors were got by testing the models on benchmark dataset through jackknife validation, where the horizontal coordinate X is for the false positive rate or 1-SP and the vertical coordinate Y is for the true positive rate or SN and a good method would yield a curve close to the coordinate (0,1) meaning low false positive rate and high true positive rate. From Table 3 and Figure 4 we can see that SVM-PSSM-DT achieved the best performance with ACC of 79.96%, MCC of 0.62 and AUC of 86.50%, outperforming other four methods by 4.56-7.41% in terms of ACC, 0.12-0.18 in terms of MCC and 5-10.4% in terms of AUC. It indicates that PSSM-DT can advance the prhedictive performance of DNA-binding proteins identification from PSSM based sequence information.