Independent test In order to further compare the predictive performance of SVM-PSSM-DT with other existing methods, we evaluated the proposed method on the independent dataset PDB186. It was recently constructed by Lou et al [75] to validate the quality of predictions, which consists 93 DNA-binding proteins and equal number of non DNA-binding proteins selected from PDB. Since there are some sequences from the benchmark dataset that shared high sequence identity with the independent dataset PDB186, the tool CD-HIT [77] was applied to remove the sequences from the benchmark dataset having more than 25% sequence identity to any one in a same subset in the independent dataset PDB186 to avoid homology bias. Table 4 lists the predictive results of the proposed method and several relevant existing methods, including iDNA-Prot [16], DNA-Prot [74], DNAbinder [21], DNABIND [34], and DNA-Threader [78], to our best knowledge. Table 4 Results on Independent dataset PDB186 of different predictorsa Methods Acc(%) MCC Sn(%) Sp(%) AUC(%) iDNA-Prot 67.20 0.344 67.70 66.70 83.30 DNA-Prot 61.80 0.240 69.90 53.80 79.60 DNAbinder 60.80 0.216 57.00 64.50 60.70 DNABIND 67.70 0.355 66.70 68.80 69.40 DNA-Threader 59.70 0.279 23.70 95.70 N/A DBPPred 76.90 0.538 79.60 74.20 79.10 PSSM-DT 80.00 0.647 87.09 72.83 87.40 The six methods in the front of the table are six useful predicting methods for identification of DNA-binding proteins proposed in the past and were demonstrated to have good performance. The results of the six existing predicting methods and the SVM-PSSM-DT were achieved on the dataset PDB186 by their model trained on benchmark dataset. aThe results of iDNA-Prot [16], DNA-Prot [74], DNAbinder[21], DNABIND [34], DNA-Threader [78] and DDPPred [75] were obtained from [75]. Moreover, to provide a graphic illustration to show the performance comparisons of the SVM-PSSM-DT with other existing state-of-the-art predictors, the corresponding ROC curves were drawn in Figure 5. The experimental real value results of three predictors are provided by [75], including DBPPred [75], DNAbinder [21] and DNABIND [23]. And the real value outputs of the proposed method, iDNA-Prot and DNA-Prot are obtained by testing their predictors trained on benchmark dataset on independent dataset PDB186. Figure 5 The ROC curves of several predictive methods on Independent dataset. The receiver operating characteristic (ROC) curves of SVM-PSSM-DT and several other existing DNA-binding protein predictors were got by testing the models trained by benchmark dataset on independent dataset PDB186, where the horizontal coordinate X is for the false positive rate or 1-SP and the vertical coordinate Y is for the true positive rate or SN and a good method would yield a curve close to the coordinate (0,1) meaning low false positive rate and high true positive rate. From Table 4 and Figure 5 we can see that among the seven predictive methods, the proposed method has the highest performance with ACC of 80.00%, MCC of 0.674 and AUC of 87.40% and DBPPred is the known reported predictive method with the best predictive performance (ACC = 76.90%, MCC = 0.538 and AUC = 79.10%). So the independent prediction of SVM-PSSM-DT is improved by ACC of 3.105%, MCC of 0.136 and AUC of 8.30% when compared with the DBPPred method, indicating that SVM-PSSM-DT is an effective prediction model for DNA-binding protein identification.