PMC:4331676 / 34063-37520
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"25708928-24475169-14842666","span":{"begin":226,"end":228},"obj":"24475169"},{"id":"25708928-19385697-14842667","span":{"begin":847,"end":849},"obj":"19385697"},{"id":"25708928-16551468-14842668","span":{"begin":877,"end":879},"obj":"16551468"},{"id":"25708928-19911048-14842669","span":{"begin":900,"end":902},"obj":"19911048"},{"id":"25708928-19385697-14842670","span":{"begin":1734,"end":1736},"obj":"19385697"},{"id":"25708928-16551468-14842671","span":{"begin":1763,"end":1765},"obj":"16551468"},{"id":"25708928-19911048-14842672","span":{"begin":1782,"end":1784},"obj":"19911048"},{"id":"25708928-24475169-14842673","span":{"begin":1799,"end":1801},"obj":"24475169"},{"id":"25708928-24475169-14842674","span":{"begin":1823,"end":1825},"obj":"24475169"},{"id":"25708928-24475169-14842675","span":{"begin":2102,"end":2104},"obj":"24475169"},{"id":"25708928-24475169-14842676","span":{"begin":2126,"end":2128},"obj":"24475169"},{"id":"25708928-20089514-14842677","span":{"begin":2159,"end":2161},"obj":"20089514"}],"text":"Independent test\nIn order to further compare the predictive performance of SVM-PSSM-DT with other existing methods, we evaluated the proposed method on the independent dataset PDB186. It was recently constructed by Lou et al [75] to validate the quality of predictions, which consists 93 DNA-binding proteins and equal number of non DNA-binding proteins selected from PDB. Since there are some sequences from the benchmark dataset that shared high sequence identity with the independent dataset PDB186, the tool CD-HIT [77] was applied to remove the sequences from the benchmark dataset having more than 25% sequence identity to any one in a same subset in the independent dataset PDB186 to avoid homology bias. Table 4 lists the predictive results of the proposed method and several relevant existing methods, including iDNA-Prot [16], DNA-Prot [74], DNAbinder [21], DNABIND [34], and DNA-Threader [78], to our best knowledge.\nTable 4 Results on Independent dataset PDB186 of different predictorsa\nMethods Acc(%) MCC Sn(%) Sp(%) AUC(%)\niDNA-Prot 67.20 0.344 67.70 66.70 83.30\nDNA-Prot 61.80 0.240 69.90 53.80 79.60\nDNAbinder 60.80 0.216 57.00 64.50 60.70\nDNABIND 67.70 0.355 66.70 68.80 69.40\nDNA-Threader 59.70 0.279 23.70 95.70 N/A\nDBPPred 76.90 0.538 79.60 74.20 79.10\nPSSM-DT 80.00 0.647 87.09 72.83 87.40\nThe six methods in the front of the table are six useful predicting methods for identification of DNA-binding proteins proposed in the past and were demonstrated to have good performance. The results of the six existing predicting methods and the SVM-PSSM-DT were achieved on the dataset PDB186 by their model trained on benchmark dataset.\naThe results of iDNA-Prot [16], DNA-Prot [74], DNAbinder[21], DNABIND [34], DNA-Threader [78] and DDPPred [75] were obtained from [75]. Moreover, to provide a graphic illustration to show the performance comparisons of the SVM-PSSM-DT with other existing state-of-the-art predictors, the corresponding ROC curves were drawn in Figure 5. The experimental real value results of three predictors are provided by [75], including DBPPred [75], DNAbinder [21] and DNABIND [23]. And the real value outputs of the proposed method, iDNA-Prot and DNA-Prot are obtained by testing their predictors trained on benchmark dataset on independent dataset PDB186.\nFigure 5 The ROC curves of several predictive methods on Independent dataset. The receiver operating characteristic (ROC) curves of SVM-PSSM-DT and several other existing DNA-binding protein predictors were got by testing the models trained by benchmark dataset on independent dataset PDB186, where the horizontal coordinate X is for the false positive rate or 1-SP and the vertical coordinate Y is for the true positive rate or SN and a good method would yield a curve close to the coordinate (0,1) meaning low false positive rate and high true positive rate. From Table 4 and Figure 5 we can see that among the seven predictive methods, the proposed method has the highest performance with ACC of 80.00%, MCC of 0.674 and AUC of 87.40% and DBPPred is the known reported predictive method with the best predictive performance (ACC = 76.90%, MCC = 0.538 and AUC = 79.10%). So the independent prediction of SVM-PSSM-DT is improved by ACC of 3.105%, MCC of 0.136 and AUC of 8.30% when compared with the DBPPred method, indicating that SVM-PSSM-DT is an effective prediction model for DNA-binding protein identification."}