PMC:4331676 / 27489-31210
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/4331676","sourcedb":"PMC","sourceid":"4331676","source_url":"https://www.ncbi.nlm.nih.gov/pmc/4331676","text":"Comparison with existing PSSM based encoding schemes\nIn this section, four protein encoding schemes based on PSSM are introduced for a comparison. They are the average score of the residues with respect to the column of certain AA type called AvePscore-20 [21], the average score of the residues of certain AA type with respect to the column of certain AA type called AvePscore-400 [74], the percentile value of the PSSM scores along with the column of certain AA type according to percent thresholds called Pscore-100 [75], and auto-correlation coefficient (ACC) transformation that can transform the PSSMs of different lengths into fixed-length vectors by measuring the correlation between two scores separated by a distance of lg along the sequence [76], respectively. Table 2 lists the predictive results of the proposed protein representation and other four considered protein representations on the benchmark dataset using jackknife validation.\nTable 2 Results on benchmark dataset of different PSSM based encoding schemes through jackknife validation.\nMethods Acc(%) MCC SN(%) SP(%) AUC(%)\nAvePscore-20a 73.95 0.480 68.57 79.09 81.40\nAvePscore-400d 73.58 0.470 66.47 80.36 81.50\nPscore-100c 73.12 0.463 72.76 73.45 80.50\nACCd 73.77 0.475 73.14 74.36 81.90\nPSSM-DTf 79.96 0.622 81.91 78.00 86.50\nThe four protein representation methods in the front of the table are four protein encoding methods for identification of DNA-binding proteins proposed in the past. The four methods and the current method PSSM-DT are based on PSSMs property of protein sequences, but the encoding method applied by them are different. The results were got by testing on benchmark dataset through jackknife validation.\naresults obtained by in-house implementation of AvePscore-20 [21]\nbresults obtained by in-house implementation of AvePscore-400 [21]\ncresults obtained by in-house implementation of Pscore-100 [75]\ndresults obtained by in-house implementation of ACC [76]\nfresults obtained by using PSSM-DT as protein representation Furthermore, to provide a graphic illustration to show the performance of the five protein representations, the corresponding ROC (receiver operating characteristic) curves were drawn in Figure 3, where the horizontal coordinate X is for the false positive rate or 1-SP and the vertical coordinate Y is for the true positive rate or SN. The best method would yield a point with the coordinate (0,1) meaning 0 false positive rate and 100% true positive rate. Therefore a perfect classification method would give a point with the coordinate (0,1) and a completely random guess would give a point along the diagonal from point (0,0) to (1,1). The area under the ROC curve called AUC is often used to indicate the performance quality of binary classification methods, where the larger the area, the better the predictive quality is.\nFigure 3 The ROC curves of several PSSM based protein encoding methods on benchmark dataset. The receiver operating characteristic (ROC) curves of PSSM-DT and several other existing protein encoding methods were got by testing the models on benchmark dataset through jackknife validation, where the horizontal coordinate X is for the false positive rate or 1-SP and the vertical coordinate Y is for the true positive rate or SN and a good method would yield a curve close to the coordinate (0,1) meaning low false positive rate and high true positive rate. As shown in Table 2 and Figure 3, the PSSM-DT based protein representation generated the highest performance and outperformed the other four protein representations based on PSSM, indicating that PSSM-DT based protein representation is effective for DNA-binding protein identification.","divisions":[{"label":"title","span":{"begin":0,"end":52}},{"label":"p","span":{"begin":53,"end":950}},{"label":"table-wrap","span":{"begin":951,"end":2048}},{"label":"label","span":{"begin":951,"end":958}},{"label":"caption","span":{"begin":960,"end":1059}},{"label":"p","span":{"begin":960,"end":1059}},{"label":"table","span":{"begin":1060,"end":1332}},{"label":"tr","span":{"begin":1060,"end":1102}},{"label":"th","span":{"begin":1060,"end":1067}},{"label":"th","span":{"begin":1069,"end":1075}},{"label":"th","span":{"begin":1077,"end":1080}},{"label":"th","span":{"begin":1082,"end":1087}},{"label":"th","span":{"begin":1089,"end":1094}},{"label":"th","span":{"begin":1096,"end":1102}},{"label":"tr","span":{"begin":1103,"end":1151}},{"label":"td","span":{"begin":1103,"end":1116}},{"label":"td","span":{"begin":1118,"end":1123}},{"label":"td","span":{"begin":1125,"end":1130}},{"label":"td","span":{"begin":1132,"end":1137}},{"label":"td","span":{"begin":1139,"end":1144}},{"label":"td","span":{"begin":1146,"end":1151}},{"label":"tr","span":{"begin":1152,"end":1201}},{"label":"td","span":{"begin":1152,"end":1166}},{"label":"td","span":{"begin":1168,"end":1173}},{"label":"td","span":{"begin":1175,"end":1180}},{"label":"td","span":{"begin":1182,"end":1187}},{"label":"td","span":{"begin":1189,"end":1194}},{"label":"td","span":{"begin":1196,"end":1201}},{"label":"tr","span":{"begin":1202,"end":1248}},{"label":"td","span":{"begin":1202,"end":1213}},{"label":"td","span":{"begin":1215,"end":1220}},{"label":"td","span":{"begin":1222,"end":1227}},{"label":"td","span":{"begin":1229,"end":1234}},{"label":"td","span":{"begin":1236,"end":1241}},{"label":"td","span":{"begin":1243,"end":1248}},{"label":"tr","span":{"begin":1249,"end":1288}},{"label":"td","span":{"begin":1249,"end":1253}},{"label":"td","span":{"begin":1255,"end":1260}},{"label":"td","span":{"begin":1262,"end":1267}},{"label":"td","span":{"begin":1269,"end":1274}},{"label":"td","span":{"begin":1276,"end":1281}},{"label":"td","span":{"begin":1283,"end":1288}},{"label":"tr","span":{"begin":1289,"end":1332}},{"label":"td","span":{"begin":1289,"end":1297}},{"label":"td","span":{"begin":1299,"end":1304}},{"label":"td","span":{"begin":1306,"end":1311}},{"label":"td","span":{"begin":1313,"end":1318}},{"label":"td","span":{"begin":1320,"end":1325}},{"label":"td","span":{"begin":1327,"end":1332}},{"label":"table-wrap-foot","span":{"begin":1333,"end":2048}},{"label":"p","span":{"begin":1333,"end":1733}},{"label":"p","span":{"begin":1734,"end":1799}},{"label":"p","span":{"begin":1800,"end":1866}},{"label":"p","span":{"begin":1867,"end":1930}},{"label":"p","span":{"begin":1931,"end":1987}},{"label":"p","span":{"begin":1988,"end":2048}},{"label":"p","span":{"begin":2049,"end":2877}},{"label":"figure","span":{"begin":2878,"end":3435}},{"label":"label","span":{"begin":2878,"end":2886}},{"label":"caption","span":{"begin":2888,"end":3435}},{"label":"p","span":{"begin":2888,"end":3435}}],"tracks":[{"project":"2_test","denotations":[{"id":"25708928-19385697-14842659","span":{"begin":383,"end":385},"obj":"19385697"},{"id":"25708928-24475169-14842660","span":{"begin":520,"end":522},"obj":"24475169"},{"id":"25708928-19706744-14842661","span":{"begin":753,"end":755},"obj":"19706744"},{"id":"25708928-24475169-14842662","span":{"begin":1927,"end":1929},"obj":"24475169"},{"id":"25708928-19706744-14842663","span":{"begin":1984,"end":1986},"obj":"19706744"}],"attributes":[{"subj":"25708928-19385697-14842659","pred":"source","obj":"2_test"},{"subj":"25708928-24475169-14842660","pred":"source","obj":"2_test"},{"subj":"25708928-19706744-14842661","pred":"source","obj":"2_test"},{"subj":"25708928-24475169-14842662","pred":"source","obj":"2_test"},{"subj":"25708928-19706744-14842663","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#ec93ea","default":true}]}]}}