PMC:4331676 / 13432-15488 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/4331676","sourcedb":"PMC","sourceid":"4331676","source_url":"https://www.ncbi.nlm.nih.gov/pmc/4331676","text":"PSSM distance transformation\nIt has been reported that dipeptides containing two residues separated by a distance along the sequence are important for protein functionality annotation in the work [65]. Additionally, the PSSM score can approximately measure how frequently an amino acid occurs at a position of a sequence. Accordingly, we present here a PSSM distance transformation (PSSM-DT) method to encode the feature vector representation from the PSSM information. PSSM-DT can transform the PSSM information into uniform numeric representation by approximately measuring the occurrence probabilities of any pairs of amino acid separated by a distance along the sequence in a sequence. PSSM-DT results in two kinds of features: PSSM distance transformation of pairs of same amino acids (PSSM-SDT) and PSSM distance transformation of pairs of different amino acids (PSSM-DDT). The PSSM-SDT features approximately measure the occurrence probabilities of pairs of same amino acids separated by a distance of lg along the sequence in a sequence, which can be calculated as below\n(3) PSSM - SDT( i , l g ) =  ∑ j = 1 L - l g S i , j * S i , j + l g / ( L - l g )\nwhere i is one type of the amino acid, L is the length of the sequence, Si,j is the PSSM score of amino acid j at position i. In such a way, 20*LG is the number of PSSM-SDT features, where LG is the maximum value of lg (lg = 1, 2,...,LG).\nThe PSSM-DDT features approximately measures the occurrence probabilities of pairs of different amino acids separated by a distance of lg along the sequence, which can be calculated by:\n(4) PSSM - DDT( i 1 , i 2 , l g ) =  ∑ j = 1 L - l g S i 1 , j * S i 2 , j + l g / ( L - l g )\nwhere i1 and i2 refer to two different types amino acids. Similarly, the total number of PSSM-DDT features can be calculated as 380*LG.\nPSSM-DT is the combination of variable PSSM-SDT and PSSM-DDT. Thus a sequence can be transformed into a uniform feature vector with a fixed dimension of 400*LG by using variable PSSM-DT from its PSSM profile.","divisions":[{"label":"title","span":{"begin":0,"end":28}},{"label":"p","span":{"begin":29,"end":1078}},{"label":"p","span":{"begin":1079,"end":1176}},{"label":"label","span":{"begin":1079,"end":1082}},{"label":"p","span":{"begin":1177,"end":1415}},{"label":"p","span":{"begin":1416,"end":1601}},{"label":"p","span":{"begin":1602,"end":1711}},{"label":"label","span":{"begin":1602,"end":1605}},{"label":"p","span":{"begin":1712,"end":1847}}],"tracks":[]}