PMC:4331677 / 9521-11999 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"25707321-17510168-14839513","span":{"begin":156,"end":158},"obj":"17510168"},{"id":"25707321-15319262-14839514","span":{"begin":385,"end":387},"obj":"15319262"},{"id":"25707321-17510168-14839515","span":{"begin":388,"end":390},"obj":"17510168"},{"id":"25707321-17510168-14839516","span":{"begin":788,"end":790},"obj":"17510168"},{"id":"25707321-15319262-14839517","span":{"begin":1349,"end":1351},"obj":"15319262"},{"id":"25707321-17510168-14839518","span":{"begin":1352,"end":1354},"obj":"17510168"}],"text":"Representation of targets' binding sites based on target dictionary\nIn this work, we made use of the target dictionary provided by Nagamine and Sakakibara [18] to represent the biding sites. The dictionary is organized as three layers: the amino acids, trimers and trimer clusters. In the first layer, each amino acid is first represented by 237 items of physical-chemical properties [19,18], such as residue volume, polarizability and solvation free energy. And then, a principal component analysis (PCA [20]) is applied to reduce the dimension. As a result, each amino acid is described by a 5-dimensional feature vector. In the second layer, twenty amino acids are permutated and combined into 4200 trimers. Each trimer αtri(α01, α11, α12) is mapped into a 5-dimensional vector space [18] as follows:\n(1) α t r i α 01 , α 11 , α 12 = α α 01 + α ( α 11 ) + α ( α 12 ) 4\nwhere α(α01), α(α11), α(α12) and αtri(α01, α11, α12) are the 5-dimensional vectors. α01 is the center (major) amino acid, α11 and α12 are the left and right amino acid (subordinate) respectively. There is no location difference between α11 and α12, that means αtri(α01, α11, α12) and αtri(α01, α12, α11) are equivalence. In the third layer, the hierarchical clustering (Ward's algorithm [21]) is used to cluster 4200 trimers into 199 clusters [19,18]. All clusters constitute the dictionary.\nTherefore, we first broke the binding site sequences into trimers. For example, the amino acid sequence NGMGN produces three trimers G(NM), M(GG) and G(MN). Since G(NM) and G(MN) are equivalence, we could combine them by adding a count. Then, we casted all the trimers into 199 clusters, and counted the occurring frequency of each cluster in every binding site. Finally, all cluster frequencies were normalized to unit L2 norm to obtain the feature vectors with 199 dimensions for the binding sites. For example, the sequence NGMGN can be represented as following:\n(2) B s ( N G M G N ) … c ( G ( N N ) , G ( M N ) , … ) … c ( M ( G G ) , M ( A G ) , … ) … = ( … 2 5 … 1 5 … )\nwhere Bs(·) denotes a binding site feature vector. c(·) denotes a cluster, for example, c(G(N N), G(M N),...) represent a cluster that contains G(N N), G(M N), etc. Because the trimers in the same cluster own similar chemical properties, the clusters can be viewed as chemical \"groups\", based on which the ligand binding sites are decomposed into fragments."}