PMC:540021 / 3741-7472 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"15608176-9783222-77178214","span":{"begin":384,"end":386},"obj":"9783222"},{"id":"15608176-12952881-77178215","span":{"begin":1089,"end":1091},"obj":"12952881"},{"id":"15608176-9783222-77178216","span":{"begin":2096,"end":2098},"obj":"9783222"},{"id":"15608176-11734650-77178217","span":{"begin":2940,"end":2942},"obj":"11734650"}],"text":"GENERATION AND STATISTICS OF THE DATABASE\n\nSubfamily generation\nThe division of a protein family into subfamilies is often performed by inspecting the phylogenetic tree of the family and deciding the subfamily membership of proteins. However, there are no clear criteria for dividing the tree into subfamilies, and it would also be time consuming for large-scale analysis. Sjolander (10,12) developed a method called BETE, which uses total relative entropy (TRE), the average relative entropy of all the columns in an alignment between two subfamilies. In this method, a neighbor-joining tree is constructed using TRE as distance measure. The subfamilies are defined using an encoding cost function that strives to minimize the number of subfamilies at the same time as it maximizes the sequence homogeneity within each subfamily. This method is completely automatic and hence can be used for large-scale analysis.\nSubfamilies for the Pfam families were generated using the BETE method. The size and sequence diversity of the subfamilies thus generated is similar to the PANTHER database (11), where expert curators divided the subfamilies after inspecting the phylogenetic tree of each family manually. Function shift between subfamilies was predicted by identifying two kinds of sites, namely CSS and RSS.\n\nConservation shifting sites\nPositions conserved in all members of a family are considered to be important for maintaining the structural scaffold or the core function. However, some positions may be conserved in different subfamilies but using different amino acids. Such positions are likely to be responsible for subfamily-specific functions. It is probable that these subfamilies have slight changes in function, such as different substrate specificities. Positions that exhibit such subfamily-specific conservation patterns are termed as CSS and can thus be used as indicators of function shift. CSS between the subfamilies were identified using the method developed by us (S. Abhiman and E. L. L. Sonnhammer, submitted for publication), which is similar to the method of Sjolander (10). Essentially, the amino acid distribution at each position in an alignment is computed and used to calculate the relative entropy between two subfamily alignments. The cumulative relative entropy is then converted into a Z-score, which is a normalized measure of conservation dissimilarity between two subfamilies.\n\nRate shifting sites\nSites in a protein evolve at different rates, with some functionally constrained sites evolving slowly and some others evolving faster. Some sites also evolve at different rates in different subfamilies of a family. Sites with such shifts in evolutionary rates between two subfamilies are referred to as RSS. Detecting a large number of such positions between two subfamilies suggests that the function has diverged between them. RSS between subfamilies in a family were determined using the LRT method (13). Each position in the alignment is analyzed individually and the program generates U-values that specify the likelihood that there is a rate change for each alignment position between the subfamilies under consideration.\n\nPrediction of functionally divergent subfamily comparisons\nIn each family, the subfamily pairs were compared all-against-all for CSS and RSS. Subfamilies that had at least four sequences were only considered for this analysis. A function shift between a subfamily pair was predicted by using the percentage of CSS and RSS as variables in classification functions. These classification functions were derived from a previous analysis of functionally divergent subfamilies derived from enzyme families (S. Abhiman and E. L. L. Sonnhammer, submitted for publication).\n"}