Genomics_Inform

PMC:3475487 JSON TXT 6 Projects

CysQ of Cryptosporidium parvum, a Protozoa, May Have Been Acquired from Bacteria by Horizontal Gene Transfer Abstract Horizontal gene transfer (HGT) is the movement of genetic material between kingdoms and is considered to play a positive role in adaptation. Cryptosporidium parvum is a parasitic protozoan that causes an infectious disease. Its genome sequencing reported 14 bacteria-like proteins in the nuclear genome. Among them, cgd2_1810, which has been annotated as CysQ, a sulfite synthesis pathway protein, is listed as one of the candidates of genes horizontally transferred from bacterial origin. In this report, we examined this issue using phylogenetic analysis. Our BLAST search showed that C. parvum CysQ protein had the highest similarity with that of proteobacteria. Analysis with NCBI's Conserved Domain Tree showed phylogenetic incongruence, in that C. parvum CysQ protein was located within a branch of proteobacteria in the cd01638 domain, a bacterial member of the inositol monophosphatase family. According to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, the sulfate assimilation pathway, where CysQ plays an important role, is well conserved in most eukaryotes as well as prokaryotes. However, the Apicomplexa, including C. parvum, largely lack orthologous genes of the pathway, suggesting its loss in those protozoan lineages. Therefore, we conclude that C. parvum regained cysQ from proteobacteria by HGT, although its functional role is elusive. Introduction Horizontal gene transfer (HGT) is the movement of genetic material between different species of interkingdom, and lateral gene transfer is for intrakingdom movement [1]. The concept for gene transfer was first mentioned for an acquisition of virulence between bacterial strains [2], and then, transfers of multiple drug resistance between Shigellae and Escherichia coli strains were reported in Japan [3]. It is important to note that HGT not only affects correct reconstruction of a phylogenetic tree but also helps to understand reasons of its occurrence. In bacteria, it is widely known that genetic material of antibiotic resistance is transferred in the gastrointestinal tract [4], and in the case of unicellular eukaryotes, such as Giardia lamblia, Trichomonas vaginalis, and Entamoeba histolytica, the organisms that use fermentation or anaerobic metabolism in the low-oxygen environment overcame the environmental stress by taking genes from bacteria [5]. Moreover, the transferred gene plays a positive role in adaptation to a pathogenic way of life [6]. On the other hand, HGT has been criticized, in that its biological significance is overemphasized. If a gene has an essential role and participates in many interactions, its transfer might be less likely to occur than that of others or would be detrimental to the recipient. Thus, some transferred genes are considered nonfunctional [7-9]. Cryptosporidium is a parasitic protozoan of the phylum Apicomplexa. Among Cryptosporidium, C. parvum causes an infectious disease in humans and animals with diarrhea, called cryptosporidiosis. Although the disease is prevalent where water quality is poorly managed, there is no satisfactory treatment until now [10, 11]. Genome sequencing of C. parvum was completed in 2004, identifying major metabolic pathways through comparison with other parasites and also reporting those enzymes with high similarities to bacterial and plantal counterparts [12]. In some crucial biosynthesis pathways, C. parvum has enzymes that originated from various organisms, such as bacteria, plants, and algae. Phylogenomic analyses predicted a set of genes transferred from algae and eubacteria [13] and promising drug targets in nucleotide biosynthesis [14]. However, among 14 bacterial-like enzymes that were reported by Abrahamsen et al., only two enzymes received follow-up attention [15, 16]. CysQ, 3'-phosphoadenosine-5'-phosphatase, also known as 3'-phosphoadenosine-5'-phosphosulfate (PAPS) 3'-phosphatase, or 3'(2'), 5'-bisphosphate nucleotidase, was among the 14 bacterial-like enzymes initially reported by the genome analysis. It has been thought that it is needed during aerobic growth in E. coli to help control the levels of PAPS in cysteine biosynthesis [17]. Recently, CysQ protein has been considered as an important regulator that modulates the sulfate assimilation pathway by affecting levels of intermediates in plants, fungi, and bacteria [18, 19]. Despite its biological importance, no follow-up phylogenetic analysis of CysQ in C. parvum has been reported. In this study, we assumed that CysQ might have been transferred from bacteria to C. parvum by horizontal gene transfer. We constructed phylogenetic trees, based on a conserved domain of the protein, and inferred HGT from the phylogenetic incongruence. Methods Sequence source Cryptosporidium parvum strain Iowa type II (NCBI taxonomy accession 353152) was chosen for this study. One of its genes, cgd2_1810, encodes CysQ, a sulfite synthesis pathway protein (accession no. XP_001388206) [12]. The protein sequence was retrieved from the NCBI Protein Database (http://www.ncbi.nlm.nih.gov/protein). BLAST search Sequence similarity searches were performed using BLASTP 2.2.26+ [20] with C. parvum CysQ protein against a nonredundant protein sequence database. TBLASTN 2.2.26+ [21] was performed in order to search for the orthologs of the sulfur metabolism pathway in the C. parvum genome sequence using E. coli and Mycobacterium tuberculosis protein sequences as queries. Conserved domain analysis We obtained information of conserved domains using the NCBI online Conserved Domain-search tool (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and Conserved Domain Database (CDD) (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) [22, 23]. After the matching CDD model and the corresponding Conserved Domain Tree (CDTree) were identified, the CysQ sequence was added to the matched model, and the corresponding CDTree was then recalculated. Phylogenetic tree analysis PhylomeDB is a collection of phylogenetic trees that have been precalculated automatically with a variety of options for a wide range of species [24]. We queried PhylomeDB and downloaded the phylogenetic trees that included CysQ protein. Pathway and ortholog analysis CysQ enzyme was found in sulfate assimilation on sulfur metabolism of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The list of organisms that harbored this pathway was compiled from KEGG orthology (KO) (http://www.genome.jp/kegg/ko.html) [25]. Results BLAST for sequence similarity The BLAST run using the query sequence of Cryptosporidium parvum reported the proteins of the genus Cryptosporidium as the best hits, followed by those of the order Eucoccidiorida, the genus Cryptosporidium belongs to. The next best hits belonged mostly to Gammaproteobacteria. Their gene descriptions corresponded to one of the alternative names of CysQ protein or a member protein of the inositol monophosphatase family, except for unclassified proteins and hypothetical proteins. The best bacterial hit had an identity of 40% and a bit score of 181 bits for the query sequence of 341 amino acids. From the taxonomy report of the BLAST result, 11 organisms among 110 were eukaryotes, and the other 99 were bacteria. The bacterial list was composed of 59 Proteobacteria species, including 53 Gammaproteobacteria and 31 Bacteroidetes species. Phylogenetic analyses While the BLAST analysis hinted HGT of the cysQ gene from bacteria to C. parvum, the hypothesis should be confirmed by phylogenetic analysis. A phylogenetic tree for CysQ protein was retrieved from PhylomeDB. We chose the Phy0018DKQ_ECOL5 tree made by the E. coli protein sequence as a seed and maximum likelihood method with the Jones-Taylor-Thornton (JTT) evolutionary model. The phylogenetic tree with 170 orthologs comprised three eukaryotes-C. parvum, Arabidopsis thaliana, and Oryza sativa-one Archaea, and 166 Bacteria species. In the tree, C. parvum was branched with Proteobacteria, while the plantal proteins were the outgroup of prokaryotic proteins. In OrthoMCL (http://orthomcl.org), CysQ of C. parvum was located within the inositol monophosphatase family of Pfam (entry name OG5_129356) [26]. This ortholog group has only 70 orthologs from 54 different species and paralogs of Viridiplantae or T. vaginalis. Moreover, it included a larger portion of plants and fungi rather than bacteria, and no metazoan protein orthologs were included. Unlike PhylomeDB or OrthoMCL, the CDD of NCBI cataloged proteins sharing CysQ or related domains comprehensively. CysQ protein of C. parvum contains a CysQ domain (accession no. cd01638), which is one of the children of the Fig (FBPase/inositol monophosphatase [IMPase]/glpX-like domain) superfamily. The Fig superfamily is a metal-dependent phosphatase that organizes two subsets of direct children in the hierarchy of the superfamily: FBPase glpX domain (cd01516) and IMPase-like domain (cd01637). Cd01637 has 9 children domains: CysQ (cd01638), IMPase (cd01639), bacterial IMPaselike 1 (cd01641), bacterial IMPase-like 2 (cd01643), IPPase (cd10640), FBPase (cd00354), Arch FBPase 1 (cd 01515), Arch FBPase 2 (cd01642), and PAP phosphatase (cd10517). The whole hierarchy tree of the Fig superfamily comprises a total of 360 cellular organisms: 246 bacteria, 95 eukaryotes, and 19 Archaea (Fig. 1A). Some domains (cd01516, cd01637, cd01638, cd01641, and cd0643) comprise predominantly bacterial proteins in their CDTree, whereas the other domains have a combined composition (cd000354, cd0517, and cd01639) or a high level of Archaea (cd01642 and cd01515). Domains cd01638, cd01641, and cd01643 are bacterial members of the IMPase family. All of them show a high proportion of Proteobacteria, at about 65%, 50%, and 43% respectively. In cd01638, C. parvum CysQ protein is located within the monophyletic gram-negative subtree, ranging from Pseudomonas sringae, Gammaproteobacteria, to Campylobacter jejuni, Epsilonproteobacteria (Fig. 1B). On the other hand, the gram-negative subtree is paraphyletic, in that it has 27 branches of Proteobacteria and Aquificae, Cyanobacteria, and Bacteroidetes, respectively. Taken together, the phylogenetic analysis strongly supports the hypothesis that the cysQ gene of C. parvum may have been acquired from Proteobacteria by horizontal gene transfer. Orthologs on sulfate assimilation pathway CysQ protein participates in sulfate assimilation on sulfur metabolism. In Fig. 2, we show a simplified version of the KEGG pathway, classifying the enzymes into three groups, according to their direction and steps: Class I for EC 2.7.7.4 (CysN) and EC 2.7.7.5 (CysD); Class II for EC 2.7.1.25 (CysC); and Class III for EC 3.1.3.7 (CysQ). If CysQ of C. parvum is a true CysQ enzyme, playing a role in sulfate assimilation in the parasite, the other components of the pathway should be present in it. On the contrary, we could not identify such genes in the annotated gene list. The KEGG pathway did not list C. parvum proteins in the sulfate assimilation pathway. We looked for the C. parvum proteins by searching the genome sequence using TBLASTN with M. tuberculosis CysN (Rv1286) and CysD (Rv1285) and E. coli CysN (b2751), CysD (b2752), and CysC (b2750) proteins as queries. Among class I and II proteins, only CysN showed marginal matches to cgd6_3990 (29% and 33% identities, respectively) to M. tuberculosis and E. coli sequences. Interestingly, this C. parvum protein was reported as elongation factor 1 alpha, not a sulfate adenylyltransferase. This protein had high similarities to other protozoan or fungal elongation factor 1 alpha proteins. Thus, we consider this as a false hit. The class III protein, CysQ, matched to cgd2_1810 (24% and 36% identities, respectively, for M. tuberculosis and E. coli proteins). This C. parvum gene was annotated "CysQ, sulfite synthesis pathway protein." As no other components of the sulfate assimilation pathway, except for CysQ, are found in C. parvum, we may conclude that the pathway does not function in this organism. We compiled the orthologs of the genes in this pathway using the KO database (Table 1). Eukaryotic kingdoms, except for protists harbored full ranges of orthologs in all three classes. Animals and plants showed similar trends in Class I and II, because two classes shared two orthologs (K13811: 3'-phosphoadenosine 5'-phosphosulfate synthase [PAPSS], K00955: bifunctional enzyme CysN/CysC [CysNC]), and even K13811 is specialized in animals and plants. Fungi also have many orthologs, like animals and plants, in Class I and II, but they have different orthologs (Class I, K00958, sulfate adenylyltransferase [E2.7.7.4C, met3]; Class II, K00860, adenylylsulfatekinase [CysC]). In prokaryotes, the proportion of Class I genes is higher than Class II. All Cyanobacteria, two-thirds of Proteobacteria, and Actinobacteria contained one of the orthologs in Class I, whereas Firmicutes, other bacteria, and the Archaea group have a few orthologs in Class I, II, and III. On the other hand, there were very few orthologs of the sulfate assimilation pathway in protists. We expanded the protist lineage, cataloging the proteins at the class or species level (Table 2). Some protists (Choanoflagellates, Entamoeba of Amoebozoa, and Diatoms) had at least one orthologous gene in each of three classes, while most Alveolates, Amoeboflagellate, Euglenozoa, and Diplomonads did not have any orthologs in three classes, many of which are known as parasites causing infectious diseases. While the sulfate assimilation pathway is generally well conserved in both prokaryotes and eukaryotes, in some protist lineages, the pathway is missing. Thus, we hypothesize that the pathway may have been lost during the evolution of the lineages. C. parvum, like other Aveolates, also may have lost it, and cge2_1810 can not function as CysQ properly. Its function remains elusive, as the sequence similarity to CysQ of M. tuberculosis or E. coli is rather low. Discussion A BLAST search of C. parvum CysQ (cgd2_1810) protein shows the highest similarity with those of proteobacteria. Although it implies HGT from bacteria to this eukaryote, sequence similarity alone is not enough as its basis for several known reasons [27]. In addition, a phylogenetic analysis should support it. For this, initially, we relied on the tree built by PhylomeDB. However, its species coverage was biased or undersampled. On the other hand, CDD of NCBI is well subdivided into kingdom and function groups. C. parvum CysQ protein was mapped into the subfamily of IMPase, which is a bacterial CysQ domain, and comprises only bacterial sequences. Furthermore, C. parvum was located near Gamma- and Alphaproteobacteria in the CDTree. Hence, it seems that these results demonstrate that gene transfer events occurred from bacteria to C. parvum in the evolutionary process. On the KEGG pathway, we raise the possibility that Alveolates, Euglenozoa, and Diplomonads of protozoa suffered from the losses of genes in the sulfate assimilation pathway. But, did C. parvum recover CysQ protein by HGT in the process of evolution? Sulfate assimilation shows highly conserved orthologs for each taxonomy lineage, and it plays important roles in sulfur metabolism, whereas Alveolates of protozoa, including Cryptosporidium, rarely have orthologous genes. For the pathogenic bacterium M. tuberculosis, sulfur-containing metabolites are essential to its pathogenesis and persistence in the host [18, 28], and the Database of Essential Genes (DEG) lists that Rv1286 (cysN) and Rv1285 (cysD), not Rv2131c (cysQ), are essential genes in sulfur metabolism of M. tuberculosis [29, 30]. Parasitic protozoa have diverse sulfur-containing amino acid metabolism that are considered to affect virulence and several stress response. On the other hand, C. parvum and Plasmodium falciparum lack a sulfur assimilation pathway, which is expected to be substituted from host cells [31]. In conclusion, although the sulfate assimilation pathway is missing in some protest lineages, C. parvum has a protein that is predicted as CysQ and has sequence similarity with that of proteobacteria, gram-negative bacteria. Moreover, the phylogenetic analysis supports the acquisition of cgd2_1810 from proteobacteria through horizontal gene transfer. Therefore, we can infer that C. parvum lost its genes in the sulfate assimilation pathway, including cysQ, during a parasitic way of life, and it acquired a copy of cysQ from bacteria by horizontal gene transfer. What is the biological role of this gene product? As the sole member, without other members, of the pathway, it can not assume the right role of CysQ. Its function is elusive at the moment. Fig. 1 Conserved Domain Tree (CDTree) involving CysQ (cgd2_1810) protein. (A) Whole hierarchy of Fig superfamily. Branch colors indicate Eukaryota (pink), Bacteria (bright blue), and Archaea (orange) domains; Cryptosporidium parvum sequence is added (red branch and circle). On the right side, color bar marked with domain accession names displays subfamilies in Fig superfamily. All CDs except for cd01637 are well clustered. cd01637 is scattered around the tree and highlighted in black ticks. (B) CDTree of cd01638 domain. The cd01638 clade that includes only bacteria is classified by taxonomic lineage. Bacteria are subdivided into two distinct parts, gram-negative and gram-positive bacteria. Each bacterial phylum is colored as shown in legend. CysQ protein of C. parvum is shown in red dashed branch with a red circle near Proteobacteria. Fig. 2 A simplified sulfate assimilation pathway. Inorganic sulfate is converted to adenosine 5'-phosphosulfate (APS) by ATP sulfurylase (EC 2.7.7.4 and/or EC 2.7.7.5, Class I). APS is phosphorylated by APS kinase (EC 2.7.1.25, Class II) to 3'-phosphoadenosine-5'-phosphosulfate (PAPS). PAPS is either transferred to 3'-phosphoadenosine-5'-phosphate (PAP) or reconverted to APS by 3'-phosphoadenosine-5'-phosphatase (EC 3.1.3.7, Class III). Table 1 Class comparison of ortholog conservation in cellular organismsa Values of conservation state are presented as percentage. aEach class (see the caption of Fig. 2 for abbreviations of classes) consists of several orthologs and it follows the reference pathway of KEGG Ortholog (KO); b+ symbol denotes the maximum percentage of KO that have orthologs in each Class: 0% < (+) ≤ 25%, 25% < (++) ≤ 50%, 50% < (+++) ≤ 75%, 75% < (++++) ≤ 100%; cMaximum percentage of KO; dAverage percentage of participating KOs in the pathway. Table 2 Protist orthologs involved in sulfate assimilation pathway aAbbreviations of ortholog are as follows: A (K13811), B (K00955), C (K00956), D (K00957), E (K00958), F (K00959), G (K00960), H (K01082), I (K15759), J (K15422).

Document structure show

Title	CysQ of Cryptosporidium parvum, a Protozoa, May Have Been Acquired from Bacteria by Horizontal Gene Transfer
Abstract	Horizontal gene transfer (HGT) is the movement of genetic material between kingdoms and is considered to play a positive role in adaptation. Cryptosporidium parvum is a parasitic protozoan that causes an infectious disease. Its genome sequencing reported 14 bacteria-like proteins in the nuclear genome. Among them, cgd2_1810, which has been annotated as CysQ, a sulfite synthesis pathway protein, is listed as one of the candidates of genes horizontally transferred from bacterial origin. In this report, we examined this issue using phylogenetic analysis. Our BLAST search showed that C. parvum CysQ protein had the highest similarity with that of proteobacteria. Analysis with NCBI's Conserved Domain Tree showed phylogenetic incongruence, in that C. parvum CysQ protein was located within a branch of proteobacteria in the cd01638 domain, a bacterial member of the inositol monophosphatase family. According to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, the sulfate assimilation pathway, where CysQ plays an important role, is well conserved in most eukaryotes as well as prokaryotes. However, the Apicomplexa, including C. parvum, largely lack orthologous genes of the pathway, suggesting its loss in those protozoan lineages. Therefore, we conclude that C. parvum regained cysQ from proteobacteria by HGT, although its functional role is elusive.
Body	Introduction Horizontal gene transfer (HGT) is the movement of genetic material between different species of interkingdom, and lateral gene transfer is for intrakingdom movement [1]. The concept for gene transfer was first mentioned for an acquisition of virulence between bacterial strains [2], and then, transfers of multiple drug resistance between Shigellae and Escherichia coli strains were reported in Japan [3]. It is important to note that HGT not only affects correct reconstruction of a phylogenetic tree but also helps to understand reasons of its occurrence. In bacteria, it is widely known that genetic material of antibiotic resistance is transferred in the gastrointestinal tract [4], and in the case of unicellular eukaryotes, such as Giardia lamblia, Trichomonas vaginalis, and Entamoeba histolytica, the organisms that use fermentation or anaerobic metabolism in the low-oxygen environment overcame the environmental stress by taking genes from bacteria [5]. Moreover, the transferred gene plays a positive role in adaptation to a pathogenic way of life [6]. On the other hand, HGT has been criticized, in that its biological significance is overemphasized. If a gene has an essential role and participates in many interactions, its transfer might be less likely to occur than that of others or would be detrimental to the recipient. Thus, some transferred genes are considered nonfunctional [7-9]. Cryptosporidium is a parasitic protozoan of the phylum Apicomplexa. Among Cryptosporidium, C. parvum causes an infectious disease in humans and animals with diarrhea, called cryptosporidiosis. Although the disease is prevalent where water quality is poorly managed, there is no satisfactory treatment until now [10, 11]. Genome sequencing of C. parvum was completed in 2004, identifying major metabolic pathways through comparison with other parasites and also reporting those enzymes with high similarities to bacterial and plantal counterparts [12]. In some crucial biosynthesis pathways, C. parvum has enzymes that originated from various organisms, such as bacteria, plants, and algae. Phylogenomic analyses predicted a set of genes transferred from algae and eubacteria [13] and promising drug targets in nucleotide biosynthesis [14]. However, among 14 bacterial-like enzymes that were reported by Abrahamsen et al., only two enzymes received follow-up attention [15, 16]. CysQ, 3'-phosphoadenosine-5'-phosphatase, also known as 3'-phosphoadenosine-5'-phosphosulfate (PAPS) 3'-phosphatase, or 3'(2'), 5'-bisphosphate nucleotidase, was among the 14 bacterial-like enzymes initially reported by the genome analysis. It has been thought that it is needed during aerobic growth in E. coli to help control the levels of PAPS in cysteine biosynthesis [17]. Recently, CysQ protein has been considered as an important regulator that modulates the sulfate assimilation pathway by affecting levels of intermediates in plants, fungi, and bacteria [18, 19]. Despite its biological importance, no follow-up phylogenetic analysis of CysQ in C. parvum has been reported. In this study, we assumed that CysQ might have been transferred from bacteria to C. parvum by horizontal gene transfer. We constructed phylogenetic trees, based on a conserved domain of the protein, and inferred HGT from the phylogenetic incongruence. Methods Sequence source Cryptosporidium parvum strain Iowa type II (NCBI taxonomy accession 353152) was chosen for this study. One of its genes, cgd2_1810, encodes CysQ, a sulfite synthesis pathway protein (accession no. XP_001388206) [12]. The protein sequence was retrieved from the NCBI Protein Database (http://www.ncbi.nlm.nih.gov/protein). BLAST search Sequence similarity searches were performed using BLASTP 2.2.26+ [20] with C. parvum CysQ protein against a nonredundant protein sequence database. TBLASTN 2.2.26+ [21] was performed in order to search for the orthologs of the sulfur metabolism pathway in the C. parvum genome sequence using E. coli and Mycobacterium tuberculosis protein sequences as queries. Conserved domain analysis We obtained information of conserved domains using the NCBI online Conserved Domain-search tool (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and Conserved Domain Database (CDD) (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) [22, 23]. After the matching CDD model and the corresponding Conserved Domain Tree (CDTree) were identified, the CysQ sequence was added to the matched model, and the corresponding CDTree was then recalculated. Phylogenetic tree analysis PhylomeDB is a collection of phylogenetic trees that have been precalculated automatically with a variety of options for a wide range of species [24]. We queried PhylomeDB and downloaded the phylogenetic trees that included CysQ protein. Pathway and ortholog analysis CysQ enzyme was found in sulfate assimilation on sulfur metabolism of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The list of organisms that harbored this pathway was compiled from KEGG orthology (KO) (http://www.genome.jp/kegg/ko.html) [25]. Results BLAST for sequence similarity The BLAST run using the query sequence of Cryptosporidium parvum reported the proteins of the genus Cryptosporidium as the best hits, followed by those of the order Eucoccidiorida, the genus Cryptosporidium belongs to. The next best hits belonged mostly to Gammaproteobacteria. Their gene descriptions corresponded to one of the alternative names of CysQ protein or a member protein of the inositol monophosphatase family, except for unclassified proteins and hypothetical proteins. The best bacterial hit had an identity of 40% and a bit score of 181 bits for the query sequence of 341 amino acids. From the taxonomy report of the BLAST result, 11 organisms among 110 were eukaryotes, and the other 99 were bacteria. The bacterial list was composed of 59 Proteobacteria species, including 53 Gammaproteobacteria and 31 Bacteroidetes species. Phylogenetic analyses While the BLAST analysis hinted HGT of the cysQ gene from bacteria to C. parvum, the hypothesis should be confirmed by phylogenetic analysis. A phylogenetic tree for CysQ protein was retrieved from PhylomeDB. We chose the Phy0018DKQ_ECOL5 tree made by the E. coli protein sequence as a seed and maximum likelihood method with the Jones-Taylor-Thornton (JTT) evolutionary model. The phylogenetic tree with 170 orthologs comprised three eukaryotes-C. parvum, Arabidopsis thaliana, and Oryza sativa-one Archaea, and 166 Bacteria species. In the tree, C. parvum was branched with Proteobacteria, while the plantal proteins were the outgroup of prokaryotic proteins. In OrthoMCL (http://orthomcl.org), CysQ of C. parvum was located within the inositol monophosphatase family of Pfam (entry name OG5_129356) [26]. This ortholog group has only 70 orthologs from 54 different species and paralogs of Viridiplantae or T. vaginalis. Moreover, it included a larger portion of plants and fungi rather than bacteria, and no metazoan protein orthologs were included. Unlike PhylomeDB or OrthoMCL, the CDD of NCBI cataloged proteins sharing CysQ or related domains comprehensively. CysQ protein of C. parvum contains a CysQ domain (accession no. cd01638), which is one of the children of the Fig (FBPase/inositol monophosphatase [IMPase]/glpX-like domain) superfamily. The Fig superfamily is a metal-dependent phosphatase that organizes two subsets of direct children in the hierarchy of the superfamily: FBPase glpX domain (cd01516) and IMPase-like domain (cd01637). Cd01637 has 9 children domains: CysQ (cd01638), IMPase (cd01639), bacterial IMPaselike 1 (cd01641), bacterial IMPase-like 2 (cd01643), IPPase (cd10640), FBPase (cd00354), Arch FBPase 1 (cd 01515), Arch FBPase 2 (cd01642), and PAP phosphatase (cd10517). The whole hierarchy tree of the Fig superfamily comprises a total of 360 cellular organisms: 246 bacteria, 95 eukaryotes, and 19 Archaea (Fig. 1A). Some domains (cd01516, cd01637, cd01638, cd01641, and cd0643) comprise predominantly bacterial proteins in their CDTree, whereas the other domains have a combined composition (cd000354, cd0517, and cd01639) or a high level of Archaea (cd01642 and cd01515). Domains cd01638, cd01641, and cd01643 are bacterial members of the IMPase family. All of them show a high proportion of Proteobacteria, at about 65%, 50%, and 43% respectively. In cd01638, C. parvum CysQ protein is located within the monophyletic gram-negative subtree, ranging from Pseudomonas sringae, Gammaproteobacteria, to Campylobacter jejuni, Epsilonproteobacteria (Fig. 1B). On the other hand, the gram-negative subtree is paraphyletic, in that it has 27 branches of Proteobacteria and Aquificae, Cyanobacteria, and Bacteroidetes, respectively. Taken together, the phylogenetic analysis strongly supports the hypothesis that the cysQ gene of C. parvum may have been acquired from Proteobacteria by horizontal gene transfer. Orthologs on sulfate assimilation pathway CysQ protein participates in sulfate assimilation on sulfur metabolism. In Fig. 2, we show a simplified version of the KEGG pathway, classifying the enzymes into three groups, according to their direction and steps: Class I for EC 2.7.7.4 (CysN) and EC 2.7.7.5 (CysD); Class II for EC 2.7.1.25 (CysC); and Class III for EC 3.1.3.7 (CysQ). If CysQ of C. parvum is a true CysQ enzyme, playing a role in sulfate assimilation in the parasite, the other components of the pathway should be present in it. On the contrary, we could not identify such genes in the annotated gene list. The KEGG pathway did not list C. parvum proteins in the sulfate assimilation pathway. We looked for the C. parvum proteins by searching the genome sequence using TBLASTN with M. tuberculosis CysN (Rv1286) and CysD (Rv1285) and E. coli CysN (b2751), CysD (b2752), and CysC (b2750) proteins as queries. Among class I and II proteins, only CysN showed marginal matches to cgd6_3990 (29% and 33% identities, respectively) to M. tuberculosis and E. coli sequences. Interestingly, this C. parvum protein was reported as elongation factor 1 alpha, not a sulfate adenylyltransferase. This protein had high similarities to other protozoan or fungal elongation factor 1 alpha proteins. Thus, we consider this as a false hit. The class III protein, CysQ, matched to cgd2_1810 (24% and 36% identities, respectively, for M. tuberculosis and E. coli proteins). This C. parvum gene was annotated "CysQ, sulfite synthesis pathway protein." As no other components of the sulfate assimilation pathway, except for CysQ, are found in C. parvum, we may conclude that the pathway does not function in this organism. We compiled the orthologs of the genes in this pathway using the KO database (Table 1). Eukaryotic kingdoms, except for protists harbored full ranges of orthologs in all three classes. Animals and plants showed similar trends in Class I and II, because two classes shared two orthologs (K13811: 3'-phosphoadenosine 5'-phosphosulfate synthase [PAPSS], K00955: bifunctional enzyme CysN/CysC [CysNC]), and even K13811 is specialized in animals and plants. Fungi also have many orthologs, like animals and plants, in Class I and II, but they have different orthologs (Class I, K00958, sulfate adenylyltransferase [E2.7.7.4C, met3]; Class II, K00860, adenylylsulfatekinase [CysC]). In prokaryotes, the proportion of Class I genes is higher than Class II. All Cyanobacteria, two-thirds of Proteobacteria, and Actinobacteria contained one of the orthologs in Class I, whereas Firmicutes, other bacteria, and the Archaea group have a few orthologs in Class I, II, and III. On the other hand, there were very few orthologs of the sulfate assimilation pathway in protists. We expanded the protist lineage, cataloging the proteins at the class or species level (Table 2). Some protists (Choanoflagellates, Entamoeba of Amoebozoa, and Diatoms) had at least one orthologous gene in each of three classes, while most Alveolates, Amoeboflagellate, Euglenozoa, and Diplomonads did not have any orthologs in three classes, many of which are known as parasites causing infectious diseases. While the sulfate assimilation pathway is generally well conserved in both prokaryotes and eukaryotes, in some protist lineages, the pathway is missing. Thus, we hypothesize that the pathway may have been lost during the evolution of the lineages. C. parvum, like other Aveolates, also may have lost it, and cge2_1810 can not function as CysQ properly. Its function remains elusive, as the sequence similarity to CysQ of M. tuberculosis or E. coli is rather low. Discussion A BLAST search of C. parvum CysQ (cgd2_1810) protein shows the highest similarity with those of proteobacteria. Although it implies HGT from bacteria to this eukaryote, sequence similarity alone is not enough as its basis for several known reasons [27]. In addition, a phylogenetic analysis should support it. For this, initially, we relied on the tree built by PhylomeDB. However, its species coverage was biased or undersampled. On the other hand, CDD of NCBI is well subdivided into kingdom and function groups. C. parvum CysQ protein was mapped into the subfamily of IMPase, which is a bacterial CysQ domain, and comprises only bacterial sequences. Furthermore, C. parvum was located near Gamma- and Alphaproteobacteria in the CDTree. Hence, it seems that these results demonstrate that gene transfer events occurred from bacteria to C. parvum in the evolutionary process. On the KEGG pathway, we raise the possibility that Alveolates, Euglenozoa, and Diplomonads of protozoa suffered from the losses of genes in the sulfate assimilation pathway. But, did C. parvum recover CysQ protein by HGT in the process of evolution? Sulfate assimilation shows highly conserved orthologs for each taxonomy lineage, and it plays important roles in sulfur metabolism, whereas Alveolates of protozoa, including Cryptosporidium, rarely have orthologous genes. For the pathogenic bacterium M. tuberculosis, sulfur-containing metabolites are essential to its pathogenesis and persistence in the host [18, 28], and the Database of Essential Genes (DEG) lists that Rv1286 (cysN) and Rv1285 (cysD), not Rv2131c (cysQ), are essential genes in sulfur metabolism of M. tuberculosis [29, 30]. Parasitic protozoa have diverse sulfur-containing amino acid metabolism that are considered to affect virulence and several stress response. On the other hand, C. parvum and Plasmodium falciparum lack a sulfur assimilation pathway, which is expected to be substituted from host cells [31]. In conclusion, although the sulfate assimilation pathway is missing in some protest lineages, C. parvum has a protein that is predicted as CysQ and has sequence similarity with that of proteobacteria, gram-negative bacteria. Moreover, the phylogenetic analysis supports the acquisition of cgd2_1810 from proteobacteria through horizontal gene transfer. Therefore, we can infer that C. parvum lost its genes in the sulfate assimilation pathway, including cysQ, during a parasitic way of life, and it acquired a copy of cysQ from bacteria by horizontal gene transfer. What is the biological role of this gene product? As the sole member, without other members, of the pathway, it can not assume the right role of CysQ. Its function is elusive at the moment.
Section	Introduction Horizontal gene transfer (HGT) is the movement of genetic material between different species of interkingdom, and lateral gene transfer is for intrakingdom movement [1]. The concept for gene transfer was first mentioned for an acquisition of virulence between bacterial strains [2], and then, transfers of multiple drug resistance between Shigellae and Escherichia coli strains were reported in Japan [3]. It is important to note that HGT not only affects correct reconstruction of a phylogenetic tree but also helps to understand reasons of its occurrence. In bacteria, it is widely known that genetic material of antibiotic resistance is transferred in the gastrointestinal tract [4], and in the case of unicellular eukaryotes, such as Giardia lamblia, Trichomonas vaginalis, and Entamoeba histolytica, the organisms that use fermentation or anaerobic metabolism in the low-oxygen environment overcame the environmental stress by taking genes from bacteria [5]. Moreover, the transferred gene plays a positive role in adaptation to a pathogenic way of life [6]. On the other hand, HGT has been criticized, in that its biological significance is overemphasized. If a gene has an essential role and participates in many interactions, its transfer might be less likely to occur than that of others or would be detrimental to the recipient. Thus, some transferred genes are considered nonfunctional [7-9]. Cryptosporidium is a parasitic protozoan of the phylum Apicomplexa. Among Cryptosporidium, C. parvum causes an infectious disease in humans and animals with diarrhea, called cryptosporidiosis. Although the disease is prevalent where water quality is poorly managed, there is no satisfactory treatment until now [10, 11]. Genome sequencing of C. parvum was completed in 2004, identifying major metabolic pathways through comparison with other parasites and also reporting those enzymes with high similarities to bacterial and plantal counterparts [12]. In some crucial biosynthesis pathways, C. parvum has enzymes that originated from various organisms, such as bacteria, plants, and algae. Phylogenomic analyses predicted a set of genes transferred from algae and eubacteria [13] and promising drug targets in nucleotide biosynthesis [14]. However, among 14 bacterial-like enzymes that were reported by Abrahamsen et al., only two enzymes received follow-up attention [15, 16]. CysQ, 3'-phosphoadenosine-5'-phosphatase, also known as 3'-phosphoadenosine-5'-phosphosulfate (PAPS) 3'-phosphatase, or 3'(2'), 5'-bisphosphate nucleotidase, was among the 14 bacterial-like enzymes initially reported by the genome analysis. It has been thought that it is needed during aerobic growth in E. coli to help control the levels of PAPS in cysteine biosynthesis [17]. Recently, CysQ protein has been considered as an important regulator that modulates the sulfate assimilation pathway by affecting levels of intermediates in plants, fungi, and bacteria [18, 19]. Despite its biological importance, no follow-up phylogenetic analysis of CysQ in C. parvum has been reported. In this study, we assumed that CysQ might have been transferred from bacteria to C. parvum by horizontal gene transfer. We constructed phylogenetic trees, based on a conserved domain of the protein, and inferred HGT from the phylogenetic incongruence.
Title	Introduction
Section	Methods Sequence source Cryptosporidium parvum strain Iowa type II (NCBI taxonomy accession 353152) was chosen for this study. One of its genes, cgd2_1810, encodes CysQ, a sulfite synthesis pathway protein (accession no. XP_001388206) [12]. The protein sequence was retrieved from the NCBI Protein Database (http://www.ncbi.nlm.nih.gov/protein). BLAST search Sequence similarity searches were performed using BLASTP 2.2.26+ [20] with C. parvum CysQ protein against a nonredundant protein sequence database. TBLASTN 2.2.26+ [21] was performed in order to search for the orthologs of the sulfur metabolism pathway in the C. parvum genome sequence using E. coli and Mycobacterium tuberculosis protein sequences as queries. Conserved domain analysis We obtained information of conserved domains using the NCBI online Conserved Domain-search tool (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and Conserved Domain Database (CDD) (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) [22, 23]. After the matching CDD model and the corresponding Conserved Domain Tree (CDTree) were identified, the CysQ sequence was added to the matched model, and the corresponding CDTree was then recalculated. Phylogenetic tree analysis PhylomeDB is a collection of phylogenetic trees that have been precalculated automatically with a variety of options for a wide range of species [24]. We queried PhylomeDB and downloaded the phylogenetic trees that included CysQ protein. Pathway and ortholog analysis CysQ enzyme was found in sulfate assimilation on sulfur metabolism of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The list of organisms that harbored this pathway was compiled from KEGG orthology (KO) (http://www.genome.jp/kegg/ko.html) [25].
Title	Methods
Section	Sequence source Cryptosporidium parvum strain Iowa type II (NCBI taxonomy accession 353152) was chosen for this study. One of its genes, cgd2_1810, encodes CysQ, a sulfite synthesis pathway protein (accession no. XP_001388206) [12]. The protein sequence was retrieved from the NCBI Protein Database (http://www.ncbi.nlm.nih.gov/protein).
Title	Sequence source
Section	BLAST search Sequence similarity searches were performed using BLASTP 2.2.26+ [20] with C. parvum CysQ protein against a nonredundant protein sequence database. TBLASTN 2.2.26+ [21] was performed in order to search for the orthologs of the sulfur metabolism pathway in the C. parvum genome sequence using E. coli and Mycobacterium tuberculosis protein sequences as queries.
Title	BLAST search
Section	Conserved domain analysis We obtained information of conserved domains using the NCBI online Conserved Domain-search tool (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and Conserved Domain Database (CDD) (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) [22, 23]. After the matching CDD model and the corresponding Conserved Domain Tree (CDTree) were identified, the CysQ sequence was added to the matched model, and the corresponding CDTree was then recalculated.
Title	Conserved domain analysis
Section	Phylogenetic tree analysis PhylomeDB is a collection of phylogenetic trees that have been precalculated automatically with a variety of options for a wide range of species [24]. We queried PhylomeDB and downloaded the phylogenetic trees that included CysQ protein.
Title	Phylogenetic tree analysis
Section	Pathway and ortholog analysis CysQ enzyme was found in sulfate assimilation on sulfur metabolism of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The list of organisms that harbored this pathway was compiled from KEGG orthology (KO) (http://www.genome.jp/kegg/ko.html) [25].
Title	Pathway and ortholog analysis
Section	Results BLAST for sequence similarity The BLAST run using the query sequence of Cryptosporidium parvum reported the proteins of the genus Cryptosporidium as the best hits, followed by those of the order Eucoccidiorida, the genus Cryptosporidium belongs to. The next best hits belonged mostly to Gammaproteobacteria. Their gene descriptions corresponded to one of the alternative names of CysQ protein or a member protein of the inositol monophosphatase family, except for unclassified proteins and hypothetical proteins. The best bacterial hit had an identity of 40% and a bit score of 181 bits for the query sequence of 341 amino acids. From the taxonomy report of the BLAST result, 11 organisms among 110 were eukaryotes, and the other 99 were bacteria. The bacterial list was composed of 59 Proteobacteria species, including 53 Gammaproteobacteria and 31 Bacteroidetes species. Phylogenetic analyses While the BLAST analysis hinted HGT of the cysQ gene from bacteria to C. parvum, the hypothesis should be confirmed by phylogenetic analysis. A phylogenetic tree for CysQ protein was retrieved from PhylomeDB. We chose the Phy0018DKQ_ECOL5 tree made by the E. coli protein sequence as a seed and maximum likelihood method with the Jones-Taylor-Thornton (JTT) evolutionary model. The phylogenetic tree with 170 orthologs comprised three eukaryotes-C. parvum, Arabidopsis thaliana, and Oryza sativa-one Archaea, and 166 Bacteria species. In the tree, C. parvum was branched with Proteobacteria, while the plantal proteins were the outgroup of prokaryotic proteins. In OrthoMCL (http://orthomcl.org), CysQ of C. parvum was located within the inositol monophosphatase family of Pfam (entry name OG5_129356) [26]. This ortholog group has only 70 orthologs from 54 different species and paralogs of Viridiplantae or T. vaginalis. Moreover, it included a larger portion of plants and fungi rather than bacteria, and no metazoan protein orthologs were included. Unlike PhylomeDB or OrthoMCL, the CDD of NCBI cataloged proteins sharing CysQ or related domains comprehensively. CysQ protein of C. parvum contains a CysQ domain (accession no. cd01638), which is one of the children of the Fig (FBPase/inositol monophosphatase [IMPase]/glpX-like domain) superfamily. The Fig superfamily is a metal-dependent phosphatase that organizes two subsets of direct children in the hierarchy of the superfamily: FBPase glpX domain (cd01516) and IMPase-like domain (cd01637). Cd01637 has 9 children domains: CysQ (cd01638), IMPase (cd01639), bacterial IMPaselike 1 (cd01641), bacterial IMPase-like 2 (cd01643), IPPase (cd10640), FBPase (cd00354), Arch FBPase 1 (cd 01515), Arch FBPase 2 (cd01642), and PAP phosphatase (cd10517). The whole hierarchy tree of the Fig superfamily comprises a total of 360 cellular organisms: 246 bacteria, 95 eukaryotes, and 19 Archaea (Fig. 1A). Some domains (cd01516, cd01637, cd01638, cd01641, and cd0643) comprise predominantly bacterial proteins in their CDTree, whereas the other domains have a combined composition (cd000354, cd0517, and cd01639) or a high level of Archaea (cd01642 and cd01515). Domains cd01638, cd01641, and cd01643 are bacterial members of the IMPase family. All of them show a high proportion of Proteobacteria, at about 65%, 50%, and 43% respectively. In cd01638, C. parvum CysQ protein is located within the monophyletic gram-negative subtree, ranging from Pseudomonas sringae, Gammaproteobacteria, to Campylobacter jejuni, Epsilonproteobacteria (Fig. 1B). On the other hand, the gram-negative subtree is paraphyletic, in that it has 27 branches of Proteobacteria and Aquificae, Cyanobacteria, and Bacteroidetes, respectively. Taken together, the phylogenetic analysis strongly supports the hypothesis that the cysQ gene of C. parvum may have been acquired from Proteobacteria by horizontal gene transfer. Orthologs on sulfate assimilation pathway CysQ protein participates in sulfate assimilation on sulfur metabolism. In Fig. 2, we show a simplified version of the KEGG pathway, classifying the enzymes into three groups, according to their direction and steps: Class I for EC 2.7.7.4 (CysN) and EC 2.7.7.5 (CysD); Class II for EC 2.7.1.25 (CysC); and Class III for EC 3.1.3.7 (CysQ). If CysQ of C. parvum is a true CysQ enzyme, playing a role in sulfate assimilation in the parasite, the other components of the pathway should be present in it. On the contrary, we could not identify such genes in the annotated gene list. The KEGG pathway did not list C. parvum proteins in the sulfate assimilation pathway. We looked for the C. parvum proteins by searching the genome sequence using TBLASTN with M. tuberculosis CysN (Rv1286) and CysD (Rv1285) and E. coli CysN (b2751), CysD (b2752), and CysC (b2750) proteins as queries. Among class I and II proteins, only CysN showed marginal matches to cgd6_3990 (29% and 33% identities, respectively) to M. tuberculosis and E. coli sequences. Interestingly, this C. parvum protein was reported as elongation factor 1 alpha, not a sulfate adenylyltransferase. This protein had high similarities to other protozoan or fungal elongation factor 1 alpha proteins. Thus, we consider this as a false hit. The class III protein, CysQ, matched to cgd2_1810 (24% and 36% identities, respectively, for M. tuberculosis and E. coli proteins). This C. parvum gene was annotated "CysQ, sulfite synthesis pathway protein." As no other components of the sulfate assimilation pathway, except for CysQ, are found in C. parvum, we may conclude that the pathway does not function in this organism. We compiled the orthologs of the genes in this pathway using the KO database (Table 1). Eukaryotic kingdoms, except for protists harbored full ranges of orthologs in all three classes. Animals and plants showed similar trends in Class I and II, because two classes shared two orthologs (K13811: 3'-phosphoadenosine 5'-phosphosulfate synthase [PAPSS], K00955: bifunctional enzyme CysN/CysC [CysNC]), and even K13811 is specialized in animals and plants. Fungi also have many orthologs, like animals and plants, in Class I and II, but they have different orthologs (Class I, K00958, sulfate adenylyltransferase [E2.7.7.4C, met3]; Class II, K00860, adenylylsulfatekinase [CysC]). In prokaryotes, the proportion of Class I genes is higher than Class II. All Cyanobacteria, two-thirds of Proteobacteria, and Actinobacteria contained one of the orthologs in Class I, whereas Firmicutes, other bacteria, and the Archaea group have a few orthologs in Class I, II, and III. On the other hand, there were very few orthologs of the sulfate assimilation pathway in protists. We expanded the protist lineage, cataloging the proteins at the class or species level (Table 2). Some protists (Choanoflagellates, Entamoeba of Amoebozoa, and Diatoms) had at least one orthologous gene in each of three classes, while most Alveolates, Amoeboflagellate, Euglenozoa, and Diplomonads did not have any orthologs in three classes, many of which are known as parasites causing infectious diseases. While the sulfate assimilation pathway is generally well conserved in both prokaryotes and eukaryotes, in some protist lineages, the pathway is missing. Thus, we hypothesize that the pathway may have been lost during the evolution of the lineages. C. parvum, like other Aveolates, also may have lost it, and cge2_1810 can not function as CysQ properly. Its function remains elusive, as the sequence similarity to CysQ of M. tuberculosis or E. coli is rather low.
Title	Results
Section	BLAST for sequence similarity The BLAST run using the query sequence of Cryptosporidium parvum reported the proteins of the genus Cryptosporidium as the best hits, followed by those of the order Eucoccidiorida, the genus Cryptosporidium belongs to. The next best hits belonged mostly to Gammaproteobacteria. Their gene descriptions corresponded to one of the alternative names of CysQ protein or a member protein of the inositol monophosphatase family, except for unclassified proteins and hypothetical proteins. The best bacterial hit had an identity of 40% and a bit score of 181 bits for the query sequence of 341 amino acids. From the taxonomy report of the BLAST result, 11 organisms among 110 were eukaryotes, and the other 99 were bacteria. The bacterial list was composed of 59 Proteobacteria species, including 53 Gammaproteobacteria and 31 Bacteroidetes species.
Title	BLAST for sequence similarity
Section	Phylogenetic analyses While the BLAST analysis hinted HGT of the cysQ gene from bacteria to C. parvum, the hypothesis should be confirmed by phylogenetic analysis. A phylogenetic tree for CysQ protein was retrieved from PhylomeDB. We chose the Phy0018DKQ_ECOL5 tree made by the E. coli protein sequence as a seed and maximum likelihood method with the Jones-Taylor-Thornton (JTT) evolutionary model. The phylogenetic tree with 170 orthologs comprised three eukaryotes-C. parvum, Arabidopsis thaliana, and Oryza sativa-one Archaea, and 166 Bacteria species. In the tree, C. parvum was branched with Proteobacteria, while the plantal proteins were the outgroup of prokaryotic proteins. In OrthoMCL (http://orthomcl.org), CysQ of C. parvum was located within the inositol monophosphatase family of Pfam (entry name OG5_129356) [26]. This ortholog group has only 70 orthologs from 54 different species and paralogs of Viridiplantae or T. vaginalis. Moreover, it included a larger portion of plants and fungi rather than bacteria, and no metazoan protein orthologs were included. Unlike PhylomeDB or OrthoMCL, the CDD of NCBI cataloged proteins sharing CysQ or related domains comprehensively. CysQ protein of C. parvum contains a CysQ domain (accession no. cd01638), which is one of the children of the Fig (FBPase/inositol monophosphatase [IMPase]/glpX-like domain) superfamily. The Fig superfamily is a metal-dependent phosphatase that organizes two subsets of direct children in the hierarchy of the superfamily: FBPase glpX domain (cd01516) and IMPase-like domain (cd01637). Cd01637 has 9 children domains: CysQ (cd01638), IMPase (cd01639), bacterial IMPaselike 1 (cd01641), bacterial IMPase-like 2 (cd01643), IPPase (cd10640), FBPase (cd00354), Arch FBPase 1 (cd 01515), Arch FBPase 2 (cd01642), and PAP phosphatase (cd10517). The whole hierarchy tree of the Fig superfamily comprises a total of 360 cellular organisms: 246 bacteria, 95 eukaryotes, and 19 Archaea (Fig. 1A). Some domains (cd01516, cd01637, cd01638, cd01641, and cd0643) comprise predominantly bacterial proteins in their CDTree, whereas the other domains have a combined composition (cd000354, cd0517, and cd01639) or a high level of Archaea (cd01642 and cd01515). Domains cd01638, cd01641, and cd01643 are bacterial members of the IMPase family. All of them show a high proportion of Proteobacteria, at about 65%, 50%, and 43% respectively. In cd01638, C. parvum CysQ protein is located within the monophyletic gram-negative subtree, ranging from Pseudomonas sringae, Gammaproteobacteria, to Campylobacter jejuni, Epsilonproteobacteria (Fig. 1B). On the other hand, the gram-negative subtree is paraphyletic, in that it has 27 branches of Proteobacteria and Aquificae, Cyanobacteria, and Bacteroidetes, respectively. Taken together, the phylogenetic analysis strongly supports the hypothesis that the cysQ gene of C. parvum may have been acquired from Proteobacteria by horizontal gene transfer.
Title	Phylogenetic analyses
Section	Orthologs on sulfate assimilation pathway CysQ protein participates in sulfate assimilation on sulfur metabolism. In Fig. 2, we show a simplified version of the KEGG pathway, classifying the enzymes into three groups, according to their direction and steps: Class I for EC 2.7.7.4 (CysN) and EC 2.7.7.5 (CysD); Class II for EC 2.7.1.25 (CysC); and Class III for EC 3.1.3.7 (CysQ). If CysQ of C. parvum is a true CysQ enzyme, playing a role in sulfate assimilation in the parasite, the other components of the pathway should be present in it. On the contrary, we could not identify such genes in the annotated gene list. The KEGG pathway did not list C. parvum proteins in the sulfate assimilation pathway. We looked for the C. parvum proteins by searching the genome sequence using TBLASTN with M. tuberculosis CysN (Rv1286) and CysD (Rv1285) and E. coli CysN (b2751), CysD (b2752), and CysC (b2750) proteins as queries. Among class I and II proteins, only CysN showed marginal matches to cgd6_3990 (29% and 33% identities, respectively) to M. tuberculosis and E. coli sequences. Interestingly, this C. parvum protein was reported as elongation factor 1 alpha, not a sulfate adenylyltransferase. This protein had high similarities to other protozoan or fungal elongation factor 1 alpha proteins. Thus, we consider this as a false hit. The class III protein, CysQ, matched to cgd2_1810 (24% and 36% identities, respectively, for M. tuberculosis and E. coli proteins). This C. parvum gene was annotated "CysQ, sulfite synthesis pathway protein." As no other components of the sulfate assimilation pathway, except for CysQ, are found in C. parvum, we may conclude that the pathway does not function in this organism. We compiled the orthologs of the genes in this pathway using the KO database (Table 1). Eukaryotic kingdoms, except for protists harbored full ranges of orthologs in all three classes. Animals and plants showed similar trends in Class I and II, because two classes shared two orthologs (K13811: 3'-phosphoadenosine 5'-phosphosulfate synthase [PAPSS], K00955: bifunctional enzyme CysN/CysC [CysNC]), and even K13811 is specialized in animals and plants. Fungi also have many orthologs, like animals and plants, in Class I and II, but they have different orthologs (Class I, K00958, sulfate adenylyltransferase [E2.7.7.4C, met3]; Class II, K00860, adenylylsulfatekinase [CysC]). In prokaryotes, the proportion of Class I genes is higher than Class II. All Cyanobacteria, two-thirds of Proteobacteria, and Actinobacteria contained one of the orthologs in Class I, whereas Firmicutes, other bacteria, and the Archaea group have a few orthologs in Class I, II, and III. On the other hand, there were very few orthologs of the sulfate assimilation pathway in protists. We expanded the protist lineage, cataloging the proteins at the class or species level (Table 2). Some protists (Choanoflagellates, Entamoeba of Amoebozoa, and Diatoms) had at least one orthologous gene in each of three classes, while most Alveolates, Amoeboflagellate, Euglenozoa, and Diplomonads did not have any orthologs in three classes, many of which are known as parasites causing infectious diseases. While the sulfate assimilation pathway is generally well conserved in both prokaryotes and eukaryotes, in some protist lineages, the pathway is missing. Thus, we hypothesize that the pathway may have been lost during the evolution of the lineages. C. parvum, like other Aveolates, also may have lost it, and cge2_1810 can not function as CysQ properly. Its function remains elusive, as the sequence similarity to CysQ of M. tuberculosis or E. coli is rather low.
Title	Orthologs on sulfate assimilation pathway
Section	Discussion A BLAST search of C. parvum CysQ (cgd2_1810) protein shows the highest similarity with those of proteobacteria. Although it implies HGT from bacteria to this eukaryote, sequence similarity alone is not enough as its basis for several known reasons [27]. In addition, a phylogenetic analysis should support it. For this, initially, we relied on the tree built by PhylomeDB. However, its species coverage was biased or undersampled. On the other hand, CDD of NCBI is well subdivided into kingdom and function groups. C. parvum CysQ protein was mapped into the subfamily of IMPase, which is a bacterial CysQ domain, and comprises only bacterial sequences. Furthermore, C. parvum was located near Gamma- and Alphaproteobacteria in the CDTree. Hence, it seems that these results demonstrate that gene transfer events occurred from bacteria to C. parvum in the evolutionary process. On the KEGG pathway, we raise the possibility that Alveolates, Euglenozoa, and Diplomonads of protozoa suffered from the losses of genes in the sulfate assimilation pathway. But, did C. parvum recover CysQ protein by HGT in the process of evolution? Sulfate assimilation shows highly conserved orthologs for each taxonomy lineage, and it plays important roles in sulfur metabolism, whereas Alveolates of protozoa, including Cryptosporidium, rarely have orthologous genes. For the pathogenic bacterium M. tuberculosis, sulfur-containing metabolites are essential to its pathogenesis and persistence in the host [18, 28], and the Database of Essential Genes (DEG) lists that Rv1286 (cysN) and Rv1285 (cysD), not Rv2131c (cysQ), are essential genes in sulfur metabolism of M. tuberculosis [29, 30]. Parasitic protozoa have diverse sulfur-containing amino acid metabolism that are considered to affect virulence and several stress response. On the other hand, C. parvum and Plasmodium falciparum lack a sulfur assimilation pathway, which is expected to be substituted from host cells [31]. In conclusion, although the sulfate assimilation pathway is missing in some protest lineages, C. parvum has a protein that is predicted as CysQ and has sequence similarity with that of proteobacteria, gram-negative bacteria. Moreover, the phylogenetic analysis supports the acquisition of cgd2_1810 from proteobacteria through horizontal gene transfer. Therefore, we can infer that C. parvum lost its genes in the sulfate assimilation pathway, including cysQ, during a parasitic way of life, and it acquired a copy of cysQ from bacteria by horizontal gene transfer. What is the biological role of this gene product? As the sole member, without other members, of the pathway, it can not assume the right role of CysQ. Its function is elusive at the moment.
Title	Discussion
Figure caption	Fig. 1 Conserved Domain Tree (CDTree) involving CysQ (cgd2_1810) protein. (A) Whole hierarchy of Fig superfamily. Branch colors indicate Eukaryota (pink), Bacteria (bright blue), and Archaea (orange) domains; Cryptosporidium parvum sequence is added (red branch and circle). On the right side, color bar marked with domain accession names displays subfamilies in Fig superfamily. All CDs except for cd01637 are well clustered. cd01637 is scattered around the tree and highlighted in black ticks. (B) CDTree of cd01638 domain. The cd01638 clade that includes only bacteria is classified by taxonomic lineage. Bacteria are subdivided into two distinct parts, gram-negative and gram-positive bacteria. Each bacterial phylum is colored as shown in legend. CysQ protein of C. parvum is shown in red dashed branch with a red circle near Proteobacteria.
Figure caption	Fig. 2 A simplified sulfate assimilation pathway. Inorganic sulfate is converted to adenosine 5'-phosphosulfate (APS) by ATP sulfurylase (EC 2.7.7.4 and/or EC 2.7.7.5, Class I). APS is phosphorylated by APS kinase (EC 2.7.1.25, Class II) to 3'-phosphoadenosine-5'-phosphosulfate (PAPS). PAPS is either transferred to 3'-phosphoadenosine-5'-phosphate (PAP) or reconverted to APS by 3'-phosphoadenosine-5'-phosphatase (EC 3.1.3.7, Class III).
Table caption	Table 1 Class comparison of ortholog conservation in cellular organismsa Values of conservation state are presented as percentage. aEach class (see the caption of Fig. 2 for abbreviations of classes) consists of several orthologs and it follows the reference pathway of KEGG Ortholog (KO); b+ symbol denotes the maximum percentage of KO that have orthologs in each Class: 0% < (+) ≤ 25%, 25% < (++) ≤ 50%, 50% < (+++) ≤ 75%, 75% < (++++) ≤ 100%; cMaximum percentage of KO; dAverage percentage of participating KOs in the pathway.
Table caption	Table 2 Protist orthologs involved in sulfate assimilation pathway aAbbreviations of ortholog are as follows: A (K13811), B (K00955), C (K00956), D (K00957), E (K00958), F (K00959), G (K00960), H (K01082), I (K15759), J (K15422).

Annnotations

blinded

PMC:3475487 JSONTXT 6 Projects

Document structure show

Annnotations

PMC:3475487 JSON TXT 6 Projects