Results BLAST for sequence similarity The BLAST run using the query sequence of Cryptosporidium parvum reported the proteins of the genus Cryptosporidium as the best hits, followed by those of the order Eucoccidiorida, the genus Cryptosporidium belongs to. The next best hits belonged mostly to Gammaproteobacteria. Their gene descriptions corresponded to one of the alternative names of CysQ protein or a member protein of the inositol monophosphatase family, except for unclassified proteins and hypothetical proteins. The best bacterial hit had an identity of 40% and a bit score of 181 bits for the query sequence of 341 amino acids. From the taxonomy report of the BLAST result, 11 organisms among 110 were eukaryotes, and the other 99 were bacteria. The bacterial list was composed of 59 Proteobacteria species, including 53 Gammaproteobacteria and 31 Bacteroidetes species. Phylogenetic analyses While the BLAST analysis hinted HGT of the cysQ gene from bacteria to C. parvum, the hypothesis should be confirmed by phylogenetic analysis. A phylogenetic tree for CysQ protein was retrieved from PhylomeDB. We chose the Phy0018DKQ_ECOL5 tree made by the E. coli protein sequence as a seed and maximum likelihood method with the Jones-Taylor-Thornton (JTT) evolutionary model. The phylogenetic tree with 170 orthologs comprised three eukaryotes-C. parvum, Arabidopsis thaliana, and Oryza sativa-one Archaea, and 166 Bacteria species. In the tree, C. parvum was branched with Proteobacteria, while the plantal proteins were the outgroup of prokaryotic proteins. In OrthoMCL (http://orthomcl.org), CysQ of C. parvum was located within the inositol monophosphatase family of Pfam (entry name OG5_129356) [26]. This ortholog group has only 70 orthologs from 54 different species and paralogs of Viridiplantae or T. vaginalis. Moreover, it included a larger portion of plants and fungi rather than bacteria, and no metazoan protein orthologs were included. Unlike PhylomeDB or OrthoMCL, the CDD of NCBI cataloged proteins sharing CysQ or related domains comprehensively. CysQ protein of C. parvum contains a CysQ domain (accession no. cd01638), which is one of the children of the Fig (FBPase/inositol monophosphatase [IMPase]/glpX-like domain) superfamily. The Fig superfamily is a metal-dependent phosphatase that organizes two subsets of direct children in the hierarchy of the superfamily: FBPase glpX domain (cd01516) and IMPase-like domain (cd01637). Cd01637 has 9 children domains: CysQ (cd01638), IMPase (cd01639), bacterial IMPaselike 1 (cd01641), bacterial IMPase-like 2 (cd01643), IPPase (cd10640), FBPase (cd00354), Arch FBPase 1 (cd 01515), Arch FBPase 2 (cd01642), and PAP phosphatase (cd10517). The whole hierarchy tree of the Fig superfamily comprises a total of 360 cellular organisms: 246 bacteria, 95 eukaryotes, and 19 Archaea (Fig. 1A). Some domains (cd01516, cd01637, cd01638, cd01641, and cd0643) comprise predominantly bacterial proteins in their CDTree, whereas the other domains have a combined composition (cd000354, cd0517, and cd01639) or a high level of Archaea (cd01642 and cd01515). Domains cd01638, cd01641, and cd01643 are bacterial members of the IMPase family. All of them show a high proportion of Proteobacteria, at about 65%, 50%, and 43% respectively. In cd01638, C. parvum CysQ protein is located within the monophyletic gram-negative subtree, ranging from Pseudomonas sringae, Gammaproteobacteria, to Campylobacter jejuni, Epsilonproteobacteria (Fig. 1B). On the other hand, the gram-negative subtree is paraphyletic, in that it has 27 branches of Proteobacteria and Aquificae, Cyanobacteria, and Bacteroidetes, respectively. Taken together, the phylogenetic analysis strongly supports the hypothesis that the cysQ gene of C. parvum may have been acquired from Proteobacteria by horizontal gene transfer. Orthologs on sulfate assimilation pathway CysQ protein participates in sulfate assimilation on sulfur metabolism. In Fig. 2, we show a simplified version of the KEGG pathway, classifying the enzymes into three groups, according to their direction and steps: Class I for EC 2.7.7.4 (CysN) and EC 2.7.7.5 (CysD); Class II for EC 2.7.1.25 (CysC); and Class III for EC 3.1.3.7 (CysQ). If CysQ of C. parvum is a true CysQ enzyme, playing a role in sulfate assimilation in the parasite, the other components of the pathway should be present in it. On the contrary, we could not identify such genes in the annotated gene list. The KEGG pathway did not list C. parvum proteins in the sulfate assimilation pathway. We looked for the C. parvum proteins by searching the genome sequence using TBLASTN with M. tuberculosis CysN (Rv1286) and CysD (Rv1285) and E. coli CysN (b2751), CysD (b2752), and CysC (b2750) proteins as queries. Among class I and II proteins, only CysN showed marginal matches to cgd6_3990 (29% and 33% identities, respectively) to M. tuberculosis and E. coli sequences. Interestingly, this C. parvum protein was reported as elongation factor 1 alpha, not a sulfate adenylyltransferase. This protein had high similarities to other protozoan or fungal elongation factor 1 alpha proteins. Thus, we consider this as a false hit. The class III protein, CysQ, matched to cgd2_1810 (24% and 36% identities, respectively, for M. tuberculosis and E. coli proteins). This C. parvum gene was annotated "CysQ, sulfite synthesis pathway protein." As no other components of the sulfate assimilation pathway, except for CysQ, are found in C. parvum, we may conclude that the pathway does not function in this organism. We compiled the orthologs of the genes in this pathway using the KO database (Table 1). Eukaryotic kingdoms, except for protists harbored full ranges of orthologs in all three classes. Animals and plants showed similar trends in Class I and II, because two classes shared two orthologs (K13811: 3'-phosphoadenosine 5'-phosphosulfate synthase [PAPSS], K00955: bifunctional enzyme CysN/CysC [CysNC]), and even K13811 is specialized in animals and plants. Fungi also have many orthologs, like animals and plants, in Class I and II, but they have different orthologs (Class I, K00958, sulfate adenylyltransferase [E2.7.7.4C, met3]; Class II, K00860, adenylylsulfatekinase [CysC]). In prokaryotes, the proportion of Class I genes is higher than Class II. All Cyanobacteria, two-thirds of Proteobacteria, and Actinobacteria contained one of the orthologs in Class I, whereas Firmicutes, other bacteria, and the Archaea group have a few orthologs in Class I, II, and III. On the other hand, there were very few orthologs of the sulfate assimilation pathway in protists. We expanded the protist lineage, cataloging the proteins at the class or species level (Table 2). Some protists (Choanoflagellates, Entamoeba of Amoebozoa, and Diatoms) had at least one orthologous gene in each of three classes, while most Alveolates, Amoeboflagellate, Euglenozoa, and Diplomonads did not have any orthologs in three classes, many of which are known as parasites causing infectious diseases. While the sulfate assimilation pathway is generally well conserved in both prokaryotes and eukaryotes, in some protist lineages, the pathway is missing. Thus, we hypothesize that the pathway may have been lost during the evolution of the lineages. C. parvum, like other Aveolates, also may have lost it, and cge2_1810 can not function as CysQ properly. Its function remains elusive, as the sequence similarity to CysQ of M. tuberculosis or E. coli is rather low.