Phylogenetic analyses While the BLAST analysis hinted HGT of the cysQ gene from bacteria to C. parvum, the hypothesis should be confirmed by phylogenetic analysis. A phylogenetic tree for CysQ protein was retrieved from PhylomeDB. We chose the Phy0018DKQ_ECOL5 tree made by the E. coli protein sequence as a seed and maximum likelihood method with the Jones-Taylor-Thornton (JTT) evolutionary model. The phylogenetic tree with 170 orthologs comprised three eukaryotes-C. parvum, Arabidopsis thaliana, and Oryza sativa-one Archaea, and 166 Bacteria species. In the tree, C. parvum was branched with Proteobacteria, while the plantal proteins were the outgroup of prokaryotic proteins. In OrthoMCL (http://orthomcl.org), CysQ of C. parvum was located within the inositol monophosphatase family of Pfam (entry name OG5_129356) [26]. This ortholog group has only 70 orthologs from 54 different species and paralogs of Viridiplantae or T. vaginalis. Moreover, it included a larger portion of plants and fungi rather than bacteria, and no metazoan protein orthologs were included. Unlike PhylomeDB or OrthoMCL, the CDD of NCBI cataloged proteins sharing CysQ or related domains comprehensively. CysQ protein of C. parvum contains a CysQ domain (accession no. cd01638), which is one of the children of the Fig (FBPase/inositol monophosphatase [IMPase]/glpX-like domain) superfamily. The Fig superfamily is a metal-dependent phosphatase that organizes two subsets of direct children in the hierarchy of the superfamily: FBPase glpX domain (cd01516) and IMPase-like domain (cd01637). Cd01637 has 9 children domains: CysQ (cd01638), IMPase (cd01639), bacterial IMPaselike 1 (cd01641), bacterial IMPase-like 2 (cd01643), IPPase (cd10640), FBPase (cd00354), Arch FBPase 1 (cd 01515), Arch FBPase 2 (cd01642), and PAP phosphatase (cd10517). The whole hierarchy tree of the Fig superfamily comprises a total of 360 cellular organisms: 246 bacteria, 95 eukaryotes, and 19 Archaea (Fig. 1A). Some domains (cd01516, cd01637, cd01638, cd01641, and cd0643) comprise predominantly bacterial proteins in their CDTree, whereas the other domains have a combined composition (cd000354, cd0517, and cd01639) or a high level of Archaea (cd01642 and cd01515). Domains cd01638, cd01641, and cd01643 are bacterial members of the IMPase family. All of them show a high proportion of Proteobacteria, at about 65%, 50%, and 43% respectively. In cd01638, C. parvum CysQ protein is located within the monophyletic gram-negative subtree, ranging from Pseudomonas sringae, Gammaproteobacteria, to Campylobacter jejuni, Epsilonproteobacteria (Fig. 1B). On the other hand, the gram-negative subtree is paraphyletic, in that it has 27 branches of Proteobacteria and Aquificae, Cyanobacteria, and Bacteroidetes, respectively. Taken together, the phylogenetic analysis strongly supports the hypothesis that the cysQ gene of C. parvum may have been acquired from Proteobacteria by horizontal gene transfer.