> top > docs > PMC:3082918 > spans > 0-128

PMC:3082918 / 0-128 JSONTXT

Insights into the evolution of Archaea and eukaryotic protein modifier systems revealed by the genome of a novel archaeal group Abstract The domain Archaea has historically been divided into two phyla, the Crenarchaeota and Euryarchaeota. Although regarded as members of the Crenarchaeota based on small subunit rRNA phylogeny, environmental genomics and efforts for cultivation have recently revealed two novel phyla/divisions in the Archaea; the ‘Thaumarchaeota’ and ‘Korarchaeota’. Here, we show the genome sequence of Candidatus ‘Caldiarchaeum subterraneum’ that represents an uncultivated crenarchaeotic group. A composite genome was reconstructed from a metagenomic library previously prepared from a microbial mat at a geothermal water stream of a sub-surface gold mine. The genome was found to be clearly distinct from those of the known phyla/divisions, Crenarchaeota (hyperthermophiles), Euryarchaeota, Thaumarchaeota and Korarchaeota. The unique traits suggest that this crenarchaeotic group can be considered as a novel archaeal phylum/division. Moreover, C. subterraneum harbors an ubiquitin-like protein modifier system consisting of Ub, E1, E2 and small Zn RING finger family protein with structural motifs specific to eukaryotic system proteins, a system clearly distinct from the prokaryote-type system recently identified in Haloferax and Mycobacterium. The presence of such a eukaryote-type system is unprecedented in prokaryotes, and indicates that a prototype of the eukaryotic protein modifier system is present in the Archaea. INTRODUCTION The Archaea have long been presumed to consist of two phyla, the Crenarchaeota and Euryarchaeota. However, it has been established that diverse uncultivated lineages of Archaea inhabit every niche on this planet (1). Recent metagenomic analyses have revealed that two previously uncultivated Archaea, the group I marine crenarchaeote Candidatus (Ca.) ‘Cenarchaeum symbiosum’ and the hyperthermophilic deeply branching Ca. ‘Korarchaeum cryptofilum’, harbor both Crenarchaeota- and Euryarchaeota-specific genomic traits (2–5). Based on their unique phylogenetic positions and distinct genomic features, it has been proposed that C. symbiosum represents a novel phylum/division ‘Thaumarchaeota’ (4). The unique genomic features of K. cryptofilum also support the proposal of ‘Korarchaeota’ whose phylogenetic position had been discussed only based on SSU rRNA gene phylogenetic analysis (5). The proposal of ‘Thaumarchaeota’ has further been supported by the genome sequences of the marine archaeon Ca. ‘Nitrosopumilus maritimus’ and the moderately thermophilic archaeon Ca. ‘Nitrososphaera gargensis’ (6–9). On the other hand, the phylum ‘Nanoarchaeota’, represented by the obligate symbiont Ca. ‘Nanoarchaeum equitans’, has been proposed based on SSU rRNA gene phylogeny (10), but a later study using its genomic information suggested that the archaeal group is a fast evolving group within the Euryarchaeota (11). Proteasome-mediated protein degradation coupled with protein modification with ubiquitin (Ub) is one of the hallmarks of eukaryotes (12). In eukaryotes, proteasome-mediated proteolysis is regulated by the Ub system, which is responsible for the conjugation of Ub to target proteins via the function of Ub-activating (E1), Ub-conjugating (E2) and Ub-protein ligating (E3) enzymes (12). Ub, E1 and E2 are members of distinct protein superfamilies that include structurally related proteins termed Ub-like (Ubl), E1-like (E1l) and E2-like (E2l) proteins, respectively. Although only distantly related to their eukaryotic counterparts, Ubl, E1l and E2l proteins are present in prokaryotes (13–15). For simplicity, based on primary structure, we will refer to these proteins as the ‘prokaryote-type’ Ubl, E1l and E2l proteins. In prokaryotes, some of the prokaryote-type Ubls and E1ls are responsible for sulfur incorporation in the biosynthesis of thiamine, molybdenum/tungstate cofactors and siderophores, while functions of other prokaryote-type proteins remain obscure (13,15). Recently, two proteasome-mediated proteolysis systems utilizing prokaryote-type proteins have been identified; the prokaryotic Ub-like protein (Pup)-proteasome system in Mycobacterium tuberculosis and the Ub-like small archaeal modifier proteins (SAMPs)-proteasome system in the halophilic archaeon Haloferax volcanii (16–18). In the Haloferax system, two prokaryote-type Ubls of the ThiS/MoaD family, which generally had been presumed to contribute in thiamine and molybdenum/tungstate cofactor biosynthesis together with prokaryote-type E1ls, have been shown to be involved in protein degradation via protein conjugation in the absence of E2/E3 homologs (16,18). These studies provided the first evidence that Ub–proteasome protein degradation occurs in Archaea and Bacteria. As these systems utilize prokaryote-type components, it is of increasing interest whether the origin of the eukaryote-type system resides in the prokaryotes. The Hot Water Crenarchaeotic Group I (HWCGI) comprises putative thermophiles that have been detected in high-temperature environments such as terrestrial surface and subsurface hot springs, and deep sea hydrothermal environments, but have not yet been cultivated (7,19–22). The phylogroup is known to occupy a relatively deep position within crenarchaeotic lineages but distinct from hyperthermophilic Crenarchaeota or Thaumarchaeota in SSU rRNA gene phylogenetic analyses (7,21,22). From a geothermal water stream in a subsurface gold mine, we previously found unusual mat formation dominated by uncultured crenarchaeotic lineages including members of HWCGI, and constructed a metagenomic library to elucidate the physiology and genomic traits of these crenarchaeotes (21). Here, we present a composite genome sequence of a member of HWCGI, Ca. ‘Caldiarchaeum subterraneum’, from the metagenomic library, and its unique genomic features that are distinct from previously reported archaeal genomes. In particular, the genome has revealed the presence of a eukaryote-type protein modifier system, a trait that had been believed to be inherent in Eucarya. The C. subterraneum genome harbors unique features that are distinct from previously reported archaeal genomes. The genome set provides clear insight into the biology of the novel deeply branching crenarchaeotic lineage, as well as the evolution of Archaea especially in the lineages which include the HWCGI, hyperthermophilic Crenarchaeota, Thaumarchaeota and Korarchaeota. MATERIALS AND METHODS Sampling, sample preparation and fosmid library construction Sampling, DNA isolation and fosmid library construction have been previously described (21). The microbial mat community, in which HWCGI dominated, was taken from a geothermal water stream located at a depth of 320 m from the ground surface from a subsurface mine in Japan. High-molecular DNA up to 50 kb was extracted from microbial mat formation, and fosmid library using pCC1FOS (EPICENTRE, Madison, WI, USA) vector was constructed. Resulting totally 5280 fosmid clones were stored as glycerol stock in 96-well microtiter dishes at −80°C. Screening for archaeal genome fragments encoding SSU rRNA gene Genome fragments encoding archaeal SSU rRNA genes in the metagenomic library were reexamined by dot-blot hybridization with a digoxigenin-labeled DNA probe and anti digoxigenin antibody coupled to alkaline phosphatase using a DNA labeling and detection kit (Roche, Basel, Switzerland). SSU rRNA genes amplified from the genome fragments 10-H-8 (HWCGI (C. subterraneum); AB201309) and 45-H-12 [HWCGIII (Nitrosocaldus sp.); AB201308] obtained previously (7,21) were used as DNA probes. Archaeal SSU rRNA genes in the fosmids acquired by the dot-blot hybridization were amplified by PCR using primers A21F and U1492R (23,24) and directly sequenced from both strands. Sequencing and enrichments of archaeal genome fragments, and annotation All fosmid clones in the metagenomic library were extracted from E. coli culture, and paired-end sequences of each cloned genomic fragment were sequenced using Big Dye ver. 3.1 sequencing kit (Applied Biosystems, Foster City, CA, USA) in accordance with the manufacturer’s recommendations by an ABI3730 DNA sequencer (Applied Biosystems). The end-sequences from cloned genomic fragments were analyzed by BLAST algorithm targeted to NCBI/EMBL/DDBJ database. On the other hand, as a part of metagenomic assessment for the whole microbial community (Takami et al., unpublished data), 151 fosmid clones; 15 clones encoding SSU rRNA gene and 136 clones were randomly selected and sequenced by the whole-genome random-sequencing method described previously using ABI 3730 and the MegaBase 1000 (GE Healthcare, Piscataway, NJ, USA) (25,26). Fifty-two fosmid clones encoding putative archaeal genome fragments were grouped into four individual pools containing equal weight of 13 fosmids. Each fosmid pool was analyzed in a half plate of the 454 DNA Genome Sequencer 20 (GS20) (Roche) at Takara Bio Inc. (Otsu, Japan). Large contigs obtained by 454 pyrosequencing were analyzed using BLAST algorithm targeted to genomic fragments encoding archaeal SSU rRNA genes reported previously (21), complete sequences of 151 fosmid clones analyzed by Sanger method (Takami et al., unpublished data) and end-sequences of the genome fragments in the metagenomic library. Based on the homology search using BLAST, large scaffolds containing large contigs from 454 sequencing, complete fosmid clone sequences and fosmid-end sequences were manually constructed. In the second round of 454 sequencing, a total of 80 fosmids involving genome fragments extending previously sequenced regions and putative archaeal genome fragments were separated into four groups each containing 20 fosmids. The 20 fosmids in each group were analyzed in a half plate of the 454 GS20. Large contigs obtained from a total of four runs of GS20 were analyzed by BLAST targeting fosmid sequences analyzed by Sanger sequencing and fosmid end-sequences from the metagenomic library. A single large scaffold was manually constructed. Gap-regions in the scaffold were amplified by PCR with appropriate fosmids as templates, and the amplified fragments were analyzed using an ABI 3130xl DNA sequencer. Assembly in overlapping regions and gap regions was accomplished with Sequencher ver. 4.7 software (Gene Codes Corp, Ann Arbor, MI, USA). Finally, the large circular scaffold was constructed by the fosmid clone 10-H-8 (AB201309) reported previously (21), and JFF001_H02 (AP011633), JFF004_H08 (AP011650), JFF011_H10 (AP011675), JFF016_D08 (AP011689), JFF022_F09 (AP011708), JFF029_E04 (AP011723), JFF029_F10 (AP011724), JFF030_F06 (AP011727), JFF037_B02 (AP011745), JFF040_C01 (AP011751), JFF055_C09 (AP011796) analyzed by Sanger method (Takami et al., unpublished data), and JFF001_G10 (AP011862), JFF002_G05 (AP011850), JFF004_B03 (AP011868), JFF005_B08 (AP011872), JFF008_E07 (AP011864), JFF009_A08 (AP011867), JFF009_F01 (AP011875), JFF009_F10 (AP011844), JFF011_A11 (AP011858, AP011859), JFF012_C01 (AP011870), JFF013_A09 (AP011845), JFF015_C06 (AP011842), JFF015_C07 (AP011830), JFF015_E11 (AP011831), JFF017_C01 (AP011851), JFF021_E09 (AP011873), JFF021_G03 (AP011856), JFF022_C07 (AP011838), JFF025_E12 (AP011827), JFF027_H06 (AP011834), JFF028_A01 (AP011854), JFF028_A10 (AP011876), JFF028_E01 (AP011852), JFF029_A12 (AP011865), JFF029_F08 (AP011836), JFF030_C12 (AP011869), JFF030_H11 (AP011855), JFF031_B05 (AP011861), JFF032_D08 (AP011843), JFF033_A05 (AP011857), JFF033_F07 (AP011840), JFF033_G03 (AP011849), JFF034_A01 (AP011853), JFF035_A09 (AP011828), JFF035_E02 (AP011848), JFF036_A12 (AP011839), JFF036_E03 (AP011833), JFF036_H04 (AP011837), JFF039_F10 (AP011846), JFF040_F12 (AP011871), JFF042_C08 (AP011829), JFF049_D05 (AP011863), JFF050_B05 (AP011866), JFF051_A09 (AP011832), JFF051_C10 (AP011826), JFF052_D03 (AP011874), JFF052_E01 (AP011841), JFF052_H05 (AP011847), JFF053_A03 (AP011860) and JFF055_E04 (AP011835) analyzed by the GS20 in this study. Numbers in parentheses following each fosmid clone are accession numbers in DDBJ/EMBL/GenBank database. The predicted ORFs were initially defined by Glimmer program (http://www.cbcb.umd.edu/software/glimmer/), and putative functions for predicted ORFs were identified by comparing against all non-redundant (NR) sequences deposited in the NCBI database using BLASTP (27). Truncated ORFs and frame shifts found in the initial BLASTP search were confirmed by re-sequencing by the Sanger method. Clusters of Orthologous Groups (COGs) (28), archaeal Clusters of Orthologous Groups (arCOGs) (29) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) (30) databases were used for further functional information. For the comparison of genome core genes, publically available archaeal genome sequences in the arCOG database were used, and arCOGs in K. cryptofilum were referred to from Elkins et al. (5). Assignments of arCOGs for C. subterraneum and N. maritimus were performed under the following condition; the BLAST E-value threshold was set at 10−3, and the homologous region covers >70% of the hit sequences in arCOGs. Proteins that were putatively separated or fused compared to those in the databaes were manually concatenated or divided, and reexamined. Forty-six tRNA genes were identified by using tRNAscan-SE (31) with Archaea-specific search mode and SPLITSX (32) with the following parameters: –p 0.55 –f 0 –h 3. Clusters of regularly interspaced repeats (CRISPR) were identified using the CRISPR Finder (33). Phylogenetic analyses The small and large subunit rRNA gene alignments were constructed by ARB software (34). Then, concatenated alignments were constructed using only unambiguously aligned region for phylogenetic analysis. The maximum likelihood tree was computed by using the program package PhyML with HKY85 (35). The support values for the internal nodes were estimated from 100 bootstrap replicates. Protein sequences; RNAP subunits, ribosomal proteins, D-type DNA polymerase (DNAP) small and large subunits and elongation factor II (EFII) were aligned by using CLUSTAL W 1.8 program (36), and ambiguous regions were automatically trimmed according to Gblocks (37,38). Two concatenated alignments were constructed for the phylogenetic analyses of ribosomal proteins (L10, L10e, L11, L13, L14, L15, L15e, L18e, L19e, L2, L22, L3, L30, L44e, L4e, L5, L6, L7Ae, S10, S11, S13, S19, S19e, S2, S27e, S3, S3Ae, S4, S4e, S5, S6e, S7, S8, S8e, S9, S17, S17e, L1, L18, L24, L31e, L32e, S12, S15, L23) and RNAP subunits (RpoA′, RpoA′′, RpoB′, RpoB′′, RpoD, RpoE′, RpoH and RpoK), and concatenated (SSU+LSU) DNAP. Maximum likelihood trees were constructed using the program package RAxML with WAG+I+G (39). The support values for the internal nodes were estimated from 200 bootstrap replicates. Almost full length of ef2 sequence from the Nitrosocaldus sp. (HWCGIII) was obtained by PCR amplification from the DNA assemblage. A primer set (5′-AATNGCNCAYGTNGAYCAYGGMAARAC-3′, and 5′-GTCTCWGMTGCAGGTATCTC-3′) for the amplification of ef2 was constructed based on DNA alignments of ef2 from crenarchaeal lineages including partial ef2 sequence from the Nitrosocaldus sp. (HWCGIII) (31-F-01; GI 106364417) that were obtained from the metagenomic fosmid library used in this study. Alignments of Ub-like protein family, E1-like protein family, E2-like protein family and JAMM protease family shown in Figure 2 were constructed by ClustalX (40) and edited manually based on the previously reported secondary structures of each protein family (13–15,41–44). RESULTS Archaeal diversity within the metagenomic library As a result of dot blot hybridization and previous PCR screening, a total of 21 and three fosmids-encoding SSU rRNA genes of HWCGI and HWCGIII (Ca. ‘Nitrosocaldus’ sp.; SSU rRNA gene similarity between ammonia oxidizing thaumarchaeon Ca. ‘Nitrosocaldus yellowstonii’ (21) and the HWCGIII sequences in the metagenomic library [AB201308] was 95%) lineages, respectively, were obtained from the metagenomic library. Among the 21 fosmids-harboring HWCGI SSU rRNA genes, 19 SSU rRNA gene sequences belonged to ribotype I represented by the SSU rRNA gene included in the fosmid clone 10-H-08, while the other two sequences constituted another single ribotype. Here, we named the predominant HWCGI archaeon represented by the 10-H-08 SSU rRNA gene ribotype as Ca. ‘C. subterraneum’ (Caldiarchaeum type I) (‘calidus’ and ‘subterraneum’ meaning hot and underground, respectively) and the other minor HWCGI population as ‘Caldiarchaeum type II’. Similarity between the two ribotypes of Caldiarchaeum SSU rRNA gene sequences was 96.6%. Sixteen of the C. subterraneum SSU rRNA genes, each harbored two introns. Three orthologous sequences with 99% similarity were observed among the 16 sequences of the first intron, while five sequences with 95–99% similarity were found for the second intron. No diversity was present among all exon SSU rRNA gene sequences in the C. subterraneum SSU rRNA. Reconstruction of a composite genome In order to investigate the genomic properties of the metagenomic library, paired- or one-end sequences of the genome fragments were obtained from 3375 fosmid clones, and 151 fosmids (136 randomly selected fosmids and 15 fosmids encoding SSU rRNA gene) were analyzed by Sanger method (Takami et al., unpublished data). Among a total 5965 end-sequences from these cloned fragments, 883 end-sequences (∼13.5 % of total end-sequences) displayed highest similarity with sequences derived from Archaea. Among these ‘archaeal’ sequences, fosmids were selected for 454 sequencing based on the following two criteria: (i) the presence of paired-ends sequences predicted to encode open reading frames (ORFs) most similar to archaeal sequences; or (ii) the presence of ORFs in either end encoding homologues of archaeal translation, transcription or replication genes. Large contigs obtained by initial 454 sequencing of the 52 fosmids were manually assembled with the sequences from the 151 fosmids described above, two genome fragments-encoding archaeal SSU rRNA genes obtained previously (21) and the end-sequences of all fosmids, followed by a BLAST search. In this step, a scaffold of >1 Mb including the C. subterraneum SSU rRNA gene was assembled, but we did not find a large scaffold with other archaeal SSU rRNA genes. For the second round of 454 sequencing, 80 fosmids that met the following criteria were further analyzed: (i) linkage with the scaffold including the C. subterraneum SSU rRNA gene sequence; (ii) presence of paired-ends predicted to encode ORFs most similar to archaeal sequences; and (iii) presence of ORFs in either end showing high similarity with archaeal sequences. After the second 454 sequencing, large contigs obtained from 454 sequencing, fosmids analyzed by Sanger method and end-sequences were manually assembled and subjected to BLAST search. As a result, a circular scaffold including complete sequences of 12 fosmid clones analyzed by Sanger sequencing was obtained. The similarities of overlapping regions were generally >99%. Afterwards, gap-regions were obtained by PCR with appropriate fosmid clones as templates, and the amplified fragments were sequenced by Sanger method. Finally, a composite circular genome sequence of C. subterraneum (1 680 938 bp) was assembled from a set of 62 complete or partial fosmid sequences (Figure 1). We also obtained 28 complete or partial fosmid sequences derived from C. subterraneum, and 10 of them completely overlapped with the composite circular genome. However, 18 sequences harbored distinct insertion (a total of 68 kb)/deletion regions compared to the composite circular genome, or consisted of two genomic regions distantly located on the composite circular genome. The similarities of these regions with the circular genome were >99%. The genomic heterogeneity is likely the result of recombination or rearrangement within a species because we could not obtain any evidence of inter-species genomic recombination in the distinct insertion regions. Figure 1. Circular representation of the C. subterraneum composite genome. From the inside, the first and second circles show the GC skew (values >0 or <0 are indicated in green and pink, respectively) and the G+C percent content (values greater or smaller than the average percentage in the overall chromosome are shown in blue and sky blue, respectively) in a 10-kb window with 100-bp step, respectively. The third and fourth circles show the presence of RNAs (rRNA and tRNA); CDSs aligned in the clock-wise and counterclock-wise directions are indicated in the upper and lower sides of the circle, respectively. Colors of CDSs indicate their functional categories; red for information storage and processing, green for metabolism, blue for cellular processes and signaling, and gray for poorly characterized function. General features The G + C content of the genome from C. subterraneum is 51.6%. A single rRNA gene set is identified but rRNA genes do not form an operon structure in the composite genome. Forty-five tRNAs were identified. A total of 1730 predicted ORFs were detected. Among these, 1054 of the predicted protein-encoding sequences (CDSs) could be assigned a function, 352 of the CDSs could be identified as hypothetical conserved proteins and the remaining 324 CDSs did not show significant similarity to any of the amino acid sequences in the protein databases (Supplementary Table S1). Mobile genes The genome contains three genes encoding transposases of the IS6 family and one of the IS4 family. Both of these transposase families, originally found in Bacteria, are distributed only in the Euryarchaeota and not in the Crenarchaeota within the archaeal domain (45). Four clustered regularly interspaced short palindromic repeats (CRISPR) and one CRISPR-related gene cluster, presumed to provide resistance against virus infection, are present (46). The genome encodes one prophage-like gene cluster. DNA replication, repair, cell cycle Caldiarchaeum subterraneum carries three orc1/cdc6 orthologues and a single minichromosome maintenance protein. The genome encodes multiple DNA-dependent DNA polymerases including two family B type enzymes; the BII type found only in crenarchaeal lineages (47) and the inactivated type (48), and both the small and large subunits of a D-type enzyme (Table 1). Genes for the large and small subunits of replication factor C form a gene cluster. Single genes each encoding the small subunit of primase, sliding clamp (PCNA), ATP dependent ligase, RNase HII, flap endonuclease (FEN1) and ERCC4-like helicase, are present. Genes for one truncated and two complete large subunits of primase are found. Unlike the Hef protein found in P. furiosus that consists of ERCC4-like helicase (COG1111) and XPF protein domains (ERCC4-type nuclease), which is the case in most of the euryarchaeotes, both domains are located separately on the genome of C. subterraneum as observed in Thaumarchaeota and a minority of euryarchaeotes (8,49,50). The ERCC4-like helicase domain (COG1111) is absent from the genome of Korarchaeota (8). Both topoisomerase IA and IB were found in C. subterraneum as in the case of Thaumarchaeota (8) (Table 1). One reverse gyrase gene, which had been considered a genomic signature for hyperthermophiles, but now also detected in thermophiles, is observed (51–53). Genes for chromatin-associated proteins, two Alba and one histone, are present. The archaeon possesses genes for euryarchaeal chromosome segregation proteins including SMC family ATPase, chromosome segregation and condensation protein B and kleisin family Rec8/ScpA/Scc1-like protein (chromosome segregation and condensation protein A) in a single, operon-like structure. The genome harbors one gene for the cell division protein FtsZ. Among the newly identified crenarchaeal cell division proteins CdvA, CdvB and CdvC that have been identified in Thaumarchaeota and hyperthermophilic Crenarchaeota (with the exception of the Thermoproteales), CdvB and CdvC are present but a gene for CdvA is absent in C. subterraneum (8,54). Table 1. Distribution patterns of representative components for DNA replication/repair, cell division, translation and transcription among Crenarchaeota, Euryarchaeota, Thaumarchaeota, Korarchaeota and C. subterraneum +, present; −, absent. aCharacterization of DNA polymerase is based on Ref. (47). bOnly C-terminal domain is found in C. symbiosum and N. maritimus. cOnly found in Thermofilum pendens and Caldivirga maquilingensis. dFusion form is observed in Thermococcales and Thermoplasmatales. eOnly found in N. gargensis. The genome contains genes for double-strand-break repair, direct repair, base excision repair and nucleotide excision repair including photolyase and family Y DNA polymerase, which have previously been found only in Sulfolobales among the hyperthermophilic crenarchaeotes (55,56). However, XPB helicase for excision repair, mismatch detection proteins MutS and MutL, mismatch glycosylase MIG and bacterial nucleotide excision repair protein UvrABC are absent. Translation and transcription Forty-six tRNAs corresponding to all 61 sense codons and one initiator codon can be identified. Thirteen tRNAs are predicted to be intron-containing tRNAs and three out of the 13 harbor multiple introns (tRNALeu UAA, tRNAGln CUG, tRNAThr GGU). The introns are located not only at anticodon loop regions (canonical position) but also various non-canonical positions (D-arm, V-arm and T-arm), as observed in other crenarchaeal species (57,58). The BHB structure, a well-known motif of archaeal tRNA splicing, is found at exon–intron junctions of tRNA and the corresponding heterotetrameric splicing endonuclease can be identified. Aminoacyl tRNA synthetases for all of the amino acids are encoded in the genome except for the enzyme for glutaminyl tRNA synthesis, however, glutaminyl tRNA formation is likely dependent on heterodimeric glutamyl-tRNA amidotransferase (GatD and GatE). A selenocysteine incorporation system is lacking, resembling other genomes from crenarchaeal lineages (59). The archaeal DNA-dependent RNA polymerase in C. subterraneum lacks the orthologue of the eukaryotic subunit RPB8 found in the hyperthermophilic crenarchaeotes and Korarchaeota (5,60), and possesses all other subunits found in the Archaea. RpoA is not fragmented as in eukaryotes, and is similar to those of Thaumarchaeota and Korarchaeota (Table 1). An ortholog of the eukaryotic RNA polymerase III subunit RPC34 is also found in C. subterraneum as in the hyperthermophilic Crenarchaeota, Thaumarchaeota and some of the Euryarchaeota but not the Korarchaeota (61). Archaeal homologs related to transcriptional initiation such as transcription factor B (TFB), TATA-binding protein (TBP) and transcription factor E (TFE) are present. A complete set of 28 archaeal SSU ribosomal proteins are present, including S25e, S26e and S30e, that are absent in the Euryarchaeota (4,8,62) (Table 1). A total of 34 LSU ribosomal proteins are present. Although L39e is conserved in the Euryarchaeota and hyperthermophilic Crenarchaeota, L39e, along with L13e, L35ae, L38e, L41e and LXa (L20a/L18s), was not present on the genome. The absence of L13e is a euryarchaeal feature, and that of L35ae and LXa (L20a/L18a) is common to the Thaumarchaeota and Korarchaeota (4,5,8,62). The lack of L39e has also been noted in the Korarchaeota (4). We observed that L14e and L34e, which are not conserved in the Thaumarchaeota, are present on the C. subterraneum genome (Table 1). Energy metabolism The predicted gene set suggests the potential of chemolithotrophic growth in C. subterraneum using hydrogen or carbon monoxide as an electron donor, and oxygen, nitrate or nitrite as an electron acceptor. One Ni–Fe NADP-reducing hydrogenase and one potential aerobic type carbon monoxide dehydrogenase were detected. However, the hydrogenase is phylogenetically similar to those of heterotrophic organisms and potential aerobic type carbon monoxide dehydrogenase lacks biochemical evidence (21). In the respiratory chain, one set of complex II (succinate dehydrogenase), an incomplete complex I (NADH dehydrogenase), cytochrome b, rieske protein, heme-copper terminal oxidase, membrane-bound nitrate reductase and periplasmic nitrite reductase are each present. Genes for cytochrome b, rieske protein and potential cytochrome c are distributed separately on the genome. The subunit II of heme-copper terminal oxidase harbors copper-binding motif residues that are signatures of cytochrome c oxidase but not quinone oxidase (63). Central metabolism An almost complete Emden-Meyerhof pathway and complete tricarboxylic acid (TCA) cycle are present, but phosphofructokinase that is necessary for glycolysis is missing. ATP citrate lyase and its alternatives such as citryl-CoA synthetase and citryl-CoA lyase (64) are also lacking in the genome. Therefore, the reductive TCA cycle most likely does not function. Genes encoding enzymes for the Calvin–Benson cycle and reductive acetyl-CoA pathway are also not observed. Recently, two carbon assimilation pathways; the 3-hydroxypropionate/4-hydroxybutyrate cycle and the dicarboxylate/4-hydroxybutyrate cycle have been recognized in crenarchaeal lineages. The two cycles utilize distinct carbon dioxide/bicarbonate-fixing pathways to convert acetyl-CoA to succinyl-CoA, but share a common route in converting the succinyl-CoA to two acetyl-CoA molecules (65–69). Enzymes responsible for the conversion of acetyl-CoA and bicarbonate into succinyl-CoA in the 3-hydroxypropionate/4-hydroxybutyrate cycle, methylmalonyl-CoA epimerase, methylmalonyl-CoA mutase and biotin carboxylase and L-chain subunit of acetyl-CoA carboxylase (65,69), are not found in C. subterraneum . In contrast, all enzymes converting acetyl-CoA into succinyl-CoA by fixing carbon dioxide and bicarbonate in the dicarboxylate/4-hydroxybutyrate cycle are present. Intriguingly however, although all other enzymes necessary for the regeneration of acetyl-CoA from succinyl-CoA are present, the gene for 4-hydroxybutyryl-CoA dehydratase cannot be found on the genome. The organism does not have the non-oxidative pentose phosphate pathway that is required for standard pentose/nucleic acid biosynthesis. However, three alternative pathways that replace the non-oxidative pentose phosphate pathway can be identified; the ribulose monophosphate (RuMP) pathway that converts fructose 6-phosphate to ribulose 5-phosphate (70,71), the archaeal 2-deoxyribose 5-phophate aldolase (DERA) pathway that can produce deoxyribose 1-phosphate from glyceraldehyde 3-phosphate and acetaldehyde (72), and the 6-deoxy-5-ketofructose-1-phosphate (DEFP) pathway that supplies 3-dehydroquinate (73,74). Protein folding and heat shock proteins The genome possesses gene sets of heat shock proteins such as sHsp, Hsp60, Hsp70 and HtpX. Homologues of Hsp70 related proteins such as DnaJ, DnaK and GrpE have only been found in mesophilic euryarchaeotes and the Thaumarchaeota among the Archaea (75). Genes for NAC protein, prefoldin, FKBP-type peptidyl-prolyl cis-trans isomerase and thioredoxin are present, but those for Lon and Clp protease are absent. Ub-like protein modifier system Among the various unique traits of C. subterraneum, an unparalleled finding is the presence of a potential protein-degradation pathway consisting of a eukaryote-type Ub conjugation system associated with proteasome and AAA+ family ATPase. As mentioned above, the structural features of the components of this system clearly distinguishes this system from the prokaryote-type systems recently identified in the Archaea and Bacteria. In the H. volcanii SAMPs-proteasome system, two of five prokaryote-type Ubls (ThiS/MoaD) identified in this haloarchaeon have been shown to conjugate with proteins and function as SAMPs, and conjugation between the SAMPs and a prokaryote-type E1l (MoeB) has been observed (18). C. subterraneum possesses four prokaryote-type Ubl (ThiS/MoaD) genes (CSUB_C0702, CSUB_C1012, CSUB_C0525 and CSUB_C1603) along with a molybdenum cofactor/tungstate cofactor biosynthesis pathway including a single prokaryote-type E1l (MoeB) gene (CSUB_C1135). These genes may be involved in a prokaryote-type protein modifier system similar to that found in H. volcanii (Figure 2A). Interestingly however, two of the prokaryote-type Ubls (CSUB_C0702 and CSUB_C1603) in C. subterraneum have 89 and 12 additional residues following the C-terminal Gly-Gly motif, in contrast to most archaeal prokaryote-type Ubl (MoaD) sequences which terminate after the Gly-Gly sequence (13) (Figure 2A). Figure 2. Sequence alignments of Ub, E1, E2 (super-) and JAMM family proteins. (A) Sequence alignments of eukaryotic and archaeal Ub superfamily proteins; proteins from Saccharomyces cerevisiae; S.cere Smt3 (6320718) and S.cere Rpl40 (6322043), from human; Human sumo2 (54792071), Human sumo1 (54792065), Human NEDD8 (5453760) and Human Ufm1 (7705300), from Cyanidioschyzon merolae; C.mero smt3 (CME004C), C.mero ubl (CML042C) and C.mero Rps27 (CMN125C), from Tetrahymena thermophila; T.ther ubl1 (229594936) and T.ther ubl2 (118367859), from Cryptosporidium parvum; C. parv ubl1 (126654302), C.parv Rps27 (66357428) and C.parv ubl2 (66363058), from Giardia lamblia; G.lamb sumo (159114790), G.lamb Epl40 (159108136), G.lamb ub1 (159112981), G.lamb ub2 (159111413), from Trypanosoma brucei; T.bruc ub (72387960) and T.bruc ubl (72387818), from C. subterraneum; eukaryote-type Ubl (CSUB_C1474) and prokaryote-type Ubls (ThiS/MoaD) (CSUB_C0525, CSUB_C0702, CSUB_C1012, CSUB_C1603), from H. volcanii; SAMPs, HVO_0202 (302595884) and HVO_2619 (302595883), from Bacillus subtilis; B.sub ThiS (CAB13025), from Streptomyces avermitilis; S.aver ThiS (BAC73805), from Nitrosomonas europaea; N.euro ThiS (CAD84196), from Escherichia coli; E.coli MoaB (AAN79339), from Pyrococcus furiosus; P.furi MoaB (1VJK_A), from Methanosarcina acetivorans; M.acet MoaB (AAM05120), from Aromatoleum aromaticum; A.arom NrfH (CAI07579) and from Pseudomonas syringae; P.syri NrfH (AAY39230). Asterisks indicate the C-terminal Gly-Gly motif. (B) Sequence alignments of adenylation and catalytic cysteine domains in E1 superfamily proteins; proteins from human; Human E1L (23510338), Human sumoE1 (60594167), Human UBA1 (23510338), Human UBA2 (4885649), Human UBA3 (38045942), Human UBA5 (13376212), Human ATG7 (119584500) and Human MOCS3 (7657339), from Schizosaccharomyces pombe; S.pomb E1L (162312305) and S.pomb UBA3 (19113852), from S. cerevisiae; S.cere Aos1 (6325438), S.cere UBA1 (6322639), S.cere UBA2 (6320598), S.cere ATG7 (6321965), S.cere UBA4 (6321903) and S.cere YgdLl (6322825), from T. thermophila; T.ther E1L (118383519), T.ther E1B (118351055), T.ther UBA4 (118351953) and T.ther YgdLl (118400480), from Trypanosoma cruzi; T.cruz E1 (71411317), from Plasmodium yoelii; P. yoel UBA2 (82595829) and P.uoel MoeB (83315401), from Trichomonas vaginalis; T.vagi APG7 (123446747), from C. subterraneum; E1l (CSUB_C1476) and MoeB (CSUB_C1135), from H. volacanii; HVO_0558 (292654724), Cupriavidus metallidurans; C.meta ThiF (4039868), from Clostridium perfringens; C.perf (86559649), from Shewanella sp. ANA3; S.ANA3 (117676291), from Rhizobium etli; R.etli (86359719), from Anabaena variabilis; A.vari (ABA25158), from Polaromonas naphthalenivorans; P.naph (121605347), from Nostoc sp. PCC7120; Nostoc (BAB77147), from Xanthomonas axonopodis; X.axon MoeB (21242767), from E. coli; E.coli MoeB (1JW9_B) from C. symbiosum; C.symb ThiF (ABK78649), from P. furiosus; P.furi MoeB (18977661), from Geobacillus kaustophilus; G.kaus MoeBl (56419161), Desulfuromonas acetoxidans; D.acet ThiF (95930339), from Desulfovibrio desulfuricans; D.desu ThiF (78357502), from Bacteroides thetaiotaomicron; B.thet (29349047), from M. tuberculosis; M.tube Rv (15609475), from Cytophaga hutchinsonii; C.hutc (110639176), and from Bacillus thuringiensis; B.thur (110639176). Asterisks and plus indicate adenylation active sites and thiolating cysteine, respectively. Mg2+ chelating motifs (CxxC) are shown by octothorpes. (C) Alignment of E2 superfamily proteins; proteins from human; Human E2A (32967280), Human E2D (5454146), Human E2N (61175265), Human E2G1 (13489085), Human E2G2 (29893557), Human E2K (163660385), Human E2H (4507783), Human E2M (4507791), Human E2J2 (37577124), Human E2J (37577122) and Human Tsg101 (5454140), from Arabidopsis thaliana; A.thal E2I (15230881), A.thal E2C (18403097) and A.thal E2J (18401338), from Chlamydomonas reinhardtii; C.rein E2K (159463008), from C. merolae; C.mero E2D (CMB015C) and C.mero E2N (CMR010C), from Plasmodium falciparum; P.fal E2D (124805463), from S. cerevisiae; S.cere E2A (6321380), S.cere E2D (6319556), S.cere E2N (6320297), S.cere E2I (6320139), S.cere E2C (6324915), S.cere E2G2 (6323664), S.cere E2K (6320382), S.cere E2H (6579192), S.cere E2M (6323337) and S.cere E2J2 (6320947), from S. pombe; S.pomb E2G1 (6323664), from T. thermophila; T.ther E2M (118382495), from T. vaginalis; T.vagi E2M (123484378), from G. lamblia; G. lamb E2D (159111264), from C. subterraneum; CSUB_C1475, from Ruegeria sp; Rueger (22726448), from Arthrobacter sp.; Arthro (A0AW81), from E. coli; E.coli (37927532), from Syntrophus aciditrophicus; S.acid (85859492), from Rhodobacter sphaeroides; R.spha (77387013), from Clostridium perfringens; C.perf (86559649), from Dechloromonas aromatica; D.arom (71847775), from Anabaena variabilis; A.vari (75705484), from Bacteroides thetaiotaomicron; B.thet (29339960), from Synechocystis sp. PCC6803; Synech (38423903), from Burkholderia cepacia; B.cepa (A4JA91), and from Rhizobium sp. NGR234; Rhizob (2496664). Astetisk and octothorpes indicate catalytic cysteine residue and residues forming a conserved stabilizing contact in E2 from eukaryotes, respectively. Flap histidine and asparagine residues are shown by plus. Identical and similar amino acids are shaded in black and gray, respectively. (D) Sequence alignment of JAMM family proteins; proteins from human; Human COPS5 (12654695) and Human PSMD14 (5031981), from A. thaliana; A.thal CSN5A (15219970), from S. cerevisiae; S. cere RPN11 (14318526), from T. brucei; T.bruc RPN11 (18463065) and T.bruc SCN5 (72393165), from G. lamblia; G.lamb RPN11 (159114272), from S. pombe; S.pomb AMSHP (19115685), from C. subterraneum; CSUB_C1473, from Archaeoglobus flugidus; A.flugi JAB (11499780), from Pyrococcus horikoshii; P.hori JAB (3257912), from Pseudomonas aeruginosa; P.aeru JAB (15597298), from Pyrobaculum aerophilum; Py.aer JAB (18313041), from E. coli; E.coli RadC (15801143), from B. subtilis; B.subt RadC (16079856), from M. acetivorans; M.acet RadC (20090827), from Thermotoga maritima; T.mari RadC (15644305), from Aquifex aeolicus; A.aeol (2984019); from Deinococcus radiodurans; D.radi (15805429), from Pseudomonas putida; P.puti (84994017), from Salinibacter rubber; S.rubb (83814538), from M. tuberculosis; M.tube (13880984), from Nocardia farcinica; N.farc (54014564), from Wolinella succinogenes; W.succ, and from Geobacter metallireducens; G.meta. Asterisks indicate the JAMM motif residues. Identical and similar amino acids are shaded in black and gray, respectively. In addition to these homologues, the C. subterraneum genome harbors an operon-like gene cluster encoding homologues of eukaryote-type Ubl, E1l and E2l (CSUB_C1474, CSUB_C1476 and CSUB_C1475, respectively), suggesting the presence of an unprecedented eukaryote-type Ubl system (Figures 2 and 3). Furthermore, while an apparent homologue of E3 is absent in the genome, a gene for a small Zn finger protein (CSUB_C1477) containing a RING finger motif (C-X2-C-X11-C-X2-C-X4-H-X2-C-X10-C-X2-C) that mediates the Ub ligase activity of RING-type E3s (76) is also found in the same operon-like gene cluster (Figure 3). Moreover, a gene for RPN11-like protein (RPN11l) (CSUB_C1473), which is the homologue of eukaryotic 26 S proteasome regulatory subunit constituting a part of the proteasome lid sub-complex that catalyzes de-ubiquitination of captured substrates (77,78), is juxtaposed to the operon-like structure in the reverse strand (Figures 2D and 3). The Ubl, E1l and E2l harbor the key residues necessary for their respective functions, and are much more similar to their eukaryotic counterparts than to the prokaryote-type proteins (Figure 2). Ubl found in C. subterraneum shares >30–35% identity with the eukaryotic Ub-ribosomal fusion proteins and Ub B, and harbors the Gly–Gly motif found at the C-terminal region of eukaryotic Ub/Ubl (Figure 2A). As nine residues follow the Gly-Gly motif in the C. subterraneum Ubl, this suggests that this organism possesses a post-translational modification system, generally presumed to be a trait of the eukaryotic Ub/Ubl system (79). The C. subterraneum E1l retains the second-catalytic-cysteine domain involved in Ub-E1 interaction and the adenylation domains found in eukaryote-type E1s (UBA2, UBA3) (80,81) (Figure 2B). The significant eukaryote-type feature in the C. subterraneum E1l is the presence of two insertion helices (Asp197–Ser208 and Ile224–Leu239) between the Ub-E1 interaction domain and second Mg2+-chelating domain, which are found only in eukaryote-type E1s such as UBA1, UBA2, UBA3 and Aos1 (15) (Figure 2B). The JAMM (JAB1/MPN/Mov34 metalloenzyme) motif is a highly conserved motif found in various metal proteases from all three domains of life (82). The motif is known to be essential for the de-ubiquitination of captured substrate by RPN11 to facilitate their degradation, and is conserved in the RPN11l found in C. subterraneum (83). The C. subterraneum protein also possesses a C-terminal extension that forms sheet structures, which is a specific characteristic of the eukaryotic RPN11 proteins associated with the proteasome, and not found in archaeal and bacterial JAMM proteins (84) (Figure 2D). However the C. subterraneum protein seemingly lacks the central region of the eukaryotic RPN11, consisting of ∼55 residues and including one helix. Figure 3. The gene cluster of the Ub-like protein modifier system in C. subterraneum. CDSs without gene annotation encode hypothetical proteins. CDSs; rpn11l (CSUB_C1473), ubl (CSUB_C1474), e2l (CSUB_C1475), e1l (CSUB_C1476) and srfp (CSUB_C1477) encode eukaryotic RPN11, Ubl, E2l and E1l and small RING finger protein, respectively. Phylogenetic analyses In order to confirm the phylogenetic position of HWCGI, we used the genomic information of C. subterraneum along with those from other archaeal complete genome sequences and environmental genome fragments to perform phylogenetic analyses based on (i) concatenated SSU+LSU rRNA genes; (ii) concatenated ribosomal proteins and RNA polymerase subunits; and (iii) translation elongation factor 2 (EFII) (Figure 4). Taken together, all of these phylogenetic analyses demonstrate that C. subterraneum forms a robust cluster with the Thaumarchaeota, and is distinct from the hyperthermophilic Crenarchaeota. The Korarchaeota is placed in a deeply branching lineage with affinity to the crenarchaeal cluster in the trees of SSU+LSU rRNA genes and EFII, and occupies the deepest position of the Archaea in the tree based on concatenated r-proteins+RNAP subunits sequence. Most orders in the Euryarchaeota are sturdily recovered in all of these trees (Figure 4). The phylogenetic positions of C. subterraneum based on these multiple gene phylogenetic analyses are consistent with those suggested from previously reported phylogenetic trees including environmental SSU rRNA gene sequences (7,21,22; Supplementary Figure S1). The results appear to conflict with the deep branching of Thaumarchaeota as a sister group of all other Archaea, and the potential of a mesophilic last archaeal common ancestor (4,8,9). Figure 4. Phylogenetic analyses of Archaea including C. subterraneum. (A) Maximum likelihood phylogenetic tree of concatenated (SSU+LSU) rRNA genes using 3063 identical nucleotide positions. Bacterial sequences were used as out-group. Numbers indicate bootstrap values from 100 replications. (B) Maximum likelihood phylogenetic tree of concatenated universally conserved 45 ribosomal proteins and nine RNA polymerase subunits using aligned identical 5993 amino acid residues. Eukaryotic sequences were used as out-group. Numbers indicate bootstrap values (%) from 200 replications. (C) Maximum likelihood phylogenetic tree made from archaeal translation EF2 proteins based on 590 identical residues. Numbers indicate bootstrap values (%) from 200 replications. Furthermore, in order to examine the origin of the ‘euryarchaeal genes’ in the novel creanarchaeal lineages, we performed phylogenetic analyses targeting DNAP, which is a signature of Euryarchaeota (47) (Table 1). The phylogenetic tree of concatenated SSU+LSU D-type DNAP presents a robust cluster of crenarchaeal lineages that can be considered as a sister group of the enzymes from Euryarchaeota (Figure 5). When the cluster of crenarchaeal sequences was placed as an outgroup of the euryarchaeal sequences, the tree topology does not contradict with the phylogenetic analyses for rRNA genes, r-proteins+RNAP subunits and EFII (Figures 4 and 5). It can thus be concluded that the D-type DNAPs in the novel crenarchaeal lineages were vertically inherited from the last common archaeal ancestor and did not originate in euryarchaeotes. Figure 5. Maximum likelihood phylogenetic tree of concatenated (SSU+LSU) DNAP. Number of identical amino acid residues used were 829. Numbers indicate bootstrap values (%) from 200 replications. Genome core In order to compare the gene complement among the novel crenarchaeal lineages, C. subterraneum, Thaumarchaeota and Korarchaeota, and to investigate the differences between C. subterraneum and hyperthermophilic Crenarchaeota, the numbers of arCOGs in these crenarchaeal lineages that are in common with the genome core genes of Euryarchaeota (E) and hyperthermophilic Crenarchaeota (HC) were examined (Figure 6, Supplementary Table S2). The CDSs in C. subterraneum were tentatively assigned to arCOGs based on BLASTP analysis (

Document structure show

projects that have annotations to this span

There is no project