PMC:3091627 / 6370-24119 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"http://pubannotation.org/docs/sourcedb/PMC/sourceid/3091627","sourcedb":"PMC","sourceid":"3091627","source_url":"https://www.ncbi.nlm.nih.gov/pmc/3091627","text":"Results and Discussion\n\nNon-ubiquitous gene distribution in relation to associated diseases\nHybridization results for the 120 studied DNAs used as a probe and the home-made macroarrays derived from the reference strain 26695 are presented in Additional file 1 (data based on the binary presence/absence analyses) and Figure 1 (data based on the multidimensional analysis of continuous values, see material and methods). Both presentations illustrate the distribution of each of the 254 genes (213 non-ubiquitous, and 41 ubiquitous, used for normalization) with respect to associated diseases. Each strain hybridization profile (Figure 1) is represented by a series of vertically aligned bar charts, whereas the horizontal lines represent each of the 254 genes. Each strain exhibited a unique profile. The most striking features were related to the distribution of the cagPAI genes: almost all H. pylori strains associated with metaplasia harbored a complete cagPAI, a result consistent with findings by Nilsson et al. [26]. However, a complete cagPAI was present in 70% of duodenal ulcer strains, and in 50% of chronic gastritis and of MALT lymphoma strains, confirming previously published findings for isolates collected in the West [27].\nFigure 1 Hybridization reactions on a DNA macroarray membrane containing 254 PCR products that are representative of H. pylori strain 26695 (41 ubiquitous genes + 213 non-ubiquitous or strain-specific genes). Bacterial DNAs from 120 isolates involved in various diseases, including chronic gastritis (yellow), intestinal metaplasia (pink), duodenal ulcer (blue) and gastric MZBL (green), were tested by hybridization. Isolates are listed on the horizontal axis, and the genes tested, on the vertical axis. Clustering (genesis software) was carried out using the continuous values from 120 heterologous hybridization experiments, where each value corresponds to the (log26695-logheterol.strain) value for each tested gene (see materials \u0026 methods). Colors of the line range from blue, if the gene is present, to red, if absent. The range of intermediate colors reflects the degree of hybridization and thus homology, but also the redundancy of the tested genes. This figure represents the clustering based on the complete set of 254 genes. Hierarchical clustering of the continuous values derived from the hybridization experiments of 120 French clinical isolates presenting different disease characteristics was performed (Figure 1). This allowed us to visualize a branch clustering almost exclusively isolates associated with MALT lymphoma. Furthermore, principal component analysis allowed us to identify a combination of 48 genes (Additional file 1), which proved to be the most informative during multidimensional analysis. We then performed hierarchical clustering based on the values of these 48 genes (Figure 2). Two main branches were detected, one consisting of a distinct cluster of 20 isolates, all totally deprived of the cagPAI. Eighteen of the isolates were associated with MALT lymphoma and two with gastritis. Interestingly, none of the peptic ulcer or metaplasia isolates clustered in this branch. The second branch splits into two main clusters, one corresponding to isolates that totally or partially lack cagPAI genes mostly associated with gastritis and the other clustering isolates associated with other diseases.\nFigure 2 Hybridization reactions on a DNA macroarray membrane: clustering based on the 48 most discriminatory genes identified as key combinations of variables (genes/axes) from Principal Component Analysis. These 48 genes are labeled in Addional file 1. To clarify the genetic determinism of the MALT lymphoma strains, we selected one strain that was representative of the MALT lymphoma cagPAI minus branch and determined its genome sequence. We selected strain B38, which was isolated from a 62-year-old man suffering from MALT lymphoma. It fulfilled various requirements: i) it belonged to the hpEurope phylogenetic branch according to MLST analysis (Suerbaum, personal communication), a property that was consistent with the five Helicobacter genome sequences previously published (26695, J99, HPAG1, P12, and G27); ii) it was genetically transformable; iii) it was plasmid free, and iv) it was capable of colonizing the mouse gastric mucosa. Its vacA status was s2m2 [18].\n\nMain features of the B38 genome\nThe genome of the B38 strain consists of a circular chromosome containing 1,576,758 base pairs (bp) and an average GC content of 39.2% (Figure 3). It is the smallest H. pylori genome sequenced to date (Table 1). The B38 genome sequence was first automatically and then manually annotated using the MaGe system [28]http://www.genoscope.cns.fr/agc/mage and was then compared with the other sequenced H. pylori genomes. It contains 1,528 CDSs with a coding density (85.0%) similar to that found in the other Helicobacter sequenced strains. Among the 1528 CDSs, 1393 were predicted to be protein-coding genes (complete CDSs) with an average length of 971 bp; 135 correspond to partial CDSs, of which 133 are pseudogenes (i.e. 133 fragments representing 62 genes) and two are remnant genes (corresponding to truncated genes for which we cannot find the missing sections in close proximity) (Table 1).\nTable 1 Summary of comparative features of Helicobacter genomes\naThese genomes have got a 9,369 bp (HPAG1), a 10,031 bp (G27), a 10,225 bp (P12), a 3,661 bp (Sheeba) plasmid and a 10,031 bp (G27) and a 10,225 bp (P12). Plasmids were not counted\nbRevised number with the MaGe system and manual curation\ncPercentage of fragments of genes/total CDSs\ndPercentage of fragmented genes/total CDSs\ne Number of copies\nFigure 3 Genome map of Helicobacter pylori strain B38. From outside to inside: -\tGC skew (window 2500, step 500) in blue. -\tTotal CDSs (green) with pseudogenes/partial genes (purple). -\tCDSs coding for hypothetical restriction/modification systems (purple), phage proteins (orange), or insertion sequences (ISHp609) (green). -\tTotal CDSs according to the matrix defined for gene identification (matrix n°1 in red, matrix N°2 in black, matrix n°3 in green). -\tRNA (rRNA in green, tRNA in purple and misc_RNA in red). -\tRule. -\tGC% (window 5000, step 2000) in yellow. Red arrow indicates the position of the origin of replication. Of the 1,528 annotated CDSs, a function was assigned to 989 CDSs (64.7%). For 784 of them (79.3%), a function was experimentally demonstrated either in the Helicobacter species (188, 12.3%) or in another organism (596, 39%). Two hundred and five CDSs (20.7% of 989) received a function based on the presence of a conserved amino acid motif, a structural feature, or limited homology. A total of 378 CDSs have homologs in previously reported sequences of the genus Helicobacter (43.6% of 378), in the epsilon proteobacteria (35.2% of 378), or in other distant bacteria (21.2% of 378). Protein function classification based on the cluster orthologous genes classification (COG) database allowed us to place 1189 of the 1528 CDSs (77.81%) in at least one of the COG functional groups (Table 2): 454 were assigned to cellular processes and signaling systems, 342 to information storage and processing, while 595 were involved in metabolism. The B38 genome exhibits the highest percentage of CDSs associated with a COG group (77.97% vs 73.38% for 26695, 76.48% for J99, 76.15% for HPAG1 and, 73.49 for Shi470), with the number of CDSs involved in defense mechanisms slightly higher than in the other sequenced Helicobacter strains.\nTable 2 Automatic distribution of protein functions, based on the COG classification, between Helicobacter strains\n*The CDSs were manually curated in the MaGe system for the elimination of artifacts. There are a significant number of restriction/modification systems present in H. pylori; their composition and activity have been shown to be strain-specific [29]. In the B38 strain, 63 CDSs were involved in restriction/modification systems. Among them, 30 elements were fragmented into pseudogenes corresponding to 12 potential genes, and three elements appeared to be partial genes (Additional file 2). Thus, the proportion of potentially active genes (52%) appeared to be higher in B38 than in strains J99 and 26695, in which only 30% of type II R-M systems were reported to be functional [30].\nThe B38 genome harbors five complete copies of the four-gene insertion sequence ISHp609. This insertion sequence was frequently found in H. pylori strains from Europe, Americas, India and Africa, but was almost always absent in strains from East Asia [31]. Three of the four genes (orf1, orf2, ORFA) demonstrated 100% of identity in the five B38 ISHp609 copies, whereas ORFB from one of the five B38 ISHp609 copies (HELPY1334) exhibited a single mutation. Among the sequenced genomes (Table 1), a single and complete copy of this element was found in strain HPAG1, but it differed slightly from that found in B38 (6, 8, and 9 mutations are present in orf1, ORFA, and ORFB of HPAG1, respectively). This consistency in the five copies of ISHp609 in B38 indicated that it has been acquired very recently, and that it is probably an active element that is capable of transposition, a property never experimentally demonstrated for a transposable element in H. pylori.\nAnother property associated with the B38 genome relates to the complete absence of four of the 45 genes encoding outer membrane proteins (OMPs) from the four conserved OMP families (Hop, Hor, Hof et Hom) (Additional file 3). B38 lacks babB, babC, sabB, and homB, four OMPs known to play a major role in adhesion to gastric epithelial cells and possibly in long-term persistence of strains in the human gastric mucosa when associated with peptic ulcer diseases or gastric metaplasia [32]. B38 lacks a high number of adhesin genes among the sequenced genomes.\n\nComparative genomics and genome evolution\nWe then analyzed the genomic rearrangements through pair-wise genomic synteny comparisons between B38 and the eight published Helicobacteriaceae genomes. For five of the isolates (namely, 26695, J99, G27, P12, HPAG1), we confirmed the previously reported relative colinearity of the H.pylori genomes. This colinearity is mainly interrupted by insertion elements, the cagPAI, and genes encoding hypothetical proteins [33]. However, unexpectedly, conserved synteny highlighted an almost complete colinearity never described so far, between B38 and Shi470 (Figure 4). Shi470 is a clinical isolate from the gastric antrum of an Amerindian resident of a remote Amazonian village in Peru, and was thought to be related to strains from East Asia [RefSeq:NC_010698]. This unexpected absence of major genomic rearrangements between the two genomes prompted us to compare the genome of these two isolates more closely, as a way of better understanding H. pylori genome evolution. B38 lacks 174 Shi470 genes, of which 70 genes cluster in three insertion blocks: one corresponds to the well characterized cagPAI; another to a block of 33 CDSs, mainly remnants from a conjugative plasmid (presence of TraG, VirB11, toposiomerase I, ComB3, homologs of conjugal plasmid transfer system); and the third corresponds to a block that includes 7 CDSs encoding hypothetical proteins, as well as one CDS encoding an exodeoxyribonuclease subunit which is unique to the Shi 470 isolate.\nFigure 4 Synteny lineplot pair-wise analyses between B38 and the H. pylori strain 26695, J99, HPAG1, Shi470, P12, G27, Helicobacter hepaticus, or Helicobacter acinonychis. Conversely, loss of synteny was also due to the presence of 110 CDSs in B38 that were not present in Shi470. Forty-three of these CDSs appeared as clusters within eight loci. Twenty corresponded to ISHp609 (5 complete and conserved copies of ISHp609 each comprising orf1, orf2, ORFA and ORFB) [31], which interrupts HELPY0571, HELPY0700 (both encoding restriction/modification systems), HELPY0838 (encoding a putative Rad50 ATPase), HELPY1330 (encoding a putative glycosyl-transferase), and HELPY1529 (a HAC prophage II protein homolog). In addition to these five ISHp609 insertions, loss of synteny was also due to the presence of CDSs in four other loci: i) a cluster of seven genes (HELPY1520 to HELPY1525 and HEPLY1527, HELPY1528 to HELPY1533) encoding HacII prophage-like proteins similar to those found in H. acinonychis strain Sheeba [34]; however, the size of the prophage is much larger (32 CDS) in this species, suggesting that the prophage in B38 has been deleted, possibly following the insertion of one copy of ISHp609; ii) a cluster of six genes encoding hypothetical proteins of unknown function (HELPY0051 to HELPY0056); iii) a cluster of three CDSs that are absent in Shi470, HPAG1, J99, P12, and G127, but present in strain 26695, of which two encode alginate-O-acetylation proteins (HELPY0497-498); iv) a cluster of seven CDSs that encode a putative helicase (HELPY0989) and a putative serine kinase (HELPY0990), two functional proteins not found in all of the other sequenced strains.\n\nH. pylori core genome and strain-specific genes\nBLAST score ratio analyses and comparisons between the B38 strain and the six other sequenced genomes, which were analyzed and revised through the MaGe system (Table 1), allowed us to establish that the core of the H. pylori genome consists of 1,275 CDSs. This number is slightly higher than that recently published by McClain and colleagues who identified 1,237 genes, as it takes into account additional CDSs detected by the MaGe system [35]. This number is lower than that calculated from data presented in Additional file 1 (1,358 genes) based on the macroarray hybridization analysis of 120 isolates. This approach overestimated the number of ubiquitous CDSs, as all small CDS (\u003c350 bp) from the 26695 strain genome were excluded from the analysis, and thus were systematically counted as ubiquitous CDSs.\nTo identify strain-specific genes present in the B38 strain but absent from the other sequenced strains, we studied the putative orthologous relationship between two genomes i.e. gene couples who satisfy Bi-directional Best Hit (BBH) criteria. Criteria included a minimum of 30% sequence identity and 80% of the length of the smallest protein (Additional file 4). Only 16 CDSs were found to be unique to the B38 strain: nine seemed to be complete and thus putatively functional; six were shown to encode the putative HacII prophage-like proteins (HELPY1521-1522-1523-1524-1525-1527); three were found to encode hypothetical proteins (HELPY0409, HELPY0645 and HELPY0996), and seven corresponded to fragments of genes (partial genes) coding for either conserved hypothetical proteins, prophage-like sequences or for a restriction enzyme. Using the same methodology, we looked for genes that were present in the various H. pylori strains and absent in B38 (Additional file 5). If compared pair-wise, the number of CDSs absent in B38 was between 105 and 175. The only genes that were found to be exclusively absent in B38 corresponded to those of the cagPAI (Additional file 5), the well-known cluster of genes involved in the induction of a strong inflammatory response.\n\nSpecific properties associated with the genomes of strains belonging to the MALT lymphoma PAI minus cluster\nOf the 19 strains belonging to the MALT lymphoma PAI minus cluster, all 19 contained the vacAm2 allele; 16 exhibited an s2m2 genotype, indicating that they encode a non-functional cytotoxin, and three exhibited an s1m2 genotype [18]. We then investigated whether the properties found to be unique to strain B38 are shared by the strains belonging to the cluster of the MALT lymphoma PAI-minus cluster. The search for the presence of the HacII-like prophage was done through hybridization using internal fragments of HELPY1521, HELPY1525, and HELPY1526 as probes. Four of the 19 strains (21%, including B38) of the MALT lymphoma PAI minus cluster, contained HacII prophage-like sequences. By contrast, 1/24 (4%) strains isolated from patients with MALT lymphoma containing cagPAI, 2/33 (6%) strains from patients suffering from gastritis and 2/27 strains (7.4%) from those with duodenal ulcers contained HacII prophage-like sequences. Furthermore, the presence of the two adjacent HELPY0989 and HELPY0990 genes encoding a helicase and a serine kinase, respectively, not previously found in the other sequenced genomes as functional proteins were found in three of the 19 strains (16%) of the B38 cluster. These two genes were not detected in the other MALT lymphoma strains (cagPAI positive), nor within the 22 isolates associated with gastritis and peptic ulcers. Finally, three clustered conservative mutations in glmM (HELPY0072 - Ala332, Leu333), leading to the absence of amplification of the 294-bp internal fragment of the phosphoglucosamine mutase-encoding gene [36], were observed in five of the 19 MALT lymphoma PAI minus isolates (26%). However, these mutations were not found in any of the 120 clinical isolates of this study, nor were they found in more than 400 H. pylori isolates associated with gastritis, peptic ulcers or metaplasia that were tested with identical oligonucleotides (personal data). These conservative mutations may be indicative of a selective pressure to maintain these mutations, together with a property encoded by a gene present in close proximity to glmM, a property that has yet to be identified. Thus, although none of the unique properties of B38 were shared by all MALT strains of the cluster, characterizing a cagPAI minus isolate containing either glmM mutations or HELPY0989-0990 genes may be predictive of MALT lymphoma, as these two characteristics were found exclusively among the strains of this cluster.","divisions":[{"label":"title","span":{"begin":0,"end":22}},{"label":"sec","span":{"begin":24,"end":4357}},{"label":"title","span":{"begin":24,"end":91}},{"label":"p","span":{"begin":92,"end":1240}},{"label":"figure caption","span":{"begin":1241,"end":2280}},{"label":"p","span":{"begin":1251,"end":2280}},{"label":"p","span":{"begin":2281,"end":3378}},{"label":"figure caption","span":{"begin":3379,"end":3634}},{"label":"p","span":{"begin":3389,"end":3634}},{"label":"p","span":{"begin":3635,"end":4357}},{"label":"sec","span":{"begin":4359,"end":9874}},{"label":"title","span":{"begin":4359,"end":4390}},{"label":"p","span":{"begin":4391,"end":5286}},{"label":"table caption","span":{"begin":5287,"end":5696}},{"label":"p","span":{"begin":5296,"end":5351}},{"label":"p","span":{"begin":5352,"end":5532}},{"label":"p","span":{"begin":5533,"end":5589}},{"label":"p","span":{"begin":5590,"end":5634}},{"label":"p","span":{"begin":5635,"end":5677}},{"label":"p","span":{"begin":5678,"end":5696}},{"label":"figure caption","span":{"begin":5697,"end":6326}},{"label":"p","span":{"begin":5707,"end":6326}},{"label":"p","span":{"begin":6327,"end":7553}},{"label":"table caption","span":{"begin":7554,"end":7754}},{"label":"p","span":{"begin":7563,"end":7669}},{"label":"p","span":{"begin":7670,"end":7754}},{"label":"p","span":{"begin":7755,"end":8352}},{"label":"p","span":{"begin":8353,"end":9316}},{"label":"p","span":{"begin":9317,"end":9874}},{"label":"sec","span":{"begin":9876,"end":13058}},{"label":"title","span":{"begin":9876,"end":9917}},{"label":"p","span":{"begin":9918,"end":11380}},{"label":"figure caption","span":{"begin":11381,"end":11553}},{"label":"p","span":{"begin":11391,"end":11553}},{"label":"p","span":{"begin":11554,"end":13058}},{"label":"sec","span":{"begin":13060,"end":15186}},{"label":"title","span":{"begin":13060,"end":13107}},{"label":"p","span":{"begin":13108,"end":13918}},{"label":"p","span":{"begin":13919,"end":15186}},{"label":"title","span":{"begin":15188,"end":15295}}],"tracks":[{"project":"2_test","denotations":[{"id":"20537153-14573679-10781733","span":{"begin":1019,"end":1021},"obj":"14573679"},{"id":"20537153-16425389-10781734","span":{"begin":1236,"end":1238},"obj":"16425389"},{"id":"20537153-14742532-10781735","span":{"begin":4353,"end":4355},"obj":"14742532"},{"id":"20537153-16407324-10781736","span":{"begin":4702,"end":4704},"obj":"16407324"},{"id":"20537153-10944229-10781737","span":{"begin":7914,"end":7916},"obj":"10944229"},{"id":"20537153-11226310-10781738","span":{"begin":8348,"end":8350},"obj":"11226310"},{"id":"20537153-15516563-10781739","span":{"begin":8605,"end":8607},"obj":"15516563"},{"id":"20537153-16790815-10781740","span":{"begin":9800,"end":9802},"obj":"16790815"},{"id":"20537153-10682319-10781741","span":{"begin":10335,"end":10337},"obj":"10682319"},{"id":"20537153-15516563-10781742","span":{"begin":11848,"end":11850},"obj":"15516563"},{"id":"20537153-16789826-10781743","span":{"begin":12396,"end":12398},"obj":"16789826"},{"id":"20537153-19123947-10781744","span":{"begin":13548,"end":13550},"obj":"19123947"},{"id":"20537153-14742532-10781745","span":{"begin":15525,"end":15527},"obj":"14742532"},{"id":"20537153-9157493-10781746","span":{"begin":16866,"end":16868},"obj":"9157493"}],"attributes":[{"subj":"20537153-14573679-10781733","pred":"source","obj":"2_test"},{"subj":"20537153-16425389-10781734","pred":"source","obj":"2_test"},{"subj":"20537153-14742532-10781735","pred":"source","obj":"2_test"},{"subj":"20537153-16407324-10781736","pred":"source","obj":"2_test"},{"subj":"20537153-10944229-10781737","pred":"source","obj":"2_test"},{"subj":"20537153-11226310-10781738","pred":"source","obj":"2_test"},{"subj":"20537153-15516563-10781739","pred":"source","obj":"2_test"},{"subj":"20537153-16790815-10781740","pred":"source","obj":"2_test"},{"subj":"20537153-10682319-10781741","pred":"source","obj":"2_test"},{"subj":"20537153-15516563-10781742","pred":"source","obj":"2_test"},{"subj":"20537153-16789826-10781743","pred":"source","obj":"2_test"},{"subj":"20537153-19123947-10781744","pred":"source","obj":"2_test"},{"subj":"20537153-14742532-10781745","pred":"source","obj":"2_test"},{"subj":"20537153-9157493-10781746","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#eca293","default":true}]}]}}