article-title
|
From array-based hybridization of Helicobacter pylori isolates to the complete genome sequence of an isolate associated with MALT lymphoma
|
abstract
|
Background
elicobacter pylori infection is associated with several gastro-duodenal inflammatory diseases of various levels of severity. To determine whether certain combinations of genetic markers can be used to predict the clinical source of the infection, we analyzed well documented and geographically homogenous clinical isolates using a comparative genomics approach.
Results
A set of 254 H. pylori genes was used to perform array-based comparative genomic hybridization among 120 French H. pylori strains associated with chronic gastritis (n = 33), duodenal ulcers (n = 27), intestinal metaplasia (n = 17) or gastric extra-nodal marginal zone B-cell MALT lymphoma (n = 43). Hierarchical cluster analyses of the DNA hybridization values allowed us to identify a homogeneous subpopulation of strains that clustered exclusively with cagPAI minus MALT lymphoma isolates. The genome sequence of B38, a representative of this MALT lymphoma strain-cluster, was completed, fully annotated, and compared with the six previously released H. pylori genomes (i.e. J99, 26695, HPAG1, P12, G27 and Shi470). B38 has the smallest H. pylori genome described thus far (1,576,758 base pairs containing 1,528 CDSs); it contains the vacAs2m2 allele and lacks the genes encoding the major virulence factors (absence of cagPAI, babB, babC, sabB, and homB). Comparative genomics led to the identification of very few sequences that are unique to the B38 strain (9 intact CDSs and 7 pseudogenes). Pair-wise genomic synteny comparisons between B38 and the 6 H. pylori sequenced genomes revealed an almost complete co-linearity, never seen before between the genomes of strain Shi470 (a Peruvian isolate) and B38.
Conclusion
These isolates are deprived of the main H. pylori virulence factors characterized previously, but are nonetheless associated with gastric neoplasia.
|
sec
|
Background
elicobacter pylori infection is associated with several gastro-duodenal inflammatory diseases of various levels of severity. To determine whether certain combinations of genetic markers can be used to predict the clinical source of the infection, we analyzed well documented and geographically homogenous clinical isolates using a comparative genomics approach.
|
title
|
Background
|
p
|
elicobacter pylori infection is associated with several gastro-duodenal inflammatory diseases of various levels of severity. To determine whether certain combinations of genetic markers can be used to predict the clinical source of the infection, we analyzed well documented and geographically homogenous clinical isolates using a comparative genomics approach.
|
sec
|
Results
A set of 254 H. pylori genes was used to perform array-based comparative genomic hybridization among 120 French H. pylori strains associated with chronic gastritis (n = 33), duodenal ulcers (n = 27), intestinal metaplasia (n = 17) or gastric extra-nodal marginal zone B-cell MALT lymphoma (n = 43). Hierarchical cluster analyses of the DNA hybridization values allowed us to identify a homogeneous subpopulation of strains that clustered exclusively with cagPAI minus MALT lymphoma isolates. The genome sequence of B38, a representative of this MALT lymphoma strain-cluster, was completed, fully annotated, and compared with the six previously released H. pylori genomes (i.e. J99, 26695, HPAG1, P12, G27 and Shi470). B38 has the smallest H. pylori genome described thus far (1,576,758 base pairs containing 1,528 CDSs); it contains the vacAs2m2 allele and lacks the genes encoding the major virulence factors (absence of cagPAI, babB, babC, sabB, and homB). Comparative genomics led to the identification of very few sequences that are unique to the B38 strain (9 intact CDSs and 7 pseudogenes). Pair-wise genomic synteny comparisons between B38 and the 6 H. pylori sequenced genomes revealed an almost complete co-linearity, never seen before between the genomes of strain Shi470 (a Peruvian isolate) and B38.
|
title
|
Results
|
p
|
A set of 254 H. pylori genes was used to perform array-based comparative genomic hybridization among 120 French H. pylori strains associated with chronic gastritis (n = 33), duodenal ulcers (n = 27), intestinal metaplasia (n = 17) or gastric extra-nodal marginal zone B-cell MALT lymphoma (n = 43). Hierarchical cluster analyses of the DNA hybridization values allowed us to identify a homogeneous subpopulation of strains that clustered exclusively with cagPAI minus MALT lymphoma isolates. The genome sequence of B38, a representative of this MALT lymphoma strain-cluster, was completed, fully annotated, and compared with the six previously released H. pylori genomes (i.e. J99, 26695, HPAG1, P12, G27 and Shi470). B38 has the smallest H. pylori genome described thus far (1,576,758 base pairs containing 1,528 CDSs); it contains the vacAs2m2 allele and lacks the genes encoding the major virulence factors (absence of cagPAI, babB, babC, sabB, and homB). Comparative genomics led to the identification of very few sequences that are unique to the B38 strain (9 intact CDSs and 7 pseudogenes). Pair-wise genomic synteny comparisons between B38 and the 6 H. pylori sequenced genomes revealed an almost complete co-linearity, never seen before between the genomes of strain Shi470 (a Peruvian isolate) and B38.
|
sec
|
Conclusion
These isolates are deprived of the main H. pylori virulence factors characterized previously, but are nonetheless associated with gastric neoplasia.
|
title
|
Conclusion
|
p
|
These isolates are deprived of the main H. pylori virulence factors characterized previously, but are nonetheless associated with gastric neoplasia.
|
body
|
Background
Helicobacter pylori infections occur in approximately 50% of the human population and are associated with several inflammatory gastroduodenal diseases [1], including two types of gastric cancers: gastric adenocarcinoma [2] and gastric extra-nodal marginal zone B-cell MALT (mucosa-associated lymphoid tissue) lymphoma, first described by Isaacson et al. [3]. Evolution of this bacterial infection towards malignancy only occurs in approximately 1% of infected individuals, suggesting that both bacterial and host susceptibility factors are involved[4].
Since the discovery of H. pylori, several studies have focused on elucidating H. pylori pathogenicity mechanisms (microbial factors) that are associated with disease outcomes[5]. The cag-pathogenicity island (cagPAI) has been recognized as a major pro-inflammatory actor, but its association with MALT lymphoma strains has yet to be clearly shown [6]. The VacA vacuolating cytotoxin, thought to cause detectable alterations in gastric epithelial cells and immune cells, is also one of the most studied H. pylori virulence factors [7]. VacA has also been suggested to play a role in H. pylori persistence, demonstrated by in vitro studies, based on its immunosuppressive properties [8]. Adhesion of H. pylori to gastric epithelial cells is another bacterial trait contributing to chronic state of the infection. BabA [9], SabA [10], HopZ [11], HomB [12] and 30 outer-membrane-like paralogs recognized as adhesins or potential adhesins are encoded by the H. pylori genome [13]. Several studies have highlighted their contribution to pathogen fitness in human populations [14,15]. Over the last twenty years, genes encoding these virulence factors have served as genotyping markers to establish correlations between these markers, alone or in combination, and clinical outcomes of H. pylori infections [16].
Few studies have been conducted in relation to gastric MALT lymphoma-associated strains. Koehler et al. reported that the vacAm2 allele predominated in MALT lymphoma-associated isolates [17]. In previous studies [18,19] including an identical collection of H. pylori gastric MALT lymphoma strains to that used here, the authors confirmed this finding and suggested that certain combinations of genomic markers may have a predictive value for determining whether gastric MALT lymphoma develops. All these data suggest the potential role for bacterial determinism in the clinical outcome of MALT lymphoma.
So far, comparative genomics involving sequenced H. pylori genomes have been limited to five clinical isolates isolated in the West and associated with gastritis [strain 26695 [20], peptic ulcers (strains J99 [GenBank:AE001439.1], P12 [EMBL:CP001217, EMBL:CP001218]), atrophic gastritis (HPAG1 [21]), or no known disease (strains G27 [22] and Shi470 [RefSeq:NC_010698]. However, no genome sequence of a H. pylori strain isolated from MALT lymphoma is currently available. Comparative genomics based on DNA-array analyses, first conducted by Salama et al. on 15 Caucasian isolates [23], led to the elucidation of the H. pylori core genome comprising the pool of ubiquitous H. pylori genes and strain-specific genes (non-ubiquitous). Gressmann et al. studied gene gain and loss during evolution, by comparing the genome of 56 globally representative strains of H. pylori; they reported that 25% of the genes were non-ubiquitous [24]. Through comparative genomics based on the analysis of 24 clinical isolates from various geographical origins (Western, Asian, African countries) using whole genome DNA arrays, we identified 213 non-ubiquitous or strain-specific genes [25]. In this study, we describe the gene distribution of these 213 non-ubiquitous genes (Additional file 1) within genomes from a large geographically homogeneous French collection of 120 well-characterized H. pylori strains associated with chronic gastritis, duodenal ulcer, intestinal metaplasia or gastric MALT lymphoma. A hierarchical clustering analysis of the DNA hybridization values identified a homogeneous phylogenic subpopulation of strains containing all of the cagPAI minus MALT lymphoma isolates. The B38 isolate was selected as a representative of this MALT lymphoma-specific cluster. Its genome sequence was completed, fully annotated, and compared with previously sequenced and published H. pylori genomes.
Results and Discussion
Non-ubiquitous gene distribution in relation to associated diseases
Hybridization results for the 120 studied DNAs used as a probe and the home-made macroarrays derived from the reference strain 26695 are presented in Additional file 1 (data based on the binary presence/absence analyses) and Figure 1 (data based on the multidimensional analysis of continuous values, see material and methods). Both presentations illustrate the distribution of each of the 254 genes (213 non-ubiquitous, and 41 ubiquitous, used for normalization) with respect to associated diseases. Each strain hybridization profile (Figure 1) is represented by a series of vertically aligned bar charts, whereas the horizontal lines represent each of the 254 genes. Each strain exhibited a unique profile. The most striking features were related to the distribution of the cagPAI genes: almost all H. pylori strains associated with metaplasia harbored a complete cagPAI, a result consistent with findings by Nilsson et al. [26]. However, a complete cagPAI was present in 70% of duodenal ulcer strains, and in 50% of chronic gastritis and of MALT lymphoma strains, confirming previously published findings for isolates collected in the West [27].
Figure 1 Hybridization reactions on a DNA macroarray membrane containing 254 PCR products that are representative of H. pylori strain 26695 (41 ubiquitous genes + 213 non-ubiquitous or strain-specific genes). Bacterial DNAs from 120 isolates involved in various diseases, including chronic gastritis (yellow), intestinal metaplasia (pink), duodenal ulcer (blue) and gastric MZBL (green), were tested by hybridization. Isolates are listed on the horizontal axis, and the genes tested, on the vertical axis. Clustering (genesis software) was carried out using the continuous values from 120 heterologous hybridization experiments, where each value corresponds to the (log26695-logheterol.strain) value for each tested gene (see materials & methods). Colors of the line range from blue, if the gene is present, to red, if absent. The range of intermediate colors reflects the degree of hybridization and thus homology, but also the redundancy of the tested genes. This figure represents the clustering based on the complete set of 254 genes. Hierarchical clustering of the continuous values derived from the hybridization experiments of 120 French clinical isolates presenting different disease characteristics was performed (Figure 1). This allowed us to visualize a branch clustering almost exclusively isolates associated with MALT lymphoma. Furthermore, principal component analysis allowed us to identify a combination of 48 genes (Additional file 1), which proved to be the most informative during multidimensional analysis. We then performed hierarchical clustering based on the values of these 48 genes (Figure 2). Two main branches were detected, one consisting of a distinct cluster of 20 isolates, all totally deprived of the cagPAI. Eighteen of the isolates were associated with MALT lymphoma and two with gastritis. Interestingly, none of the peptic ulcer or metaplasia isolates clustered in this branch. The second branch splits into two main clusters, one corresponding to isolates that totally or partially lack cagPAI genes mostly associated with gastritis and the other clustering isolates associated with other diseases.
Figure 2 Hybridization reactions on a DNA macroarray membrane: clustering based on the 48 most discriminatory genes identified as key combinations of variables (genes/axes) from Principal Component Analysis. These 48 genes are labeled in Addional file 1. To clarify the genetic determinism of the MALT lymphoma strains, we selected one strain that was representative of the MALT lymphoma cagPAI minus branch and determined its genome sequence. We selected strain B38, which was isolated from a 62-year-old man suffering from MALT lymphoma. It fulfilled various requirements: i) it belonged to the hpEurope phylogenetic branch according to MLST analysis (Suerbaum, personal communication), a property that was consistent with the five Helicobacter genome sequences previously published (26695, J99, HPAG1, P12, and G27); ii) it was genetically transformable; iii) it was plasmid free, and iv) it was capable of colonizing the mouse gastric mucosa. Its vacA status was s2m2 [18].
Main features of the B38 genome
The genome of the B38 strain consists of a circular chromosome containing 1,576,758 base pairs (bp) and an average GC content of 39.2% (Figure 3). It is the smallest H. pylori genome sequenced to date (Table 1). The B38 genome sequence was first automatically and then manually annotated using the MaGe system [28]http://www.genoscope.cns.fr/agc/mage and was then compared with the other sequenced H. pylori genomes. It contains 1,528 CDSs with a coding density (85.0%) similar to that found in the other Helicobacter sequenced strains. Among the 1528 CDSs, 1393 were predicted to be protein-coding genes (complete CDSs) with an average length of 971 bp; 135 correspond to partial CDSs, of which 133 are pseudogenes (i.e. 133 fragments representing 62 genes) and two are remnant genes (corresponding to truncated genes for which we cannot find the missing sections in close proximity) (Table 1).
Table 1 Summary of comparative features of Helicobacter genomes
aThese genomes have got a 9,369 bp (HPAG1), a 10,031 bp (G27), a 10,225 bp (P12), a 3,661 bp (Sheeba) plasmid and a 10,031 bp (G27) and a 10,225 bp (P12). Plasmids were not counted
bRevised number with the MaGe system and manual curation
cPercentage of fragments of genes/total CDSs
dPercentage of fragmented genes/total CDSs
e Number of copies
Figure 3 Genome map of Helicobacter pylori strain B38. From outside to inside: - GC skew (window 2500, step 500) in blue. - Total CDSs (green) with pseudogenes/partial genes (purple). - CDSs coding for hypothetical restriction/modification systems (purple), phage proteins (orange), or insertion sequences (ISHp609) (green). - Total CDSs according to the matrix defined for gene identification (matrix n°1 in red, matrix N°2 in black, matrix n°3 in green). - RNA (rRNA in green, tRNA in purple and misc_RNA in red). - Rule. - GC% (window 5000, step 2000) in yellow. Red arrow indicates the position of the origin of replication. Of the 1,528 annotated CDSs, a function was assigned to 989 CDSs (64.7%). For 784 of them (79.3%), a function was experimentally demonstrated either in the Helicobacter species (188, 12.3%) or in another organism (596, 39%). Two hundred and five CDSs (20.7% of 989) received a function based on the presence of a conserved amino acid motif, a structural feature, or limited homology. A total of 378 CDSs have homologs in previously reported sequences of the genus Helicobacter (43.6% of 378), in the epsilon proteobacteria (35.2% of 378), or in other distant bacteria (21.2% of 378). Protein function classification based on the cluster orthologous genes classification (COG) database allowed us to place 1189 of the 1528 CDSs (77.81%) in at least one of the COG functional groups (Table 2): 454 were assigned to cellular processes and signaling systems, 342 to information storage and processing, while 595 were involved in metabolism. The B38 genome exhibits the highest percentage of CDSs associated with a COG group (77.97% vs 73.38% for 26695, 76.48% for J99, 76.15% for HPAG1 and, 73.49 for Shi470), with the number of CDSs involved in defense mechanisms slightly higher than in the other sequenced Helicobacter strains.
Table 2 Automatic distribution of protein functions, based on the COG classification, between Helicobacter strains
*The CDSs were manually curated in the MaGe system for the elimination of artifacts. There are a significant number of restriction/modification systems present in H. pylori; their composition and activity have been shown to be strain-specific [29]. In the B38 strain, 63 CDSs were involved in restriction/modification systems. Among them, 30 elements were fragmented into pseudogenes corresponding to 12 potential genes, and three elements appeared to be partial genes (Additional file 2). Thus, the proportion of potentially active genes (52%) appeared to be higher in B38 than in strains J99 and 26695, in which only 30% of type II R-M systems were reported to be functional [30].
The B38 genome harbors five complete copies of the four-gene insertion sequence ISHp609. This insertion sequence was frequently found in H. pylori strains from Europe, Americas, India and Africa, but was almost always absent in strains from East Asia [31]. Three of the four genes (orf1, orf2, ORFA) demonstrated 100% of identity in the five B38 ISHp609 copies, whereas ORFB from one of the five B38 ISHp609 copies (HELPY1334) exhibited a single mutation. Among the sequenced genomes (Table 1), a single and complete copy of this element was found in strain HPAG1, but it differed slightly from that found in B38 (6, 8, and 9 mutations are present in orf1, ORFA, and ORFB of HPAG1, respectively). This consistency in the five copies of ISHp609 in B38 indicated that it has been acquired very recently, and that it is probably an active element that is capable of transposition, a property never experimentally demonstrated for a transposable element in H. pylori.
Another property associated with the B38 genome relates to the complete absence of four of the 45 genes encoding outer membrane proteins (OMPs) from the four conserved OMP families (Hop, Hor, Hof et Hom) (Additional file 3). B38 lacks babB, babC, sabB, and homB, four OMPs known to play a major role in adhesion to gastric epithelial cells and possibly in long-term persistence of strains in the human gastric mucosa when associated with peptic ulcer diseases or gastric metaplasia [32]. B38 lacks a high number of adhesin genes among the sequenced genomes.
Comparative genomics and genome evolution
We then analyzed the genomic rearrangements through pair-wise genomic synteny comparisons between B38 and the eight published Helicobacteriaceae genomes. For five of the isolates (namely, 26695, J99, G27, P12, HPAG1), we confirmed the previously reported relative colinearity of the H.pylori genomes. This colinearity is mainly interrupted by insertion elements, the cagPAI, and genes encoding hypothetical proteins [33]. However, unexpectedly, conserved synteny highlighted an almost complete colinearity never described so far, between B38 and Shi470 (Figure 4). Shi470 is a clinical isolate from the gastric antrum of an Amerindian resident of a remote Amazonian village in Peru, and was thought to be related to strains from East Asia [RefSeq:NC_010698]. This unexpected absence of major genomic rearrangements between the two genomes prompted us to compare the genome of these two isolates more closely, as a way of better understanding H. pylori genome evolution. B38 lacks 174 Shi470 genes, of which 70 genes cluster in three insertion blocks: one corresponds to the well characterized cagPAI; another to a block of 33 CDSs, mainly remnants from a conjugative plasmid (presence of TraG, VirB11, toposiomerase I, ComB3, homologs of conjugal plasmid transfer system); and the third corresponds to a block that includes 7 CDSs encoding hypothetical proteins, as well as one CDS encoding an exodeoxyribonuclease subunit which is unique to the Shi 470 isolate.
Figure 4 Synteny lineplot pair-wise analyses between B38 and the H. pylori strain 26695, J99, HPAG1, Shi470, P12, G27, Helicobacter hepaticus, or Helicobacter acinonychis. Conversely, loss of synteny was also due to the presence of 110 CDSs in B38 that were not present in Shi470. Forty-three of these CDSs appeared as clusters within eight loci. Twenty corresponded to ISHp609 (5 complete and conserved copies of ISHp609 each comprising orf1, orf2, ORFA and ORFB) [31], which interrupts HELPY0571, HELPY0700 (both encoding restriction/modification systems), HELPY0838 (encoding a putative Rad50 ATPase), HELPY1330 (encoding a putative glycosyl-transferase), and HELPY1529 (a HAC prophage II protein homolog). In addition to these five ISHp609 insertions, loss of synteny was also due to the presence of CDSs in four other loci: i) a cluster of seven genes (HELPY1520 to HELPY1525 and HEPLY1527, HELPY1528 to HELPY1533) encoding HacII prophage-like proteins similar to those found in H. acinonychis strain Sheeba [34]; however, the size of the prophage is much larger (32 CDS) in this species, suggesting that the prophage in B38 has been deleted, possibly following the insertion of one copy of ISHp609; ii) a cluster of six genes encoding hypothetical proteins of unknown function (HELPY0051 to HELPY0056); iii) a cluster of three CDSs that are absent in Shi470, HPAG1, J99, P12, and G127, but present in strain 26695, of which two encode alginate-O-acetylation proteins (HELPY0497-498); iv) a cluster of seven CDSs that encode a putative helicase (HELPY0989) and a putative serine kinase (HELPY0990), two functional proteins not found in all of the other sequenced strains.
H. pylori core genome and strain-specific genes
BLAST score ratio analyses and comparisons between the B38 strain and the six other sequenced genomes, which were analyzed and revised through the MaGe system (Table 1), allowed us to establish that the core of the H. pylori genome consists of 1,275 CDSs. This number is slightly higher than that recently published by McClain and colleagues who identified 1,237 genes, as it takes into account additional CDSs detected by the MaGe system [35]. This number is lower than that calculated from data presented in Additional file 1 (1,358 genes) based on the macroarray hybridization analysis of 120 isolates. This approach overestimated the number of ubiquitous CDSs, as all small CDS (<350 bp) from the 26695 strain genome were excluded from the analysis, and thus were systematically counted as ubiquitous CDSs.
To identify strain-specific genes present in the B38 strain but absent from the other sequenced strains, we studied the putative orthologous relationship between two genomes i.e. gene couples who satisfy Bi-directional Best Hit (BBH) criteria. Criteria included a minimum of 30% sequence identity and 80% of the length of the smallest protein (Additional file 4). Only 16 CDSs were found to be unique to the B38 strain: nine seemed to be complete and thus putatively functional; six were shown to encode the putative HacII prophage-like proteins (HELPY1521-1522-1523-1524-1525-1527); three were found to encode hypothetical proteins (HELPY0409, HELPY0645 and HELPY0996), and seven corresponded to fragments of genes (partial genes) coding for either conserved hypothetical proteins, prophage-like sequences or for a restriction enzyme. Using the same methodology, we looked for genes that were present in the various H. pylori strains and absent in B38 (Additional file 5). If compared pair-wise, the number of CDSs absent in B38 was between 105 and 175. The only genes that were found to be exclusively absent in B38 corresponded to those of the cagPAI (Additional file 5), the well-known cluster of genes involved in the induction of a strong inflammatory response.
Specific properties associated with the genomes of strains belonging to the MALT lymphoma PAI minus cluster
Of the 19 strains belonging to the MALT lymphoma PAI minus cluster, all 19 contained the vacAm2 allele; 16 exhibited an s2m2 genotype, indicating that they encode a non-functional cytotoxin, and three exhibited an s1m2 genotype [18]. We then investigated whether the properties found to be unique to strain B38 are shared by the strains belonging to the cluster of the MALT lymphoma PAI-minus cluster. The search for the presence of the HacII-like prophage was done through hybridization using internal fragments of HELPY1521, HELPY1525, and HELPY1526 as probes. Four of the 19 strains (21%, including B38) of the MALT lymphoma PAI minus cluster, contained HacII prophage-like sequences. By contrast, 1/24 (4%) strains isolated from patients with MALT lymphoma containing cagPAI, 2/33 (6%) strains from patients suffering from gastritis and 2/27 strains (7.4%) from those with duodenal ulcers contained HacII prophage-like sequences. Furthermore, the presence of the two adjacent HELPY0989 and HELPY0990 genes encoding a helicase and a serine kinase, respectively, not previously found in the other sequenced genomes as functional proteins were found in three of the 19 strains (16%) of the B38 cluster. These two genes were not detected in the other MALT lymphoma strains (cagPAI positive), nor within the 22 isolates associated with gastritis and peptic ulcers. Finally, three clustered conservative mutations in glmM (HELPY0072 - Ala332, Leu333), leading to the absence of amplification of the 294-bp internal fragment of the phosphoglucosamine mutase-encoding gene [36], were observed in five of the 19 MALT lymphoma PAI minus isolates (26%). However, these mutations were not found in any of the 120 clinical isolates of this study, nor were they found in more than 400 H. pylori isolates associated with gastritis, peptic ulcers or metaplasia that were tested with identical oligonucleotides (personal data). These conservative mutations may be indicative of a selective pressure to maintain these mutations, together with a property encoded by a gene present in close proximity to glmM, a property that has yet to be identified. Thus, although none of the unique properties of B38 were shared by all MALT strains of the cluster, characterizing a cagPAI minus isolate containing either glmM mutations or HELPY0989-0990 genes may be predictive of MALT lymphoma, as these two characteristics were found exclusively among the strains of this cluster.
Conclusion
The study was initiated with the aim of gaining insight into the existence of bacterial determinism for gastric extra-nodal marginal zone B-cell MALT lymphoma. DNA hydridization against the whole genome of 120 clinical isolates revealed a cluster of 19 H. pylori strains, all completely deprived of cagPAI sequences originating from patients with MALT lymphoma. We sequenced the genome of strain B38, a representative of this cluster, and describe the first genome sequence of a cagPAI minus H. pylori strain. The absence of the cagPAI, including that of several non-ubiquitous genes, makes the B38 genome the smallest H. pylori genome described to date. The cagPAI minus B38 strain lacks a functional cytotoxin (vacAs2m2) as well as genes encoding the major adhesion factors (absence of babB, babC, sabB, and homB); thus, compared with well-known pro-inflammatory H. pylori isolates, it appears to be deprived of all known pathogenic determinants, but is nonetheless associated with gastric neoplasia. Further investigation is required to fully understand the difference in fitness between these strains with low pro-inflammatory profiles and the human host factors that may play a significant role in the development of gastric MALT lymphoma.
Methods
H. pylori strains, and growth
We examined 120 H. pylori strains isolated from patients from different areas of France enrolled in 3 multi-center studies carried out by 1) the Groupe d'Etude Français des Helicobacter (G.E.F.H.), 2) the Groupe d'Etude Français des Lymphomes Digestifs (G.E.L.D.) [37] and of the Fédération Française de Cancérologie Digestive (F.F.C.D.) [38], and 3) the Groupe d'Etude des Lymphomes de l'Adulte (G.E.L.A.). Criteria for patient inclusion were age (>55 years), suffering from chronic gastritis (n = 33), duodenal ulcer without intestinal metaplasia (27), intestinal metaplasia without ulcer (n = 17). We identified 43 strains from patients with gastric MALT lymphoma. H. pylori was isolated from one biopsy specimen following biopsy homogenization and culture under microaerophilic conditions (5-6% 02, 8-10% CO2, 80-85% N2) on blood agar medium (BA; Oxoid blood agar base N°2) supplemented with 10% horse blood, as reported previously [39]. One colony was selected at random from each primary culture; it was then sub-cultured and used to prepare chromosomal DNA. This DNA was extracted from 48-hour-old confluent cells using the QIAamp Tissue kit (Qiagen, Chatsworth, CA) according to the manufacturer's recommendations.
In house DNA macroarray membrane preparation
A total of 254 PCR products were amplified in four 96-well microtiter plates, corresponding to 41 ubiquitous and 213 non-ubiquitous genes from the genome of strain 26695 as previously described [39]. Briefly, amplification reactions were performed in 2 × 100 μl reaction volumes, in which 2 μl of DNA corresponding to the recombinant plasmid containing the full-length CDS (CoDing Sequence) inserted into the pILL570-derivative vector was used as template. Each PCR product was sequenced to confirm the identity of the gene, and was then spotted in triplicate onto a nylon membrane (Qfilter, Genetix 22.2 × 22.2 cm, N+) using a Qpix robot (Genetix). Denaturated 26695 genomic DNA was spotted in triplicate at the four corners of the membrane (positive controls) and seven squares were left empty as negative controls. Following spot deposition, membranes were fixed for 15 minutes in 0.5 M NaOH 1.5 M NaCl, washed briefly in distilled water, and stored wet at -20°C until use [39].
Aliquots of 250 μl of DNA were labeled by random priming with 2 μl of 33P-dCTP. Labeling was performed for 3 hours at room temperature. Unincorporated radionucleotides were removed by purification on Quick Spin Sephadex G-25 columns (Roche Diagnostics). Immediately before being used for hybridization experiments, the sonicated, labeled, and purified chromosomal DNA was heat-denaturated and cooled on ice. Hybridization was conducted in 5 ml prewarmed (65°C) hybridization mixtures containing the heat denaturated probe, with overnight incubation. Membranes were then washed and exposed for 25 hours to a phosphoimager screen (Molecular Dynamics).
Screens were scanned on a Storm 860 machine (Molecular Dynamics). Image analysis and quantification of hybridization intensities for each spot were performed using the Xdots Reader program (COSE) and determined in pixels [39]. The intensity of the background surrounding each spot was substracted from that of each of the spots. Twenty-one homologous hybridizations were performed. The average intensity of the 41 ubiquitous genes was calculated for each reference array. This number served to allocate a reference array to each heterologous hybridization (average of the ubiquitous spots from the heterologous and the homologous reference hybridizations were not significantly different, Student's test), to calculate the ratio used for normalization. Following normalization, the data were analyzed by attributing a binary score (presence/absence - Additional file 1) or by multidimensional analysis based on continuous intensity values (Figure 1 and Figure 2). To define the cutoff ratio for the presence/absence of a gene, we analyzed the results for the sequenced H. pylori J99 DNA hybridized with H. pylori 26695; the threshold for the presence of a gene was defined as >0.25. The multidimensional analyses (Genesis software) for the hierarchical clustering as well as for the Principal Component Analysis were performed using the 254 continuous values from the 120 heterologous hybridization experiments, each corresponding to (log10normalized intensity values of strain 26695) minus (log10normalized intensity values of the heterologous strain) (i.e. log26695-logheterol.strain).
Sequencing and annotation of the B38 genome
Genomic DNA was randomly sheared by nebulization (HydroShear, GeneMachines) and the ends were enzymatically repaired. SmaI fragments (1.5-4 kb) were inserted into plasmid vector pBAM3/SmaI (derived from pBluescript KS and constructed by R. Heilig). Large (35-45 kb) DNA fragments generated from partial BamHI-restriction were inserted into the cosmid vector pHC79/BamHI.
Plasmid DNA was prepared with the TempliPhi DNA sequencing template amplification kit (GE Healthcare-Bio-Sciences). Cosmid DNA was purified with the Montage BAC Miniprep96 Kit (Millipore). Sequencing reactions were performed from both ends of DNA templates using ABI PRISM BigDye Terminator cycle sequencing ready reactions kits and were run on a 3700 or a 3730 xl Genetic Analyzer (Applied Biosystems).
Sequence data base calling was carried out using Phred [40]. Sequences not meeting our production quality criteria (at least 100 bases called with a quality over 20) were discarded. Sequences were screened against plasmid vector and E. coli sequences. The traces were assembled using Phrap and Consed [41]. Whole genome shotgun sequencing was performed to ensure approximately 11-fold coverage. Autofinish [42] was used to design primers for improving regions of low quality sequence and for primer walking along templates spanning the gaps between contigs. Several strategies were used to orientate contigs and to enable directed PCR-based approaches to span the gaps between contigs. These strategies included linking isolates and a Blast-based approach, which identified contigs with hits to the H. pylori strain 26695 genome. Various combined PCR techniques were used to amplify genomic or cosmid DNA, to close the gaps between the final contigs. Outward-directed primers were designed for each of the contig ends; the primer sequences were subsequently checked and confirmed to be unique to the genome. This combined PCR process required approximately 200 PCR reactions pairing each of the primers. In addition, two cosmid isolates containing a rDNA operon copy each, were completely sequenced by sub-cloning into a pSMART-LC vector (Lucigen Corp.). The error rate was less than 1 error per 10,000 bp in the final assembly. The complete genome sequence was obtained from 40 153 sequences, resulting in 14-fold coverage.
AMIGene software was used to predict which CDSs were likely to encode proteins [43]. The set of predicted genes underwent automatic functional annotation using the set of tools listed in Vallenet et al. [28]. All these data (syntactic and functional annotations, results of comparative analysis) are stored in a relational database, called PyloriScope. Manual validation of the automatic annotation was performed using the MaGe (Magnifying Genomes, http://www.genoscope.cns.fr) web-based interface, which allows graphic visualization of the annotations enhanced by the synchronized representation of synteny groups in other genomes chosen for comparison.
Accession Numbers
The EMBL Nucleotide Sequence Database http://www.ebi.ac.uk/embl accession number for the H. pylori strain B38 chromosome is [EMBL:FM991728].
All data and comparative genomics concerning the H. pylori B38 genome are stored in PyloriScope http://www.genoscope.cns.fr/agc/mage, a related database that is available to the public.
Authors' contributions
JMT carried out the macroarrays, the molecular genetic studies, and participated to the genome assembly. CB-E carried out the major part of the manual annotation of the genome together with PL, HDR, and IB. CB and LM carried out to the genome sequencing and assembly. CM, ZR and AL were involved in the automatic annotation, comparative genomics, and administration of the MaGe system. JYC, MAD and SC participated to the home made DNA arrays preparation, and the statistical analyses. CB, AR-F, AC-M, DL, FM and JCD collected the clinical isolates. AL designed the study, analysed the results, and drafted the manuscript. JR analysed the results, and drafted the manuscript. All authors read and approved the final manuscript.
Supplementary Material
Additional file 1
List of the 254 genes of Helicobacter pylori strain 26695 used for gene amplification and preparation of the home-made macroarray membranes. Distribution of each gene in the 120 French isolates of this study associated with gastritis (G), duodenal ulcer (DU), gastric MALT lymphoma (MALT) or metaplasia (META). The percentages were based on the binary analysis (presence/absence/) according to the normalization process and the cutoff ratio described in Material ad Methods. "HPXXXX+", genes were designated as ubiquitous genes based on previous comparative analysis [25]; "HPXXXX" are the non-ubiquitous genes; the 48 most discriminatory genes identified as key combinations of variables (genes/axes) from the Principal Component Analysis, which were used for the clustering analysis, are in bold (Figure 2).
Click here for file
Additional file 2
CDSs of B38 strain involved in restriction/modification systems classified according to the gene status.
Click here for file
Additional file 3
Distribution of the outer membrane proteins (OMPs) encoding genes in the 7 Helicobacter pylori genome sequences. (B38, J99, 26695, HPAG1, Shi470, G27, P12). The genes are classified according to the hop, hor, hof, and hom gene families. The numbers refer to the name of the CDS in each genome (for example: 0009 in 26695 refers to HP0009, 0007 in B38 refers to HELPY0007). "x" indicates a complete absence of the gene. Two or three names separated by a "/" reveals the presence of a pseudogene.
Click here for file
Additional file 4
Number of CDSs in the B38 strain that are absent in the J99, 26695, HPAG1 or Shi470 Helicobacter pylori strains classified by protein functions.
Click here for file
Additional file 5
Number of CDSs (listed by protein functions) of the Helicobacter pylori J99, 26695, HPAG1 and Shi470, G27 and P12 strains that are absent in strain B38 respectively. * All strains: J99, 26695, HPAG1, Shi470, G27, and P12. ** The number depends on the strain chosen for reference.
Click here for file
|
sec
|
Background
Helicobacter pylori infections occur in approximately 50% of the human population and are associated with several inflammatory gastroduodenal diseases [1], including two types of gastric cancers: gastric adenocarcinoma [2] and gastric extra-nodal marginal zone B-cell MALT (mucosa-associated lymphoid tissue) lymphoma, first described by Isaacson et al. [3]. Evolution of this bacterial infection towards malignancy only occurs in approximately 1% of infected individuals, suggesting that both bacterial and host susceptibility factors are involved[4].
Since the discovery of H. pylori, several studies have focused on elucidating H. pylori pathogenicity mechanisms (microbial factors) that are associated with disease outcomes[5]. The cag-pathogenicity island (cagPAI) has been recognized as a major pro-inflammatory actor, but its association with MALT lymphoma strains has yet to be clearly shown [6]. The VacA vacuolating cytotoxin, thought to cause detectable alterations in gastric epithelial cells and immune cells, is also one of the most studied H. pylori virulence factors [7]. VacA has also been suggested to play a role in H. pylori persistence, demonstrated by in vitro studies, based on its immunosuppressive properties [8]. Adhesion of H. pylori to gastric epithelial cells is another bacterial trait contributing to chronic state of the infection. BabA [9], SabA [10], HopZ [11], HomB [12] and 30 outer-membrane-like paralogs recognized as adhesins or potential adhesins are encoded by the H. pylori genome [13]. Several studies have highlighted their contribution to pathogen fitness in human populations [14,15]. Over the last twenty years, genes encoding these virulence factors have served as genotyping markers to establish correlations between these markers, alone or in combination, and clinical outcomes of H. pylori infections [16].
Few studies have been conducted in relation to gastric MALT lymphoma-associated strains. Koehler et al. reported that the vacAm2 allele predominated in MALT lymphoma-associated isolates [17]. In previous studies [18,19] including an identical collection of H. pylori gastric MALT lymphoma strains to that used here, the authors confirmed this finding and suggested that certain combinations of genomic markers may have a predictive value for determining whether gastric MALT lymphoma develops. All these data suggest the potential role for bacterial determinism in the clinical outcome of MALT lymphoma.
So far, comparative genomics involving sequenced H. pylori genomes have been limited to five clinical isolates isolated in the West and associated with gastritis [strain 26695 [20], peptic ulcers (strains J99 [GenBank:AE001439.1], P12 [EMBL:CP001217, EMBL:CP001218]), atrophic gastritis (HPAG1 [21]), or no known disease (strains G27 [22] and Shi470 [RefSeq:NC_010698]. However, no genome sequence of a H. pylori strain isolated from MALT lymphoma is currently available. Comparative genomics based on DNA-array analyses, first conducted by Salama et al. on 15 Caucasian isolates [23], led to the elucidation of the H. pylori core genome comprising the pool of ubiquitous H. pylori genes and strain-specific genes (non-ubiquitous). Gressmann et al. studied gene gain and loss during evolution, by comparing the genome of 56 globally representative strains of H. pylori; they reported that 25% of the genes were non-ubiquitous [24]. Through comparative genomics based on the analysis of 24 clinical isolates from various geographical origins (Western, Asian, African countries) using whole genome DNA arrays, we identified 213 non-ubiquitous or strain-specific genes [25]. In this study, we describe the gene distribution of these 213 non-ubiquitous genes (Additional file 1) within genomes from a large geographically homogeneous French collection of 120 well-characterized H. pylori strains associated with chronic gastritis, duodenal ulcer, intestinal metaplasia or gastric MALT lymphoma. A hierarchical clustering analysis of the DNA hybridization values identified a homogeneous phylogenic subpopulation of strains containing all of the cagPAI minus MALT lymphoma isolates. The B38 isolate was selected as a representative of this MALT lymphoma-specific cluster. Its genome sequence was completed, fully annotated, and compared with previously sequenced and published H. pylori genomes.
|
title
|
Background
|
p
|
Helicobacter pylori infections occur in approximately 50% of the human population and are associated with several inflammatory gastroduodenal diseases [1], including two types of gastric cancers: gastric adenocarcinoma [2] and gastric extra-nodal marginal zone B-cell MALT (mucosa-associated lymphoid tissue) lymphoma, first described by Isaacson et al. [3]. Evolution of this bacterial infection towards malignancy only occurs in approximately 1% of infected individuals, suggesting that both bacterial and host susceptibility factors are involved[4].
|
p
|
Since the discovery of H. pylori, several studies have focused on elucidating H. pylori pathogenicity mechanisms (microbial factors) that are associated with disease outcomes[5]. The cag-pathogenicity island (cagPAI) has been recognized as a major pro-inflammatory actor, but its association with MALT lymphoma strains has yet to be clearly shown [6]. The VacA vacuolating cytotoxin, thought to cause detectable alterations in gastric epithelial cells and immune cells, is also one of the most studied H. pylori virulence factors [7]. VacA has also been suggested to play a role in H. pylori persistence, demonstrated by in vitro studies, based on its immunosuppressive properties [8]. Adhesion of H. pylori to gastric epithelial cells is another bacterial trait contributing to chronic state of the infection. BabA [9], SabA [10], HopZ [11], HomB [12] and 30 outer-membrane-like paralogs recognized as adhesins or potential adhesins are encoded by the H. pylori genome [13]. Several studies have highlighted their contribution to pathogen fitness in human populations [14,15]. Over the last twenty years, genes encoding these virulence factors have served as genotyping markers to establish correlations between these markers, alone or in combination, and clinical outcomes of H. pylori infections [16].
|
p
|
Few studies have been conducted in relation to gastric MALT lymphoma-associated strains. Koehler et al. reported that the vacAm2 allele predominated in MALT lymphoma-associated isolates [17]. In previous studies [18,19] including an identical collection of H. pylori gastric MALT lymphoma strains to that used here, the authors confirmed this finding and suggested that certain combinations of genomic markers may have a predictive value for determining whether gastric MALT lymphoma develops. All these data suggest the potential role for bacterial determinism in the clinical outcome of MALT lymphoma.
|
p
|
So far, comparative genomics involving sequenced H. pylori genomes have been limited to five clinical isolates isolated in the West and associated with gastritis [strain 26695 [20], peptic ulcers (strains J99 [GenBank:AE001439.1], P12 [EMBL:CP001217, EMBL:CP001218]), atrophic gastritis (HPAG1 [21]), or no known disease (strains G27 [22] and Shi470 [RefSeq:NC_010698]. However, no genome sequence of a H. pylori strain isolated from MALT lymphoma is currently available. Comparative genomics based on DNA-array analyses, first conducted by Salama et al. on 15 Caucasian isolates [23], led to the elucidation of the H. pylori core genome comprising the pool of ubiquitous H. pylori genes and strain-specific genes (non-ubiquitous). Gressmann et al. studied gene gain and loss during evolution, by comparing the genome of 56 globally representative strains of H. pylori; they reported that 25% of the genes were non-ubiquitous [24]. Through comparative genomics based on the analysis of 24 clinical isolates from various geographical origins (Western, Asian, African countries) using whole genome DNA arrays, we identified 213 non-ubiquitous or strain-specific genes [25]. In this study, we describe the gene distribution of these 213 non-ubiquitous genes (Additional file 1) within genomes from a large geographically homogeneous French collection of 120 well-characterized H. pylori strains associated with chronic gastritis, duodenal ulcer, intestinal metaplasia or gastric MALT lymphoma. A hierarchical clustering analysis of the DNA hybridization values identified a homogeneous phylogenic subpopulation of strains containing all of the cagPAI minus MALT lymphoma isolates. The B38 isolate was selected as a representative of this MALT lymphoma-specific cluster. Its genome sequence was completed, fully annotated, and compared with previously sequenced and published H. pylori genomes.
|
sec
|
Results and Discussion
Non-ubiquitous gene distribution in relation to associated diseases
Hybridization results for the 120 studied DNAs used as a probe and the home-made macroarrays derived from the reference strain 26695 are presented in Additional file 1 (data based on the binary presence/absence analyses) and Figure 1 (data based on the multidimensional analysis of continuous values, see material and methods). Both presentations illustrate the distribution of each of the 254 genes (213 non-ubiquitous, and 41 ubiquitous, used for normalization) with respect to associated diseases. Each strain hybridization profile (Figure 1) is represented by a series of vertically aligned bar charts, whereas the horizontal lines represent each of the 254 genes. Each strain exhibited a unique profile. The most striking features were related to the distribution of the cagPAI genes: almost all H. pylori strains associated with metaplasia harbored a complete cagPAI, a result consistent with findings by Nilsson et al. [26]. However, a complete cagPAI was present in 70% of duodenal ulcer strains, and in 50% of chronic gastritis and of MALT lymphoma strains, confirming previously published findings for isolates collected in the West [27].
Figure 1 Hybridization reactions on a DNA macroarray membrane containing 254 PCR products that are representative of H. pylori strain 26695 (41 ubiquitous genes + 213 non-ubiquitous or strain-specific genes). Bacterial DNAs from 120 isolates involved in various diseases, including chronic gastritis (yellow), intestinal metaplasia (pink), duodenal ulcer (blue) and gastric MZBL (green), were tested by hybridization. Isolates are listed on the horizontal axis, and the genes tested, on the vertical axis. Clustering (genesis software) was carried out using the continuous values from 120 heterologous hybridization experiments, where each value corresponds to the (log26695-logheterol.strain) value for each tested gene (see materials & methods). Colors of the line range from blue, if the gene is present, to red, if absent. The range of intermediate colors reflects the degree of hybridization and thus homology, but also the redundancy of the tested genes. This figure represents the clustering based on the complete set of 254 genes. Hierarchical clustering of the continuous values derived from the hybridization experiments of 120 French clinical isolates presenting different disease characteristics was performed (Figure 1). This allowed us to visualize a branch clustering almost exclusively isolates associated with MALT lymphoma. Furthermore, principal component analysis allowed us to identify a combination of 48 genes (Additional file 1), which proved to be the most informative during multidimensional analysis. We then performed hierarchical clustering based on the values of these 48 genes (Figure 2). Two main branches were detected, one consisting of a distinct cluster of 20 isolates, all totally deprived of the cagPAI. Eighteen of the isolates were associated with MALT lymphoma and two with gastritis. Interestingly, none of the peptic ulcer or metaplasia isolates clustered in this branch. The second branch splits into two main clusters, one corresponding to isolates that totally or partially lack cagPAI genes mostly associated with gastritis and the other clustering isolates associated with other diseases.
Figure 2 Hybridization reactions on a DNA macroarray membrane: clustering based on the 48 most discriminatory genes identified as key combinations of variables (genes/axes) from Principal Component Analysis. These 48 genes are labeled in Addional file 1. To clarify the genetic determinism of the MALT lymphoma strains, we selected one strain that was representative of the MALT lymphoma cagPAI minus branch and determined its genome sequence. We selected strain B38, which was isolated from a 62-year-old man suffering from MALT lymphoma. It fulfilled various requirements: i) it belonged to the hpEurope phylogenetic branch according to MLST analysis (Suerbaum, personal communication), a property that was consistent with the five Helicobacter genome sequences previously published (26695, J99, HPAG1, P12, and G27); ii) it was genetically transformable; iii) it was plasmid free, and iv) it was capable of colonizing the mouse gastric mucosa. Its vacA status was s2m2 [18].
Main features of the B38 genome
The genome of the B38 strain consists of a circular chromosome containing 1,576,758 base pairs (bp) and an average GC content of 39.2% (Figure 3). It is the smallest H. pylori genome sequenced to date (Table 1). The B38 genome sequence was first automatically and then manually annotated using the MaGe system [28]http://www.genoscope.cns.fr/agc/mage and was then compared with the other sequenced H. pylori genomes. It contains 1,528 CDSs with a coding density (85.0%) similar to that found in the other Helicobacter sequenced strains. Among the 1528 CDSs, 1393 were predicted to be protein-coding genes (complete CDSs) with an average length of 971 bp; 135 correspond to partial CDSs, of which 133 are pseudogenes (i.e. 133 fragments representing 62 genes) and two are remnant genes (corresponding to truncated genes for which we cannot find the missing sections in close proximity) (Table 1).
Table 1 Summary of comparative features of Helicobacter genomes
aThese genomes have got a 9,369 bp (HPAG1), a 10,031 bp (G27), a 10,225 bp (P12), a 3,661 bp (Sheeba) plasmid and a 10,031 bp (G27) and a 10,225 bp (P12). Plasmids were not counted
bRevised number with the MaGe system and manual curation
cPercentage of fragments of genes/total CDSs
dPercentage of fragmented genes/total CDSs
e Number of copies
Figure 3 Genome map of Helicobacter pylori strain B38. From outside to inside: - GC skew (window 2500, step 500) in blue. - Total CDSs (green) with pseudogenes/partial genes (purple). - CDSs coding for hypothetical restriction/modification systems (purple), phage proteins (orange), or insertion sequences (ISHp609) (green). - Total CDSs according to the matrix defined for gene identification (matrix n°1 in red, matrix N°2 in black, matrix n°3 in green). - RNA (rRNA in green, tRNA in purple and misc_RNA in red). - Rule. - GC% (window 5000, step 2000) in yellow. Red arrow indicates the position of the origin of replication. Of the 1,528 annotated CDSs, a function was assigned to 989 CDSs (64.7%). For 784 of them (79.3%), a function was experimentally demonstrated either in the Helicobacter species (188, 12.3%) or in another organism (596, 39%). Two hundred and five CDSs (20.7% of 989) received a function based on the presence of a conserved amino acid motif, a structural feature, or limited homology. A total of 378 CDSs have homologs in previously reported sequences of the genus Helicobacter (43.6% of 378), in the epsilon proteobacteria (35.2% of 378), or in other distant bacteria (21.2% of 378). Protein function classification based on the cluster orthologous genes classification (COG) database allowed us to place 1189 of the 1528 CDSs (77.81%) in at least one of the COG functional groups (Table 2): 454 were assigned to cellular processes and signaling systems, 342 to information storage and processing, while 595 were involved in metabolism. The B38 genome exhibits the highest percentage of CDSs associated with a COG group (77.97% vs 73.38% for 26695, 76.48% for J99, 76.15% for HPAG1 and, 73.49 for Shi470), with the number of CDSs involved in defense mechanisms slightly higher than in the other sequenced Helicobacter strains.
Table 2 Automatic distribution of protein functions, based on the COG classification, between Helicobacter strains
*The CDSs were manually curated in the MaGe system for the elimination of artifacts. There are a significant number of restriction/modification systems present in H. pylori; their composition and activity have been shown to be strain-specific [29]. In the B38 strain, 63 CDSs were involved in restriction/modification systems. Among them, 30 elements were fragmented into pseudogenes corresponding to 12 potential genes, and three elements appeared to be partial genes (Additional file 2). Thus, the proportion of potentially active genes (52%) appeared to be higher in B38 than in strains J99 and 26695, in which only 30% of type II R-M systems were reported to be functional [30].
The B38 genome harbors five complete copies of the four-gene insertion sequence ISHp609. This insertion sequence was frequently found in H. pylori strains from Europe, Americas, India and Africa, but was almost always absent in strains from East Asia [31]. Three of the four genes (orf1, orf2, ORFA) demonstrated 100% of identity in the five B38 ISHp609 copies, whereas ORFB from one of the five B38 ISHp609 copies (HELPY1334) exhibited a single mutation. Among the sequenced genomes (Table 1), a single and complete copy of this element was found in strain HPAG1, but it differed slightly from that found in B38 (6, 8, and 9 mutations are present in orf1, ORFA, and ORFB of HPAG1, respectively). This consistency in the five copies of ISHp609 in B38 indicated that it has been acquired very recently, and that it is probably an active element that is capable of transposition, a property never experimentally demonstrated for a transposable element in H. pylori.
Another property associated with the B38 genome relates to the complete absence of four of the 45 genes encoding outer membrane proteins (OMPs) from the four conserved OMP families (Hop, Hor, Hof et Hom) (Additional file 3). B38 lacks babB, babC, sabB, and homB, four OMPs known to play a major role in adhesion to gastric epithelial cells and possibly in long-term persistence of strains in the human gastric mucosa when associated with peptic ulcer diseases or gastric metaplasia [32]. B38 lacks a high number of adhesin genes among the sequenced genomes.
Comparative genomics and genome evolution
We then analyzed the genomic rearrangements through pair-wise genomic synteny comparisons between B38 and the eight published Helicobacteriaceae genomes. For five of the isolates (namely, 26695, J99, G27, P12, HPAG1), we confirmed the previously reported relative colinearity of the H.pylori genomes. This colinearity is mainly interrupted by insertion elements, the cagPAI, and genes encoding hypothetical proteins [33]. However, unexpectedly, conserved synteny highlighted an almost complete colinearity never described so far, between B38 and Shi470 (Figure 4). Shi470 is a clinical isolate from the gastric antrum of an Amerindian resident of a remote Amazonian village in Peru, and was thought to be related to strains from East Asia [RefSeq:NC_010698]. This unexpected absence of major genomic rearrangements between the two genomes prompted us to compare the genome of these two isolates more closely, as a way of better understanding H. pylori genome evolution. B38 lacks 174 Shi470 genes, of which 70 genes cluster in three insertion blocks: one corresponds to the well characterized cagPAI; another to a block of 33 CDSs, mainly remnants from a conjugative plasmid (presence of TraG, VirB11, toposiomerase I, ComB3, homologs of conjugal plasmid transfer system); and the third corresponds to a block that includes 7 CDSs encoding hypothetical proteins, as well as one CDS encoding an exodeoxyribonuclease subunit which is unique to the Shi 470 isolate.
Figure 4 Synteny lineplot pair-wise analyses between B38 and the H. pylori strain 26695, J99, HPAG1, Shi470, P12, G27, Helicobacter hepaticus, or Helicobacter acinonychis. Conversely, loss of synteny was also due to the presence of 110 CDSs in B38 that were not present in Shi470. Forty-three of these CDSs appeared as clusters within eight loci. Twenty corresponded to ISHp609 (5 complete and conserved copies of ISHp609 each comprising orf1, orf2, ORFA and ORFB) [31], which interrupts HELPY0571, HELPY0700 (both encoding restriction/modification systems), HELPY0838 (encoding a putative Rad50 ATPase), HELPY1330 (encoding a putative glycosyl-transferase), and HELPY1529 (a HAC prophage II protein homolog). In addition to these five ISHp609 insertions, loss of synteny was also due to the presence of CDSs in four other loci: i) a cluster of seven genes (HELPY1520 to HELPY1525 and HEPLY1527, HELPY1528 to HELPY1533) encoding HacII prophage-like proteins similar to those found in H. acinonychis strain Sheeba [34]; however, the size of the prophage is much larger (32 CDS) in this species, suggesting that the prophage in B38 has been deleted, possibly following the insertion of one copy of ISHp609; ii) a cluster of six genes encoding hypothetical proteins of unknown function (HELPY0051 to HELPY0056); iii) a cluster of three CDSs that are absent in Shi470, HPAG1, J99, P12, and G127, but present in strain 26695, of which two encode alginate-O-acetylation proteins (HELPY0497-498); iv) a cluster of seven CDSs that encode a putative helicase (HELPY0989) and a putative serine kinase (HELPY0990), two functional proteins not found in all of the other sequenced strains.
H. pylori core genome and strain-specific genes
BLAST score ratio analyses and comparisons between the B38 strain and the six other sequenced genomes, which were analyzed and revised through the MaGe system (Table 1), allowed us to establish that the core of the H. pylori genome consists of 1,275 CDSs. This number is slightly higher than that recently published by McClain and colleagues who identified 1,237 genes, as it takes into account additional CDSs detected by the MaGe system [35]. This number is lower than that calculated from data presented in Additional file 1 (1,358 genes) based on the macroarray hybridization analysis of 120 isolates. This approach overestimated the number of ubiquitous CDSs, as all small CDS (<350 bp) from the 26695 strain genome were excluded from the analysis, and thus were systematically counted as ubiquitous CDSs.
To identify strain-specific genes present in the B38 strain but absent from the other sequenced strains, we studied the putative orthologous relationship between two genomes i.e. gene couples who satisfy Bi-directional Best Hit (BBH) criteria. Criteria included a minimum of 30% sequence identity and 80% of the length of the smallest protein (Additional file 4). Only 16 CDSs were found to be unique to the B38 strain: nine seemed to be complete and thus putatively functional; six were shown to encode the putative HacII prophage-like proteins (HELPY1521-1522-1523-1524-1525-1527); three were found to encode hypothetical proteins (HELPY0409, HELPY0645 and HELPY0996), and seven corresponded to fragments of genes (partial genes) coding for either conserved hypothetical proteins, prophage-like sequences or for a restriction enzyme. Using the same methodology, we looked for genes that were present in the various H. pylori strains and absent in B38 (Additional file 5). If compared pair-wise, the number of CDSs absent in B38 was between 105 and 175. The only genes that were found to be exclusively absent in B38 corresponded to those of the cagPAI (Additional file 5), the well-known cluster of genes involved in the induction of a strong inflammatory response.
Specific properties associated with the genomes of strains belonging to the MALT lymphoma PAI minus cluster
Of the 19 strains belonging to the MALT lymphoma PAI minus cluster, all 19 contained the vacAm2 allele; 16 exhibited an s2m2 genotype, indicating that they encode a non-functional cytotoxin, and three exhibited an s1m2 genotype [18]. We then investigated whether the properties found to be unique to strain B38 are shared by the strains belonging to the cluster of the MALT lymphoma PAI-minus cluster. The search for the presence of the HacII-like prophage was done through hybridization using internal fragments of HELPY1521, HELPY1525, and HELPY1526 as probes. Four of the 19 strains (21%, including B38) of the MALT lymphoma PAI minus cluster, contained HacII prophage-like sequences. By contrast, 1/24 (4%) strains isolated from patients with MALT lymphoma containing cagPAI, 2/33 (6%) strains from patients suffering from gastritis and 2/27 strains (7.4%) from those with duodenal ulcers contained HacII prophage-like sequences. Furthermore, the presence of the two adjacent HELPY0989 and HELPY0990 genes encoding a helicase and a serine kinase, respectively, not previously found in the other sequenced genomes as functional proteins were found in three of the 19 strains (16%) of the B38 cluster. These two genes were not detected in the other MALT lymphoma strains (cagPAI positive), nor within the 22 isolates associated with gastritis and peptic ulcers. Finally, three clustered conservative mutations in glmM (HELPY0072 - Ala332, Leu333), leading to the absence of amplification of the 294-bp internal fragment of the phosphoglucosamine mutase-encoding gene [36], were observed in five of the 19 MALT lymphoma PAI minus isolates (26%). However, these mutations were not found in any of the 120 clinical isolates of this study, nor were they found in more than 400 H. pylori isolates associated with gastritis, peptic ulcers or metaplasia that were tested with identical oligonucleotides (personal data). These conservative mutations may be indicative of a selective pressure to maintain these mutations, together with a property encoded by a gene present in close proximity to glmM, a property that has yet to be identified. Thus, although none of the unique properties of B38 were shared by all MALT strains of the cluster, characterizing a cagPAI minus isolate containing either glmM mutations or HELPY0989-0990 genes may be predictive of MALT lymphoma, as these two characteristics were found exclusively among the strains of this cluster.
|
title
|
Results and Discussion
|
sec
|
Non-ubiquitous gene distribution in relation to associated diseases
Hybridization results for the 120 studied DNAs used as a probe and the home-made macroarrays derived from the reference strain 26695 are presented in Additional file 1 (data based on the binary presence/absence analyses) and Figure 1 (data based on the multidimensional analysis of continuous values, see material and methods). Both presentations illustrate the distribution of each of the 254 genes (213 non-ubiquitous, and 41 ubiquitous, used for normalization) with respect to associated diseases. Each strain hybridization profile (Figure 1) is represented by a series of vertically aligned bar charts, whereas the horizontal lines represent each of the 254 genes. Each strain exhibited a unique profile. The most striking features were related to the distribution of the cagPAI genes: almost all H. pylori strains associated with metaplasia harbored a complete cagPAI, a result consistent with findings by Nilsson et al. [26]. However, a complete cagPAI was present in 70% of duodenal ulcer strains, and in 50% of chronic gastritis and of MALT lymphoma strains, confirming previously published findings for isolates collected in the West [27].
Figure 1 Hybridization reactions on a DNA macroarray membrane containing 254 PCR products that are representative of H. pylori strain 26695 (41 ubiquitous genes + 213 non-ubiquitous or strain-specific genes). Bacterial DNAs from 120 isolates involved in various diseases, including chronic gastritis (yellow), intestinal metaplasia (pink), duodenal ulcer (blue) and gastric MZBL (green), were tested by hybridization. Isolates are listed on the horizontal axis, and the genes tested, on the vertical axis. Clustering (genesis software) was carried out using the continuous values from 120 heterologous hybridization experiments, where each value corresponds to the (log26695-logheterol.strain) value for each tested gene (see materials & methods). Colors of the line range from blue, if the gene is present, to red, if absent. The range of intermediate colors reflects the degree of hybridization and thus homology, but also the redundancy of the tested genes. This figure represents the clustering based on the complete set of 254 genes. Hierarchical clustering of the continuous values derived from the hybridization experiments of 120 French clinical isolates presenting different disease characteristics was performed (Figure 1). This allowed us to visualize a branch clustering almost exclusively isolates associated with MALT lymphoma. Furthermore, principal component analysis allowed us to identify a combination of 48 genes (Additional file 1), which proved to be the most informative during multidimensional analysis. We then performed hierarchical clustering based on the values of these 48 genes (Figure 2). Two main branches were detected, one consisting of a distinct cluster of 20 isolates, all totally deprived of the cagPAI. Eighteen of the isolates were associated with MALT lymphoma and two with gastritis. Interestingly, none of the peptic ulcer or metaplasia isolates clustered in this branch. The second branch splits into two main clusters, one corresponding to isolates that totally or partially lack cagPAI genes mostly associated with gastritis and the other clustering isolates associated with other diseases.
Figure 2 Hybridization reactions on a DNA macroarray membrane: clustering based on the 48 most discriminatory genes identified as key combinations of variables (genes/axes) from Principal Component Analysis. These 48 genes are labeled in Addional file 1. To clarify the genetic determinism of the MALT lymphoma strains, we selected one strain that was representative of the MALT lymphoma cagPAI minus branch and determined its genome sequence. We selected strain B38, which was isolated from a 62-year-old man suffering from MALT lymphoma. It fulfilled various requirements: i) it belonged to the hpEurope phylogenetic branch according to MLST analysis (Suerbaum, personal communication), a property that was consistent with the five Helicobacter genome sequences previously published (26695, J99, HPAG1, P12, and G27); ii) it was genetically transformable; iii) it was plasmid free, and iv) it was capable of colonizing the mouse gastric mucosa. Its vacA status was s2m2 [18].
|
title
|
Non-ubiquitous gene distribution in relation to associated diseases
|
p
|
Hybridization results for the 120 studied DNAs used as a probe and the home-made macroarrays derived from the reference strain 26695 are presented in Additional file 1 (data based on the binary presence/absence analyses) and Figure 1 (data based on the multidimensional analysis of continuous values, see material and methods). Both presentations illustrate the distribution of each of the 254 genes (213 non-ubiquitous, and 41 ubiquitous, used for normalization) with respect to associated diseases. Each strain hybridization profile (Figure 1) is represented by a series of vertically aligned bar charts, whereas the horizontal lines represent each of the 254 genes. Each strain exhibited a unique profile. The most striking features were related to the distribution of the cagPAI genes: almost all H. pylori strains associated with metaplasia harbored a complete cagPAI, a result consistent with findings by Nilsson et al. [26]. However, a complete cagPAI was present in 70% of duodenal ulcer strains, and in 50% of chronic gastritis and of MALT lymphoma strains, confirming previously published findings for isolates collected in the West [27].
|
figure caption
|
Figure 1 Hybridization reactions on a DNA macroarray membrane containing 254 PCR products that are representative of H. pylori strain 26695 (41 ubiquitous genes + 213 non-ubiquitous or strain-specific genes). Bacterial DNAs from 120 isolates involved in various diseases, including chronic gastritis (yellow), intestinal metaplasia (pink), duodenal ulcer (blue) and gastric MZBL (green), were tested by hybridization. Isolates are listed on the horizontal axis, and the genes tested, on the vertical axis. Clustering (genesis software) was carried out using the continuous values from 120 heterologous hybridization experiments, where each value corresponds to the (log26695-logheterol.strain) value for each tested gene (see materials & methods). Colors of the line range from blue, if the gene is present, to red, if absent. The range of intermediate colors reflects the degree of hybridization and thus homology, but also the redundancy of the tested genes. This figure represents the clustering based on the complete set of 254 genes.
|
p
|
Hybridization reactions on a DNA macroarray membrane containing 254 PCR products that are representative of H. pylori strain 26695 (41 ubiquitous genes + 213 non-ubiquitous or strain-specific genes). Bacterial DNAs from 120 isolates involved in various diseases, including chronic gastritis (yellow), intestinal metaplasia (pink), duodenal ulcer (blue) and gastric MZBL (green), were tested by hybridization. Isolates are listed on the horizontal axis, and the genes tested, on the vertical axis. Clustering (genesis software) was carried out using the continuous values from 120 heterologous hybridization experiments, where each value corresponds to the (log26695-logheterol.strain) value for each tested gene (see materials & methods). Colors of the line range from blue, if the gene is present, to red, if absent. The range of intermediate colors reflects the degree of hybridization and thus homology, but also the redundancy of the tested genes. This figure represents the clustering based on the complete set of 254 genes.
|
p
|
Hierarchical clustering of the continuous values derived from the hybridization experiments of 120 French clinical isolates presenting different disease characteristics was performed (Figure 1). This allowed us to visualize a branch clustering almost exclusively isolates associated with MALT lymphoma. Furthermore, principal component analysis allowed us to identify a combination of 48 genes (Additional file 1), which proved to be the most informative during multidimensional analysis. We then performed hierarchical clustering based on the values of these 48 genes (Figure 2). Two main branches were detected, one consisting of a distinct cluster of 20 isolates, all totally deprived of the cagPAI. Eighteen of the isolates were associated with MALT lymphoma and two with gastritis. Interestingly, none of the peptic ulcer or metaplasia isolates clustered in this branch. The second branch splits into two main clusters, one corresponding to isolates that totally or partially lack cagPAI genes mostly associated with gastritis and the other clustering isolates associated with other diseases.
|
figure caption
|
Figure 2 Hybridization reactions on a DNA macroarray membrane: clustering based on the 48 most discriminatory genes identified as key combinations of variables (genes/axes) from Principal Component Analysis. These 48 genes are labeled in Addional file 1.
|
p
|
Hybridization reactions on a DNA macroarray membrane: clustering based on the 48 most discriminatory genes identified as key combinations of variables (genes/axes) from Principal Component Analysis. These 48 genes are labeled in Addional file 1.
|
p
|
To clarify the genetic determinism of the MALT lymphoma strains, we selected one strain that was representative of the MALT lymphoma cagPAI minus branch and determined its genome sequence. We selected strain B38, which was isolated from a 62-year-old man suffering from MALT lymphoma. It fulfilled various requirements: i) it belonged to the hpEurope phylogenetic branch according to MLST analysis (Suerbaum, personal communication), a property that was consistent with the five Helicobacter genome sequences previously published (26695, J99, HPAG1, P12, and G27); ii) it was genetically transformable; iii) it was plasmid free, and iv) it was capable of colonizing the mouse gastric mucosa. Its vacA status was s2m2 [18].
|
sec
|
Main features of the B38 genome
The genome of the B38 strain consists of a circular chromosome containing 1,576,758 base pairs (bp) and an average GC content of 39.2% (Figure 3). It is the smallest H. pylori genome sequenced to date (Table 1). The B38 genome sequence was first automatically and then manually annotated using the MaGe system [28]http://www.genoscope.cns.fr/agc/mage and was then compared with the other sequenced H. pylori genomes. It contains 1,528 CDSs with a coding density (85.0%) similar to that found in the other Helicobacter sequenced strains. Among the 1528 CDSs, 1393 were predicted to be protein-coding genes (complete CDSs) with an average length of 971 bp; 135 correspond to partial CDSs, of which 133 are pseudogenes (i.e. 133 fragments representing 62 genes) and two are remnant genes (corresponding to truncated genes for which we cannot find the missing sections in close proximity) (Table 1).
Table 1 Summary of comparative features of Helicobacter genomes
aThese genomes have got a 9,369 bp (HPAG1), a 10,031 bp (G27), a 10,225 bp (P12), a 3,661 bp (Sheeba) plasmid and a 10,031 bp (G27) and a 10,225 bp (P12). Plasmids were not counted
bRevised number with the MaGe system and manual curation
cPercentage of fragments of genes/total CDSs
dPercentage of fragmented genes/total CDSs
e Number of copies
Figure 3 Genome map of Helicobacter pylori strain B38. From outside to inside: - GC skew (window 2500, step 500) in blue. - Total CDSs (green) with pseudogenes/partial genes (purple). - CDSs coding for hypothetical restriction/modification systems (purple), phage proteins (orange), or insertion sequences (ISHp609) (green). - Total CDSs according to the matrix defined for gene identification (matrix n°1 in red, matrix N°2 in black, matrix n°3 in green). - RNA (rRNA in green, tRNA in purple and misc_RNA in red). - Rule. - GC% (window 5000, step 2000) in yellow. Red arrow indicates the position of the origin of replication. Of the 1,528 annotated CDSs, a function was assigned to 989 CDSs (64.7%). For 784 of them (79.3%), a function was experimentally demonstrated either in the Helicobacter species (188, 12.3%) or in another organism (596, 39%). Two hundred and five CDSs (20.7% of 989) received a function based on the presence of a conserved amino acid motif, a structural feature, or limited homology. A total of 378 CDSs have homologs in previously reported sequences of the genus Helicobacter (43.6% of 378), in the epsilon proteobacteria (35.2% of 378), or in other distant bacteria (21.2% of 378). Protein function classification based on the cluster orthologous genes classification (COG) database allowed us to place 1189 of the 1528 CDSs (77.81%) in at least one of the COG functional groups (Table 2): 454 were assigned to cellular processes and signaling systems, 342 to information storage and processing, while 595 were involved in metabolism. The B38 genome exhibits the highest percentage of CDSs associated with a COG group (77.97% vs 73.38% for 26695, 76.48% for J99, 76.15% for HPAG1 and, 73.49 for Shi470), with the number of CDSs involved in defense mechanisms slightly higher than in the other sequenced Helicobacter strains.
Table 2 Automatic distribution of protein functions, based on the COG classification, between Helicobacter strains
*The CDSs were manually curated in the MaGe system for the elimination of artifacts. There are a significant number of restriction/modification systems present in H. pylori; their composition and activity have been shown to be strain-specific [29]. In the B38 strain, 63 CDSs were involved in restriction/modification systems. Among them, 30 elements were fragmented into pseudogenes corresponding to 12 potential genes, and three elements appeared to be partial genes (Additional file 2). Thus, the proportion of potentially active genes (52%) appeared to be higher in B38 than in strains J99 and 26695, in which only 30% of type II R-M systems were reported to be functional [30].
The B38 genome harbors five complete copies of the four-gene insertion sequence ISHp609. This insertion sequence was frequently found in H. pylori strains from Europe, Americas, India and Africa, but was almost always absent in strains from East Asia [31]. Three of the four genes (orf1, orf2, ORFA) demonstrated 100% of identity in the five B38 ISHp609 copies, whereas ORFB from one of the five B38 ISHp609 copies (HELPY1334) exhibited a single mutation. Among the sequenced genomes (Table 1), a single and complete copy of this element was found in strain HPAG1, but it differed slightly from that found in B38 (6, 8, and 9 mutations are present in orf1, ORFA, and ORFB of HPAG1, respectively). This consistency in the five copies of ISHp609 in B38 indicated that it has been acquired very recently, and that it is probably an active element that is capable of transposition, a property never experimentally demonstrated for a transposable element in H. pylori.
Another property associated with the B38 genome relates to the complete absence of four of the 45 genes encoding outer membrane proteins (OMPs) from the four conserved OMP families (Hop, Hor, Hof et Hom) (Additional file 3). B38 lacks babB, babC, sabB, and homB, four OMPs known to play a major role in adhesion to gastric epithelial cells and possibly in long-term persistence of strains in the human gastric mucosa when associated with peptic ulcer diseases or gastric metaplasia [32]. B38 lacks a high number of adhesin genes among the sequenced genomes.
|
title
|
Main features of the B38 genome
|
p
|
The genome of the B38 strain consists of a circular chromosome containing 1,576,758 base pairs (bp) and an average GC content of 39.2% (Figure 3). It is the smallest H. pylori genome sequenced to date (Table 1). The B38 genome sequence was first automatically and then manually annotated using the MaGe system [28]http://www.genoscope.cns.fr/agc/mage and was then compared with the other sequenced H. pylori genomes. It contains 1,528 CDSs with a coding density (85.0%) similar to that found in the other Helicobacter sequenced strains. Among the 1528 CDSs, 1393 were predicted to be protein-coding genes (complete CDSs) with an average length of 971 bp; 135 correspond to partial CDSs, of which 133 are pseudogenes (i.e. 133 fragments representing 62 genes) and two are remnant genes (corresponding to truncated genes for which we cannot find the missing sections in close proximity) (Table 1).
|
table caption
|
Table 1 Summary of comparative features of Helicobacter genomes
aThese genomes have got a 9,369 bp (HPAG1), a 10,031 bp (G27), a 10,225 bp (P12), a 3,661 bp (Sheeba) plasmid and a 10,031 bp (G27) and a 10,225 bp (P12). Plasmids were not counted
bRevised number with the MaGe system and manual curation
cPercentage of fragments of genes/total CDSs
dPercentage of fragmented genes/total CDSs
e Number of copies
|
p
|
Summary of comparative features of Helicobacter genomes
|
p
|
aThese genomes have got a 9,369 bp (HPAG1), a 10,031 bp (G27), a 10,225 bp (P12), a 3,661 bp (Sheeba) plasmid and a 10,031 bp (G27) and a 10,225 bp (P12). Plasmids were not counted
|
p
|
bRevised number with the MaGe system and manual curation
|
p
|
cPercentage of fragments of genes/total CDSs
|
p
|
dPercentage of fragmented genes/total CDSs
|
p
|
e Number of copies
|
figure caption
|
Figure 3 Genome map of Helicobacter pylori strain B38. From outside to inside: - GC skew (window 2500, step 500) in blue. - Total CDSs (green) with pseudogenes/partial genes (purple). - CDSs coding for hypothetical restriction/modification systems (purple), phage proteins (orange), or insertion sequences (ISHp609) (green). - Total CDSs according to the matrix defined for gene identification (matrix n°1 in red, matrix N°2 in black, matrix n°3 in green). - RNA (rRNA in green, tRNA in purple and misc_RNA in red). - Rule. - GC% (window 5000, step 2000) in yellow. Red arrow indicates the position of the origin of replication.
|
p
|
Genome map of Helicobacter pylori strain B38. From outside to inside: - GC skew (window 2500, step 500) in blue. - Total CDSs (green) with pseudogenes/partial genes (purple). - CDSs coding for hypothetical restriction/modification systems (purple), phage proteins (orange), or insertion sequences (ISHp609) (green). - Total CDSs according to the matrix defined for gene identification (matrix n°1 in red, matrix N°2 in black, matrix n°3 in green). - RNA (rRNA in green, tRNA in purple and misc_RNA in red). - Rule. - GC% (window 5000, step 2000) in yellow. Red arrow indicates the position of the origin of replication.
|
p
|
Of the 1,528 annotated CDSs, a function was assigned to 989 CDSs (64.7%). For 784 of them (79.3%), a function was experimentally demonstrated either in the Helicobacter species (188, 12.3%) or in another organism (596, 39%). Two hundred and five CDSs (20.7% of 989) received a function based on the presence of a conserved amino acid motif, a structural feature, or limited homology. A total of 378 CDSs have homologs in previously reported sequences of the genus Helicobacter (43.6% of 378), in the epsilon proteobacteria (35.2% of 378), or in other distant bacteria (21.2% of 378). Protein function classification based on the cluster orthologous genes classification (COG) database allowed us to place 1189 of the 1528 CDSs (77.81%) in at least one of the COG functional groups (Table 2): 454 were assigned to cellular processes and signaling systems, 342 to information storage and processing, while 595 were involved in metabolism. The B38 genome exhibits the highest percentage of CDSs associated with a COG group (77.97% vs 73.38% for 26695, 76.48% for J99, 76.15% for HPAG1 and, 73.49 for Shi470), with the number of CDSs involved in defense mechanisms slightly higher than in the other sequenced Helicobacter strains.
|
table caption
|
Table 2 Automatic distribution of protein functions, based on the COG classification, between Helicobacter strains
*The CDSs were manually curated in the MaGe system for the elimination of artifacts.
|
p
|
Automatic distribution of protein functions, based on the COG classification, between Helicobacter strains
|
p
|
*The CDSs were manually curated in the MaGe system for the elimination of artifacts.
|
p
|
There are a significant number of restriction/modification systems present in H. pylori; their composition and activity have been shown to be strain-specific [29]. In the B38 strain, 63 CDSs were involved in restriction/modification systems. Among them, 30 elements were fragmented into pseudogenes corresponding to 12 potential genes, and three elements appeared to be partial genes (Additional file 2). Thus, the proportion of potentially active genes (52%) appeared to be higher in B38 than in strains J99 and 26695, in which only 30% of type II R-M systems were reported to be functional [30].
|
p
|
The B38 genome harbors five complete copies of the four-gene insertion sequence ISHp609. This insertion sequence was frequently found in H. pylori strains from Europe, Americas, India and Africa, but was almost always absent in strains from East Asia [31]. Three of the four genes (orf1, orf2, ORFA) demonstrated 100% of identity in the five B38 ISHp609 copies, whereas ORFB from one of the five B38 ISHp609 copies (HELPY1334) exhibited a single mutation. Among the sequenced genomes (Table 1), a single and complete copy of this element was found in strain HPAG1, but it differed slightly from that found in B38 (6, 8, and 9 mutations are present in orf1, ORFA, and ORFB of HPAG1, respectively). This consistency in the five copies of ISHp609 in B38 indicated that it has been acquired very recently, and that it is probably an active element that is capable of transposition, a property never experimentally demonstrated for a transposable element in H. pylori.
|
p
|
Another property associated with the B38 genome relates to the complete absence of four of the 45 genes encoding outer membrane proteins (OMPs) from the four conserved OMP families (Hop, Hor, Hof et Hom) (Additional file 3). B38 lacks babB, babC, sabB, and homB, four OMPs known to play a major role in adhesion to gastric epithelial cells and possibly in long-term persistence of strains in the human gastric mucosa when associated with peptic ulcer diseases or gastric metaplasia [32]. B38 lacks a high number of adhesin genes among the sequenced genomes.
|
sec
|
Comparative genomics and genome evolution
We then analyzed the genomic rearrangements through pair-wise genomic synteny comparisons between B38 and the eight published Helicobacteriaceae genomes. For five of the isolates (namely, 26695, J99, G27, P12, HPAG1), we confirmed the previously reported relative colinearity of the H.pylori genomes. This colinearity is mainly interrupted by insertion elements, the cagPAI, and genes encoding hypothetical proteins [33]. However, unexpectedly, conserved synteny highlighted an almost complete colinearity never described so far, between B38 and Shi470 (Figure 4). Shi470 is a clinical isolate from the gastric antrum of an Amerindian resident of a remote Amazonian village in Peru, and was thought to be related to strains from East Asia [RefSeq:NC_010698]. This unexpected absence of major genomic rearrangements between the two genomes prompted us to compare the genome of these two isolates more closely, as a way of better understanding H. pylori genome evolution. B38 lacks 174 Shi470 genes, of which 70 genes cluster in three insertion blocks: one corresponds to the well characterized cagPAI; another to a block of 33 CDSs, mainly remnants from a conjugative plasmid (presence of TraG, VirB11, toposiomerase I, ComB3, homologs of conjugal plasmid transfer system); and the third corresponds to a block that includes 7 CDSs encoding hypothetical proteins, as well as one CDS encoding an exodeoxyribonuclease subunit which is unique to the Shi 470 isolate.
Figure 4 Synteny lineplot pair-wise analyses between B38 and the H. pylori strain 26695, J99, HPAG1, Shi470, P12, G27, Helicobacter hepaticus, or Helicobacter acinonychis. Conversely, loss of synteny was also due to the presence of 110 CDSs in B38 that were not present in Shi470. Forty-three of these CDSs appeared as clusters within eight loci. Twenty corresponded to ISHp609 (5 complete and conserved copies of ISHp609 each comprising orf1, orf2, ORFA and ORFB) [31], which interrupts HELPY0571, HELPY0700 (both encoding restriction/modification systems), HELPY0838 (encoding a putative Rad50 ATPase), HELPY1330 (encoding a putative glycosyl-transferase), and HELPY1529 (a HAC prophage II protein homolog). In addition to these five ISHp609 insertions, loss of synteny was also due to the presence of CDSs in four other loci: i) a cluster of seven genes (HELPY1520 to HELPY1525 and HEPLY1527, HELPY1528 to HELPY1533) encoding HacII prophage-like proteins similar to those found in H. acinonychis strain Sheeba [34]; however, the size of the prophage is much larger (32 CDS) in this species, suggesting that the prophage in B38 has been deleted, possibly following the insertion of one copy of ISHp609; ii) a cluster of six genes encoding hypothetical proteins of unknown function (HELPY0051 to HELPY0056); iii) a cluster of three CDSs that are absent in Shi470, HPAG1, J99, P12, and G127, but present in strain 26695, of which two encode alginate-O-acetylation proteins (HELPY0497-498); iv) a cluster of seven CDSs that encode a putative helicase (HELPY0989) and a putative serine kinase (HELPY0990), two functional proteins not found in all of the other sequenced strains.
|
title
|
Comparative genomics and genome evolution
|
p
|
We then analyzed the genomic rearrangements through pair-wise genomic synteny comparisons between B38 and the eight published Helicobacteriaceae genomes. For five of the isolates (namely, 26695, J99, G27, P12, HPAG1), we confirmed the previously reported relative colinearity of the H.pylori genomes. This colinearity is mainly interrupted by insertion elements, the cagPAI, and genes encoding hypothetical proteins [33]. However, unexpectedly, conserved synteny highlighted an almost complete colinearity never described so far, between B38 and Shi470 (Figure 4). Shi470 is a clinical isolate from the gastric antrum of an Amerindian resident of a remote Amazonian village in Peru, and was thought to be related to strains from East Asia [RefSeq:NC_010698]. This unexpected absence of major genomic rearrangements between the two genomes prompted us to compare the genome of these two isolates more closely, as a way of better understanding H. pylori genome evolution. B38 lacks 174 Shi470 genes, of which 70 genes cluster in three insertion blocks: one corresponds to the well characterized cagPAI; another to a block of 33 CDSs, mainly remnants from a conjugative plasmid (presence of TraG, VirB11, toposiomerase I, ComB3, homologs of conjugal plasmid transfer system); and the third corresponds to a block that includes 7 CDSs encoding hypothetical proteins, as well as one CDS encoding an exodeoxyribonuclease subunit which is unique to the Shi 470 isolate.
|
figure caption
|
Figure 4 Synteny lineplot pair-wise analyses between B38 and the H. pylori strain 26695, J99, HPAG1, Shi470, P12, G27, Helicobacter hepaticus, or Helicobacter acinonychis.
|
p
|
Synteny lineplot pair-wise analyses between B38 and the H. pylori strain 26695, J99, HPAG1, Shi470, P12, G27, Helicobacter hepaticus, or Helicobacter acinonychis.
|
p
|
Conversely, loss of synteny was also due to the presence of 110 CDSs in B38 that were not present in Shi470. Forty-three of these CDSs appeared as clusters within eight loci. Twenty corresponded to ISHp609 (5 complete and conserved copies of ISHp609 each comprising orf1, orf2, ORFA and ORFB) [31], which interrupts HELPY0571, HELPY0700 (both encoding restriction/modification systems), HELPY0838 (encoding a putative Rad50 ATPase), HELPY1330 (encoding a putative glycosyl-transferase), and HELPY1529 (a HAC prophage II protein homolog). In addition to these five ISHp609 insertions, loss of synteny was also due to the presence of CDSs in four other loci: i) a cluster of seven genes (HELPY1520 to HELPY1525 and HEPLY1527, HELPY1528 to HELPY1533) encoding HacII prophage-like proteins similar to those found in H. acinonychis strain Sheeba [34]; however, the size of the prophage is much larger (32 CDS) in this species, suggesting that the prophage in B38 has been deleted, possibly following the insertion of one copy of ISHp609; ii) a cluster of six genes encoding hypothetical proteins of unknown function (HELPY0051 to HELPY0056); iii) a cluster of three CDSs that are absent in Shi470, HPAG1, J99, P12, and G127, but present in strain 26695, of which two encode alginate-O-acetylation proteins (HELPY0497-498); iv) a cluster of seven CDSs that encode a putative helicase (HELPY0989) and a putative serine kinase (HELPY0990), two functional proteins not found in all of the other sequenced strains.
|
sec
|
H. pylori core genome and strain-specific genes
BLAST score ratio analyses and comparisons between the B38 strain and the six other sequenced genomes, which were analyzed and revised through the MaGe system (Table 1), allowed us to establish that the core of the H. pylori genome consists of 1,275 CDSs. This number is slightly higher than that recently published by McClain and colleagues who identified 1,237 genes, as it takes into account additional CDSs detected by the MaGe system [35]. This number is lower than that calculated from data presented in Additional file 1 (1,358 genes) based on the macroarray hybridization analysis of 120 isolates. This approach overestimated the number of ubiquitous CDSs, as all small CDS (<350 bp) from the 26695 strain genome were excluded from the analysis, and thus were systematically counted as ubiquitous CDSs.
To identify strain-specific genes present in the B38 strain but absent from the other sequenced strains, we studied the putative orthologous relationship between two genomes i.e. gene couples who satisfy Bi-directional Best Hit (BBH) criteria. Criteria included a minimum of 30% sequence identity and 80% of the length of the smallest protein (Additional file 4). Only 16 CDSs were found to be unique to the B38 strain: nine seemed to be complete and thus putatively functional; six were shown to encode the putative HacII prophage-like proteins (HELPY1521-1522-1523-1524-1525-1527); three were found to encode hypothetical proteins (HELPY0409, HELPY0645 and HELPY0996), and seven corresponded to fragments of genes (partial genes) coding for either conserved hypothetical proteins, prophage-like sequences or for a restriction enzyme. Using the same methodology, we looked for genes that were present in the various H. pylori strains and absent in B38 (Additional file 5). If compared pair-wise, the number of CDSs absent in B38 was between 105 and 175. The only genes that were found to be exclusively absent in B38 corresponded to those of the cagPAI (Additional file 5), the well-known cluster of genes involved in the induction of a strong inflammatory response.
|
title
|
H. pylori core genome and strain-specific genes
|
p
|
BLAST score ratio analyses and comparisons between the B38 strain and the six other sequenced genomes, which were analyzed and revised through the MaGe system (Table 1), allowed us to establish that the core of the H. pylori genome consists of 1,275 CDSs. This number is slightly higher than that recently published by McClain and colleagues who identified 1,237 genes, as it takes into account additional CDSs detected by the MaGe system [35]. This number is lower than that calculated from data presented in Additional file 1 (1,358 genes) based on the macroarray hybridization analysis of 120 isolates. This approach overestimated the number of ubiquitous CDSs, as all small CDS (<350 bp) from the 26695 strain genome were excluded from the analysis, and thus were systematically counted as ubiquitous CDSs.
|
p
|
To identify strain-specific genes present in the B38 strain but absent from the other sequenced strains, we studied the putative orthologous relationship between two genomes i.e. gene couples who satisfy Bi-directional Best Hit (BBH) criteria. Criteria included a minimum of 30% sequence identity and 80% of the length of the smallest protein (Additional file 4). Only 16 CDSs were found to be unique to the B38 strain: nine seemed to be complete and thus putatively functional; six were shown to encode the putative HacII prophage-like proteins (HELPY1521-1522-1523-1524-1525-1527); three were found to encode hypothetical proteins (HELPY0409, HELPY0645 and HELPY0996), and seven corresponded to fragments of genes (partial genes) coding for either conserved hypothetical proteins, prophage-like sequences or for a restriction enzyme. Using the same methodology, we looked for genes that were present in the various H. pylori strains and absent in B38 (Additional file 5). If compared pair-wise, the number of CDSs absent in B38 was between 105 and 175. The only genes that were found to be exclusively absent in B38 corresponded to those of the cagPAI (Additional file 5), the well-known cluster of genes involved in the induction of a strong inflammatory response.
|
sec
|
Specific properties associated with the genomes of strains belonging to the MALT lymphoma PAI minus cluster
Of the 19 strains belonging to the MALT lymphoma PAI minus cluster, all 19 contained the vacAm2 allele; 16 exhibited an s2m2 genotype, indicating that they encode a non-functional cytotoxin, and three exhibited an s1m2 genotype [18]. We then investigated whether the properties found to be unique to strain B38 are shared by the strains belonging to the cluster of the MALT lymphoma PAI-minus cluster. The search for the presence of the HacII-like prophage was done through hybridization using internal fragments of HELPY1521, HELPY1525, and HELPY1526 as probes. Four of the 19 strains (21%, including B38) of the MALT lymphoma PAI minus cluster, contained HacII prophage-like sequences. By contrast, 1/24 (4%) strains isolated from patients with MALT lymphoma containing cagPAI, 2/33 (6%) strains from patients suffering from gastritis and 2/27 strains (7.4%) from those with duodenal ulcers contained HacII prophage-like sequences. Furthermore, the presence of the two adjacent HELPY0989 and HELPY0990 genes encoding a helicase and a serine kinase, respectively, not previously found in the other sequenced genomes as functional proteins were found in three of the 19 strains (16%) of the B38 cluster. These two genes were not detected in the other MALT lymphoma strains (cagPAI positive), nor within the 22 isolates associated with gastritis and peptic ulcers. Finally, three clustered conservative mutations in glmM (HELPY0072 - Ala332, Leu333), leading to the absence of amplification of the 294-bp internal fragment of the phosphoglucosamine mutase-encoding gene [36], were observed in five of the 19 MALT lymphoma PAI minus isolates (26%). However, these mutations were not found in any of the 120 clinical isolates of this study, nor were they found in more than 400 H. pylori isolates associated with gastritis, peptic ulcers or metaplasia that were tested with identical oligonucleotides (personal data). These conservative mutations may be indicative of a selective pressure to maintain these mutations, together with a property encoded by a gene present in close proximity to glmM, a property that has yet to be identified. Thus, although none of the unique properties of B38 were shared by all MALT strains of the cluster, characterizing a cagPAI minus isolate containing either glmM mutations or HELPY0989-0990 genes may be predictive of MALT lymphoma, as these two characteristics were found exclusively among the strains of this cluster.
|
title
|
Specific properties associated with the genomes of strains belonging to the MALT lymphoma PAI minus cluster
|
p
|
Of the 19 strains belonging to the MALT lymphoma PAI minus cluster, all 19 contained the vacAm2 allele; 16 exhibited an s2m2 genotype, indicating that they encode a non-functional cytotoxin, and three exhibited an s1m2 genotype [18]. We then investigated whether the properties found to be unique to strain B38 are shared by the strains belonging to the cluster of the MALT lymphoma PAI-minus cluster. The search for the presence of the HacII-like prophage was done through hybridization using internal fragments of HELPY1521, HELPY1525, and HELPY1526 as probes. Four of the 19 strains (21%, including B38) of the MALT lymphoma PAI minus cluster, contained HacII prophage-like sequences. By contrast, 1/24 (4%) strains isolated from patients with MALT lymphoma containing cagPAI, 2/33 (6%) strains from patients suffering from gastritis and 2/27 strains (7.4%) from those with duodenal ulcers contained HacII prophage-like sequences. Furthermore, the presence of the two adjacent HELPY0989 and HELPY0990 genes encoding a helicase and a serine kinase, respectively, not previously found in the other sequenced genomes as functional proteins were found in three of the 19 strains (16%) of the B38 cluster. These two genes were not detected in the other MALT lymphoma strains (cagPAI positive), nor within the 22 isolates associated with gastritis and peptic ulcers. Finally, three clustered conservative mutations in glmM (HELPY0072 - Ala332, Leu333), leading to the absence of amplification of the 294-bp internal fragment of the phosphoglucosamine mutase-encoding gene [36], were observed in five of the 19 MALT lymphoma PAI minus isolates (26%). However, these mutations were not found in any of the 120 clinical isolates of this study, nor were they found in more than 400 H. pylori isolates associated with gastritis, peptic ulcers or metaplasia that were tested with identical oligonucleotides (personal data). These conservative mutations may be indicative of a selective pressure to maintain these mutations, together with a property encoded by a gene present in close proximity to glmM, a property that has yet to be identified. Thus, although none of the unique properties of B38 were shared by all MALT strains of the cluster, characterizing a cagPAI minus isolate containing either glmM mutations or HELPY0989-0990 genes may be predictive of MALT lymphoma, as these two characteristics were found exclusively among the strains of this cluster.
|
sec
|
Conclusion
The study was initiated with the aim of gaining insight into the existence of bacterial determinism for gastric extra-nodal marginal zone B-cell MALT lymphoma. DNA hydridization against the whole genome of 120 clinical isolates revealed a cluster of 19 H. pylori strains, all completely deprived of cagPAI sequences originating from patients with MALT lymphoma. We sequenced the genome of strain B38, a representative of this cluster, and describe the first genome sequence of a cagPAI minus H. pylori strain. The absence of the cagPAI, including that of several non-ubiquitous genes, makes the B38 genome the smallest H. pylori genome described to date. The cagPAI minus B38 strain lacks a functional cytotoxin (vacAs2m2) as well as genes encoding the major adhesion factors (absence of babB, babC, sabB, and homB); thus, compared with well-known pro-inflammatory H. pylori isolates, it appears to be deprived of all known pathogenic determinants, but is nonetheless associated with gastric neoplasia. Further investigation is required to fully understand the difference in fitness between these strains with low pro-inflammatory profiles and the human host factors that may play a significant role in the development of gastric MALT lymphoma.
|
title
|
Conclusion
|
p
|
The study was initiated with the aim of gaining insight into the existence of bacterial determinism for gastric extra-nodal marginal zone B-cell MALT lymphoma. DNA hydridization against the whole genome of 120 clinical isolates revealed a cluster of 19 H. pylori strains, all completely deprived of cagPAI sequences originating from patients with MALT lymphoma. We sequenced the genome of strain B38, a representative of this cluster, and describe the first genome sequence of a cagPAI minus H. pylori strain. The absence of the cagPAI, including that of several non-ubiquitous genes, makes the B38 genome the smallest H. pylori genome described to date. The cagPAI minus B38 strain lacks a functional cytotoxin (vacAs2m2) as well as genes encoding the major adhesion factors (absence of babB, babC, sabB, and homB); thus, compared with well-known pro-inflammatory H. pylori isolates, it appears to be deprived of all known pathogenic determinants, but is nonetheless associated with gastric neoplasia. Further investigation is required to fully understand the difference in fitness between these strains with low pro-inflammatory profiles and the human host factors that may play a significant role in the development of gastric MALT lymphoma.
|
sec
|
Methods
H. pylori strains, and growth
We examined 120 H. pylori strains isolated from patients from different areas of France enrolled in 3 multi-center studies carried out by 1) the Groupe d'Etude Français des Helicobacter (G.E.F.H.), 2) the Groupe d'Etude Français des Lymphomes Digestifs (G.E.L.D.) [37] and of the Fédération Française de Cancérologie Digestive (F.F.C.D.) [38], and 3) the Groupe d'Etude des Lymphomes de l'Adulte (G.E.L.A.). Criteria for patient inclusion were age (>55 years), suffering from chronic gastritis (n = 33), duodenal ulcer without intestinal metaplasia (27), intestinal metaplasia without ulcer (n = 17). We identified 43 strains from patients with gastric MALT lymphoma. H. pylori was isolated from one biopsy specimen following biopsy homogenization and culture under microaerophilic conditions (5-6% 02, 8-10% CO2, 80-85% N2) on blood agar medium (BA; Oxoid blood agar base N°2) supplemented with 10% horse blood, as reported previously [39]. One colony was selected at random from each primary culture; it was then sub-cultured and used to prepare chromosomal DNA. This DNA was extracted from 48-hour-old confluent cells using the QIAamp Tissue kit (Qiagen, Chatsworth, CA) according to the manufacturer's recommendations.
In house DNA macroarray membrane preparation
A total of 254 PCR products were amplified in four 96-well microtiter plates, corresponding to 41 ubiquitous and 213 non-ubiquitous genes from the genome of strain 26695 as previously described [39]. Briefly, amplification reactions were performed in 2 × 100 μl reaction volumes, in which 2 μl of DNA corresponding to the recombinant plasmid containing the full-length CDS (CoDing Sequence) inserted into the pILL570-derivative vector was used as template. Each PCR product was sequenced to confirm the identity of the gene, and was then spotted in triplicate onto a nylon membrane (Qfilter, Genetix 22.2 × 22.2 cm, N+) using a Qpix robot (Genetix). Denaturated 26695 genomic DNA was spotted in triplicate at the four corners of the membrane (positive controls) and seven squares were left empty as negative controls. Following spot deposition, membranes were fixed for 15 minutes in 0.5 M NaOH 1.5 M NaCl, washed briefly in distilled water, and stored wet at -20°C until use [39].
Aliquots of 250 μl of DNA were labeled by random priming with 2 μl of 33P-dCTP. Labeling was performed for 3 hours at room temperature. Unincorporated radionucleotides were removed by purification on Quick Spin Sephadex G-25 columns (Roche Diagnostics). Immediately before being used for hybridization experiments, the sonicated, labeled, and purified chromosomal DNA was heat-denaturated and cooled on ice. Hybridization was conducted in 5 ml prewarmed (65°C) hybridization mixtures containing the heat denaturated probe, with overnight incubation. Membranes were then washed and exposed for 25 hours to a phosphoimager screen (Molecular Dynamics).
Screens were scanned on a Storm 860 machine (Molecular Dynamics). Image analysis and quantification of hybridization intensities for each spot were performed using the Xdots Reader program (COSE) and determined in pixels [39]. The intensity of the background surrounding each spot was substracted from that of each of the spots. Twenty-one homologous hybridizations were performed. The average intensity of the 41 ubiquitous genes was calculated for each reference array. This number served to allocate a reference array to each heterologous hybridization (average of the ubiquitous spots from the heterologous and the homologous reference hybridizations were not significantly different, Student's test), to calculate the ratio used for normalization. Following normalization, the data were analyzed by attributing a binary score (presence/absence - Additional file 1) or by multidimensional analysis based on continuous intensity values (Figure 1 and Figure 2). To define the cutoff ratio for the presence/absence of a gene, we analyzed the results for the sequenced H. pylori J99 DNA hybridized with H. pylori 26695; the threshold for the presence of a gene was defined as >0.25. The multidimensional analyses (Genesis software) for the hierarchical clustering as well as for the Principal Component Analysis were performed using the 254 continuous values from the 120 heterologous hybridization experiments, each corresponding to (log10normalized intensity values of strain 26695) minus (log10normalized intensity values of the heterologous strain) (i.e. log26695-logheterol.strain).
Sequencing and annotation of the B38 genome
Genomic DNA was randomly sheared by nebulization (HydroShear, GeneMachines) and the ends were enzymatically repaired. SmaI fragments (1.5-4 kb) were inserted into plasmid vector pBAM3/SmaI (derived from pBluescript KS and constructed by R. Heilig). Large (35-45 kb) DNA fragments generated from partial BamHI-restriction were inserted into the cosmid vector pHC79/BamHI.
Plasmid DNA was prepared with the TempliPhi DNA sequencing template amplification kit (GE Healthcare-Bio-Sciences). Cosmid DNA was purified with the Montage BAC Miniprep96 Kit (Millipore). Sequencing reactions were performed from both ends of DNA templates using ABI PRISM BigDye Terminator cycle sequencing ready reactions kits and were run on a 3700 or a 3730 xl Genetic Analyzer (Applied Biosystems).
Sequence data base calling was carried out using Phred [40]. Sequences not meeting our production quality criteria (at least 100 bases called with a quality over 20) were discarded. Sequences were screened against plasmid vector and E. coli sequences. The traces were assembled using Phrap and Consed [41]. Whole genome shotgun sequencing was performed to ensure approximately 11-fold coverage. Autofinish [42] was used to design primers for improving regions of low quality sequence and for primer walking along templates spanning the gaps between contigs. Several strategies were used to orientate contigs and to enable directed PCR-based approaches to span the gaps between contigs. These strategies included linking isolates and a Blast-based approach, which identified contigs with hits to the H. pylori strain 26695 genome. Various combined PCR techniques were used to amplify genomic or cosmid DNA, to close the gaps between the final contigs. Outward-directed primers were designed for each of the contig ends; the primer sequences were subsequently checked and confirmed to be unique to the genome. This combined PCR process required approximately 200 PCR reactions pairing each of the primers. In addition, two cosmid isolates containing a rDNA operon copy each, were completely sequenced by sub-cloning into a pSMART-LC vector (Lucigen Corp.). The error rate was less than 1 error per 10,000 bp in the final assembly. The complete genome sequence was obtained from 40 153 sequences, resulting in 14-fold coverage.
AMIGene software was used to predict which CDSs were likely to encode proteins [43]. The set of predicted genes underwent automatic functional annotation using the set of tools listed in Vallenet et al. [28]. All these data (syntactic and functional annotations, results of comparative analysis) are stored in a relational database, called PyloriScope. Manual validation of the automatic annotation was performed using the MaGe (Magnifying Genomes, http://www.genoscope.cns.fr) web-based interface, which allows graphic visualization of the annotations enhanced by the synchronized representation of synteny groups in other genomes chosen for comparison.
Accession Numbers
The EMBL Nucleotide Sequence Database http://www.ebi.ac.uk/embl accession number for the H. pylori strain B38 chromosome is [EMBL:FM991728].
All data and comparative genomics concerning the H. pylori B38 genome are stored in PyloriScope http://www.genoscope.cns.fr/agc/mage, a related database that is available to the public.
|
title
|
Methods
|
sec
|
H. pylori strains, and growth
We examined 120 H. pylori strains isolated from patients from different areas of France enrolled in 3 multi-center studies carried out by 1) the Groupe d'Etude Français des Helicobacter (G.E.F.H.), 2) the Groupe d'Etude Français des Lymphomes Digestifs (G.E.L.D.) [37] and of the Fédération Française de Cancérologie Digestive (F.F.C.D.) [38], and 3) the Groupe d'Etude des Lymphomes de l'Adulte (G.E.L.A.). Criteria for patient inclusion were age (>55 years), suffering from chronic gastritis (n = 33), duodenal ulcer without intestinal metaplasia (27), intestinal metaplasia without ulcer (n = 17). We identified 43 strains from patients with gastric MALT lymphoma. H. pylori was isolated from one biopsy specimen following biopsy homogenization and culture under microaerophilic conditions (5-6% 02, 8-10% CO2, 80-85% N2) on blood agar medium (BA; Oxoid blood agar base N°2) supplemented with 10% horse blood, as reported previously [39]. One colony was selected at random from each primary culture; it was then sub-cultured and used to prepare chromosomal DNA. This DNA was extracted from 48-hour-old confluent cells using the QIAamp Tissue kit (Qiagen, Chatsworth, CA) according to the manufacturer's recommendations.
|
title
|
H. pylori strains, and growth
|
p
|
We examined 120 H. pylori strains isolated from patients from different areas of France enrolled in 3 multi-center studies carried out by 1) the Groupe d'Etude Français des Helicobacter (G.E.F.H.), 2) the Groupe d'Etude Français des Lymphomes Digestifs (G.E.L.D.) [37] and of the Fédération Française de Cancérologie Digestive (F.F.C.D.) [38], and 3) the Groupe d'Etude des Lymphomes de l'Adulte (G.E.L.A.). Criteria for patient inclusion were age (>55 years), suffering from chronic gastritis (n = 33), duodenal ulcer without intestinal metaplasia (27), intestinal metaplasia without ulcer (n = 17). We identified 43 strains from patients with gastric MALT lymphoma. H. pylori was isolated from one biopsy specimen following biopsy homogenization and culture under microaerophilic conditions (5-6% 02, 8-10% CO2, 80-85% N2) on blood agar medium (BA; Oxoid blood agar base N°2) supplemented with 10% horse blood, as reported previously [39]. One colony was selected at random from each primary culture; it was then sub-cultured and used to prepare chromosomal DNA. This DNA was extracted from 48-hour-old confluent cells using the QIAamp Tissue kit (Qiagen, Chatsworth, CA) according to the manufacturer's recommendations.
|
sec
|
In house DNA macroarray membrane preparation
A total of 254 PCR products were amplified in four 96-well microtiter plates, corresponding to 41 ubiquitous and 213 non-ubiquitous genes from the genome of strain 26695 as previously described [39]. Briefly, amplification reactions were performed in 2 × 100 μl reaction volumes, in which 2 μl of DNA corresponding to the recombinant plasmid containing the full-length CDS (CoDing Sequence) inserted into the pILL570-derivative vector was used as template. Each PCR product was sequenced to confirm the identity of the gene, and was then spotted in triplicate onto a nylon membrane (Qfilter, Genetix 22.2 × 22.2 cm, N+) using a Qpix robot (Genetix). Denaturated 26695 genomic DNA was spotted in triplicate at the four corners of the membrane (positive controls) and seven squares were left empty as negative controls. Following spot deposition, membranes were fixed for 15 minutes in 0.5 M NaOH 1.5 M NaCl, washed briefly in distilled water, and stored wet at -20°C until use [39].
Aliquots of 250 μl of DNA were labeled by random priming with 2 μl of 33P-dCTP. Labeling was performed for 3 hours at room temperature. Unincorporated radionucleotides were removed by purification on Quick Spin Sephadex G-25 columns (Roche Diagnostics). Immediately before being used for hybridization experiments, the sonicated, labeled, and purified chromosomal DNA was heat-denaturated and cooled on ice. Hybridization was conducted in 5 ml prewarmed (65°C) hybridization mixtures containing the heat denaturated probe, with overnight incubation. Membranes were then washed and exposed for 25 hours to a phosphoimager screen (Molecular Dynamics).
Screens were scanned on a Storm 860 machine (Molecular Dynamics). Image analysis and quantification of hybridization intensities for each spot were performed using the Xdots Reader program (COSE) and determined in pixels [39]. The intensity of the background surrounding each spot was substracted from that of each of the spots. Twenty-one homologous hybridizations were performed. The average intensity of the 41 ubiquitous genes was calculated for each reference array. This number served to allocate a reference array to each heterologous hybridization (average of the ubiquitous spots from the heterologous and the homologous reference hybridizations were not significantly different, Student's test), to calculate the ratio used for normalization. Following normalization, the data were analyzed by attributing a binary score (presence/absence - Additional file 1) or by multidimensional analysis based on continuous intensity values (Figure 1 and Figure 2). To define the cutoff ratio for the presence/absence of a gene, we analyzed the results for the sequenced H. pylori J99 DNA hybridized with H. pylori 26695; the threshold for the presence of a gene was defined as >0.25. The multidimensional analyses (Genesis software) for the hierarchical clustering as well as for the Principal Component Analysis were performed using the 254 continuous values from the 120 heterologous hybridization experiments, each corresponding to (log10normalized intensity values of strain 26695) minus (log10normalized intensity values of the heterologous strain) (i.e. log26695-logheterol.strain).
|
title
|
In house DNA macroarray membrane preparation
|
p
|
A total of 254 PCR products were amplified in four 96-well microtiter plates, corresponding to 41 ubiquitous and 213 non-ubiquitous genes from the genome of strain 26695 as previously described [39]. Briefly, amplification reactions were performed in 2 × 100 μl reaction volumes, in which 2 μl of DNA corresponding to the recombinant plasmid containing the full-length CDS (CoDing Sequence) inserted into the pILL570-derivative vector was used as template. Each PCR product was sequenced to confirm the identity of the gene, and was then spotted in triplicate onto a nylon membrane (Qfilter, Genetix 22.2 × 22.2 cm, N+) using a Qpix robot (Genetix). Denaturated 26695 genomic DNA was spotted in triplicate at the four corners of the membrane (positive controls) and seven squares were left empty as negative controls. Following spot deposition, membranes were fixed for 15 minutes in 0.5 M NaOH 1.5 M NaCl, washed briefly in distilled water, and stored wet at -20°C until use [39].
|
p
|
Aliquots of 250 μl of DNA were labeled by random priming with 2 μl of 33P-dCTP. Labeling was performed for 3 hours at room temperature. Unincorporated radionucleotides were removed by purification on Quick Spin Sephadex G-25 columns (Roche Diagnostics). Immediately before being used for hybridization experiments, the sonicated, labeled, and purified chromosomal DNA was heat-denaturated and cooled on ice. Hybridization was conducted in 5 ml prewarmed (65°C) hybridization mixtures containing the heat denaturated probe, with overnight incubation. Membranes were then washed and exposed for 25 hours to a phosphoimager screen (Molecular Dynamics).
|
p
|
Screens were scanned on a Storm 860 machine (Molecular Dynamics). Image analysis and quantification of hybridization intensities for each spot were performed using the Xdots Reader program (COSE) and determined in pixels [39]. The intensity of the background surrounding each spot was substracted from that of each of the spots. Twenty-one homologous hybridizations were performed. The average intensity of the 41 ubiquitous genes was calculated for each reference array. This number served to allocate a reference array to each heterologous hybridization (average of the ubiquitous spots from the heterologous and the homologous reference hybridizations were not significantly different, Student's test), to calculate the ratio used for normalization. Following normalization, the data were analyzed by attributing a binary score (presence/absence - Additional file 1) or by multidimensional analysis based on continuous intensity values (Figure 1 and Figure 2). To define the cutoff ratio for the presence/absence of a gene, we analyzed the results for the sequenced H. pylori J99 DNA hybridized with H. pylori 26695; the threshold for the presence of a gene was defined as >0.25. The multidimensional analyses (Genesis software) for the hierarchical clustering as well as for the Principal Component Analysis were performed using the 254 continuous values from the 120 heterologous hybridization experiments, each corresponding to (log10normalized intensity values of strain 26695) minus (log10normalized intensity values of the heterologous strain) (i.e. log26695-logheterol.strain).
|
sec
|
Sequencing and annotation of the B38 genome
Genomic DNA was randomly sheared by nebulization (HydroShear, GeneMachines) and the ends were enzymatically repaired. SmaI fragments (1.5-4 kb) were inserted into plasmid vector pBAM3/SmaI (derived from pBluescript KS and constructed by R. Heilig). Large (35-45 kb) DNA fragments generated from partial BamHI-restriction were inserted into the cosmid vector pHC79/BamHI.
Plasmid DNA was prepared with the TempliPhi DNA sequencing template amplification kit (GE Healthcare-Bio-Sciences). Cosmid DNA was purified with the Montage BAC Miniprep96 Kit (Millipore). Sequencing reactions were performed from both ends of DNA templates using ABI PRISM BigDye Terminator cycle sequencing ready reactions kits and were run on a 3700 or a 3730 xl Genetic Analyzer (Applied Biosystems).
Sequence data base calling was carried out using Phred [40]. Sequences not meeting our production quality criteria (at least 100 bases called with a quality over 20) were discarded. Sequences were screened against plasmid vector and E. coli sequences. The traces were assembled using Phrap and Consed [41]. Whole genome shotgun sequencing was performed to ensure approximately 11-fold coverage. Autofinish [42] was used to design primers for improving regions of low quality sequence and for primer walking along templates spanning the gaps between contigs. Several strategies were used to orientate contigs and to enable directed PCR-based approaches to span the gaps between contigs. These strategies included linking isolates and a Blast-based approach, which identified contigs with hits to the H. pylori strain 26695 genome. Various combined PCR techniques were used to amplify genomic or cosmid DNA, to close the gaps between the final contigs. Outward-directed primers were designed for each of the contig ends; the primer sequences were subsequently checked and confirmed to be unique to the genome. This combined PCR process required approximately 200 PCR reactions pairing each of the primers. In addition, two cosmid isolates containing a rDNA operon copy each, were completely sequenced by sub-cloning into a pSMART-LC vector (Lucigen Corp.). The error rate was less than 1 error per 10,000 bp in the final assembly. The complete genome sequence was obtained from 40 153 sequences, resulting in 14-fold coverage.
AMIGene software was used to predict which CDSs were likely to encode proteins [43]. The set of predicted genes underwent automatic functional annotation using the set of tools listed in Vallenet et al. [28]. All these data (syntactic and functional annotations, results of comparative analysis) are stored in a relational database, called PyloriScope. Manual validation of the automatic annotation was performed using the MaGe (Magnifying Genomes, http://www.genoscope.cns.fr) web-based interface, which allows graphic visualization of the annotations enhanced by the synchronized representation of synteny groups in other genomes chosen for comparison.
|
title
|
Sequencing and annotation of the B38 genome
|
p
|
Genomic DNA was randomly sheared by nebulization (HydroShear, GeneMachines) and the ends were enzymatically repaired. SmaI fragments (1.5-4 kb) were inserted into plasmid vector pBAM3/SmaI (derived from pBluescript KS and constructed by R. Heilig). Large (35-45 kb) DNA fragments generated from partial BamHI-restriction were inserted into the cosmid vector pHC79/BamHI.
|
p
|
Plasmid DNA was prepared with the TempliPhi DNA sequencing template amplification kit (GE Healthcare-Bio-Sciences). Cosmid DNA was purified with the Montage BAC Miniprep96 Kit (Millipore). Sequencing reactions were performed from both ends of DNA templates using ABI PRISM BigDye Terminator cycle sequencing ready reactions kits and were run on a 3700 or a 3730 xl Genetic Analyzer (Applied Biosystems).
|
p
|
Sequence data base calling was carried out using Phred [40]. Sequences not meeting our production quality criteria (at least 100 bases called with a quality over 20) were discarded. Sequences were screened against plasmid vector and E. coli sequences. The traces were assembled using Phrap and Consed [41]. Whole genome shotgun sequencing was performed to ensure approximately 11-fold coverage. Autofinish [42] was used to design primers for improving regions of low quality sequence and for primer walking along templates spanning the gaps between contigs. Several strategies were used to orientate contigs and to enable directed PCR-based approaches to span the gaps between contigs. These strategies included linking isolates and a Blast-based approach, which identified contigs with hits to the H. pylori strain 26695 genome. Various combined PCR techniques were used to amplify genomic or cosmid DNA, to close the gaps between the final contigs. Outward-directed primers were designed for each of the contig ends; the primer sequences were subsequently checked and confirmed to be unique to the genome. This combined PCR process required approximately 200 PCR reactions pairing each of the primers. In addition, two cosmid isolates containing a rDNA operon copy each, were completely sequenced by sub-cloning into a pSMART-LC vector (Lucigen Corp.). The error rate was less than 1 error per 10,000 bp in the final assembly. The complete genome sequence was obtained from 40 153 sequences, resulting in 14-fold coverage.
|
p
|
AMIGene software was used to predict which CDSs were likely to encode proteins [43]. The set of predicted genes underwent automatic functional annotation using the set of tools listed in Vallenet et al. [28]. All these data (syntactic and functional annotations, results of comparative analysis) are stored in a relational database, called PyloriScope. Manual validation of the automatic annotation was performed using the MaGe (Magnifying Genomes, http://www.genoscope.cns.fr) web-based interface, which allows graphic visualization of the annotations enhanced by the synchronized representation of synteny groups in other genomes chosen for comparison.
|
sec
|
Accession Numbers
The EMBL Nucleotide Sequence Database http://www.ebi.ac.uk/embl accession number for the H. pylori strain B38 chromosome is [EMBL:FM991728].
All data and comparative genomics concerning the H. pylori B38 genome are stored in PyloriScope http://www.genoscope.cns.fr/agc/mage, a related database that is available to the public.
|
title
|
Accession Numbers
|
p
|
The EMBL Nucleotide Sequence Database http://www.ebi.ac.uk/embl accession number for the H. pylori strain B38 chromosome is [EMBL:FM991728].
|
p
|
All data and comparative genomics concerning the H. pylori B38 genome are stored in PyloriScope http://www.genoscope.cns.fr/agc/mage, a related database that is available to the public.
|
sec
|
Authors' contributions
JMT carried out the macroarrays, the molecular genetic studies, and participated to the genome assembly. CB-E carried out the major part of the manual annotation of the genome together with PL, HDR, and IB. CB and LM carried out to the genome sequencing and assembly. CM, ZR and AL were involved in the automatic annotation, comparative genomics, and administration of the MaGe system. JYC, MAD and SC participated to the home made DNA arrays preparation, and the statistical analyses. CB, AR-F, AC-M, DL, FM and JCD collected the clinical isolates. AL designed the study, analysed the results, and drafted the manuscript. JR analysed the results, and drafted the manuscript. All authors read and approved the final manuscript.
|
title
|
Authors' contributions
|
p
|
JMT carried out the macroarrays, the molecular genetic studies, and participated to the genome assembly. CB-E carried out the major part of the manual annotation of the genome together with PL, HDR, and IB. CB and LM carried out to the genome sequencing and assembly. CM, ZR and AL were involved in the automatic annotation, comparative genomics, and administration of the MaGe system. JYC, MAD and SC participated to the home made DNA arrays preparation, and the statistical analyses. CB, AR-F, AC-M, DL, FM and JCD collected the clinical isolates. AL designed the study, analysed the results, and drafted the manuscript. JR analysed the results, and drafted the manuscript. All authors read and approved the final manuscript.
|
sec
|
Supplementary Material
Additional file 1
List of the 254 genes of Helicobacter pylori strain 26695 used for gene amplification and preparation of the home-made macroarray membranes. Distribution of each gene in the 120 French isolates of this study associated with gastritis (G), duodenal ulcer (DU), gastric MALT lymphoma (MALT) or metaplasia (META). The percentages were based on the binary analysis (presence/absence/) according to the normalization process and the cutoff ratio described in Material ad Methods. "HPXXXX+", genes were designated as ubiquitous genes based on previous comparative analysis [25]; "HPXXXX" are the non-ubiquitous genes; the 48 most discriminatory genes identified as key combinations of variables (genes/axes) from the Principal Component Analysis, which were used for the clustering analysis, are in bold (Figure 2).
Click here for file
Additional file 2
CDSs of B38 strain involved in restriction/modification systems classified according to the gene status.
Click here for file
Additional file 3
Distribution of the outer membrane proteins (OMPs) encoding genes in the 7 Helicobacter pylori genome sequences. (B38, J99, 26695, HPAG1, Shi470, G27, P12). The genes are classified according to the hop, hor, hof, and hom gene families. The numbers refer to the name of the CDS in each genome (for example: 0009 in 26695 refers to HP0009, 0007 in B38 refers to HELPY0007). "x" indicates a complete absence of the gene. Two or three names separated by a "/" reveals the presence of a pseudogene.
Click here for file
Additional file 4
Number of CDSs in the B38 strain that are absent in the J99, 26695, HPAG1 or Shi470 Helicobacter pylori strains classified by protein functions.
Click here for file
Additional file 5
Number of CDSs (listed by protein functions) of the Helicobacter pylori J99, 26695, HPAG1 and Shi470, G27 and P12 strains that are absent in strain B38 respectively. * All strains: J99, 26695, HPAG1, Shi470, G27, and P12. ** The number depends on the strain chosen for reference.
Click here for file
|
title
|
Supplementary Material
|
title
|
Additional file 1
|
p
|
List of the 254 genes of Helicobacter pylori strain 26695 used for gene amplification and preparation of the home-made macroarray membranes. Distribution of each gene in the 120 French isolates of this study associated with gastritis (G), duodenal ulcer (DU), gastric MALT lymphoma (MALT) or metaplasia (META). The percentages were based on the binary analysis (presence/absence/) according to the normalization process and the cutoff ratio described in Material ad Methods. "HPXXXX+", genes were designated as ubiquitous genes based on previous comparative analysis [25]; "HPXXXX" are the non-ubiquitous genes; the 48 most discriminatory genes identified as key combinations of variables (genes/axes) from the Principal Component Analysis, which were used for the clustering analysis, are in bold (Figure 2).
|
p
|
Click here for file
|
title
|
Additional file 2
|
p
|
CDSs of B38 strain involved in restriction/modification systems classified according to the gene status.
|
p
|
Click here for file
|
title
|
Additional file 3
|
p
|
Distribution of the outer membrane proteins (OMPs) encoding genes in the 7 Helicobacter pylori genome sequences. (B38, J99, 26695, HPAG1, Shi470, G27, P12). The genes are classified according to the hop, hor, hof, and hom gene families. The numbers refer to the name of the CDS in each genome (for example: 0009 in 26695 refers to HP0009, 0007 in B38 refers to HELPY0007). "x" indicates a complete absence of the gene. Two or three names separated by a "/" reveals the presence of a pseudogene.
|
p
|
Click here for file
|
title
|
Additional file 4
|
p
|
Number of CDSs in the B38 strain that are absent in the J99, 26695, HPAG1 or Shi470 Helicobacter pylori strains classified by protein functions.
|
p
|
Click here for file
|
title
|
Additional file 5
|
p
|
Number of CDSs (listed by protein functions) of the Helicobacter pylori J99, 26695, HPAG1 and Shi470, G27 and P12 strains that are absent in strain B38 respectively. * All strains: J99, 26695, HPAG1, Shi470, G27, and P12. ** The number depends on the strain chosen for reference.
|
p
|
Click here for file
|