Single Nucleotide Polymorphisms of a 16 kb Region on Human Chromosome 11p15.5 that Includes the H19 Gene
The H19 gene, located at human chromosome 11 p15.5, is imprinted in most normal human tissues. However,
imprinting is often lost in tumors suggesting H19 is a putative tumor suppressor. We analyzed the single nucleotide polymorphisms (SNPs) of a 16 kb region that includes the H19 gene and its imprinting control region (ICR) in the Korean population. To identify SNPs, we directly sequenced this region in 18 Korean subjects. We identified 64 SNPs, of which 7 were in the exons of H19, 2 were in the introns, 14 were in the 3’ intergenic region and 41 were in the 5’ intergenic region. Of the 64 SNPs, 21 had not previously been reported and thus appear to be unique to the Korean population. The identified SNPs of H19 in the Korean population may eventually be useful as genetic markers associated with various diseases. In this study, 7 of the 64 identified SNPs were at CTCF binding sites in the ICR and may affect regulation of H19 gene imprinting. Thus, several genetic variations of the H19 gene may be important markers in human diseases that involve genomic imprinting, including cancer.
The human H19 gene consists of five exons separated by four introns. Early studies mapped it to 11p15.5, a chromosomal region with known tumor suppressor activity, and subsequent analysis revealed that H19 expression is often altered in several different types of tumors relative to that in adjacent, non-transformed cells (Chung et al., 1996; Cui et al., 1997). The human sequence has a putative ORF that appears to encode a
26 kDa protein but the gene product appears to be RNA rather than a protein (Hao et al., 1993; Joubel et al., 1996; Pfeifer et al., 1996). The H19 gene is a known imprinted gene. It is expressed exclusively from the maternal chromosome and is linked and co-regulated with the insulin-like growth factor 2 gene, IGF2, which is also imprinted but expressed primarily from the paternal chromosome (Bell etal., 2000).
A 2 kb region of differential methylation located from -2 to -4 kb relative to the H19 transcriptional start site has been proposed to act as the imprinting mark (Davis etal., 1999). A differentially methylated region (DMR) upstream of H19 has been proposed to participate in the imprinting control of human IGF2 and H19 (Thorvaldsen, 1998). The DMR, or imprinting control region (ICR), acts as an epigenetic modifier of allelic expression by recruiting proteins that specifically bind to methylated or unmethylated DNA (Ulaner et al., 2003a). In human, the ICR contains seven specific binding sites for the zinc finger insulator protein CTCF (CCCTC-binding factor) (Hark etal., 2000). CTCF binds to several sites in the unmethylated ICR that are essential for enhancer blocking (Hark et a)., 2000). The methylation status of binding sites for CTCF in the H19 promoter is suggested to be involved in the regulation of imprinting of the H19/IGF2 locus (Takai etal., 2001). Only the sixth of the seven CTCF-binding sites has been demonstrated to have allele-specific differential methylation (Takai etal., 2001). This region contains well characterized single nucleotide polymorphisms (SNPs) which make it possible to distinguish the paternal and maternal alleles (Takai et al., 2001). Several studies have used two or three SNPs within the ICR to discriminate between the two alleles to investigate whether tumors have abnormal H19/IGF2 methylation (Nakagawa etal., 2001; Ulaner et al., 2003a; Ulaner etal., 2003b).
In this study, we analyzed SNPs in a 16 kb region that includes the H19 gene and its 2 kb ICR in the Korean population. Our results provide new information about several SNPs that may impact imprinting regulation.
Peripheral blood samples were obtained from 18 healthy adults (8 males and 10 females) aged 25-43 years at the National Genome Research Institute in Korea. Genomic
DNA was extracted from blood with a QIAamp DNA Blood Kit (QIAGEN, Valencia, CA) according to the manufacturer’s protocol. The primer sets for the PCR walking analysis are shown in Table 1. The PCR reactions contained 20 ng genomic DNA, 0.2 units of AmpliTaq Gold (Perkin-Elmer, Weiterstadt, Germany) or LA taq (Takara Bio, Otsu, Japan) polymerase, 1 mM dNTPs, 1xPCR buffer, 5 pmol sense primer and 5 pmol antisense primer in 20 and were performed in a thermal cycler (Perkin-Elmer) at the following temperatures: 95°C for 5 min, 30 cycles at 95 C for 30 sec. at 60~68°C for 45 sec and at 72°C for 3 min,
followed by 72 °C for 10 min. To check the quality of the PCR products before the sequencing reactions, one-tenth of the reaction mixture was separated by electrophoresis on a 1 % agarose gel.
The primers used for sequencing are the same as those used for PCR (Table 1). Cycle sequencing reactions were performed using PCR product and an Applied Biosystems (Foster City, CA) Big Dye Terminator (version 2.0) ready reaction kit. The amount of primer in the reaction was 1 pmol, and total reaction volume was 10 id. Cycling parameters were: 30 cycles of 30 sec at 95 °C, 5 sec at 50 °C, 4 min at 60°C, followed by refrigeration until use. Each reaction mixture was ethanol-precipitated to remove excess dye terminators. The pellets were dissolved in 11 id template suppression reagent (TSR, Applied Biosystems), heated at 95 C C for 4 min to denature, quenched on ice for 4 min, mixed, spun briefly, loaded into the autosampler tray of an ABI3100 automated DNA sequencer, and sequenced according to the ABI3100 operator’s manual. The sequencer was set up to run using POP6 polymer and a 36 cm capillary with 30 sec injection time and 120 min run time.
Polymorphisms were detected by multiple alignments of sequences using the Phred/Phrap/ Consed package (Ewing eta/. 1998; Gordon etal. 1998). Deviation from Hardy-Weinberg expectancy was examined with x 2 or Fisher’s exact test.
Haploview software (version 32) was used for the analysis of haplotype and tagging SNP detection (Barrett, 2005).
We discovered 64 SNPs in the H19 region (16 kb; NT_009237) on chromosome 11p15.5 in the Korean population (n=18). This region contains the H19 gene (1.4 kb), the H19 promoter region (2 kb), an imprinting control region (ICR) including CTCF binding sites (2 kb), and an intergenic region (-10 kb). This region was amplified from samples from 18 normal Koreans by using PCR primers (Table 1). The PCR products were directly sequenced to analyze the SNPs. The dbSNP database in the National Center for Biotechnology Information (NCBI) has 93 SNPs in the H19 region. Of the 64 SNPs that we identified in this region, 43 were identical to SNPs in the NCBI database. The remaining 21 were thus judged to be novel in the Korean
population. Of the 64 SNPs, 7 were in exons and 2 were in introns (none of which were novel), 6 were in the promoter (1 of which was novel), 7 were in the ICR (1 novel), 28 were in the 5’ intergenic region (16 novel), and 14 were in the 3’ intergenic region (3 novel) (Table 2).
We investigated the haplotypes of a 5.4 kb region (Position: 780559-785146, Table 2) with 21 SNPs containing the H19 gene, promoter and ICR. The position 784496 SNP (minimal allele frequency < 5%) was omitted. The patterns of haplotype structure and frequencies in this 5.4 kb gene region are shown in Table 3. There were 8 common haplotypes (frequency 5%) and 11 tagging SNPs were found. We also selected 4 SNPs (rs2839698, rs2251375, rs2071095, rs4930103) that were present in both our data and the HapMap database (The International HapMap Consortium., 2004) and analyzed the haplotype diversity and frequency (Table 4). The frequencies of haplotype 1 -4 in the Korean population were similar with those in the Japanese and Chinese populations but different from the frequency in the CEPH population (Utah residents with ancestry from northern and western Europe). Haplotype 5 was the common haplotype in the Korean population,
but was not found in the Japanese, Chinese, and CEPH populations (Table 4). Thus we find that several SNPs and haplotypes in the Korean population differ from those in other ethnic groups.
We only found 64 SNPs in this region compared with the 93 in the NCBI database. This may be due to our small sample size (n=18). Detection of 89% of polymorphic sites (minimal allele frequency of 5%) would require n=16, and detection of 99% would require n=48 (Kruglyak et a/., 2001). In our case, with n=18, we expect to detect over 99% of SNPs with a minimal allele frequency of 20%. Thus we do not expect to detect low frequency SNPs in our sample. In addition this discrepancy in SNP detection raises the possibility of invalid or extremely low frequency SNPs in the public databases and underlines the need to check the validity and frequency of any potential marker SNPs in an intended study population.
The ICR of H19 is methylated from -2 to -4 kb relative to the start of transcription. CTCF binding was recently shown to play a direct role in inhibiting methylation of the ICR in order to establish and maintain imprinting of the Igf2/H19 region, and CTCF binding sites were shown to be dispensable for initiating imprinting (Szabo eta/., 2004). The methylation status of this region was ascertained by bisulfite conversion and methylation-specific PCR (Ulaner et al., 2003b). The paternal allele is methylated and the maternal allele is unmethylated. The SNPs allow distinction of the paternal and maternal alleles. Of the six SNPs in the ICR that were previously reported, two (785015 (C/T) and 785104 (G/T)) were previously analyzed for allelic expression based on the methylation status (Ulaner et al., 2003b). Bisulfite DNA sequencing revealed a C/T polymorphism at base 785015 (an A/G polymorphism at base 6325; accession no. AF087017). The allele containing the A polymorphism was methylated, whereas the allele containing the G polymorphism was unmethylated. In addition, Poon et al. (2002) reported that paternally inherited methylated H19 fetal alleles were different from the methylated alleles of the respective mothers using a C/T polymorphism at base 785015. This raised the possibility of using epigenetic markers for the specific detection of fetal DNA in maternal plasma. Furthermore, only the sixth CTCF binding site showed allele-specific methylation (Takai eta/., 2001). The 785146 (A/G) site is
located at the sixth CTCF binding site. This SNP site may act as a marker for regulating H19 expression. Nakayashiki et al. (2004) recently described three closely located SNPs (g/a g7523a, g/a g7547a, c/t c7591t; accession no. AF125183) and designated them as the H19FR haplotype. They were able to selectively discriminate the parental alleles by enzymatically digesting differentially methylated genomic DNA. This method could be useful for identifying the parental origin of alleles. However, these three SNPs were not present in the Korean population. In any case, SNPs located in the ICR should provide information about the methylation status determined by expression of H19 alleles. In addition, information on the methylation status of the H19 gene may help to understand how imprinting is disrupted in tumors.
Numerous studies have revealed abnormal imprinting of H19 in a wide range of tumors (Nakagawa eta/., 2001; Ulaner etal., 2003a; Manoharan etal., 2003; Yin et al., 2004). Loss of imprinting (LOI) of IGF2 correlated strongly with biallelic hypermethylation in a core region of an H19-associated CTCF-binding site (Nakagawa et al., 2001). The presence of this methylation-dependent LOI in both tumors and normal colonic mucosa indicates that hypermethylation may create a field defect predisposing to cancer (Nakagawa eta/., 2001). In addition, incomplete gain or loss of methylation at this CTCF- binding site during tumorigenesis can explain the complex and conflicting expression patterns of IGF2 and H19 in a tumor (Ulaner etal., 2003a). Manoharan etal. (2003) used a SNP marker in the H19 coding sequence to investigate H19 imprinting. They found monoallelic expression of the maternal gene in fetal liver, but biallelic expression in liver neoplasms, thus demonstrating the basis for the deregulation of the imprinted gene expression during hepatocarcinogenesis. The imprinting of IGF2/H19 is similar to DLK1/GTL2, another reciprocally imprinted gene pair. The imprinting status of DLK1 in brain tumors and lymphomas has also been deduced by analysis of a SNP (Yin etal., 2004). We propose that the polymorphic sites of H19 are good genomic markers for imprinting studies of additional tumor types.
H19 was one of the first imprinted genes to be identified in mice and humans. It is an excellent model for studies of genomic imprinting because it is a representative imprinting gene in human and other animals. In the mouse, monoallelic expression of H19 is regulated by an ICR located at chromosome 7 (Thorvaldsen etal., 1998) and CTCF binding has been shown to have a role at four sites in the IGF2/H19 ICR (Szabo etal., 2004; Schoenherr etal., 2003; Thorvaldsen etal., 2002). A mutation in the mouse CTCF site 4 was sufficient to cause robust activation of the maternal Igf2 allele and to disturb the methylation-free status of the maternal H19 ICR allele (Pant etal., 2004).
H19 is also imprinted in cattle, in which the maternal allele was found to be predominantly or exclusively expressed in all tissues examined (Zhang etal., 2004). Identification of a SNP in the bovine H19 gene made it possible to study its imprinting status by following the expression of parental alleles in heterozygous animals. The present results will help to analyze sequences and SNPs of the H19 region in other mammals (e.g., pig and sheep). The SNPs that we identified can be used as markers for the study of developmental abnormalities. The SNPs of the H19 gene discovered in this study should be useful as genomic markers for imprinting studies. Particularly, they should be useful as markers in types of tumors that involve genomic imprinting. We are planning a case-control study of cancer using the SNPs discovered in the Korean population.
|