@ewha-bio:238 / 0-4 JSONTXT

=============Title========== Copy Number Variations in the Human Genome: Potential Source for Individual Diversity and Disease Association Studies. =============Cor Author========== *Corresponding author: E-mail yejun@catholic.ac.krTel +82-2-590-1214, Fax +82-2-596-8969 Accepted 11 March 2008 ===========Author========== Tae-Min Kim1, Seon-Hee Yim2 and Yeun-Jun Chung1,2*1Department of Microbiology, 2Integrated Research Center for Genome Polymorphism, The Catholic University of Korea, Seoul 137-701, Korea ===========Keywords========== Keywords: array-CGH, Copy number variation (CNV), Genome-wide association study (GWAS)Keywords: chromosome, genome-wide linkage search, heritability, HDL cholesterol ===========Sub Heading========== Abstract Introduction The definition of CNV The identification of CNVs using differ-ent platforms Clinical implications of CNVs and dis-ease association study Conclusion Introduction Methods Results and Discussion ==========Minor Heading=========== ASubjects, medical histories, genotyping, and measurement of HDL cholesterol Statistical analyses, heritability estimation, and variance component linkage analysis ===========Main Text========== Abstract. The widespread presence of large-scale genomic variations, termed copy number variation (CNVs), has been recently recognized in phenotypically normal individuals. Judging by the growing number of reports on CNVs, it is now evident that these variants contribute significantly to genetic diversity in the human genome. Like single nucleotide polymorphisms (SNPs), CNVs are expected to serve as potential biomarkers for disease susceptibility or drug responses. However, the technical and practical concerns still remain to be tackled. In this review, we examine the current status of CNV DBs and research, including the ongoing efforts of CNV screening in the human genome. We also discuss the characteristics of platforms that are available at the moment and suggest the potential of CNVs in clinical research and application. IIntroduction. Traditionally, large-scale genomic variants that are visible in conventional karyotyping have been thought to be associated with early-onset, highly penetrant genetic disorders, while they are incompatible in normal, disease-free individuals (Lupski, 1998; Stankiewicz and Lupski, 2002). The construction of the 'reference genome' by the human genome sequencing project is based on the belief that human genome sequences are virtually identical, even in different individuals, except for well-known single nucleotide polymorphisms (SNP) or size-variants of tandem repeats such as mini- or microsatellites (variable number of tandem repeats or VNTR) (Przeworski et al., 2000). This traditional concept has been recently challenged by the discovery that large structural variations are more prevalent than previously presumed (Check, 2005). Using high-resolution whole- genome scanning technologies such as array-based comparative genomic hybridization (array-CGH), two groups of pioneering scientists have identified widespread copy number variations (CNVs) in apparently healthy, normal individuals (Iafrate et al., 2004; Sebat et al., 2004). It proposes that our genome is more diverse than has ever been recognized, and subsequent studies have identified up to 11,000 CNVs across the whole genome (Tuzun et al., 2005; Hinds et al., 2006; Mills et al., 2006; McCarroll et al., 2006; Conrad et al., 2006; Sharp et al., 2005; Wong et al., 2007; de Smith et al., 2007). Although the current understanding of CNVs is still limited for practical use and technical challenges still remain to be tackled, recent studies already have demonstrated the potential association of CNVs with various diseases, suggesting plausible functional significances and highlighting the promising utility of CNVs. The current coverage of CNVs in the human genome already has exceeded that of SNPs (approximately 600 Mb comprising 12% of human genome) and is still increasing (Cooper et al., 2007). These large-scale structural variants, in addition to SNPs, will serve as powerful sources to help our understanding of human genetic variation and of differences in disease susceptibility for various diseases. This paper reviews the current knowledge and future perspectives of CNVs. The definition of CNV. Structural variations that involve large DNA segments can take various forms, such as duplication, deletion, insertion, inversion, and translocation. Among them, DNA copy number variations larger than 1 kb are collectively termed CNVs. Fig. 1 illustrates the concept of CNV. Although the CNV can include large, microscopically visible genomic variations, it generally indicates a submicroscopic structural variation that is hardly detectable by conventional karyotyping (35 Mb) (Freeman et al., 2006). Smaller variations such as small insertional- deletion (indel) polymorphisms are not included in CNVs, while they comprise another large collection of over 400,000 variants in the human genome (Mills et al., 2006), and neither is the insertional polymorphism of mobile elements such as Alus or L1 elements considered a CNV. At the beginning stages of CNV discovery, a number of terms were proposed to define them e.g., large-scale copy number variants (LCV) (Iafrate et al., 2004), copy number polymorphism (CNP) (Sebat et al., 2004), and intermediate-sized variants (ISV) (Tuzun et al., 2005). The current definition of CNV is also operational and can be modified with the advance of scanning resolution and coverage, and availability of allele frequency in a determined population.The identification of CNVs using differ-ent platforms. Various scanning platforms and quality control methods have been used to identify CNV calls. Because the choice of platforms has a great effect on the results, it is worth reviewing the characteristics of platforms to improve the understanding of CNVs. The presence of CNVs in normal individuals was reported for the first time in 2004 independently by two groups led by Lee C. and Wigler M. (Iafrate et al., 2004; Sebat et al., 2004). Both studies used two-dye array-CGH techniques that used clones of bacterial artificial chromosomes (BAC) or oligonucleotides (representational oligonucleotide microarray analysis, or ROMA). Theyindependently reported about 250 and 80 loci as changes in copy number from 39 and 20 normal individuals, respectively. Fig. 2 illustrates the general concept of CNV detection based on two-dye array-CGH. Although the average numbers of CNVs per individual genome were similar in two studies (about 12 CNVs per genome), it should be noted that there was little overlap between the results. This discrepancy between studies was possibly due to the use of different platforms and experimental conditions in different populations. However, it is also probable that there are still large numbers of structural variants that have yet to be discovered (Buckley et al., 2005; Eichler, 2006). One following study that provided evidence on the widespread presence of large-scale structural variations in the human genome was based solely on in silico analysis (Tuzun et al., 2005). The sequence-level comparison of two independent genome sequences, i.e., one derived from a human genome reference assembly and the other from fosmid clones of a genomic library, revealed about 300 structural variations, including inversions. This method can detect various types of structural variants, including inversion, which is not detectable by conventional array-CGH platforms. Indeed, the results by Tuzun et al. (2005) can be used as validated control for primary verification or for parameter tuning for the development of CNV-detection platforms or algorithms. Although the use of this method is currently limited by the unavailability of sequence data, ongoing efforts to sequence the individual human genome and to develop cost-effective sequencing platforms (Bennett et al., 2005) will be able to facilitate sequence-level genome comparisons and the identification of highly qualified structural variants in the near future. Two studies by McCarroll et al. and Conrad et al., which focused on the identification of deletion variants (McCarroll et al., 2006; Conrad et al., 2006), used 1.2 million SNP genotyping data from The International HapMap Consortium (International HapMap Consortium. 2005). They assumed that allelic deletion causes the discard of probes in SNP genotyping. For example, the runs of consecutive probes with null genotype calls or runs of SNP genotypes whose allelic frequencies deviate from expected Hardy-Weinberg equilibrium ratios or expected Mendelian inheritance patterns might represent the presence of deleted loci. They independently reported about 600 potential deletions as small as less than 100 bp. The relatively small size of the identified variants, compared with the array-CGH method, is due to the high resolution of the platforms. The use of an SNP-centric array platform can be used to identify linkage disequilibrium (LD) of structural variants with nearby SNPs in a given population. But, the discrepancy in deletions that were identified in the two studies was also noted in spite of using similar HapMap populations and identification methods (Eichler 2006). Recently, a comprehensive CNV analysis was reported based on high-resolution array platforms, Whole Genome TilePath (WGTP), which used 26,000 large insert clones, and Affymetric GeneChip Human Mapping 500K early access, which used 500,000 SNP oligonucleotides. They identified about 1500 genomic segments as copy number variations or CNVRs (copy number variable regions) consisting of overlapping CNVs from 269 HapMap individuals (Redon et al., 2006). The results from the two platforms are worth comparing becasuse they provide the highest currently achievable resolution and are often selected as primary platforms in many other studies. Firstly, the CNVs that are identified from BAC-based array-CGH are generally larger than those from oligonucleotide-based arrays (230 kb and 80 kb of median size, respectively). This overestimation of CNVs by BAC-based array-CGH is due to the large insert clones that are used, which has been frequently reported (Iafrate et al., 2006). Secondly, the actual boundaries of structural variants can not be determined through BAC-based array-CGH. On the other hand, a more accurate determination of variant boundaries can be achieved through SNP-centric oligonucleotide-based arrays that have an extensive number of oligonucleotides. The SNP-centric platform has additional advantages of accompanying SNP genotype information as a potential variant source, combined with large structural variants and its ability to detect the presence of loss of heterozygosity (LOH) or segmental uniparental disomy (Bruce et al., 2005; Mei et al., 2000). But, the SNP-centric platform also has its disadvantages. In spite of the advanced resolution, the relatively low signal-to-noise ratio of oligonucleotide-based hybridization intensity, compared with large insert clone array, might result in higher false-positive rates. Because most CNVs are subtle changes, this makes the results prone to misclassification of signal intensities and, consequently, to statistical errors. Sometimes, it is pointed out that the SNP-centric array was originally designed for allelic discrimination and is not appropriate for CNV detection because of biased genomic distribution and sequence composition of spotted probes (McCarroll and Altshuler 2007d). Recently proposed oligonucleotide-based array platforms have been designed for CNV detection specifically without sacrificing the advantage of high resolution, which can be a promising solution for CNV detection in the near future (Barrett et al., 2004). In identifying CNVs in normal populations, one of the fundamental problems is the lack of a reference genome from which diploid states of sample DNA can be inferred. Unlike the array-CGH-based tumor study in which the normal DNA of the same individual can be used as a reference genome, no single DNA source can present the standardized and universal genome in variant analysis. Often, the pooled genome of several individuals has been used to represent the average genome, while the heterogeneity of the used population might affect the copy number inference step, as shown for examples of X chromosomes. Redon et al. and Komura et al. adopted the pairwise comparison for ac-curate inference of copy number states in individual loci, which is noteworthy (Redon et al., 2006; Komura et al., 2006). In pairwise comparison, the hybridization intensities of one sample is compared with those of all other remaining samples as one large reference, and the diploid states of loci can be more accurately inferred from the multiple comparison results.Clinical implications of CNVs and dis-ease association study. In spite of recent technological developments of genetic polymorphism-oriented disease association studies, still little is known about the effects of genetic polymorphisms on common complex diseases. One of the ultimate goals in exploring CNVs is to systematically assess the association between such variants and the disease. Although it is unlikely that all CNVs in the human genome are associated with diseases, evidence of the association of CNVs and a wide spectrum of human diseases has rapidly accumulated. Table 1 summarizes the CNVs that have been reported to be associated with diseases. CNVs can affect disease susceptibility or individual differences in responses to drugs through alteration of gene expression. Stranger et al. 's and Heidenblad et al's reports coherently showed positive correlations between DNA copy number dosage and gene expression level (Stranger et al., 2007; Heidenblad et al., 2005). If a CNV region contains transcriptional regulatory elements rather than protein coding genes, it still can affect gene expression levels by changing transcriptional regulation or heterochromatin spread (Reymond et al., 2007).Conclusion. The genomic fraction that is occupied by CNVs is now estimated to be about 600 Mb, already exceeding that of single base-level variants. It is likely that the number of CNVs and the genomic fraction that is affected by structural variants will continue to expand, and many of them will be used for more practical purposes, including disease association or population studies. However, it should be remembered that the current CNV entries are plagued by substantial amounts of false-positive and false-negative results. Only a small portion of them have been validated by independent methods. To overcome this, it is necessary to improve scanning platforms, including optimizing experimental conditions and developing more reliable CNV calling algorithms. In the meantime, it is required for individual researchers to know the characteristics of the available platforms and analytical techniques to use them or to interpret the published results properly.e found peak evidence of linkage (LOD score=1.88) for HDL cholesterol level on chromosome 6 (nearest marker D6S1660) and potential evidences for linkage on chromosomes 1, 12 and 19 with the LOD scores of 1.32, 1.44 and 1.14, respectively. These results should pave the way for the discovery of the relevant genes by fine mapping and association analysis.IIntroduction. Cholesterol is a major part of cell membranes. Cholesterol is carried in the blood by chylomicrons, very low density lipoproteins (VLDL), high density lipoproteins (HDL) and low density lipoproteins (LDL) (Dastani et al. 2006). HDL cholesterol is reversely associated with cardiovascular disease, and is more tightly controlled by genetic factors than the other lipoproteins such as LDL, VLDL and chylomicrons. Environmental factors including chronic alcoholism, estrogen replacement therapy, and exercise influence the levels of HDL cholesterol. Several families with strikingly elevated HDL cholesterol levels have been identified. HDL cholesterol levels are higher in blacks compared with whites and HDL cholesterol levels of females are higher than those of males (Barcat et al. 2006; Brousseau et al. 2004; Yamashita et al. 2000; Imperatore et al. 2000). Candidate gene analysis using population-based case-control studies has been used to test the association between SNPs and HDL cholesterol levels. Among the candidate genes selected mainly from lipid metabolism pathways, ApoA-I gene is the one most intensively studied (Inazu et al. 1994; Kuivenhoven et al. 1997). By genome-wide linkage analysis, susceptibility genes can be identified although the genes are not candidates based on lipid metabolism. Genome-wide linkage scans are conducted by use of microsatellite markers to identify genetic determinants affecting the traits (Wang and Paigen 2005). Using HDL cholesterol levels as either discrete or quantitative trait, several linkage studies on genetic determinants of HDL cholesterol have been reported (Yancey et al. 2003). Genetic effects on the variations in HDL cholesterol were studied mainly in Caucasians and Africans thus far, and little attention has been focused in this regard on Asian populations. We found suggestive evidence for linkage for HDL cholesterol on chromosome 6, 1, 12 and 19, in studies conducted as part of GENDISCAN study, a large epidemiological study of Complex traits in geographically, culturally and genetically isolated large Mongolian families l in Dornod, Mongolia report. Methods. We analyzed data from 1002 Mongolian individuals from 95 large extended families. Informed consent was obtained from all subjects prior to participation and the protocol was approved by the Institutional Review Board at Seoul National University. Potentially confounding variables were assessed for each participant along with overall medical history. Information on age, gender and anthropometry (height, weight, waist circumference, hip circumference and body fat content) were obtained for each individual. Height in centimeter (cm) and weight in kilograms (kg) were measured using an automatic measuring instrument (IMI 1000, Immanuel Elec., Korea). Body mass index (BMI) was calculated in kg/m. Waist circumference was measured to the nearest centimeter at the level of the umbilicus, and hip circumference was measured at the level of the maximal circumference of the gluteus. All other variables were collected through interviews performed by trained interviewers. Information about amount of alcohol and smoking was also obtained from all the participants. All the subjects were asked to fast for 12 hours before their visit. Blood samples were collected from an antecubital vein into vacutainer tubes containing EDTA. Blood samples were centrifuged at 3000rpm for 10 minutes and then stored at 70C. DNA was isolated from lymphocytes for polymerase chain reaction (PCR) and automated genotyping. A 10 ml blood sample was collected from each participating individual for genomic DNA extraction. DNA was extracted from peripheral lymphocytes using the PUREGENE DNA Purification Kit for whole blood (Gentra Systems Inc, USA). For genotyping, a set of 1000 microsatellite markers deCODE mapping sets (deCODE genetics, USA) was used covering the genome at an average density of 3 centimorgans (cM). HDL cholesterol was measured by the enzymatic method using Cholestest-N-HDL kit (DAICHI, JAPAN) and HITACHI 7600-210 & HITACHI 7180 instruments. Extensive quality control procedures ensured the validity and reproducibility of the measurements. Multiple linear regression analysis was used by PC SAS version 8.2 and PC SPSS version 12 to account for effect of confounding variables. Pedigree data was managed by PedSys (Southwest Foundation for Biomedical Research, San Antonio, Texas, USA). Nonpaternity was examined using PEDCHECK (Mcpeek and Sun 2000) and relationships other than paternity were checked using average IBD-based method by PREST. After correcting pedigree error and Mendelian errors, non-mendelian errors were examined and corrected using SimWalk. Identity by descent (IBD) matrix between every relationship pairs in family was calculated and IBD matrix for single marker was calculated by SOLAR (Sequential Oligogenic Linkage Analysis Routines software version 2.1.4). Multipoint IBD matrices were computed on every 1 cM distance using Markov chain Monte Carlo method by LOKI (Heath 1997). Genetic components of selected phenotypes were estimated in terms of heritability. Narrow sense heritability, defined as the proportion of total phenotypic variation due to additive genetic effects, was calculated. Heritability of HDL cholesterol adjusted for age, gender, age- square, product of age and gender, product of age- square and gender, systolic BP, smoking and alcohol was estimated and a variance component linkage analysis was carried out by SOLAR which uses maximum likelihood methods to estimate variance components for the polygenic genetic effect and random individual environmental effects. Results and Discussion. The mean age of the 1002 individuals was 31 years and 54.5% of them were female. Demographic and pedigree characteristics of the study sample are shown in Table 1. The family size had a mean of 16. Table 2 included information on 2546 pairs of first degree relatives (1812 parent-offspring pairs and 734 full-sib pairs), 2485 pairs of their second degree relatives (395 half-sibling pairs, 1202 grandparent-grandchild pairs, and 888 avuncular pairs), and 598 first-cousin pairs. Means of their total cholesterol, HDL cholesterol, LDL cholesterol, and triglyceride were 159.82 mg/dl, 55.19 mg/dl, 90.51 mg/dl, and 63.30 mg/dl, respectively. Table 3 shows correlation between HDL cholesterol and covariates such as age, gender, systolic blood pressure, alcohol consumption status, and smoking status. These parameters were used as covariates in the variance component analysis which provided multivariable adjusted heritability estimates for HDL cholesterol of 0.45 (Table 4). The peak multipoint LOD score was 1.88 on 6p21 (nearest marker D6S1660) and a secondary peak (LOD score of 1.44) was found on 12q23 (nearest marker D12S354). We identified other potential evidence for linkage in the LOD score of 1.32 on 1q24 (nearest marker D1S412) and a LOD score of 1.14 at 19p13 (nearest marker D19S884) (Fig. 1, 2). Table 5 presents all LOD scores 1.0 for HDL cholesterol. We identified potential evidence of linkage on several chromosomes. In other genome scan, a weak linkage signal for HDL cholesterol was observed for regions that overlapped slightly with the regions identified herein. Klos et al. reported the appearance of peak position in the chromosome 12q in European American population (Klos et al. 2001) (Table 6). We found evidence of link-

projects that have annotations to this span

There is no project