@ewha-bio:231
Annnotations
Genomics_Informatics
{"project":"Genomics_Informatics","denotations":[{"id":"LappsGridBioNER_protein1","span":{"begin":331,"end":335},"obj":"Protein"},{"id":"LappsGridBioNER_protein2","span":{"begin":346,"end":350},"obj":"Protein"},{"id":"LappsGridBioNER_protein3","span":{"begin":355,"end":370},"obj":"Protein"},{"id":"LappsGridBioNER_protein4","span":{"begin":690,"end":713},"obj":"Protein"},{"id":"LappsGridBioNER_protein5","span":{"begin":828,"end":846},"obj":"Protein"},{"id":"LappsGridBioNER_protein6","span":{"begin":988,"end":992},"obj":"Protein"},{"id":"LappsGridBioNER_protein7","span":{"begin":1934,"end":1938},"obj":"Protein"},{"id":"LappsGridBioNER_protein8","span":{"begin":2160,"end":2167},"obj":"Protein"},{"id":"LappsGridBioNER_protein9","span":{"begin":3949,"end":3953},"obj":"Protein"},{"id":"LappsGridBioNER_protein10","span":{"begin":4974,"end":4978},"obj":"Protein"},{"id":"LappsGridBioNER_protein11","span":{"begin":5041,"end":5044},"obj":"Protein"},{"id":"LappsGridBioNER_protein12","span":{"begin":5554,"end":5565},"obj":"Protein"},{"id":"LappsGridBioNER_protein13","span":{"begin":5887,"end":5890},"obj":"Protein"},{"id":"LappsGridBioNER_protein14","span":{"begin":6382,"end":6386},"obj":"Protein"},{"id":"LappsGridBioNER_protein15","span":{"begin":6484,"end":6490},"obj":"Protein"},{"id":"LappsGridBioNER_protein16","span":{"begin":10469,"end":10473},"obj":"Protein"},{"id":"LappsGridBioNER_protein17","span":{"begin":10509,"end":10518},"obj":"Protein"},{"id":"LappsGridBioNER_protein18","span":{"begin":13450,"end":13454},"obj":"Protein"},{"id":"LappsGridBioNER_protein19","span":{"begin":13943,"end":13947},"obj":"Protein"},{"id":"LappsGridBioNER_protein20","span":{"begin":14715,"end":14719},"obj":"Protein"},{"id":"LappsGridBioNER_protein21","span":{"begin":16126,"end":16150},"obj":"Protein"},{"id":"LappsGridBioNER_protein22","span":{"begin":16159,"end":16184},"obj":"Protein"},{"id":"LappsGridBioNER_protein23","span":{"begin":16186,"end":16189},"obj":"Protein"},{"id":"LappsGridBioNER_protein24","span":{"begin":16195,"end":16219},"obj":"Protein"},{"id":"LappsGridBioNER_protein25","span":{"begin":17025,"end":17029},"obj":"Protein"},{"id":"LappsGridBioNER_protein26","span":{"begin":17135,"end":17146},"obj":"Protein"},{"id":"LappsGridBioNER_protein27","span":{"begin":19601,"end":19611},"obj":"Protein"},{"id":"LappsGridBioNER_protein28","span":{"begin":20367,"end":20385},"obj":"Protein"},{"id":"LappsGridBioNER_protein29","span":{"begin":27463,"end":27466},"obj":"Protein"},{"id":"LappsGridBioNER_protein30","span":{"begin":27681,"end":27703},"obj":"Protein"},{"id":"LappsGridBioNER_protein31","span":{"begin":27935,"end":27938},"obj":"Protein"},{"id":"LappsGridBioNER_protein32","span":{"begin":30829,"end":30832},"obj":"Protein"},{"id":"LappsGridBioNER_protein33","span":{"begin":33001,"end":33004},"obj":"Protein"},{"id":"LappsGridBioNER_protein34","span":{"begin":33045,"end":33048},"obj":"Protein"},{"id":"LappsGridBioNER_protein35","span":{"begin":33190,"end":33193},"obj":"Protein"},{"id":"LappsGridBioNER_protein36","span":{"begin":33324,"end":33327},"obj":"Protein"},{"id":"LappsGridBioNER_protein37","span":{"begin":33329,"end":33332},"obj":"Protein"},{"id":"LappsGridBioNER_protein38","span":{"begin":35658,"end":35686},"obj":"Protein"},{"id":"LappsGridBioNER_protein39","span":{"begin":38814,"end":38818},"obj":"Protein"},{"id":"LappsGridBioNER_protein40","span":{"begin":38976,"end":38994},"obj":"Protein"},{"id":"LappsGridBioNER_protein41","span":{"begin":39259,"end":39267},"obj":"Protein"},{"id":"LappsGridBioNER_protein42","span":{"begin":39523,"end":39541},"obj":"Protein"},{"id":"LappsGridBioNER_protein43","span":{"begin":39937,"end":39945},"obj":"Protein"},{"id":"LappsGridBioNER_protein44","span":{"begin":40046,"end":40056},"obj":"Protein"}],"text":"\n=============Title==========\nCopy Number Variations in the Human Genome: Potential Source for Individual Diversity and Disease Association Studies.\n=============Cor Author==========\n*Corresponding author: E-mail yejun@catholic.ac.krTel +82-2-590-1214, Fax +82-2-596-8969 Accepted 11 March 2008\n===========Author==========\nTae-Min Kim1, Seon-Hee Yim2 and Yeun-Jun Chung1,2*1Department of Microbiology, 2Integrated Research Center for Genome Polymorphism, The Catholic University of Korea, Seoul 137-701, Korea\n===========Keywords==========\nKeywords: array-CGH, Copy number variation (CNV), Genome-wide association study (GWAS)Keywords: chromosome, genome-wide linkage search, heritability, HDL cholesterolKeywords: inbreeding coefficient, Mengolian population, STR, HWE, PICKeywords: haplotype, HapMap, Korean, LD, populations, SNP\n===========Sub Heading==========\nAbstract\tIntroduction\tThe definition of CNV\tThe identification of CNVs using differ-ent platforms\tClinical implications of CNVs and dis-ease association study\tConclusion\tIntroduction\tMethods\tResults and Discussion\tIntroduction\tMethods\tResults\tDiscussion\tIntroduction\tMethods\t\n==========Minor Heading===========\nASubjects, medical histories, genotyping, and measurement of HDL cholesterol\tStatistical analyses, heritability estimation, and variance component linkage analysis\tParticipants\tGenotyping\tEstimating Hardy-Weinberg Equilibrium (HWE), Information Contents and Inbreeding Coefficients\tASNP Selection\tDNA Samples\tGenotyping\tStatistical Analysis \t\n===========Main Text==========\n\n\nAbstract.\n\n\n\nThe widespread presence of large-scale genomic variations, termed copy number variation (CNVs), has been recently recognized in phenotypically normal individuals.\n\n\n\nJudging by the growing number of reports on CNVs, it is now evident that these variants contribute significantly to genetic diversity in the human genome.\n\n\n\nLike single nucleotide polymorphisms (SNPs), CNVs are expected to serve as potential biomarkers for disease susceptibility or drug responses.\n\n\n\nHowever, the technical and practical concerns still remain to be tackled.\n\n\n\nIn this review, we examine the current status of CNV DBs and research, including the ongoing efforts of CNV screening in the human genome.\n\n\n\nWe also discuss the characteristics of platforms that are available at the moment and suggest the potential of CNVs in clinical research and application.\n\n\n\nIIntroduction.\n\n\n\nTraditionally, large-scale genomic variants that are visible in conventional karyotyping have been thought to be associated with early-onset, highly penetrant genetic disorders, while they are incompatible in normal, disease-free individuals (Lupski, 1998; Stankiewicz and Lupski, 2002).\n\n\n\nThe construction of the 'reference genome' by the human genome sequencing project is based on the belief that human genome sequences are virtually identical, even in different individuals, except for well-known single nucleotide polymorphisms (SNP) or size-variants of tandem repeats such as mini- or microsatellites (variable number of tandem repeats or VNTR) (Przeworski et al., 2000).\n\n\n\nThis traditional concept has been recently challenged by the discovery that large structural variations are more prevalent than previously presumed (Check, 2005).\n\n\n\nUsing high-resolution whole- genome scanning technologies such as array-based comparative genomic hybridization (array-CGH), two groups of pioneering scientists have identified widespread copy number variations (CNVs) in apparently healthy, normal individuals (Iafrate et al., 2004; Sebat et al., 2004).\n\n\n\nIt proposes that our genome is more diverse than has ever been recognized, and subsequent studies have identified up to 11,000 CNVs across the whole genome (Tuzun et al., 2005; Hinds et al., 2006; Mills et al., 2006; McCarroll et al., 2006; Conrad et al., 2006; Sharp et al., 2005; Wong et al., 2007; de Smith et al., 2007).\n\n\n\nAlthough the current understanding of CNVs is still limited for practical use and technical challenges still remain to be tackled, recent studies already have demonstrated the potential association of CNVs with various diseases, suggesting plausible functional significances and highlighting the promising utility of CNVs.\n\n\n\nThe current coverage of CNVs in the human genome already has exceeded that of SNPs (approximately 600 Mb comprising 12% of human genome) and is still increasing (Cooper et al., 2007).\n\n\n\nThese large-scale structural variants, in addition to SNPs, will serve as powerful sources to help our understanding of human genetic variation and of differences in disease susceptibility for various diseases.\n\n\n\nThis paper reviews the current knowledge and future perspectives of CNVs.\n\n\n\nThe definition of CNV.\n\n\n\nStructural variations that involve large DNA segments can take various forms, such as duplication, deletion, insertion, inversion, and translocation.\n\n\n\nAmong them, DNA copy number variations larger than 1 kb are collectively termed CNVs.\n\n\n\nFig.\n\n\n\n1 illustrates the concept of CNV.\n\n\n\nAlthough the CNV can include large, microscopically visible genomic variations, it generally indicates a submicroscopic structural variation that is hardly detectable by conventional karyotyping (35 Mb) (Freeman et al., 2006).\n\n\n\nSmaller variations such as small insertional- deletion (indel) polymorphisms are not included in CNVs, while they comprise another large collection of over 400,000 variants in the human genome (Mills et al., 2006), and neither is the insertional polymorphism of mobile elements such as Alus or L1 elements considered a CNV.\n\n\n\nAt the beginning stages of CNV discovery, a number of terms were proposed to define them e.g., large-scale copy number variants (LCV) (Iafrate et al., 2004), copy number polymorphism (CNP) (Sebat et al., 2004), and intermediate-sized variants (ISV) (Tuzun et al., 2005).\n\n\n\nThe current definition of CNV is also operational and can be modified with the advance of scanning resolution and coverage, and availability of allele frequency in a determined population.The identification of CNVs using differ-ent platforms.\n\n\n\nVarious scanning platforms and quality control methods have been used to identify CNV calls.\n\n\n\nBecause the choice of platforms has a great effect on the results, it is worth reviewing the characteristics of platforms to improve the understanding of CNVs.\n\n\n\nThe presence of CNVs in normal individuals was reported for the first time in 2004 independently by two groups led by Lee C. and Wigler M. (Iafrate et al., 2004; Sebat et al., 2004).\n\n\n\nBoth studies used two-dye array-CGH techniques that used clones of bacterial artificial chromosomes (BAC) or oligonucleotides (representational oligonucleotide microarray analysis, or ROMA).\n\n\n\nTheyindependently reported about 250 and 80 loci as changes in copy number from 39 and 20 normal individuals, respectively.\n\n\n\nFig.\n\n\n\n2 illustrates the general concept of CNV detection based on two-dye array-CGH.\n\n\n\nAlthough the average numbers of CNVs per individual genome were similar in two studies (about 12 CNVs per genome), it should be noted that there was little overlap between the results.\n\n\n\nThis discrepancy between studies was possibly due to the use of different platforms and experimental conditions in different populations.\n\n\n\nHowever, it is also probable that there are still large numbers of structural variants that have yet to be discovered (Buckley et al., 2005; Eichler, 2006).\n\n\n\nOne following study that provided evidence on the widespread presence of large-scale structural variations in the human genome was based solely on in silico analysis (Tuzun et al., 2005).\n\n\n\nThe sequence-level comparison of two independent genome sequences, i.e., one derived from a human genome reference assembly and the other from fosmid clones of a genomic library, revealed about 300 structural variations, including inversions.\n\n\n\nThis method can detect various types of structural variants, including inversion, which is not detectable by conventional array-CGH platforms.\n\n\n\nIndeed, the results by Tuzun et al.\n\n\n\n(2005) can be used as validated control for primary verification or for parameter tuning for the development of CNV-detection platforms or algorithms.\n\n\n\nAlthough the use of this method is currently limited by the unavailability of sequence data, ongoing efforts to sequence the individual human genome and to develop cost-effective sequencing platforms (Bennett et al., 2005) will be able to facilitate sequence-level genome comparisons and the identification of highly qualified structural variants in the near future.\n\n\n\nTwo studies by McCarroll et al.\n\n\n\nand Conrad et al., which focused on the identification of deletion variants (McCarroll et al., 2006; Conrad et al., 2006), used 1.2 million SNP genotyping data from The International HapMap Consortium (International HapMap Consortium.\n\n\n\n2005).\n\n\n\nThey assumed that allelic deletion causes the discard of probes in SNP genotyping.\n\n\n\nFor example, the runs of consecutive probes with null genotype calls or runs of SNP genotypes whose allelic frequencies deviate from expected Hardy-Weinberg equilibrium ratios or expected Mendelian inheritance patterns might represent the presence of deleted loci.\n\n\n\nThey independently reported about 600 potential deletions as small as less than 100 bp.\n\n\n\nThe relatively small size of the identified variants, compared with the array-CGH method, is due to the high resolution of the platforms.\n\n\n\nThe use of an SNP-centric array platform can be used to identify linkage disequilibrium (LD) of structural variants with nearby SNPs in a given population.\n\n\n\nBut, the discrepancy in deletions that were identified in the two studies was also noted in spite of using similar HapMap populations and identification methods (Eichler 2006).\n\n\n\nRecently, a comprehensive CNV analysis was reported based on high-resolution array platforms, Whole Genome TilePath (WGTP), which used 26,000 large insert clones, and Affymetric GeneChip Human Mapping 500K early access, which used 500,000 SNP oligonucleotides.\n\n\n\nThey identified about 1500 genomic segments as copy number variations or CNVRs (copy number variable regions) consisting of overlapping CNVs from 269 HapMap individuals (Redon et al., 2006).\n\n\n\nThe results from the two platforms are worth comparing becasuse they provide the highest currently achievable resolution and are often selected as primary platforms in many other studies.\n\n\n\nFirstly, the CNVs that are identified from BAC-based array-CGH are generally larger than those from oligonucleotide-based arrays (230 kb and 80 kb of median size, respectively).\n\n\n\nThis overestimation of CNVs by BAC-based array-CGH is due to the large insert clones that are used, which has been frequently reported (Iafrate et al., 2006).\n\n\n\nSecondly, the actual boundaries of structural variants can not be determined through BAC-based array-CGH.\n\n\n\nOn the other hand, a more accurate determination of variant boundaries can be achieved through SNP-centric oligonucleotide-based arrays that have an extensive number of oligonucleotides.\n\n\n\nThe SNP-centric platform has additional advantages of accompanying SNP genotype information as a potential variant source, combined with large structural variants and its ability to detect the presence of loss of heterozygosity (LOH) or segmental uniparental disomy (Bruce et al., 2005; Mei et al., 2000).\n\n\n\nBut, the SNP-centric platform also has its disadvantages.\n\n\n\nIn spite of the advanced resolution, the relatively low signal-to-noise ratio of oligonucleotide-based hybridization intensity, compared with large insert clone array, might result in higher false-positive rates.\n\n\n\nBecause most CNVs are subtle changes, this makes the results prone to misclassification of signal intensities and, consequently, to statistical errors.\n\n\n\nSometimes, it is pointed out that the SNP-centric array was originally designed for allelic discrimination and is not appropriate for CNV detection because of biased genomic distribution and sequence composition of spotted probes (McCarroll and Altshuler 2007d).\n\n\n\nRecently proposed oligonucleotide-based array platforms have been designed for CNV detection specifically without sacrificing the advantage of high resolution, which can be a promising solution for CNV detection in the near future (Barrett et al., 2004).\n\n\n\nIn identifying CNVs in normal populations, one of the fundamental problems is the lack of a reference genome from which diploid states of sample DNA can be inferred.\n\n\n\nUnlike the array-CGH-based tumor study in which the normal DNA of the same individual can be used as a reference genome, no single DNA source can present the standardized and universal genome in variant analysis.\n\n\n\nOften, the pooled genome of several individuals has been used to represent the average genome, while the heterogeneity of the used population might affect the copy number inference step, as shown for examples of X chromosomes.\n\n\n\nRedon et al.\n\n\n\nand Komura et al.\n\n\n\nadopted the pairwise comparison for ac-curate inference of copy number states in individual loci, which is noteworthy (Redon et al., 2006; Komura et al., 2006).\n\n\n\nIn pairwise comparison, the hybridization intensities of one sample is compared with those of all other remaining samples as one large reference, and the diploid states of loci can be more accurately inferred from the multiple comparison results.Clinical implications of CNVs and dis-ease association study.\n\n\n\nIn spite of recent technological developments of genetic polymorphism-oriented disease association studies, still little is known about the effects of genetic polymorphisms on common complex diseases.\n\n\n\nOne of the ultimate goals in exploring CNVs is to systematically assess the association between such variants and the disease.\n\n\n\nAlthough it is unlikely that all CNVs in the human genome are associated with diseases, evidence of the association of CNVs and a wide spectrum of human diseases has rapidly accumulated.\n\n\n\nTable 1 summarizes the CNVs that have been reported to be associated with diseases.\n\n\n\nCNVs can affect disease susceptibility or individual differences in responses to drugs through alteration of gene expression.\n\n\n\nStranger et al.\n\n\n\n's and Heidenblad et al's reports coherently showed positive correlations between DNA copy number dosage and gene expression level (Stranger et al., 2007; Heidenblad et al., 2005).\n\n\n\nIf a CNV region contains transcriptional regulatory elements rather than protein coding genes, it still can affect gene expression levels by changing transcriptional regulation or heterochromatin spread (Reymond et al., 2007).Conclusion.\n\n\n\nThe genomic fraction that is occupied by CNVs is now estimated to be about 600 Mb, already exceeding that of single base-level variants.\n\n\n\nIt is likely that the number of CNVs and the genomic fraction that is affected by structural variants will continue to expand, and many of them will be used for more practical purposes, including disease association or population studies.\n\n\n\nHowever, it should be remembered that the current CNV entries are plagued by substantial amounts of false-positive and false-negative results.\n\n\n\nOnly a small portion of them have been validated by independent methods.\n\n\n\nTo overcome this, it is necessary to improve scanning platforms, including optimizing experimental conditions and developing more reliable CNV calling algorithms.\n\n\n\nIn the meantime, it is required for individual researchers to know the characteristics of the available platforms and analytical techniques to use them or to interpret the published results properly.e found peak evidence of linkage (LOD score=1.88) for HDL cholesterol level on chromosome 6 (nearest marker D6S1660) and potential evidences for linkage on chromosomes 1, 12 and 19 with the LOD scores of 1.32, 1.44 and 1.14, respectively.\n\n\n\nThese results should pave the way for the discovery of the relevant genes by fine mapping and association analysis.IIntroduction.\n\n\n\nCholesterol is a major part of cell membranes.\n\n\n\nCholesterol is carried in the blood by chylomicrons, very low density lipoproteins (VLDL), high density lipoproteins (HDL) and low density lipoproteins (LDL) (Dastani et al.\n\n\n\n2006).\n\n\n\nHDL cholesterol is reversely associated with cardiovascular disease, and is more tightly controlled by genetic factors than the other lipoproteins such as LDL, VLDL and chylomicrons.\n\n\n\nEnvironmental factors including chronic alcoholism, estrogen replacement therapy, and exercise influence the levels of HDL cholesterol.\n\n\n\nSeveral families with strikingly elevated HDL cholesterol levels have been identified.\n\n\n\nHDL cholesterol levels are higher in blacks compared with whites and HDL cholesterol levels of females are higher than those of males (Barcat et al.\n\n\n\n2006; Brousseau et al.\n\n\n\n2004; Yamashita et al.\n\n\n\n2000; Imperatore et al.\n\n\n\n2000).\n\n\n\nCandidate gene analysis using population-based case-control studies has been used to test the association between SNPs and HDL cholesterol levels.\n\n\n\nAmong the candidate genes selected mainly from lipid metabolism pathways, ApoA-I gene is the one most intensively studied (Inazu et al.\n\n\n\n1994; Kuivenhoven et al.\n\n\n\n1997).\n\n\n\nBy genome-wide linkage analysis, susceptibility genes can be identified although the genes are not candidates based on lipid metabolism.\n\n\n\nGenome-wide linkage scans are conducted by use of microsatellite markers to identify genetic determinants affecting the traits (Wang and Paigen 2005).\n\n\n\nUsing HDL cholesterol levels as either discrete or quantitative trait, several linkage studies on genetic determinants of HDL cholesterol have been reported (Yancey et al.\n\n\n\n2003).\n\n\n\nGenetic effects on the variations in HDL cholesterol were studied mainly in Caucasians and Africans thus far, and little attention has been focused in this regard on Asian populations.\n\n\n\nWe found suggestive evidence for linkage for HDL cholesterol on chromosome 6, 1, 12 and 19, in studies conducted as part of GENDISCAN study, a large epidemiological study of Complex traits in geographically, culturally and genetically isolated large Mongolian families l in Dornod, Mongolia report.\n\n\n\nMethods.\n\n\n\nWe analyzed data from 1002 Mongolian individuals from 95 large extended families.\n\n\n\nInformed consent was obtained from all subjects prior to participation and the protocol was approved by the Institutional Review Board at Seoul National University.\n\n\n\nPotentially confounding variables were assessed for each participant along with overall medical history.\n\n\n\nInformation on age, gender and anthropometry (height, weight, waist circumference, hip circumference and body fat content) were obtained for each individual.\n\n\n\nHeight in centimeter (cm) and weight in kilograms (kg) were measured using an automatic measuring instrument (IMI 1000, Immanuel Elec., Korea).\n\n\n\nBody mass index (BMI) was calculated in kg/m.\n\n\n\nWaist circumference was measured to the nearest centimeter at the level of the umbilicus, and hip circumference was measured at the level of the maximal circumference of the gluteus.\n\n\n\nAll other variables were collected through interviews performed by trained interviewers.\n\n\n\nInformation about amount of alcohol and smoking was also obtained from all the participants.\n\n\n\nAll the subjects were asked to fast for 12 hours before their visit.\n\n\n\nBlood samples were collected from an antecubital vein into vacutainer tubes containing EDTA.\n\n\n\nBlood samples were centrifuged at 3000rpm for 10 minutes and then stored at 70C.\n\n\n\nDNA was isolated from lymphocytes for polymerase chain reaction (PCR) and automated genotyping.\n\n\n\nA 10 ml blood sample was collected from each participating individual for genomic DNA extraction.\n\n\n\nDNA was extracted from peripheral lymphocytes using the PUREGENE DNA Purification Kit for whole blood (Gentra Systems Inc, USA).\n\n\n\nFor genotyping, a set of 1000 microsatellite markers deCODE mapping sets (deCODE genetics, USA) was used covering the genome at an average density of 3 centimorgans (cM).\n\n\n\nHDL cholesterol was measured by the enzymatic method using Cholestest-N-HDL kit (DAICHI, JAPAN) and HITACHI 7600-210 \u0026 HITACHI 7180 instruments.\n\n\n\nExtensive quality control procedures ensured the validity and reproducibility of the measurements.\n\n\n\nMultiple linear regression analysis was used by PC SAS version 8.2 and PC SPSS version 12 to account for effect of confounding variables.\n\n\n\nPedigree data was managed by PedSys (Southwest Foundation for Biomedical Research, San Antonio, Texas, USA).\n\n\n\nNonpaternity was examined using PEDCHECK (Mcpeek and Sun 2000) and relationships other than paternity were checked using average IBD-based method by PREST.\n\n\n\nAfter correcting pedigree error and Mendelian errors, non-mendelian errors were examined and corrected using SimWalk.\n\n\n\nIdentity by descent (IBD) matrix between every relationship pairs in family was calculated and IBD matrix for single marker was calculated by SOLAR (Sequential Oligogenic Linkage Analysis Routines software version 2.1.4).\n\n\n\nMultipoint IBD matrices were computed on every 1 cM distance using Markov chain Monte Carlo method by LOKI (Heath 1997).\n\n\n\nGenetic components of selected phenotypes were estimated in terms of heritability.\n\n\n\nNarrow sense heritability, defined as the proportion of total phenotypic variation due to additive genetic effects, was calculated.\n\n\n\nHeritability of HDL cholesterol adjusted for age, gender, age- square, product of age and gender, product of age- square and gender, systolic BP, smoking and alcohol was estimated and a variance component linkage analysis was carried out by SOLAR which uses maximum likelihood methods to estimate variance components for the polygenic genetic effect and random individual environmental effects.\n\n\n\nResults and Discussion.\n\n\n\nThe mean age of the 1002 individuals was 31 years and 54.5% of them were female.\n\n\n\nDemographic and pedigree characteristics of the study sample are shown in Table 1.\n\n\n\nThe family size had a mean of 16.\n\n\n\nTable 2 included information on 2546 pairs of first degree relatives (1812 parent-offspring pairs and 734 full-sib pairs), 2485 pairs of their second degree relatives (395 half-sibling pairs, 1202 grandparent-grandchild pairs, and 888 avuncular pairs), and 598 first-cousin pairs.\n\n\n\nMeans of their total cholesterol, HDL cholesterol, LDL cholesterol, and triglyceride were 159.82 mg/dl, 55.19 mg/dl, 90.51 mg/dl, and 63.30 mg/dl, respectively.\n\n\n\nTable 3 shows correlation between HDL cholesterol and covariates such as age, gender, systolic blood pressure, alcohol consumption status, and smoking status.\n\n\n\nThese parameters were used as covariates in the variance component analysis which provided multivariable adjusted heritability estimates for HDL cholesterol of 0.45 (Table 4).\n\n\n\nThe peak multipoint LOD score was 1.88 on 6p21 (nearest marker D6S1660) and a secondary peak (LOD score of 1.44) was found on 12q23 (nearest marker D12S354).\n\n\n\nWe identified other potential evidence for linkage in the LOD score of 1.32 on 1q24 (nearest marker D1S412) and a LOD score of 1.14 at 19p13 (nearest marker D19S884) (Fig.\n\n\n\n1, 2).\n\n\n\nTable 5 presents all LOD scores 1.0 for HDL cholesterol.\n\n\n\nWe identified potential evidence of linkage on several chromosomes.\n\n\n\nIn other genome scan, a weak linkage signal for HDL cholesterol was observed for regions that overlapped slightly with the regions identified herein.\n\n\n\nKlos et al.\n\n\n\nreported the appearance of peak position in the chromosome 12q in European American population (Klos et al.\n\n\n\n2001) (Table 6).\n\n\n\nWe found evidence of link- the population isolates used in GENDISCAN study would not present significant inflation of type I errors from inbreeding effects in its gene discovery analysis.\n\n\n\nIIntroduction.\n\n\n\nThe GENDISCAN (Gene Discovery for Complex traits in Asian population of Northeast area) study was launched in 2002 in order to elucidate genetic causes of complex diseases.\n\n\n\nThis study attempted to incorporate designs that detect genetic signals with increased efficiency.\n\n\n\nThese included using genetically homogeneous population, recruiting large families, and considering quantitative phenotypes as well as disease outcome (Peltonen et al., 2001; Merikangas et al., 2003).\n\n\n\nLarge extended families still remaining in the Northeast Asia, enabled the project to adopt these designs.\n\n\n\nAlthough there is no doubt that gene discovery of common complex diseases is one of the research priorities, the successful results have been very limited (Grant et al., 2006).\n\n\n\nThe difficulty of replication across studies, mandates the use of internally valid study designs and proper methodologies.\n\n\n\nUsing population isolates generally confers the advantage of increasing genetic homogeneity.\n\n\n\nHowever population isolates might have inbreeding structures, which deviates the basic assumptions of HWE.\n\n\n\nThe presence of significant inbreeding necessitates modifications in genetic estimations using the population.\n\n\n\nTherefore, we attempted to estimate the status of HWE, and inbreeding coefficients in two ethnic groups of Mongolia using genome-wide short tandem repeat (STR) genetic markers.\n\n\n\nCompatibility with basic assumptions of population genetics can support the methodological validity of the overall GENDISCAN study,Methods.\n\n\n\nThe GENDISCAN study included non-selected families in Mongolia.\n\n\n\nThe People's Republic of Mongolia (not including the Chinese territory) has 2.6 million people which comprise of more than 20 ethnic groups.\n\n\n\nThe Orkhontuul are in Selenge Imag (Imag is an administrative district unit in Mongolia corresponding to a state in the United States) and the Dashbalbar area in Dornod Imag were selected.\n\n\n\nThe Orkhontuul area has a population of 3,760 people, mainly consisting of Khalkha tribe, and maintains semi-urban life style.\n\n\n\nThe Dashbalbar area is mainly habituated by about 4,000 people of Buryat ethnicity and has more traditional nomadic life style.\n\n\n\nMany large extended families, which fit the study purposes of the GENDISCAN study still remain in both areas.Genomic DNA was extracted from peripheral leukocytes.\n\n\n\nThe Orkhontuul samples (2004, n=1,080) were genotyped using the Applied Biosystems Inc. platform (ABI Prism Linkage Mapping Set version 2.5 medium density, 400 markers) with average 10 cM resolution, and Dashbalbar samples (2006, n=1,020) were genotyped using the deCODE 1,000 STR marker platform with average of 3 cM resolution.\n\n\n\nFor the Orkhontuul participants markers on the chromosome 14 were analyzed.\n\n\n\nFor Orkhontuul data, markers with low call-rate (49 markers), and with more than 1% of genotype error rates (16 markers) and markers on X chromosome (18 markers) were excluded.\n\n\n\nFor Dashbalbar genotype data, the 1,000 STR marker platform provided 1097 markers originally, however we excluded markers on X chromosome (49 markers) and markers with low call-rate and more than 1% of genotype error rates (4 markers).\n\n\n\nAll participants provided informed consent.HWE and degree of inbreeding were assessed using the founders of each pedigree.\n\n\n\nNon-founders were excluded because their genotypes are dependent on those of the founders.\n\n\n\nHWE was estimated by comparing the expected and observed genotype frequencies.\n\n\n\nExpected genotype frequency was calculated from allele frequency.\n\n\n\nChi-square goodness of fit test was used to determine whether HWE assumption was met.\n\n\n\nThe Chi-square statistics () of multi-allelic loci is defined as equation as Equation 1, with k (k-1) degree of freedom, where k is the total number of alleles.\n\n\n\n(Equation 1)where, nuu and nuv denote homozygotic and heterozygotic genotypes, while pu and pv denote allele frequency of each allele.\n\n\n\nInformation contents of the genetic markers were estimated as polymorphism information content (PIC), heterozygosity and allelic diversity.\n\n\n\nPIC is an index of the amount of information, which modifies the simple heterozygosity index by adjusting for the chance of mating between the same heterozygotic genotypes.\n\n\n\nPIC was calculated from Equation 2.\n\n\n\n(Equation 2)where p and p denote allele frequency of each allele (Czika, 2005).\n\n\n\nInbreeding was estimated by the deviation from the assumption that each founder shares no Identity by descent (IBD).\n\n\n\nGenerally genotype frequency of bi-allelic locus having p and q allele frequencies are predicted as p, 2pq, q respectively under HWE.\n\n\n\nHowever, if there are IBD sharing of FI between founders, above prediction can be re-written respectively as Equation 3.\n\n\n\n(Equation 3)where, Fdenotes inbreeding coefficient (Gillespie et al., 2004).\n\n\n\nIn brief, inbreeding is characterized by the excess of homozygote over expected level.\n\n\n\nThe inbreeding coefficient can be estimated as Equation 4 by solving Equation 3 (Equation 4)where, H denotes observed heterozygotic, and 2pq denotes estimated heterozygotic proportions from allele frequency (Hart et al., 2000).\n\n\n\nHWE and estimations of expected and observed heterozygosity frequencies were obtained using SAS/Genetics program.Results.\n\n\n\nThe demographic characteristics of the subjects geno-typed are shown in Table 1.\n\n\n\nThere were 280 (99 men and 181 women) and 142 (90 men and 52 women) founders in Orkhontuul and Dashbalbar populations.\n\n\n\nNon-founders' genotype.\n\n\n\nwere excluded, since theirs do not independently contribute to a gene pool.\n\n\n\nThe information contents in terms of PIC for single marker, range between 0.2 and 0.9, as shown in Fig.\n\n\n\n1.\n\n\n\nAverage PIC was 0.72 and 0.71 for Orkhontuul and Dashbalbar populations, respectively which are relatively high for single marker information contents.\n\n\n\nThere was no significant difference in PIC across the chromosomes or populations.\n\n\n\nThe high PIC level enabled accurate estimation of other population genetic parameters.\n\n\n\nHWE was satisfied among 88.6 % and 94.2%, respectively, of all markers in Orkhontuul and Dashbalbar populations (p-value 0.05).\n\n\n\nIf we apply the criteria of p-value 0.01, 90.5% and 95.3% of all markers were in HWE status All the markers including those which were not in HWE, were used for estimating the inbreeding coefficients,.\n\n\n\nInbreeding coefficient was estimated to be 0.0023 and 0.0021 in Orkhontuul and Dashbalbar populations.\n\n\n\nDiscussion.\n\n\n\nPopulation isolates are generally considered to be one of the most ideal populations for genetic study (Pajukanta et al., 2003; rcos-Burgos et al., 2002; Escamilla et al., 2001).\n\n\n\nHowever, possible inbreeding can cause deviation from general assumptions on which most analyses depend.\n\n\n\nPresence of inbreeding can be problematic, because, if exits, l the genetic relationships between unrelated as well as related persons could be underestimated.\n\n\n\nThis underestimation of IBD can result in inflation of type I errors for linkage analysis (Hossjer et al., 2006 Nomura et al., 2005), linkage disequilibrium estimations and haplotype reconstructions (Zhang et al., 2004).\n\n\n\nThe inbreeding coefficient found in this study (about 0.2% in each population), does not necessitate any adjustment for genetic analyses such as IBD calculation, classic or non-parametric linkage analysis, and variance component-based linkage analysis.\n\n\n\nBy estimating the last common ancestor, 0.2% of inbreeding coefficient corresponds to 10 or 11 generations (Jensen- Seaman et al., 2001; Santos-Lopes et al., 2007).\n\n\n\nIn this study, both ABI and deCODE STR markers were genotyped with standardized procedure and any markers with more than 1% of genotype errors were discarded.\n\n\n\nThe genotype errors were confirmed within the pedigree structure.\n\n\n\nAny Mendelian inconsistency was deleted and markers with possible double-recombination were also deleted.\n\n\n\nGenerally, genotyping in family-based study is more accurate than in studies using individuals only.\n\n\n\nThus, It is not likely that any genotype error could have been biased our findings.\n\n\n\nIn conclusion, we have estimated inbreeding coefficients in two population isolates in Mongolia.,.\n\n\n\nWe found that they fall in negligible range, allowing related genetic studies to be performed without any modification or adjustment for possible inbreeding effects.\n\n\n\nThis finding validates the ability of The GENDISCAN study to add to the growing body of evidence which associates specific genetic variations with complex disorders.% (6.4 of 34.5 Mb) of chromosome 22 with 757 tagSNPs and 815 haplotypes (frequency 5.0%).\n\n\n\nOf 3430 common SNPs genotyped in all five populations, 514 were monomorphic in Koreans.\n\n\n\nThe CHB + JPT samples have more than a 72% overlap with the monomorphic SNPs in Koreans, while the CEU + YRI samples have less than a 38% overlap.\n\n\n\nThe patterns of hot spots and LD blocks were dispersed throughout chromosome 22, with some common blocks among populations, highly concordant between the three Asian samples.\n\n\n\nAnalysis of the distribution of chimpanzee-derived allele frequency (DAF), a measure of genetic differentiation, Fst levels, and allele frequency difference (AFD) among Koreans and the HapMap samples showed a strong correlation between the Asians, while the CEU and YRI samples showed a very weak correlation with Korean samples.\n\n\n\nRelative distance as a quantitative measurement based upon DAF, Fst, and AFD indicated that all three Asian samples are very proximate, while CEU and YRI are significantly remote from the Asian samples.\n\n\n\nComparative genome-wide LD studies provide useful information on the association studies of complex diseases.\n\n\n\nIIntroduction.\n\n\n\nVast amounts of information on single nucleotide polymorphisms (SNPs) and progress in high-throughput genotyping technology have generated a great deal of interest in establishing genome-wide linkage disequilibrium (LD) maps for genetic studies of complex traits (Chakravarti 2001; The International HapMap Consortium 2003; Myers and Bottolo 2005).\n\n\n\nLD is known to occur in a block-like structure across the genome, with conserved haplotype blocks of tens to hundreds of kilobases punctuated by \"hot spots\" of recombination (Daly et al.\n\n\n\n2001).\n\n\n\nSince the concept of whole genome association studies using SNPs was introduced (Risch and Merikangas 1996), an optimal number of SNPs required for association studies has been center of extensive debate (Kruglyak 1999).\n\n\n\nInitial studies have focused on average LD levels and the variability in processes that generate LD (Cardon and Abecasis 2003).\n\n\n\nAlthough a single chromosome could carry many haplotypes in LD blocks, recent studies suggest that haplotypic variation may be much lower than previously imagined (Jeffreys et al.\n\n\n\n2001; Patil et al.\n\n\n\n2001; Gabriel et al.\n\n\n\n2002).\n\n\n\nPatil's group identified haplotype blocks on chromosome 21 for which over 80% of chromosomes were represented by a few common haplotypes (Patil et al.\n\n\n\n2001).\n\n\n\nIn the analysis of human chromosome 22 with a marker density of one SNP per 15 kb, Dawson's group reported a highly variable pattern of LD along the chromosome, in which extensive regions of complete LD of up to 804 kb in length were interspersed with regions of no detectable LD (Dawson et al.\n\n\n\n2002).\n\n\n\nAlthough differences of LD patterns between populations have been reported (Abecasis et al.\n\n\n\n2002; Reich et al.\n\n\n\n2001, Zavattari et al.\n\n\n\n2002), little information is available on the haplotype structure in different populations other than the recent study by S.B.\n\n\n\nGabriel, et al.\n\n\n\n(Gabriel et al.\n\n\n\n2002).\n\n\n\nOn the other hand, haplotype analysis has been widely employed in linkage studies for narrowing down the location of disease susceptibility genes (Zhang et al.\n\n\n\n2004; Park 2007).\n\n\n\nThe International HapMap Project was launched to develop a haplotype map of the human genome, the HapMap, which will describe the common patterns of human DNA sequence variation among four population samples: 30 trios from Yoruba in Ibadan, Nigeria (YRI), 45 unrelated Japanese in Tokyo, Japan (JPT), 45 unrelated Han Chinese in Beijing, China (CHB), and 30 trios in a Utah, US population with Northern and Western European ancestry (CEU) from the CEPH collection (The International HapMap Consortium 2003; 2004; 2007).\n\n\n\nAs the International HapMap Project releases a validated SNP map of 1 marker per kb for the HapMap samples, the general applicability of the HapMap data needs to be confirmed in samples from related populations.\n\n\n\nRecent comparative studies of LD patterns have shown a high degree of concordance among various populations (Gabriel et al.\n\n\n\n2002; Shifman et al.\n\n\n\n2003; Stenzel et al.\n\n\n\n2004; Mueller et al.\n\n\n\n2005).\n\n\n\nAs the HapMap samples include Japanese and Chinese, it was our interest to test whether significant differences in LD exist between Koreans and the two other Asian samples.\n\n\n\nIn this paper, we measured the LD pattern along chromosome 22 in Korean samples and compared the Korean data with those of the four HapMap samples.\n\n\n\nWe were interested in exploring how the HapMap data could be used to estimate the genomic structure of Koreans.\n\n\n\nWe expect that this study will contribute to the development of proper strategies for association studies of common complex diseases in Koreans using the HapMap data.\n\n\n\nMethods.\n\n\n\nA total of 111,448 reference SNPs from chromosome 22 in the dbSNP (http://www.ncbi.nlm.nih.gov/SNP, build 116) were collected.\n\n\n\nTo maximize cost effectiveness of genotyping, SNPs were selected based on the following criteria: 1) markers with even spacing, 2) verified SNPs, 3) coding SNPs.\n\n\n\nThe SNPs were scored for the selection of the study using the following strategies.\n\n\n\nFirst, it was most important in mapping chromosomal LD blocks to have relatively equal spaces between SNP markers.\n\n\n\nSecond, verified SNP markers (validation status was scored as 0 to 4 in the dbSNP) that had higher scores were chosen to prevent or reduce genotyping failure.\n\n\n\nAlso, repeated sequence regions were excluded by repeat masking with Primer3 software (Rozen and Skaletsky 2000).\n\n\n\nThird, to be useful for a further study, protein coding SNPs had higher scores.\n\n\n\nA total of 12,674 genotyping experiments were conducted by four Genotyping Centers, and a final set of 4681 markers passed the stringent quality control procedure (The International HapMap Consortium 2003).\n\n\n\nGenomic DNA from 90 unrelated Korean individuals without family histories of major diseases was obtained from the Genomic Research Center in the Korean National Institute of Health (KNIH).\n\n\n\nThe KNIH samples were collected as part of an epidemiological project and represent urban and rural regions in the south of Seoul.\n\n\n\nThe sex ratio was 0.5 and the mean age was 50.\n\n\n\nInformed consent from all participating subjects was obtained through KNIH, and research approval came from the relevant ethical committees.\n\n\n\nDNA was isolated from peripheral blood leukocytes according to standard procedures with proteinase K-RNase digestion, followed by phenol-chloroform extraction.For each SNP, we chose a set of three primers: two PCR primers to amplify a product of 100-200 bps under standard conditions and an optimized extension primer to be complementary to the sequence immediately to a SNP site.\n\n\n\nFor genotyping, we employed three platforms-6063 SNP genotypings were done using the Orchid Bioscience SNP-IT assay (Princeton, NJ), 984 SNP genotypings using the PerkinElmer Life Sciences FP-TDI assay (Boston, MA), and 5627 SNP genotypings using the Sequenom MassARRAY (San Diego, CA).\n\n\n\nA genotype frequency for each SNP was checked for consistency between the observed values and those expected from the Hardy-Weinberg equilibrium test in each assay.\n\n\n\nHaploview version 3.2 (Barrett et al.\n\n\n\n2004), based on the expectation-maximization (EM) method (Excoffier and Slatkin 1995), was used to infer haplotype phase and population frequency and to estimate the Lewontin's coefficients D' (Lewontin 1998), LOD, and correlation coefficient r (Hill and Robertson 1968).\n\n\n\nPHASE v2.1 was used to estimate the recombination parameters (Li and Stephens 2003; Crawford et al.\n\n\n\n2004) and assess the statistical significance of haplotype profile differences and individual haplotype fre-\n\n"}