=============Title==========
Copy Number Variations in the Human Genome: Potential Source for Individual Diversity and Disease Association Studies.
=============Cor Author==========
*Corresponding author: E-mail yejun@catholic.ac.krTel +82-2-590-1214, Fax +82-2-596-8969 Accepted 11 March 2008
===========Author==========
Tae-Min Kim1, Seon-Hee Yim2 and Yeun-Jun Chung1,2*1Department of Microbiology, 2Integrated Research Center for Genome Polymorphism, The Catholic University of Korea, Seoul 137-701, Korea−5); the re-maining 12 (ARHGEF12, CSMD1, ARSG, SGSM1, GJA8, ALX4, 2q31.1, 15q22.1, LIPC, CSK, 7q21.13, 7q31.1, 10q25.2) were suggestive loci (10−5<p-value<10−4). To enhance the original association study, a total of 1,827,004 imputed SNPs were reanalyzed for their asso-ciation with blood pressure in this study.
===========Keywords==========
Keywords: array-CGH, Copy number variation (CNV), Genome-wide association study (GWAS)Keywords: chromosome, genome-wide linkage search, heritability, HDL cholesterolKeywords: inbreeding coefficient, Mengolian population, STR, HWE, PICKeywords: haplotype, HapMap, Korean, LD, populations, SNP
===========Sub Heading==========
Abstract Introduction The definition of CNV The identification of CNVs using differ-ent platforms Clinical implications of CNVs and dis-ease association study Conclusion Introduction Methods Results and Discussion Introduction Methods Results Discussion Introduction Methods Methods Construction of Adenoviral Vector for hTERT-specific Group I Intron Major Functionalities of CGHscape Methods Features and Results Personal Genomics Polymorphism and Mutation Databases Methods Features and Results Methods Features and Results Methodology J2.5dPathway Browser Architecture Methods Data and processing Methods Methods Methods Methods Methods Methods Methods Self-Reported Personality Measures Methods Particle Swarm Optimization PSO-based classification system Methods
==========Minor Heading===========
ASubjects, medical histories, genotyping, and measurement of HDL cholesterol Statistical analyses, heritability estimation, and variance component linkage analysis Participants Genotyping Estimating Hardy-Weinberg Equilibrium (HWE), Information Contents and Inbreeding Coefficients ASNP Selection DNA Samples Genotyping Statistical Analysis ADatasets The dataset Subjects Method Dataset Study samples Animals Cataloging syntenic regions Data acquisition Study samples Study subjects Subjects Molecular Modeling Model design Trio phasing of SNP genotypes based on Men-delian inheritance patterns
===========Main Text==========
Abstract.
The widespread presence of large-scale genomic variations, termed copy number variation (CNVs), has been recently recognized in phenotypically normal individuals.
Judging by the growing number of reports on CNVs, it is now evident that these variants contribute significantly to genetic diversity in the human genome.
Like single nucleotide polymorphisms (SNPs), CNVs are expected to serve as potential biomarkers for disease susceptibility or drug responses.
However, the technical and practical concerns still remain to be tackled.
In this review, we examine the current status of CNV DBs and research, including the ongoing efforts of CNV screening in the human genome.
We also discuss the characteristics of platforms that are available at the moment and suggest the potential of CNVs in clinical research and application.
IIntroduction.
Traditionally, large-scale genomic variants that are visible in conventional karyotyping have been thought to be associated with early-onset, highly penetrant genetic disorders, while they are incompatible in normal, disease-free individuals (Lupski, 1998; Stankiewicz and Lupski, 2002).
The construction of the 'reference genome' by the human genome sequencing project is based on the belief that human genome sequences are virtually identical, even in different individuals, except for well-known single nucleotide polymorphisms (SNP) or size-variants of tandem repeats such as mini- or microsatellites (variable number of tandem repeats or VNTR) (Przeworski et al., 2000).
This traditional concept has been recently challenged by the discovery that large structural variations are more prevalent than previously presumed (Check, 2005).
Using high-resolution whole- genome scanning technologies such as array-based comparative genomic hybridization (array-CGH), two groups of pioneering scientists have identified widespread copy number variations (CNVs) in apparently healthy, normal individuals (Iafrate et al., 2004; Sebat et al., 2004).
It proposes that our genome is more diverse than has ever been recognized, and subsequent studies have identified up to 11,000 CNVs across the whole genome (Tuzun et al., 2005; Hinds et al., 2006; Mills et al., 2006; McCarroll et al., 2006; Conrad et al., 2006; Sharp et al., 2005; Wong et al., 2007; de Smith et al., 2007).
Although the current understanding of CNVs is still limited for practical use and technical challenges still remain to be tackled, recent studies already have demonstrated the potential association of CNVs with various diseases, suggesting plausible functional significances and highlighting the promising utility of CNVs.
The current coverage of CNVs in the human genome already has exceeded that of SNPs (approximately 600 Mb comprising 12% of human genome) and is still increasing (Cooper et al., 2007).
These large-scale structural variants, in addition to SNPs, will serve as powerful sources to help our understanding of human genetic variation and of differences in disease susceptibility for various diseases.
This paper reviews the current knowledge and future perspectives of CNVs.
The definition of CNV.
Structural variations that involve large DNA segments can take various forms, such as duplication, deletion, insertion, inversion, and translocation.
Among them, DNA copy number variations larger than 1 kb are collectively termed CNVs.
Fig.
1 illustrates the concept of CNV.
Although the CNV can include large, microscopically visible genomic variations, it generally indicates a submicroscopic structural variation that is hardly detectable by conventional karyotyping (35 Mb) (Freeman et al., 2006).
Smaller variations such as small insertional- deletion (indel) polymorphisms are not included in CNVs, while they comprise another large collection of over 400,000 variants in the human genome (Mills et al., 2006), and neither is the insertional polymorphism of mobile elements such as Alus or L1 elements considered a CNV.
At the beginning stages of CNV discovery, a number of terms were proposed to define them e.g., large-scale copy number variants (LCV) (Iafrate et al., 2004), copy number polymorphism (CNP) (Sebat et al., 2004), and intermediate-sized variants (ISV) (Tuzun et al., 2005).
The current definition of CNV is also operational and can be modified with the advance of scanning resolution and coverage, and availability of allele frequency in a determined population.The identification of CNVs using differ-ent platforms.
Various scanning platforms and quality control methods have been used to identify CNV calls.
Because the choice of platforms has a great effect on the results, it is worth reviewing the characteristics of platforms to improve the understanding of CNVs.
The presence of CNVs in normal individuals was reported for the first time in 2004 independently by two groups led by Lee C. and Wigler M. (Iafrate et al., 2004; Sebat et al., 2004).
Both studies used two-dye array-CGH techniques that used clones of bacterial artificial chromosomes (BAC) or oligonucleotides (representational oligonucleotide microarray analysis, or ROMA).
Theyindependently reported about 250 and 80 loci as changes in copy number from 39 and 20 normal individuals, respectively.
Fig.
2 illustrates the general concept of CNV detection based on two-dye array-CGH.
Although the average numbers of CNVs per individual genome were similar in two studies (about 12 CNVs per genome), it should be noted that there was little overlap between the results.
This discrepancy between studies was possibly due to the use of different platforms and experimental conditions in different populations.
However, it is also probable that there are still large numbers of structural variants that have yet to be discovered (Buckley et al., 2005; Eichler, 2006).
One following study that provided evidence on the widespread presence of large-scale structural variations in the human genome was based solely on in silico analysis (Tuzun et al., 2005).
The sequence-level comparison of two independent genome sequences, i.e., one derived from a human genome reference assembly and the other from fosmid clones of a genomic library, revealed about 300 structural variations, including inversions.
This method can detect various types of structural variants, including inversion, which is not detectable by conventional array-CGH platforms.
Indeed, the results by Tuzun et al.
(2005) can be used as validated control for primary verification or for parameter tuning for the development of CNV-detection platforms or algorithms.
Although the use of this method is currently limited by the unavailability of sequence data, ongoing efforts to sequence the individual human genome and to develop cost-effective sequencing platforms (Bennett et al., 2005) will be able to facilitate sequence-level genome comparisons and the identification of highly qualified structural variants in the near future.
Two studies by McCarroll et al.
and Conrad et al., which focused on the identification of deletion variants (McCarroll et al., 2006; Conrad et al., 2006), used 1.2 million SNP genotyping data from The International HapMap Consortium (International HapMap Consortium.
2005).
They assumed that allelic deletion causes the discard of probes in SNP genotyping.
For example, the runs of consecutive probes with null genotype calls or runs of SNP genotypes whose allelic frequencies deviate from expected Hardy-Weinberg equilibrium ratios or expected Mendelian inheritance patterns might represent the presence of deleted loci.
They independently reported about 600 potential deletions as small as less than 100 bp.
The relatively small size of the identified variants, compared with the array-CGH method, is due to the high resolution of the platforms.
The use of an SNP-centric array platform can be used to identify linkage disequilibrium (LD) of structural variants with nearby SNPs in a given population.
But, the discrepancy in deletions that were identified in the two studies was also noted in spite of using similar HapMap populations and identification methods (Eichler 2006).
Recently, a comprehensive CNV analysis was reported based on high-resolution array platforms, Whole Genome TilePath (WGTP), which used 26,000 large insert clones, and Affymetric GeneChip Human Mapping 500K early access, which used 500,000 SNP oligonucleotides.
They identified about 1500 genomic segments as copy number variations or CNVRs (copy number variable regions) consisting of overlapping CNVs from 269 HapMap individuals (Redon et al., 2006).
The results from the two platforms are worth comparing becasuse they provide the highest currently achievable resolution and are often selected as primary platforms in many other studies.
Firstly, the CNVs that are identified from BAC-based array-CGH are generally larger than those from oligonucleotide-based arrays (230 kb and 80 kb of median size, respectively).
This overestimation of CNVs by BAC-based array-CGH is due to the large insert clones that are used, which has been frequently reported (Iafrate et al., 2006).
Secondly, the actual boundaries of structural variants can not be determined through BAC-based array-CGH.
On the other hand, a more accurate determination of variant boundaries can be achieved through SNP-centric oligonucleotide-based arrays that have an extensive number of oligonucleotides.
The SNP-centric platform has additional advantages of accompanying SNP genotype information as a potential variant source, combined with large structural variants and its ability to detect the presence of loss of heterozygosity (LOH) or segmental uniparental disomy (Bruce et al., 2005; Mei et al., 2000).
But, the SNP-centric platform also has its disadvantages.
In spite of the advanced resolution, the relatively low signal-to-noise ratio of oligonucleotide-based hybridization intensity, compared with large insert clone array, might result in higher false-positive rates.
Because most CNVs are subtle changes, this makes the results prone to misclassification of signal intensities and, consequently, to statistical errors.
Sometimes, it is pointed out that the SNP-centric array was originally designed for allelic discrimination and is not appropriate for CNV detection because of biased genomic distribution and sequence composition of spotted probes (McCarroll and Altshuler 2007d).
Recently proposed oligonucleotide-based array platforms have been designed for CNV detection specifically without sacrificing the advantage of high resolution, which can be a promising solution for CNV detection in the near future (Barrett et al., 2004).
In identifying CNVs in normal populations, one of the fundamental problems is the lack of a reference genome from which diploid states of sample DNA can be inferred.
Unlike the array-CGH-based tumor study in which the normal DNA of the same individual can be used as a reference genome, no single DNA source can present the standardized and universal genome in variant analysis.
Often, the pooled genome of several individuals has been used to represent the average genome, while the heterogeneity of the used population might affect the copy number inference step, as shown for examples of X chromosomes.
Redon et al.
and Komura et al.
adopted the pairwise comparison for ac-curate inference of copy number states in individual loci, which is noteworthy (Redon et al., 2006; Komura et al., 2006).
In pairwise comparison, the hybridization intensities of one sample is compared with those of all other remaining samples as one large reference, and the diploid states of loci can be more accurately inferred from the multiple comparison results.Clinical implications of CNVs and dis-ease association study.
In spite of recent technological developments of genetic polymorphism-oriented disease association studies, still little is known about the effects of genetic polymorphisms on common complex diseases.
One of the ultimate goals in exploring CNVs is to systematically assess the association between such variants and the disease.
Although it is unlikely that all CNVs in the human genome are associated with diseases, evidence of the association of CNVs and a wide spectrum of human diseases has rapidly accumulated.
Table 1 summarizes the CNVs that have been reported to be associated with diseases.
CNVs can affect disease susceptibility or individual differences in responses to drugs through alteration of gene expression.
Stranger et al.
's and Heidenblad et al's reports coherently showed positive correlations between DNA copy number dosage and gene expression level (Stranger et al., 2007; Heidenblad et al., 2005).
If a CNV region contains transcriptional regulatory elements rather than protein coding genes, it still can affect gene expression levels by changing transcriptional regulation or heterochromatin spread (Reymond et al., 2007).Conclusion.
The genomic fraction that is occupied by CNVs is now estimated to be about 600 Mb, already exceeding that of single base-level variants.
It is likely that the number of CNVs and the genomic fraction that is affected by structural variants will continue to expand, and many of them will be used for more practical purposes, including disease association or population studies.
However, it should be remembered that the current CNV entries are plagued by substantial amounts of false-positive and false-negative results.
Only a small portion of them have been validated by independent methods.
To overcome this, it is necessary to improve scanning platforms, including optimizing experimental conditions and developing more reliable CNV calling algorithms.
In the meantime, it is required for individual researchers to know the characteristics of the available platforms and analytical techniques to use them or to interpret the published results properly.e found peak evidence of linkage (LOD score=1.88) for HDL cholesterol level on chromosome 6 (nearest marker D6S1660) and potential evidences for linkage on chromosomes 1, 12 and 19 with the LOD scores of 1.32, 1.44 and 1.14, respectively.
These results should pave the way for the discovery of the relevant genes by fine mapping and association analysis.IIntroduction.
Cholesterol is a major part of cell membranes.
Cholesterol is carried in the blood by chylomicrons, very low density lipoproteins (VLDL), high density lipoproteins (HDL) and low density lipoproteins (LDL) (Dastani et al.
2006).
HDL cholesterol is reversely associated with cardiovascular disease, and is more tightly controlled by genetic factors than the other lipoproteins such as LDL, VLDL and chylomicrons.
Environmental factors including chronic alcoholism, estrogen replacement therapy, and exercise influence the levels of HDL cholesterol.
Several families with strikingly elevated HDL cholesterol levels have been identified.
HDL cholesterol levels are higher in blacks compared with whites and HDL cholesterol levels of females are higher than those of males (Barcat et al.
2006; Brousseau et al.
2004; Yamashita et al.
2000; Imperatore et al.
2000).
Candidate gene analysis using population-based case-control studies has been used to test the association between SNPs and HDL cholesterol levels.
Among the candidate genes selected mainly from lipid metabolism pathways, ApoA-I gene is the one most intensively studied (Inazu et al.
1994; Kuivenhoven et al.
1997).
By genome-wide linkage analysis, susceptibility genes can be identified although the genes are not candidates based on lipid metabolism.
Genome-wide linkage scans are conducted by use of microsatellite markers to identify genetic determinants affecting the traits (Wang and Paigen 2005).
Using HDL cholesterol levels as either discrete or quantitative trait, several linkage studies on genetic determinants of HDL cholesterol have been reported (Yancey et al.
2003).
Genetic effects on the variations in HDL cholesterol were studied mainly in Caucasians and Africans thus far, and little attention has been focused in this regard on Asian populations.
We found suggestive evidence for linkage for HDL cholesterol on chromosome 6, 1, 12 and 19, in studies conducted as part of GENDISCAN study, a large epidemiological study of Complex traits in geographically, culturally and genetically isolated large Mongolian families l in Dornod, Mongolia report.
Methods.
We analyzed data from 1002 Mongolian individuals from 95 large extended families.
Informed consent was obtained from all subjects prior to participation and the protocol was approved by the Institutional Review Board at Seoul National University.
Potentially confounding variables were assessed for each participant along with overall medical history.
Information on age, gender and anthropometry (height, weight, waist circumference, hip circumference and body fat content) were obtained for each individual.
Height in centimeter (cm) and weight in kilograms (kg) were measured using an automatic measuring instrument (IMI 1000, Immanuel Elec., Korea).
Body mass index (BMI) was calculated in kg/m.
Waist circumference was measured to the nearest centimeter at the level of the umbilicus, and hip circumference was measured at the level of the maximal circumference of the gluteus.
All other variables were collected through interviews performed by trained interviewers.
Information about amount of alcohol and smoking was also obtained from all the participants.
All the subjects were asked to fast for 12 hours before their visit.
Blood samples were collected from an antecubital vein into vacutainer tubes containing EDTA.
Blood samples were centrifuged at 3000rpm for 10 minutes and then stored at 70C.
DNA was isolated from lymphocytes for polymerase chain reaction (PCR) and automated genotyping.
A 10 ml blood sample was collected from each participating individual for genomic DNA extraction.
DNA was extracted from peripheral lymphocytes using the PUREGENE DNA Purification Kit for whole blood (Gentra Systems Inc, USA).
For genotyping, a set of 1000 microsatellite markers deCODE mapping sets (deCODE genetics, USA) was used covering the genome at an average density of 3 centimorgans (cM).
HDL cholesterol was measured by the enzymatic method using Cholestest-N-HDL kit (DAICHI, JAPAN) and HITACHI 7600-210 & HITACHI 7180 instruments.
Extensive quality control procedures ensured the validity and reproducibility of the measurements.
Multiple linear regression analysis was used by PC SAS version 8.2 and PC SPSS version 12 to account for effect of confounding variables.
Pedigree data was managed by PedSys (Southwest Foundation for Biomedical Research, San Antonio, Texas, USA).
Nonpaternity was examined using PEDCHECK (Mcpeek and Sun 2000) and relationships other than paternity were checked using average IBD-based method by PREST.
After correcting pedigree error and Mendelian errors, non-mendelian errors were examined and corrected using SimWalk.
Identity by descent (IBD) matrix between every relationship pairs in family was calculated and IBD matrix for single marker was calculated by SOLAR (Sequential Oligogenic Linkage Analysis Routines software version 2.1.4).
Multipoint IBD matrices were computed on every 1 cM distance using Markov chain Monte Carlo method by LOKI (Heath 1997).
Genetic components of selected phenotypes were estimated in terms of heritability.
Narrow sense heritability, defined as the proportion of total phenotypic variation due to additive genetic effects, was calculated.
Heritability of HDL cholesterol adjusted for age, gender, age- square, product of age and gender, product of age- square and gender, systolic BP, smoking and alcohol was estimated and a variance component linkage analysis was carried out by SOLAR which uses maximum likelihood methods to estimate variance components for the polygenic genetic effect and random individual environmental effects.
Results and Discussion.
The mean age of the 1002 individuals was 31 years and 54.5% of them were female.
Demographic and pedigree characteristics of the study sample are shown in Table 1.
The family size had a mean of 16.
Table 2 included information on 2546 pairs of first degree relatives (1812 parent-offspring pairs and 734 full-sib pairs), 2485 pairs of their second degree relatives (395 half-sibling pairs, 1202 grandparent-grandchild pairs, and 888 avuncular pairs), and 598 first-cousin pairs.
Means of their total cholesterol, HDL cholesterol, LDL cholesterol, and triglyceride were 159.82 mg/dl, 55.19 mg/dl, 90.51 mg/dl, and 63.30 mg/dl, respectively.
Table 3 shows correlation between HDL cholesterol and covariates such as age, gender, systolic blood pressure, alcohol consumption status, and smoking status.
These parameters were used as covariates in the variance component analysis which provided multivariable adjusted heritability estimates for HDL cholesterol of 0.45 (Table 4).
The peak multipoint LOD score was 1.88 on 6p21 (nearest marker D6S1660) and a secondary peak (LOD score of 1.44) was found on 12q23 (nearest marker D12S354).
We identified other potential evidence for linkage in the LOD score of 1.32 on 1q24 (nearest marker D1S412) and a LOD score of 1.14 at 19p13 (nearest marker D19S884) (Fig.
1, 2).
Table 5 presents all LOD scores 1.0 for HDL cholesterol.
We identified potential evidence of linkage on several chromosomes.
In other genome scan, a weak linkage signal for HDL cholesterol was observed for regions that overlapped slightly with the regions identified herein.
Klos et al.
reported the appearance of peak position in the chromosome 12q in European American population (Klos et al.
2001) (Table 6).
We found evidence of link- the population isolates used in GENDISCAN study would not present significant inflation of type I errors from inbreeding effects in its gene discovery analysis.
IIntroduction.
The GENDISCAN (Gene Discovery for Complex traits in Asian population of Northeast area) study was launched in 2002 in order to elucidate genetic causes of complex diseases.
This study attempted to incorporate designs that detect genetic signals with increased efficiency.
These included using genetically homogeneous population, recruiting large families, and considering quantitative phenotypes as well as disease outcome (Peltonen et al., 2001; Merikangas et al., 2003).
Large extended families still remaining in the Northeast Asia, enabled the project to adopt these designs.
Although there is no doubt that gene discovery of common complex diseases is one of the research priorities, the successful results have been very limited (Grant et al., 2006).
The difficulty of replication across studies, mandates the use of internally valid study designs and proper methodologies.
Using population isolates generally confers the advantage of increasing genetic homogeneity.
However population isolates might have inbreeding structures, which deviates the basic assumptions of HWE.
The presence of significant inbreeding necessitates modifications in genetic estimations using the population.
Therefore, we attempted to estimate the status of HWE, and inbreeding coefficients in two ethnic groups of Mongolia using genome-wide short tandem repeat (STR) genetic markers.
Compatibility with basic assumptions of population genetics can support the methodological validity of the overall GENDISCAN study,Methods.
The GENDISCAN study included non-selected families in Mongolia.
The People's Republic of Mongolia (not including the Chinese territory) has 2.6 million people which comprise of more than 20 ethnic groups.
The Orkhontuul are in Selenge Imag (Imag is an administrative district unit in Mongolia corresponding to a state in the United States) and the Dashbalbar area in Dornod Imag were selected.
The Orkhontuul area has a population of 3,760 people, mainly consisting of Khalkha tribe, and maintains semi-urban life style.
The Dashbalbar area is mainly habituated by about 4,000 people of Buryat ethnicity and has more traditional nomadic life style.
Many large extended families, which fit the study purposes of the GENDISCAN study still remain in both areas.Genomic DNA was extracted from peripheral leukocytes.
The Orkhontuul samples (2004, n=1,080) were genotyped using the Applied Biosystems Inc. platform (ABI Prism Linkage Mapping Set version 2.5 medium density, 400 markers) with average 10 cM resolution, and Dashbalbar samples (2006, n=1,020) were genotyped using the deCODE 1,000 STR marker platform with average of 3 cM resolution.
For the Orkhontuul participants markers on the chromosome 14 were analyzed.
For Orkhontuul data, markers with low call-rate (49 markers), and with more than 1% of genotype error rates (16 markers) and markers on X chromosome (18 markers) were excluded.
For Dashbalbar genotype data, the 1,000 STR marker platform provided 1097 markers originally, however we excluded markers on X chromosome (49 markers) and markers with low call-rate and more than 1% of genotype error rates (4 markers).
All participants provided informed consent.HWE and degree of inbreeding were assessed using the founders of each pedigree.
Non-founders were excluded because their genotypes are dependent on those of the founders.
HWE was estimated by comparing the expected and observed genotype frequencies.
Expected genotype frequency was calculated from allele frequency.
Chi-square goodness of fit test was used to determine whether HWE assumption was met.
The Chi-square statistics () of multi-allelic loci is defined as equation as Equation 1, with k (k-1) degree of freedom, where k is the total number of alleles.
(Equation 1)where, nuu and nuv denote homozygotic and heterozygotic genotypes, while pu and pv denote allele frequency of each allele.
Information contents of the genetic markers were estimated as polymorphism information content (PIC), heterozygosity and allelic diversity.
PIC is an index of the amount of information, which modifies the simple heterozygosity index by adjusting for the chance of mating between the same heterozygotic genotypes.
PIC was calculated from Equation 2.
(Equation 2)where p and p denote allele frequency of each allele (Czika, 2005).
Inbreeding was estimated by the deviation from the assumption that each founder shares no Identity by descent (IBD).
Generally genotype frequency of bi-allelic locus having p and q allele frequencies are predicted as p, 2pq, q respectively under HWE.
However, if there are IBD sharing of FI between founders, above prediction can be re-written respectively as Equation 3.
(Equation 3)where, Fdenotes inbreeding coefficient (Gillespie et al., 2004).
In brief, inbreeding is characterized by the excess of homozygote over expected level.
The inbreeding coefficient can be estimated as Equation 4 by solving Equation 3 (Equation 4)where, H denotes observed heterozygotic, and 2pq denotes estimated heterozygotic proportions from allele frequency (Hart et al., 2000).
HWE and estimations of expected and observed heterozygosity frequencies were obtained using SAS/Genetics program.Results.
The demographic characteristics of the subjects geno-typed are shown in Table 1.
There were 280 (99 men and 181 women) and 142 (90 men and 52 women) founders in Orkhontuul and Dashbalbar populations.
Non-founders' genotype.
were excluded, since theirs do not independently contribute to a gene pool.
The information contents in terms of PIC for single marker, range between 0.2 and 0.9, as shown in Fig.
1.
Average PIC was 0.72 and 0.71 for Orkhontuul and Dashbalbar populations, respectively which are relatively high for single marker information contents.
There was no significant difference in PIC across the chromosomes or populations.
The high PIC level enabled accurate estimation of other population genetic parameters.
HWE was satisfied among 88.6 % and 94.2%, respectively, of all markers in Orkhontuul and Dashbalbar populations (p-value 0.05).
If we apply the criteria of p-value 0.01, 90.5% and 95.3% of all markers were in HWE status All the markers including those which were not in HWE, were used for estimating the inbreeding coefficients,.
Inbreeding coefficient was estimated to be 0.0023 and 0.0021 in Orkhontuul and Dashbalbar populations.
Discussion.
Population isolates are generally considered to be one of the most ideal populations for genetic study (Pajukanta et al., 2003; rcos-Burgos et al., 2002; Escamilla et al., 2001).
However, possible inbreeding can cause deviation from general assumptions on which most analyses depend.
Presence of inbreeding can be problematic, because, if exits, l the genetic relationships between unrelated as well as related persons could be underestimated.
This underestimation of IBD can result in inflation of type I errors for linkage analysis (Hossjer et al., 2006 Nomura et al., 2005), linkage disequilibrium estimations and haplotype reconstructions (Zhang et al., 2004).
The inbreeding coefficient found in this study (about 0.2% in each population), does not necessitate any adjustment for genetic analyses such as IBD calculation, classic or non-parametric linkage analysis, and variance component-based linkage analysis.
By estimating the last common ancestor, 0.2% of inbreeding coefficient corresponds to 10 or 11 generations (Jensen- Seaman et al., 2001; Santos-Lopes et al., 2007).
In this study, both ABI and deCODE STR markers were genotyped with standardized procedure and any markers with more than 1% of genotype errors were discarded.
The genotype errors were confirmed within the pedigree structure.
Any Mendelian inconsistency was deleted and markers with possible double-recombination were also deleted.
Generally, genotyping in family-based study is more accurate than in studies using individuals only.
Thus, It is not likely that any genotype error could have been biased our findings.
In conclusion, we have estimated inbreeding coefficients in two population isolates in Mongolia.,.
We found that they fall in negligible range, allowing related genetic studies to be performed without any modification or adjustment for possible inbreeding effects.
This finding validates the ability of The GENDISCAN study to add to the growing body of evidence which associates specific genetic variations with complex disorders.% (6.4 of 34.5 Mb) of chromosome 22 with 757 tagSNPs and 815 haplotypes (frequency 5.0%).
Of 3430 common SNPs genotyped in all five populations, 514 were monomorphic in Koreans.
The CHB + JPT samples have more than a 72% overlap with the monomorphic SNPs in Koreans, while the CEU + YRI samples have less than a 38% overlap.
The patterns of hot spots and LD blocks were dispersed throughout chromosome 22, with some common blocks among populations, highly concordant between the three Asian samples.
Analysis of the distribution of chimpanzee-derived allele frequency (DAF), a measure of genetic differentiation, Fst levels, and allele frequency difference (AFD) among Koreans and the HapMap samples showed a strong correlation between the Asians, while the CEU and YRI samples showed a very weak correlation with Korean samples.
Relative distance as a quantitative measurement based upon DAF, Fst, and AFD indicated that all three Asian samples are very proximate, while CEU and YRI are significantly remote from the Asian samples.
Comparative genome-wide LD studies provide useful information on the association studies of complex diseases.
IIntroduction.
Vast amounts of information on single nucleotide polymorphisms (SNPs) and progress in high-throughput genotyping technology have generated a great deal of interest in establishing genome-wide linkage disequilibrium (LD) maps for genetic studies of complex traits (Chakravarti 2001; The International HapMap Consortium 2003; Myers and Bottolo 2005).
LD is known to occur in a block-like structure across the genome, with conserved haplotype blocks of tens to hundreds of kilobases punctuated by "hot spots" of recombination (Daly et al.
2001).
Since the concept of whole genome association studies using SNPs was introduced (Risch and Merikangas 1996), an optimal number of SNPs required for association studies has been center of extensive debate (Kruglyak 1999).
Initial studies have focused on average LD levels and the variability in processes that generate LD (Cardon and Abecasis 2003).
Although a single chromosome could carry many haplotypes in LD blocks, recent studies suggest that haplotypic variation may be much lower than previously imagined (Jeffreys et al.
2001; Patil et al.
2001; Gabriel et al.
2002).
Patil's group identified haplotype blocks on chromosome 21 for which over 80% of chromosomes were represented by a few common haplotypes (Patil et al.
2001).
In the analysis of human chromosome 22 with a marker density of one SNP per 15 kb, Dawson's group reported a highly variable pattern of LD along the chromosome, in which extensive regions of complete LD of up to 804 kb in length were interspersed with regions of no detectable LD (Dawson et al.
2002).
Although differences of LD patterns between populations have been reported (Abecasis et al.
2002; Reich et al.
2001, Zavattari et al.
2002), little information is available on the haplotype structure in different populations other than the recent study by S.B.
Gabriel, et al.
(Gabriel et al.
2002).
On the other hand, haplotype analysis has been widely employed in linkage studies for narrowing down the location of disease susceptibility genes (Zhang et al.
2004; Park 2007).
The International HapMap Project was launched to develop a haplotype map of the human genome, the HapMap, which will describe the common patterns of human DNA sequence variation among four population samples: 30 trios from Yoruba in Ibadan, Nigeria (YRI), 45 unrelated Japanese in Tokyo, Japan (JPT), 45 unrelated Han Chinese in Beijing, China (CHB), and 30 trios in a Utah, US population with Northern and Western European ancestry (CEU) from the CEPH collection (The International HapMap Consortium 2003; 2004; 2007).
As the International HapMap Project releases a validated SNP map of 1 marker per kb for the HapMap samples, the general applicability of the HapMap data needs to be confirmed in samples from related populations.
Recent comparative studies of LD patterns have shown a high degree of concordance among various populations (Gabriel et al.
2002; Shifman et al.
2003; Stenzel et al.
2004; Mueller et al.
2005).
As the HapMap samples include Japanese and Chinese, it was our interest to test whether significant differences in LD exist between Koreans and the two other Asian samples.
In this paper, we measured the LD pattern along chromosome 22 in Korean samples and compared the Korean data with those of the four HapMap samples.
We were interested in exploring how the HapMap data could be used to estimate the genomic structure of Koreans.
We expect that this study will contribute to the development of proper strategies for association studies of common complex diseases in Koreans using the HapMap data.
Methods.
A total of 111,448 reference SNPs from chromosome 22 in the dbSNP (http://www.ncbi.nlm.nih.gov/SNP, build 116) were collected.
To maximize cost effectiveness of genotyping, SNPs were selected based on the following criteria: 1) markers with even spacing, 2) verified SNPs, 3) coding SNPs.
The SNPs were scored for the selection of the study using the following strategies.
First, it was most important in mapping chromosomal LD blocks to have relatively equal spaces between SNP markers.
Second, verified SNP markers (validation status was scored as 0 to 4 in the dbSNP) that had higher scores were chosen to prevent or reduce genotyping failure.
Also, repeated sequence regions were excluded by repeat masking with Primer3 software (Rozen and Skaletsky 2000).
Third, to be useful for a further study, protein coding SNPs had higher scores.
A total of 12,674 genotyping experiments were conducted by four Genotyping Centers, and a final set of 4681 markers passed the stringent quality control procedure (The International HapMap Consortium 2003).
Genomic DNA from 90 unrelated Korean individuals without family histories of major diseases was obtained from the Genomic Research Center in the Korean National Institute of Health (KNIH).
The KNIH samples were collected as part of an epidemiological project and represent urban and rural regions in the south of Seoul.
The sex ratio was 0.5 and the mean age was 50.
Informed consent from all participating subjects was obtained through KNIH, and research approval came from the relevant ethical committees.
DNA was isolated from peripheral blood leukocytes according to standard procedures with proteinase K-RNase digestion, followed by phenol-chloroform extraction.For each SNP, we chose a set of three primers: two PCR primers to amplify a product of 100-200 bps under standard conditions and an optimized extension primer to be complementary to the sequence immediately to a SNP site.
For genotyping, we employed three platforms-6063 SNP genotypings were done using the Orchid Bioscience SNP-IT assay (Princeton, NJ), 984 SNP genotypings using the PerkinElmer Life Sciences FP-TDI assay (Boston, MA), and 5627 SNP genotypings using the Sequenom MassARRAY (San Diego, CA).
A genotype frequency for each SNP was checked for consistency between the observed values and those expected from the Hardy-Weinberg equilibrium test in each assay.
Haploview version 3.2 (Barrett et al.
2004), based on the expectation-maximization (EM) method (Excoffier and Slatkin 1995), was used to infer haplotype phase and population frequency and to estimate the Lewontin's coefficients D' (Lewontin 1998), LOD, and correlation coefficient r (Hill and Robertson 1968).
PHASE v2.1 was used to estimate the recombination parameters (Li and Stephens 2003; Crawford et al.
2004) and assess the statistical significance of haplotype profile differences and individual haplotype fre-2006).
Because it has been suggested that the functional significance of IL-1B-3737 might depend on a broader haplotype, we used the three SNPs for haplotype analysis.
Haplotypes were reconstructed by PHASE version 2.1, using previously produced genotype data (Lee et al., 2004).
Of the possible eight haplotypes, three common ones accounted for 98% of the estimated haplotypes in the Korean population.
Table 1 shows the haplotype frequency estimation in each population.
The potentially more inflammatory IL-1B-511T/-31C haplotype represented 53.5% of the Korean haplotypes, compared with 33.7% of the Caucasian haplotypes.
So far, in many previous association studies, the individual SNP approach, most frequently using IL-1B-511 and IL-1B-31, has been adopted.
To our knowledge, we reported first that the IL-1B-1464 polymorphism has allele-specific differences in nuclear protein binding and is associated with a clinical disease (Lee et al., 2004).
The biological implication of this polymorphism was supported by in vivo studies by Chen et al.
that showed that the IL-1B-1464 polymorphism has substantial allele-specific effects when both IL-1B-511 and IL-1B-31 were alleles T and C, respectively (Chen et al., 2006).
The more informative haplotype 1 (GTC), containing the IL-1B-1464 polymorphism, which shows the highest transcriptional activity, represents 9.3% and 6.0% of Korean and Caucasian haplotypes, respectively, whereas haplotype 3 (GCT), with the lowest activity, had a higher frequency in Caucasians (64.8%) when compared with Koreans (44.2%) (Table 1).
The difference in IL-1B promoter haplotype frequency between the Korean and Caucasian populations was statistically significant (=20.6, p=0.000), and the allele frequencies of the IL-1B-1464 polymorphism (rs#1143623) were also significantly different between the two populations (IL-1B-1464 G allele frequencies for Korean and Hapmap European=0.548 and 0.672, respectively) (=6.38, p=0.01).
It has been suggested that genes that are involved in immune function may be under selective pressure in direct interaction with the environment (Sawyer et al., 2004; Kim et al., 2005).
The genes that influence a phenotypic variation between populations are expected to show high Fst values.
Compared with the Fst value for the Caucasian-vs-Asian comparison, the Fst values for the African-vs-Asian or -Caucasian comparisons were remarkably high (Fig.
1).
Previously, we reported that the IL-1B-1464 polymorphism contributes to the development of intestinal-type gastric cancer among Koreans (Lee et al., 2004).
As a curious finding in our report, the editor pointed out that carriers of IL-1B-1464 G tend to have a decreased risk of diffuse-type of gastric cancer, which is the opposite of intestinal-type gastric cancer, although both intestinal and diffuse types of gastric cancer are related to Helicobacter pylori-induced gastritis (Furuta et al., 2004).
Our results showed that most IL-1B-1464 C alleles are linked to the IL-1B-511T/-31C haplotype (Table 1).
Considering the level of promoter activity of haplotype 2 (CTC), we cannot exclude the possible association between this haplotype and the risk of diffuse-type gastric cancer, especially depending on interactions with other regulatory factors (Lee et al., 2007).
Association studies that use individual SNPs appear to be insufficient, and the understanding of functional haplotype structure of populations could provide potential explanations for IL-1B-related controversies and ethnic-specific associations.
Therefore, we believe that these Korean haplotype data will be useful for future association studies between IL-1B SNPs and disease risk.nted domains, including the human imprinted gene cluster that contains IGF2, H19, KCNQ1, ASCL2, and CDKN1C (Rapkins et al., 2006).
If, as has been suggested, imprinted genes are intimately connected with the acquisition of parental resources, we would not anticipate the existence of such genes in chicken, which leave their offspring to their own heritance after conception.
Phylogenetic analyses expose that the relationship between human and mouse is closer than that between human, mouse, and chicken.
Similarly, the relationship between zebrafish and chicken is quite distant (Shah et al., 2004).
Nonetheless, we assumed that chicken have imprinted genes due to the existence of common ancestral genomic regions that have evolved on a similar basis in each of the aforementioned species.
The purpose of this study was to identify candidate imprinted genes in chicken based on an analysis of orthologous genes in human, mouse, zebrafish, and chicken using the HomoloGene database.ols for the clinical oncology to determine the prognosis of patients (Lossos et al., 2004; Pomeroy et al., 2002), the molecular diagnosis (Golub et al., 1999) as well as the responsiveness to therapeutics (Snyder and Morgan, 2004).
There have been many reports on the molecular pattern analysis using microarray to understand the chemo- and radio-resistance in cervical cancer (Achary et al., 2000; Tewari et al., 2005; Wong et al., 2006), rectal cancer (Kim et al., 2007) and esophageal cancer (Fukuda et al., 2004).
Most of the studies are to identify differentially expressed genes in patients with different clinical outcomes, which can be applied to the evaluation of prognosis more accurately.
Although the conventional parameters like tumor stage and grade can be used to decide optimal cancer therapy, molecular markers would provide valuable information to make clinical decisions (Klopp and Eifel, 2006).
Genome-wide analysis on gene expression can predict the clinical consequences more accurately.
In addition, the information from gene expression profiling can facilitate the development of biological target for therapeutics by identifying pathways and determining steps contributing to the phenotype.
In this study, we examined the expression profiles of two lung cancer cell lines, which showed differential re- 1995).
In the inactive form, the pseudosubstrate domain is bound to the catalytic domain of PKC (Orr et al, 1994).
Upon stimulation, PKC translocates to the plasma membrane where the C1 and C2 domains interact with DAG and phosphatidylserine, respectively.
This interaction causes the pseudosubstrate domain to dissociate from the catalytic domain, which results in activation of PKC.
Inactive PKC is not freely distributed throughout the cytoplasm but appears to be localized to specific sites within the cell.
Association of PKC with scaffolding proteins such as AKAP79 (A Kinase-Anchoring Protein 79) (Klauck et al, 1996) and Gravin (Nauert et al, 1996) facilitates localization.
Streptomycetes are ubiquitous soil bacteria, and they play a key role in the global carbon cycle by degrading the insoluble remains of other organisms.
More clues to the development of the PKC super family come from the study of the bacterium Streptomyces coelicolor.
S. coelicolor has a large collection of enzymes and can metabolize many diverse nutrients.
This extremely simple organism contains approximately 8,667,507bp, yet has complex life cycle exhibiting mycelial growth and spore formation (Bentley, 2002) and notable for production of pharmaceutically useful anti-tumor compounds.
Of the predicted genes, an unprecedented proportion carries out regulatory functions in the cell (Winstead, 2002).
More than twelve percent of the genome is involved in facilitating biological processes, such as the bacterium's s reduce implementation time and increase the likelihood of eliminating bugs and localizing code modifications when a change in implementation is required.
In the initial version of the interface, all of the classes got tangled with each other and corrupted the concept of object-oriented programming.
However, they have been completely redesigned, as shown in Table 1.
This table summarizes the recent modifications of our system, and the interfaces for each class are documented, similar to Fig.
2.
The refactored version is now composed of 3851 lines, compared with the initial version, which was composed of 2765 lines of code.
By importing the five packages, an exemplary software system called J3dPSV 1.0, shown in Fig.
3, has been developed for viewing 3D structures of proteins from the Protein Data Bank for demonstrational purposes.
J3dPSV supports visualization of proteins for educational purposes by simulating simple molecular graphics.
In addition, J3dPSV interactively displays a molecule on the screen in a variety of color schemes, molecular representations, and animation features.
The molecular model can be changed by selecting the list (cartoon tubes, backbone, protein, cylinder, or line) in useful suggestions for genotype information.
Compared to the current genotyping tools, GTVseq has several unique and useful features in the following aspects: * GTVseq uses two different scoring schemes and the results are reported separately.
One of the scoring schemes is similar to that of NCBI, while the other is particularly useful for viral sequences with new or complicated genotypes (vide infra).
* GTVseq offers an easy and interactive web-based user interface, with intuitive reports for genotyping results.
* GTVseq can be used for genotyping many important viruses such as HIV-1, HIV-2, HBV, HCV, HTLV-1, HTLV-2, poliovirus, enterovirus, flavivirus, Hantavirus, and rotavirus, thus permitting the most comprehensive genotyping of viral genomes to date.Methods.
For genotyping of viral genome sequences, we need to establish 'reference sequences' for each genotype.
We have downloaded the reference sequence database collections from NCBI (http://www.ncbi.nlm.nih.gov/projects/ genotyping), for HIV-1, HIV-2, HBV, HCV, HTLV-1, HTLV-2, and poliovirus.
For HIV-1 reference sequences, GTVseq also provides several different collections of reference databases such as HIV-1 (2004) & CRF, HIV-1 (2005), HIV-1 (2005) & CRF.
For enterovirus, flavivirus, Hantavirus, and rotavirus, the reference sequences were combination of databases and interactive web pages for manipulating and displaying annotations on genomes.
In other words, GBrowse is a web-based application tool that is developed for navigating and visualizing the genomic features and annotations interactively for users.
Through it, users can view a certain region of the desired genomes and search for genetic biomarkers.
They may conduct a full-text search for most features of the genomes.
They also can download SNP assay, genotype, and allele frequency information and generate customized sets of tag-SNPs for their association studies (Thorisson et al., 2005).
GBrowse utilizes a web-based display that can be used to show arbitrary features of a nucleotide or protein sequence and can accommodate genome-scale sequences that are megabases in length.
The GBrowse system consists of various kinds of software modules and systems, such as web servers, database systems, and Perl libraries.
At present, many biological websites that provide genomic variants or portal services have been developed using GBrowse, including the following: the UCSC Genome Browser (Kuhn et al., 2007), the International HapMap Project (Thorisson et al., 2005), PlasmoDB (The is database is free for non-commercial purposes.
The KRDD is visualized using a web-based graphical view, and anonymous users can query and browse the data using the search function.
The KRDD homepage is shown in Fig.
1, and the stored data are visualized using a web-based graphical view.
It has four major menus of web pages: (i) a Blast Search of a mutant line; Blast from rice Ds-tagging mutant lines; (ii) a primer design tool to identify genotypes of Ds insertion lines; (iii) a Phenotype menu for Ds lines, searching by gene name and phenotype characteristics among specific Ds lines; and (iv) a Management menu for Ds lines.
The Blast Search is searchable by selecting specific databases, consisting of DS Sequence, Indica Core, Japonica Core, Indica EST, Japonica EST, Indica Genome, Japonica Genome, Indica GSS, and Japonica GSS in Oryza sativa.
The KRDD uses several reference databases to facilitate a comprehensive analysis of the genome sequence.
These include the Entrez nucleotide database of the National Center for Biotechnology entative biological pathway database, now provides the KEGG Metabolism Atlas (Okuda et al., 2008) by manually combining about 120 existing metabolic pathway maps, as shown in Fig.
1.
However, the static approach to representing metabolic pathway diagrams offers no flexibility.
On the other hand, our initial attempts to visualize all information automatically in a single atlas map resulted in a confusing diagram that was difficult to interpret, as shown in Fig.
2.
It should be noted that Fig.
2 differs in many aspects from Fig.
1 or conventional drawings in biochemistry textbooks.
For this reason, we designed a new metabolic atlas viewing tool called J2dpathway, which has node-abstracting features.
When J2dpathway is initially executed, a window frame appears, as shown in Fig.
3.
The screen consists of views and editors.
The tool-bar menu at the top lists various tool icons, including zoom-in, zoom-out, cliques, highly connected nodes, and obtaining cycles.
The Map Repository View on the left side lists a preinstalled data source that has many example pathways to explore, which are arranged in a tree view of the components of of the HIF1ODD domain with multiple partner proteins, such as ARD1 (Jeong et al., 2002), prolyl hydroxylase (PHD) (Schofield and Ratcliffe, 2004), and p53 (Fels and Koumenis, 2005; Sanchez-Puig et al., 2005), also have been reported.
However, the molecular basis for the multiple binding specificity of the HIF1ODD domain has not been understood yet.
The detailed characterization of the correlation between the binding sequence motifs in the ODD domain and its binding to multiple target proteins is necessary for understanding the versatile function of the HIF1ODD domain.
Two functionally independent sequence motifs, the N-terminal and C-terminal ODD (NODD and CODD), in the HIF1ODD domain were shown to bind to the DNA-binding domain (DBD) of p53 (Hansson et al., 2002).
The crystal structure of the CODD motif in complex with pVHL was determined to ncluding 1p36, 5q31, and 21q22, by whole-genome linkage analysis (genome-wide association studies), and many polymorphisms also have been identified at these loci (Suzuki et al., 2003; Tokuhiro et al., 2003).
Human leucine-rich alpha-2-glycoprotein 1 (LRG1) was first identified as a trace protein in human serum (Haupt & Baudner, 1977).
The LRG1 gene is located on chromosome 19p13.3, and the primary sequence of LRG includes repeated leucine residues and also has putative membrane-binding domains.
Serum LRG1 is the first extracellular ligand for cytochrome c (Cyt c).
Cyt c is a ubiquitous, heme-containing protein that normally resides in the space between the inner and outer mitochondrial membranes (Newmeyer et al., 2003).
Extracellular Cyt c may play a role in inflammation, as it has been reported to cause arthritis when it is injected into mice.
Its levels in RA patients' sera are significantly lower than those of healthy controls (Pullerits et al., 2005).
At least eight repeating 24-amino acid segments that have a notable consensus sequence were identified in a large family of LRG proteins.
The function of LRG has not been elucidated, although the functions of many of the other members of the LRR (leucine-rich repeat)-containing superfamily are known (Kobe & Deisenhofer, 1994; Buchanan & Gay, 1996).
Plasma LRG expression levels are lower in liver cancer patients who are treated with radiofrequency ablation (Kawakami eutic agent for cancer is the in vivo specificity of cancer cell regression.
For such a specificity, target RNA-independent and nonspecific transgene induction by the group I intron should be avoided.
In other words, mis-spliced products should not be generated by the group I intron.
In this study, in order to evaluate the therapeutic feasibility of the hTERT-specific group I intron, we assessed the target RNA specificity of the trans-splicing phenomenon by the intron in mice that have been intraperitoneally xenografted with human cancer cells.Construction of Adenoviral Vector for hTERT-specific Group I Intron .
The expression vector that encodes for the hTERT-specific trans-splicing group I intron was constructed as previously described (Kwon et al., 2005; Song et al., 2006).
In brief, the Rib21AS group I intron, which recognizes uridine at position 21 (U21) of hTERT RNA, was generated to harbor an extended internal guide sequence, which includes an internal guide sequence (IGS, 5'-GGCAGG-3'), an extension of the P1 helix, an additional 6-nt-long P10 helix, and a 325-nt-long antisense sequence that is complementary to the downstream region (30 to 354 residues) of the targeted U21 of hTERT RNA.
In addition, cDNA, as a 3' exon that encodes for the lacZ gene, was inserted downstream of the modified group I intron expression construct (Fig.
1A).
using PHRAP (http:// www.phrap.org/), it does not ensure correct assembly because the quality scores that are generated from 454 data are not compatible with those from Sanger reads.
Further, PHRAP has problems with handling massive reads (usually hundreds of thousands from an SFF file).
A recent report has demonstrated that GS assembler programs (gsAssembler for de novo assembly and gsMapping for reference-guided assembly; http://www.
454.com/enabling-technology/the-software.asp) that are supplied by Roche Applied Science are ideal for correct assembly of 454 data that are short and inherently error-rich (Chaisson and Pevzner, 2008).
Recent versions (1.1.02.15 and later) of GS assembler programs support mixed assembly with Sanger-type reads, but their performance is not well known at present.
Moreover, because pre-existing assembly software such as PHRAP and CelAsm (Huson et al., 2001) do not directly support data that are produced by 454 machines, 454-derived contigs (GS contigs) should be used as if they were individual reads or be shredded to generate many overlapping 'pseudoreads' (Goldberg et al., 2006).
Pseudoreads, made from GS contigs to emulate the read size of standard Sanger data (ca.
600 bp), are virtual reads whose stepping between consecutive dertaken as a collaboration between Korean funding agencies (Ministry of Education, Science and Technology and Korean National Institute of Health), experimental academia (Ulsan Medical Institute, SungKyunKwan Medical Institute, and Korea Advanced Institute of Science and Technology), and corporations (DNA Link, SNP-Genetics, and Samsung Advanced Institute of Technology) (Yoo et al., 2006; Lee et al., 2008).
Resulting from the project, a Korean SNP and haplotype database system was developed to help those researchers who study high-frequency, complex Korean diseases and changes in ethnic global migratory variants.
In the project, we tried to accomplish a number of goals.
First, the system should be able to provide essential information that is needed for gene discovery of complex Korean diseases.
Second, the system should contain basic and advanced tools that may apply to applications such as diagnostics, treatment, and prevention of diseases.
Third, the database system should provide Korean-specific SNPs and haplotype information that are common in the Korean population.
We have developed a series of software programs for association studies as well as for the comparison and analysis of Korean HapMap data with four other populations (Yorubans in Ibadan, Nigeria; Centre d'Etude du are involved in lipogenesis, such as SREBF1, suggesting that PTP1B may play a role in the enlargement of adipocyte energy storage (Rondinone et al., 2002).
The human PTPN1 gene maps to chromosome 20q13.13, a syntenic region of the distal arm of mouse chromosome 2 that harbors quantitative trait loci for body fat and body weight (Lembertas et al., 1997).
The PTPB1 gene consists of 10 exons, spanning 74 kb, and the first intron is longer than 50 kb.
In humans, several linkage signals with type 2 diabetes mellitus (T2DM) (Bowden et al., 1997), BMI (Hunt et al., 2001), fat mass, and energetic intake (Collaku et al., 2004; Dong et al., 2003; Lembertas et al., 1997) were reported at this locus in different populations, further supporting the candidacy of PTPN1 involvement in T2DM and obesity.
In Poland, a family-based linkage study of T2DM showed the highest logarithm of the odds score (Ji et al., 1997; Klupa et al., 2000).
This locus also showed evidence of linkage with early onset T2DM (onset=45 years) in a subset of 55 French families (Zouali et al., 1997).
prostaglandin and is associated with biologic events such as injury, inflammation, and proliferation (Hla and Neilson, 1992; Tazawa et al., 1994).
PTGS2-mediated prostanoids play an important role in maintaining blood pressure (Anderson et al., 1976; Daniels et al., 1967).
Specially the cortical PTGS2- derived prostaglandin I2 participates in the pathogenesis of renal vascular hypertension through stimulating renal rennin synthesis and release (Hao and Breyer, 2008).
Clinical studies as well as animal studies also demonstrate important roles for PTGS2 in maintaining cardiovascular homeostasis (Zewde and Mattson, 2004; Zhang et al., 2006).
PTGS2 is upregulated in animal models of cardiac failure (Abassi et al., 2001; Adderley and Fitzgerald, 1999), and its expression has been detected in heart failure in humans (Wong et al., 1998).
PTGS2 gene is located on chromosome 1q25.2-q25.3 (Hla and Neilson, 1992) and its cDNA encodes a 604 amino acid protein.
Recently a large-scale association study in Japanese population revealed the association of PTGS2 poly-c blood pressures and heart rate (Eric Colman, 2005).
Phendimetrazine has also been widely prescribed as an anorectic for the treatment of obesity, and has been reported to have properties similar to methamphetamine, which is known to suppress appetite by activating catecholaminergic neurotransmission (Seiden et al., 1993; Chen et al., 2001).
Methamphetamine is known to primarily block dopamine transporter, which inhibits dopamine reuptake, indicating that dopamine up-regulation has an anorectic effect (Mackler et al., 1993).
Because phendimetrazine and methamphetamine stimulate the central nervous system to produce euphoria, probably via the activation of dopaminergic systems in the brain (Nailles et al., 2003), these drugs are restricted to short-term use (a few weeks) and prominently labeled to warn against the risk of addiction.
However, although many anorectics are available, evidence is still lacking concerning their efficacies, safeties, and molecular mechanisms.
Recently, cDNA microarray studies on gene expression profile changes by amphetamine have been reported (Noailles et al., 2003; Yamamoto et al., 2005), but no such report has been issued on other anorectics.
In this study, we employed gned for the identification and visual representation of CNAs using genome-wide array-CGH profiles.
CNAs can be directly identified from log2 ratio profiles that can be obtained from array-CGH datasets with minimal modifications.
Data smoothing option is also provided to cope with the noise level of data for reliable detection of CNAs.
The identification of CNAs is based on SW- ARRAY algorithm that ensures fast and robust detection of chromosomal alterations.
The identified CNAs are exported into Excel-compatible outputs or graphically illustrated with graphic-user interface.
Relatively easy operability as well as the fast processing of overall procedures is the major advantage of our software over the conventional ones.
CGHscape software package is freely available and provides the comprehensive environments for investigation of tumor genome and genomic variants.Major Functionalities of CGHscape.
(1) CGHscape was designed as a standalone program compatible in Microsoft Windows environments.
Compiled codes of CGHscape can be easily installed.
The interpreter- or web-based methods have the advantage al., 2006).
Genealogical relationships among haplotypes in a chromosome 2 8.4 kb region without obligate recombination events were demonstrated using the CEU samples only (The International HapMap Consortium, 2005).
If other population samples such as YRI, CHB, and JPT had been included, the haplotype blocks would have been fragmented due to a number of historical recombination events and phylogenetic studies with such a small block would have not been informative.
In this study, instead of conventional tree-based phylogeny, principal coordinate analysis (PCoA) (Higgins, 1992) was employed using the haplotype data on a region encompassing multiple blocks.
As PCoA, albeit distance-based, is useful to grasp the major trend among the sequences, it would be worth to try how PCoA performs with such a dataset.
As an illustrative purpose, a region of 200 kb in chromosome Xq28, which is about 1 Mb away from the pseudoautosomal region (PAR2) at the tip of X chromosome long arm, was chosen and the haplotype structures of three ethnic groups that showed apparent recombination events were compared.
This region of the human genome harbors several important disease genes such as glucose-6-phosphate dehydrogenase (G6PD), cancer/testis antigens (CTAG1B, CTAG2), and Gab3 protein (GAB3).
oaches should help identify biomarkers to classify specific diseases based on high-throughput data.
However, when a patient’s sample is evaluated to determine his/her disease status using more than one experimental condition relative to a determined biomarker set, correct prediction becomes impossible.
Furthermore, methods to predict the disease status of a patient using biomarkers that initially are identified under different conditions than those that are used for the patient analysis have not been developed.
This study suggests a method that can accurately predict the disease status of a patient using a predetermined biomarker that is developed on a different platform.
Specifically, we performed a two-step discretization of gene expression values by their rank, which were processed in both the biomarker selection and prediction stages.Methods.
To evaluate our proposed method, we used two different datasets: the NCI dataset (Lee, et al., 2003) and the colon cancer dataset (Kim, et al., 2007; Notterman, et al., 2001).
Both of these datasets include gene expression information that was determined experimentally using two different microarray platforms (oligonucleotide-based and cDNA-based).
There are a large number a loss of function of the oxidase (Tosha et al., 2004).
Recently, the I47A/I54V protease mutant in complex with Lopinavir showed that mutation affects the strain of the bound inhibitor in the protease-binding cleft (Grantz Saskova et al., 2008).
In previous studies, the mutation of specific sites has been shown to have an effect on the function and structure of proteins that cause disease.
It is well known that there is a correlation between mutated proteins and disease.
Also, there are bioinformatic tools to predict the correlation between mutation and disease, such as SIFT (Steven Henikoff et al., 2003) and PolyPhen (Vasily Ramensky et al., 2002).
However, these tools are based only on sequence homology.
In this study, we conducted a large-scale structural and sequence mutational analysis of amino acids that could have a direct effect on protein function.
Because we collected the largest number of 3D structural changes in proteins, such as pockets, we named the dataset the structural mutatome.
The number of such structural mutations will increase continuously, and mapping the mutations to function and to disease will play a critical role in understanding the precise disease mechanisms that are caused by 3D mutations.
We classified mutated proteins by their structural properties (distance of pocket residue and mutation, pocket size, surface size, and stability) and physico-chemical properties (weight, instability, isoelectric point, and GRAVY cations that are related to the comparison and translation of various XML languages and parsers (Funahashi et al., 2004; Strmbck et al., 2005; Choi et al., 2008).
However, because they were mostly aimed at drawing only relatively small-scale drawings that unify?only several pathways, systematic analyses of shared and duplicated compounds between pathway maps were not necessary.
Thus, to the best of our knowledge, KGML analyses tools rarely have been addressed in the literature to draw a large-scale pathway, such as the KEGG Atlas from a graph-theoretical perspective.
As a preliminary step in providing automatic graph layout techniques to the genome-scale flow of metabolism, analyzing KEGG XML files is crucial for software developers.
Thus, in this paper, we provide shared and duplicate compound information, using our XML analyses tool, to provide valuable information for automatic layout research in the area of systems biology.
These kinds of analyses that are based on graph-theoretical perspectives can be extremely useful when drawing a global pathway map in which edge crossing arises as a crucial issue.
ulting in a vast amount of genetic and pathway information with regard to the etiology of cerebrovascular disease.
These genes were annotated to access information on transcription, translation, structural function, and relatedness to the disease.
In addition to in silico data mining, 320 250K Affymetrix SNP chips (GeneChip Human Mapping 250K Nsp Array, Affymetrix, Inc., CA) were utilized for a case/control association study to generate experimentally associated markers of cerebrovascular disease.
The associated genes from the SNP chips and the genes that were retrieved from in silico data mining systems were compared and analyzed.
A protein-protein network diagram that showed the integrated markers and their relationships was constructed in order to analyze the network characteristics and produce hub genes.
It was found that the PPI network that was associated with cerebrovascular disease follows a power-law degree distribution, as other biological networks do (Peri et al., 2003).
The PathwayStudio 5.0 program (Ariadne, Inc., MD, USA) was utilized to process the natural text mining of PubMed abstracts; the use of PathwayStudio resulted in a gene-disease association network.
The etiology of the disease and its related genes, which were extracted from in silico data mining and network analy-Transitional DTD standard and does not use technologies that are dependent on specific web browsers.
This is one way to make a web alignment tool more compatible with many web browsers.
We have developed a user-friendly a web based alignment tool based on ClustalW-MPI program.
It is standard and easy to maintain.
This web tool will help researchers to carry out multiple sequence alignment with a large number of input accompanies by a viewer and an editing function.
It also enables users to download the results and do basic analyses such as building trees and sequence clustering.Features and Results.
In order to use alignment tools, most advanced users use UNIX or Linux commands and options directly in a console window.
It is very inconvenient to use.
It also can cause frequent mistakes.
A web alignment tool can be executed through a GUI environment on the web page by selecting commands and options.
Our web alignment tool has the following features; input, downloadable output, and visualization.
Users input multiple sequences in the web alignment ty.
The speed of sequencing is advancing many folds per year, much faster than the cycle of semiconductor chips in computer industries.
Also, genome sequencing technology is becoming an everyday technology at the level as computer CPUs are universally used.
In five years time, experts predict that everyone in developed nations will be able to have his or her own genome information.
Due to its far reaching consequences in medicine, health, biology, nanotechnology, and information technology, DNA sequencing will become the most important industrial technology ever developed during the next decades.Personal Genomics.
In 2009, genome sequencing technologies will achieve one person's whole genome per day in terms of DNA fragments sequenced.
Personal genomics is a new term that utilizes such fast sequencers.
In 2008, the cost for one personal genome is less than $350,000 USD.
If the cost goes down below $1,000 USD, the impact of personal genomics is predicted to be the largest ever in biology in common people's lives.
Reflecting this technological advancement to society is the PGP (Personal Genome Project), a project to sequence as many people as possible with lowest possible cost (Church, 2005).
At ited human diseases.
In addition, many computational programs have been created to predict the functional effects of unknown CVs (Ng et al., 2006; Care et al., 2007).
Database searches and bioinformatic predictions can be useful in prioritizing novel CVs for further analysis.
In this review, we summarize the databases that are most helpful in interpreting the functional effects of CVs.
We perform an extensive survey of existing in silico prediction methods and compare their performance.
Finally, we introduce a combination method as a promising approach to improve prediction performance.Polymorphism and Mutation Databases.
Several databases that are helpful in assessing the functional effects of CVs or their relevance to disease phenotype are listed in Table 1.
Each of two broad-category mutation databases, general mutation databases (GMDBs) and locus-specific mutation databases (LSDBs), has unique strengths and weaknesses (Porter et al., 2000).
Because polymorphism and mutation databases have been developed for different uses, they complement each other.d to the successful identification of specific positively selected genes, including human olfactory genes and human leukocyte antigen (HLA) loci (Salamon et al., 1999; Gilad et al., 2000).
Therefore, the NS/S ratio test is a recognized tool for the effective detection of types of natural selection in protein-coding genes.
Under conditions of no selection, we would expect a NS/S ratio of 1.
In case of negative selection, NS/S is 1, and with positive selection, NS/S would be 1 (Biswas & Akey, 2006).
Furthermore, the availability of large SNP datasets allowed us to determine where natural selection (either negative or positive) has effected variations in humans (Nielsen et al., 2007).
In this study, we investigated natural selection on the human genes by comparing the simple ratios of nonsynonymous and synonymous coding SNPs (cSNPs) in individual protein-coding genes.
Methods.
We downloaded and analyzed all coding SNPs (cSNPs) with a validation code greater than 2 from the public dbSNP (build 125, http://www.ncbi.nlm.nih.gov/SNP/).
Where necessary, we additionally used genotype data generated from the International HapMap Project with udied the progression of chronic liver disease with regard to HCC.
Genetic variations are thought to influence the risk of developing HCC (Edmondson, Henderson et al., 1976; Cha and Dematteo, 2005), particularly those that involve the activation of cellular oncogenes or the inactivation of tumor suppressor genes in various signaling pathways (e.g., mutation of beta-catenin-related Wnt/beta-catenin signals (Pang, Yuen et al., 2004) and overexpression of Ras signaling (Mitin, Rossman et al., 2005)).
Also, single nucleotide polymorphsims (SNPs) of many famous genes, such as p53 (Kirk, Lesi et al., 2005), HDAC10 (Park, Kim et al., 2007) and MMP2 (Wu, Zhang et al., 2008), have been significantly associated with HCC.
The significant SNPs of these genes may represent genetic markers that are in linkage disequilibrium (LD) with other causative variations.
Also recently, many SNP studies that are related to various human diseases have been reported (Lee, Kim et al., 2007).
Ras signaling transduction pathways influence cell proliferation, survival, differentiation, vesicular trafficking, and gene expression in B cells (Mitin, Rossman et al., 2005).
In previous studies, various growth factors were found to enhance HCC cell proliferation, as well as tu-C), and hepatitis B virus (HBV)-infected HCC.
HCC is one of the most common malignant tumors worldwide and causes about 1 million deaths each year (Parkin et al., 2001; Marrero, 2006).
The etiology of HCC seems to be multifactorial, and several events appear to be necessary for malignant transformation to occur.
Hepatitis C virus (HCV) and HBV infections are important risk factors for chronic liver diseases (Collier & Sherman, 1998).
CHB infection is the most common etiology of HCC in Asian countries.
In particular, cirrhosis is present in about 70% to 80% of HCC cases (Velazquez et al., 2003; Sy et al., 2005).
Moreover, HCC is a highly hypervascular tumor that is associated with a high faculty for vascular invasion (Sun & Tang, 2004).
Because tumor angiogenesis plays a critical role in the development and progression of cancers, including HCC (Sun & Tang, 2004; Pang & Poon, 2007), angiogenic factors have been used not only for diagnosis and prognosis but also as predictors in cancer patients.
n data or infer functional protein complexes from protein interaction data (Friedman et al., 2000).
Others combine various genomic data to infer biological networks without using prior knowledge about biological interaction.
In fact, there are a few recent works trying to reconstruct biological relations based on prior knowledge (Yamanishi et al., 2004; Kharchenko et al., 2004).
Yamanishi et al.
uses kernel method to predict new gene-to-gene interaction within metabolic pathway and bases it on known pathway knowledge by adopting supervised approach.
The work of Kharchenko et al.
compares established metabolic network with expression profiles to find genes that can complete a metabolic pathway with some participants missed.
While the methods are good in finding missing genes, they do not suggest possible new members (or genes) for the given biological pathway for pathway extension.
We first observe that a biological pathway contains highly verified information but covers only a small fraction of genes, while microarray data provide noisy experimental data but covers the whole genome.
The essence of PathPlus approach is to determine candidate genes that are highly likely to be related to a given pathway by combining microarray gene expression data n detail at any given moment.
Moreover, most proteins are known to mediate their functions within regulated complex networks or pathways of interconnected macromolecules by forming dynamic topological interactomes.
Additionally, genes that are not significantly altered may play a critical role with other significantly dysregulated components in their biological pathways.
Therefore, a systems biology approach that can identify pathways with these proteins would significantly improve the ability to find disease-associated genes from micorarray datasets.
This also would be useful in understanding the relationship between pathways and various phenotypes.
There has been a tremendous increase in information for constructing large-scale protein-protein interaction networks from public interactome databases, such as HPRD (Peri et al., 2004).
A number of approaches have been demonstrated for identifying subnetworks of protein-protein interactions, based on coherent expression patterns of their genes (Chen and Yuan, 2006; Chuang et al., 2007).
There also is a study that has identified candidate genes that are related to certain diseases based only on the topological features of the network of disease-related protein-protein interactions (Hwang et al., 2008).
Recently, several methods for integrating microarray data with metabolic pathways have been pre-sugar moieties, often deoxysugars, which add important features to the shape and the stereo-electronic properties of a molecule and often play an essential role in the biological activity of many natural product drugs (Rix et al., 2002).
Thus, glycosyltransferases are and will become more and more important tools for combinatorial biosynthetic approaches.
In this respect, our study focuses on developing the computational tools for the substrate prediction of a given glycosyltransferase (GT) and the prediction of deoxysuguar biosynthesis unit pathway.
The deoxysugar is synthesized by diverse biosynthetic enzymes and functions as a substrate of GTs.Features and Results.
Our computational system developed in this work has 3-tier architecture which is composed of a client, an application server, and a back-end database server (Fig.
1).
The application server consists of major three modules - pathway analysis module, GT analysis module and back-office module.
The pathway analysis module is involved in SBPD (Sugar Biosynthesis Pathway Database) search and sugar biosynthesis unit pathway prediction and drawing.
The main procedure of pathway ition of important functional groups to polyketide skeletons and to the structural diversity and biological activity of this class of natural products (Rix et al., 2002; Kwan et al., 2008).
The most frequently found post-PKS modifications are catalyzed by oxidoreductases, a very broad group of enzymes consisting of oxygenases, oxidases, peroxidases, reductases (e.g., ketoreductases), and dehydrogenases.
In general, these enzymes introduce oxygen-containing functionalities, such as hydroxyl groups (hydroxylases), aldehyde or keto groups, and epoxides (epoxidases) or modify such functionalities by addition or removal of hydrogen atoms, e.g.
transforming a ketone into a secondary alcohol or an aldehyde into a carboxylic acid.
Although oxidoreductases provide or modify relatively small functional groups, they can have a tremendous impact on the binding properties of a molecule with respect to a biological ligand molecule (receptor protein, enzyme, DNA etc.).
The term group transferase refers to enzymes that possess transferase activity introducing novel functional groups and altered profiles on the product relative to the substrate.
This enzyme group contains important enzymes such as amino transferases, alkyl (usually methyl) transferases, acyl (usually acetyl) transferases, glycosyltransferases (GTs) rsity and background of complex diseases.
For defining CNV accurately, resolution is one of the important issues.
When CNV was first uncovered, approximately 12 CNVs per genome were identified through both BAC array and oligoarray (Iafrate et al., 2004; Sebat et al., 2004).
In 2006, Affymetrix GeneChip Human Mapping 500K early access version was applied to define the CNVs from 269 HapMap individuals (Redon et al., 2006).
In that study, 1500 CNVs were identified and the median size of them was smaller (80 Kb) than those defined by tiling BAC array (230 Kb).
In addition to SNP-based CNV analysis, recent higher resolution oligoarray platforms were introduced and revealed that the human genome may contain more CNVs than previously thought and that the average size of CNVs might be smaller than previously reported (de Smith et al., 2007; Perry et al., 2008).
In spite of advance of new technologies, SNP marker has been used frequently to detect CNVs because of several advantages.
First, due to large number of known SNP resources, extremely high resolution SNP genotyping chips (1 Million) can be designed and currently available.
Secondly, accompanying SNP genotype information is useful for disease association study and CNV-SNP combined interpretation can achieve new breakthrough in understanding genetic contribution to oblems with incorporating recombination, the coalescent approach deals with individual sequences, which are different from the genotype data that we usually have.
Therefore, in the coalescent approach, the stochastic inferences using genotype data involve additional assumptions or constraints (Zollner and Pritchard, 2005).
This problem continues when using sequenced data because the positions of mutation sites are given.
In order to find a possible future remedy for the current approaches and obtain more descriptive analyses from actual ancestral graphs instead of probability distributions of graphs, an alternative way of constructing ancestral graphs that avoids the multiple MRCA, as well as unnatural constraints, is proposed in this study.
Instead of constructing the genealogies by coalescence of each individual sequence in a backward direction, the focus is on the emerging order of variants in the ancestral history of genetic data.
By concentrating on the variants themselves and constructing the graph in a forward direction, multiple MRCAs are avoidable, naturally lelic variation in the VDR gene explains 75% of the genetic variability in BMD used as a proxy measure (Eisman, 1999; Liu et al., 2003; Morrison et al., 1994).
Since the first association by Morrison et al., allelic variations in genetic regulation of BMD have been subsequently studied in candidate genes related to important elements of bone mineral homeostasis, bone remodeling and bone matrix composition.
These approach were practically performed by restriction fragment length polymorphisms (RFLPs) on various populations: Caucasians (Deng et al., 1999; Langdahl et al., 2000; Quesada et al., 2004), African-Americans (Zmuda et al., 1999; Harris et al., 1997), Mexican-Americans (Kammerer et al., 2004; McClure et al., 1997), and Asians (Mitra et al., 2006; Morita et al., 2004; Yamada et al., 2003; Zhang et al., 2004).
Ethnicity was shown to be one of the important factors affecting BMD (Liel et al., 1988; Wang et al., 1997).
However, controversial results of interethnic differences in allele or genotype distributions for BMD variation have been evidently presented, and a con-tir et al., 2000; Wolfe, 2000), which is an indicator of RA severity (Cabral et al., 2005).
Considering that the occurrence of RA reflects the interaction between genetic and environmental factors, an approach to estimating such interactions would be useful in the detection of RA susceptibility genes.
A recent study reported a potential gene-environment interaction between the shared epitope of HLA-DR and smoking.
In RF-seropositive RA, HLA-DRB1 genotypes are associated with smoking, which means that a gene that is defined as a risk factor for RA is strongly affected by this environmental factor (Padyukov et al., 2004).
We hypothesized that smoking has a significant effect on the elevation of RA and that this relationship is mediated by the function of a specific gene.
This purpose of this study was to investigate the gene-environment interaction between smoking and the severity of RA.
Methods .
We used the resources of the 15 Genetic Analysis Workshop (GAW 15) database and the North American Rheumatoid Arthritis Consortium (NARAC) family collec-arge heterooligomeric aggregates that protect other proteins against aggregation and denaturation (Mulder et al., 2007).
Hsp20 was first discovered in human skeletal muscle and has been found to be expressed in cardiac muscle, stomach, intestine and bladder (Rembold et al., 2000; Meeks et al., 2005; Mulder et al., 2007).
The phylum Proteobacteria is no exception in the sense that it too posses heat shock proteins.
It is one of the largest phyla of the Bacteria domain and is further subdivided into five categories: alpha, beta, delta, epsilon and gamma The alpha, beta and gamma subclasses were highly supported by morphological analyses, however, the delta and epsilon subclasses were added separately and are considered to have separated earlier than the other subclasses on the phylogenetic tree (Ludwig and Klenk, 2001).
All Proteobacteria are Gram negative and posses a membrane composed of lipopolysaccharides.
Betaproteobacteria play a pivotal role in plant nitrogen fixation and are commonly found in environmental samples (Dedysh et al., 2004).
Epsilonproteobacteria are known to inhabit the digestive tracts of animals and humans (Miroschnichenko et al., 2004).
Gammaproteobacteria encompass several species of medically and scientifically important groups of bacteria that include Salmonella, Vibrio, Pseudomonas and Escherichia (Lee et al., 2005).
The major aim of our present work is to study the sequence evolution of Hsp20 among all five subclasses of le way (Troyer, 2008).
Many biobanks are currently in an establishment phase, aiming at collecting and archiving biomaterials and their relevant biomedical information for future use.
Owing to this, several working protocols of sample collection, delivery, processing, storage, and retrieval are available for reference purposes (International Society for Biological Environmental Respositories 2008; Elliott and Peakman, 2008).
However, the ultimate goal of a biobank is to make the general access of research-oriented people to high quality biomaterials possible with minimal efforts.
This requires an optimized procedure for sample distribution and proper measures of QA for archived biomaterials.
For instance, when genome-wide association studies which usually require tens of thousands of high quality DNA samples in a short period of time are underway, providing them with a proper level of QA in the required time frame can be a difficult task.
QA of biomaterials in biobanks raises two issues: to authenticate that the specimen in a container matches exactly the participant that can be identified by the label on it and to avoid cross-contamination of biomaterials from different donors.
Any of these QA breaches can mislead or nullify the research results in crucial ways.
Although efforts to automate all or parts of biobanking projects in order to provide extensive genome data to the scientists.
We have also developed a user-friendly web site to facilitate access to sequences information of novel creatures.
Here, we demonstrate a newly developed website, EKIS, which stands for EST Knowledge Integration System.
This website facilitates the annotation of new sequences with EC numbers and KEGG pathways based on BLASTX homology searches against the UniProt database using the PESTAS (http://pestas.
kribb.re.kr/; unpublished).
The EKIS is utilized not only as a generic website for obtaining genome project statistics, such as samples, libraries, analyzing ESTs, and annotated ESTs in current ongoing projects, but also as a mining tool for the KEGG pathway based on annotated EC numbers, expression profiling, homology searches and retrieval information systems.
Features and Results.
The EKIS, based on information derived from 65 EST sequencing projects (May 2008), is an integrated knowledge database developed using cutting-edge internet technology (Fig.
1a).
To construct the data integrated ollection (WFCC), the World Data Center for Microorganism (WDCM), and the Asian Consortium for the Conservation and Sustainable Use of Microbial Resources (ACM).
In 2008, KACC developed a web-based system to provide an integrated database, which contains updated information on microbial resources.
Methodology.
The database information was collected from the microorganism project (http://kacc.rda.go.kr/), the National Agrobiodiversity Center (NAC, http://genebank.rda.go.kr/), the National Academy of Agricultural Science (NAAS, http://www.niast.go.kr/), the Bio-Green 21 project (http:// biogreen21.rda.go.kr/), relevant microorganism divisions of universities, and various institutes in Korea.
In most instances, the strains were isolated from Korean agricultural environments, but some strains were shared with other culture collections.
In addition, the database information was accumulated and maintained through several collaborating international institutes, such as the Centraalbureau voor Schimmelcultures (CBS, http://www.
hotgun fragment insert lengths.
Still, these problems of the common assembly methods place a heavy burden on the bioinformaticians who work on the finishing stage of sequencing projects (Edwards et al., 1990).
Besides these traditional difficulties, the recent technological divergence of high-throughput sequencing platforms, including 454 Life Sciences (Roche) (http:// www.454.com/), Solexa of the Illumina Genome Analysis System (http://www.illumina.com), and Applied Biosystems SOLiD Sequencing (http://ww.appliedbiosystems.
Upon phosphorylation of Cbl, the Cbl/CAP complex provides docking sites for the recruitment of the adaptor protein CRKII and the guanyl nucleotide exchange factor C3G (Baumann et al., 2000; Ribon et al., 1996).
Recruitment of C3G results in the activation of the small GTP-binding TC10 (protein product of the RHOQ gene).
Activation of TC10 results in cytoskeletal rearrangements that are needed to facilitate Glut4 translocation upon insulin signaling as well as insulin-stimulated glucose uptake (Chiang et al., 2001).
A recent study of streptozotocin-induced diabetic animals showed that Cbl and SORBS1 gene expression was significantly reduced and that the activation of TC10 was also abridged (Gupte and Mora, 2006).
In spite of the pivotal role of the CAP/TC10 pathway in skeletal muscle and adipose tissue, the association of effector genes with T2DM has not been elucidated well.
In the present report, we investigated the genetic var-kines such as TNFand interleukin-6 (IL-6) induce acute-phase proteins by the liver (Spranger et al., 2003).
Elevated levels of acute-phase proteins have been detected in T2DM subjects (McMillan, 1989; Jonsson and Wales, 1976).
These findings imply the relation of acute-phase proteins to T2DM through the action of proinflammatory agents.
A variety of risk factors for T2DM development including age, inactivity, obesity, racial group, smoking, psychological stress, and low birth weight also have been known to be associated with augmented acute phase proteins (pickup, 2006).
Fibrinogen, one of the acute phase proteins, has been considered as a marker of cardiovascular disease (CVD) since it is the principal protein of blood clotting (Ernst & Resch, 1993).
Some reports demonstrated linking plasma fibrinogen concentration to not only CVD but also the metabolic syndrome, namely T2DM, hypertriglyceridemia, hypertension, and hyperinsulinemia (Ganda & Arkin, 1992; Imperatore et al., 1998).
FGA is A fast and cost-efficient approach to identify a multitude of novel SNPs involving in mining sequence data from public repositories needed to be developed.
The improvement of sequence trace data in public repositories at the DNA chromatogram level was required (Panitz et al., 2007).
An alternative and cheaper method of SNP identification exploits the redundancy of gene sequences generated by expressed sequence tags (ESTs; Gu et al., 1998).
- In this scenario, each SNP would also be associated with an expressed gene.
EST information has been used for the detection of SNPs in mammalian genomes by many groups (Buetow et al., 1999; Fitzsimmons et al., 2004; Garg et al., 1999; Guryev et al., 2004; Hawken et al., 2004; Kim et al., 2003; Lee et al., 2006a; Lee et al., 2006b; Picoult-Newberg, et al., 1999).
Many conventional cDNA libraries have been constructed with porcine material.
As of May 2009, 1,532,429 ESTs were available in the GenBank porcine EST database and 167,740 SNPs were achievable in the GenBank pig SNP database (Table 1).
Several researchers have reported mining cSNPs from porcine EST sequences (Dirisala et al., 2005, 2007; Fahrenkrug et al., 2002; Grapes et al., 2006; Kollers et al., 2005; Panitz et al., 2007; Park et al., 2009; Uenishi et al., e precursor cells exit the proliferation cycle and express young neuronal and glial characteristics.
The postmitotic cell bodies and differentiating cells are confined within the mantle layer and give rise to a distinct grey appearance.
Within this layer, some neurons develop local interconnections, while others elaborate axons to distal targets and form the marginal layer during the process.
Glial cells gradually encapsulate the axons producing a marginal layer, which demonstrates a distinctive white appearance in the adult spinal cord.
In this way, neuronal and glial cells follow a developmental program that depends on temporal and regional patterning (Poh et al., 2002).
Recent progress in the field has improved our understanding of the genetic control involved in neurogenesis and gliogenesis.
For example, the identification of the basic helix-loop-helix (bHLH) transcription factor has allowed us to further understand the molecular regulation of vertebrate neurogenesis and gliogenesis (Bertrand et al., 2002; Kageyama et al., 2005; Lu et al., 2000; Zhou and Anderson, 2002).
Despite the identification of intrinsic cellular factors and extracellular cues, the molecular bases of general programs, including sequential generation, are not yet fully understood.
In this study, we employed microarray analysis to monitor the tempo-odels, EGCG causes a high mortality rate in adult female Swiss Webster mice and hepatotoxicity in immature C57BL/6 female mice (Galati et al., 2006; Goodin and Rosengren, 2003).
Additionally, recent studies have shown that several phenolic antioxidant food additives can accelerate oxidative damage to DNA, proteins, and carbohydrates, despite their antioxidative action.
The molecular mechanism of prooxidative action and hepatotoxicity that are induced by EGCG remains poorly understood.
Galati et al.
(2006) reported that cell death induced by EGCG was associated with increased production of reactive oxygen species (ROS) and depletion of reduced glutathione (GSH) and catechol-o-methyltransferase (COMT).
The mechanism of EGCG-induced cell death was suggested in previous reports, which implicated the apoptosis pathway (Schmidt et al.
2005) and the endoplasmic reticulum (ER) stress-related pathway (Dodo et al., 2008).
To gain insight into the molecular mechanism underlying EGCG-induced cytotoxicity, we used a microarray approach, which allowed us to observe the global effects of EGCG on hepatic gene expression.
Time course in this field of traditional Korean medicine is that both drug response and the disease susceptibility are related to a given persons particular psychological and physiological characteristics.
Based on such characteristics, Sasang constitutional medicine classifies people into four constitutional types.
In the Sasang constitution type system, one is treated differently in medical procedures based on one’s type.
It is widely accepted in Korea that individualized medicine based on constitutional energy traits has a favorable influence on treatment (Leem and Park, 2007) and constitution-specific responses to herbs are well described (Jeong and Song, 1995a; Jeong and Song, 1995b).
Furthermore inconsistent drug responses and disease susceptibility can be explained to some extent by characterizing the Sasang constitution and taking advantage of it in the course of treatment.
Studies seeking genetic evidence for the Sasang constitution have been rare, except only a few reports showing an association between Sasang constitution type and specific genetic polymorphisms (Song et al., 2008; Um et al., 2003).
To unravel the genetic patterns of the Sasang constitution obtaining genetic evidence will be mandatory.
Advances in genetic analysis and statistical techni-d are essential for the organism's interaction with its environment.
Proteins such as channel proteins and transporters drive the transmembrane movement of ions, solutes, and small molecules.
Other proteins such as receptors receive signals by interacting with a large variety of ligands, including hormones, neurotransmitters, autacoids, chemokines, odorants, even light, and transduce them to various kinds of information.
Membrane proteins with seven transmembrane segments (7-TMSs) play major roles in signal transformation and information transport in cellular organisms.
A wide variety of genomic and proteomic studies have been carried out since tens of whole genome sequences were determined.
In 2003, a database of experimentally characterized transmembrane topologies was constructed and reported as TMPDB; it contained a total of 302 TM protein sequences (Ikeda et al., 2003).
Recently, the distributions of membrane protein topologies in both E. coli and S. cerevisiae membrane proteomes were reported (Daley et al., 2005; Kim et al., 2006).
There were a very small number of 7-TMS proteins compared with the numbers of proteins with other TMS numbers in E. coli, whereas a larger number of 7-TMS proteins were found in S. cerevisiae.
Instead, E. approach that can identify perturbed molecular functions with differentially expressed genes would accelerate the understanding the basic molecular mechanism of certain phenotypes or diseases.
The Gene Ontology (The Gene Ontology Consortium, 2000) is a database of structured controlled ontology that describe gene product in terms of their biological processes, cellular components, and molecular functions.
GO is a hierarchically structure forming an acyclic digraph with top-down directions, which provides an efficient navigation for the structure of the ontology.
However, the same GO term may occur in different lines of ontology structure.
It means that GO may not be represented as directed tree since a GO term may have more than one parent providing multiple paths from the root.
To construct an ordered GO tree for the purpose of global visualization, GO terms need to be distinguished from one another if they are occurred in different locations on the hierarchical classification of gene ontology.
Based on the biological point of view, Lee et al.
(2004) justified that what is more important is not a GO term itself, but which path the GO terms takes from the root in the gene ontology.
It means that each location of a GO term could be considered distinct if a distinct path leads to it from the root of gene ontology.
Therefore, we de-shuffling suggests that the two flanking introns of a domain-encoding exon should have symmetric phase combinations (Patthy, 1999).
Intron phases are determined by the examination of the translational reading frame relative to the intron, and introns are thus described as phase 0, 1, or 2 introns.
Of the nine possible combinations of flanking introns, three are symmetric (0-0, 1-1, and 2-2) and six are asymmetric.
The length of a symmetric exon is always a multiple of three nucleotides.
Only symmetric exons can be duplicated in tandem or deleted without affecting the reading frame, whereas the duplication and deletion of asymmetric exons can disrupt the downstream reading frame (Patthy, 1996).
Comparisons between human exon- boundary domains and bacterial domains have indicated that 1-1 domains are associated with the origin of animal multicellularity, whereas 0-0 domains are shared between eukaryotes and prokaryotes (Kaessmann, et al., 2002).
This comparison also indicates that 0-0 domains date back to before the prokaryote/eukaryote divergence and can thus be defined as ‘old domains’, whereas 1-1 domains were created recently and can thus be termed young domains.
In this study, we analyzed eight metazoan genomes rphism (SNP), because it implies that a smaller number of markers (tagging SNPs) are necessary to uniquely distinguish different haplotypes.
On the other hand, if a marker shows a strong association with a trait, then the other markers within the same haplotype block should show the same association.
This idea has been implemented in GWA analysis software programs, including PLINK (Purcell et al., 2007), in various ways: direct association of each haplotype, proxy association, or LD-based clumping.
These approaches require an accurate haplotype model.
The quality of the haplotype models in a particular GWA dataset can be improved by phasing the haplotypes, based on the background genotypes of the ultrahigh-density International HapMap data.
The International HapMap Project aims to produce such valuable haplotype information of the human genome (The International HapMap Consortium 2003).
A total of 270 samples, 90 samples from each of three major ethnic groups-African, Asian, and European-were genotyped, phased, and released to the public freely (Thorisson et al., 2005).
The second phase of the project released the genotypes for over 2 million SNPs.
The genotypes of the markers that were not included in the study dataset but were included in the reference dataset, the International HapMap dataset, can be inferred if has been developed for eukaryotes by the group who developed COG (Koonin et al., 2004).
Inparanoid works by selecting orthologs by a reciprocal best hit, like COG.
The method also considers the existence of paralogs.
Therefore, Inparanoid defines ortholog relations that are sometimes one-to-many or many-to-many (Remm et al., 2001).
OrthoMCL (Li et al., 2003) is another kind of ortholog-finding system.
This method can find orthologs among several species at the same time, which was impossible for other methods.
OrthoMCL uses an all- against-all BLAST search and a clustering algorithm based on the Markov model to select an ortholog group.
However, these approaches above are basically using BLAST to check sequence similarity, and the proteins that have the best BLAST score are selected as the main orthologs.
It has been known that proteins that have the best similarity scores in BLAST search are often not the closest relatives phylogenetically (Koski and Golding, 2001).
This implies that there is a possibility that genuine orthologs can not be found using sequence similarity alone.
Not only can sequence similarity be used for ortholog finding but also structure and interaction data.
Even though the sequence similarity is not discovered, the including cases is high.
In such instances, we increase sample size for control only and then see if the effect on statistical power is the same as that obtained when the sample size for both case and control increases.
Specifically, we examine if increase in the ratio of control to case has an effect on increasing power.
We simulate SNP data as below and assess the effect of the ratio of control to case on statistical power.
that the risk of osteoporotic fracture is elevated in the first degree relatives of probands with osteoporosis (Barthe et al., 1998).
Genetic linkage analysis and candidate genes association studies have implicated several chromosomal loci and multiple candidate genes (Jin and Ralston, 2005).
To date, more than 10 studies have performed genome- wide linkage scans to localize quantitative trait loci (QTLs) that modulate BMD, and these studies have identified several QTLs.
However, only a limited number of loci have been replicated across studies.
Strongest support for linkage has been observed for chromosome 1p36 influencing hip BMD (Devoto et al., 2001).
In addition, 1q21, 4q32, 12q24, 13q14 and 16p13 have been implicated for BMD.
However, only a few studies have performed a genome wide linkage analysis in Asian populations to localize the region that may harbor susceptibility genes for BMD (Hsu et al., 2007).
In the present study, we examined randomly selected extended families from a region of Mongolia to understand the epidemiologic characteristics of bone health indices (BHI) as surrogates for bone mineral density; to assess the magnitude of heritability of BHI; and to perform variance component linkage analysis for chromosome 13. lism-related genes.
In addition, there is no information on the relationships between particular SNPs and the drug metabolism, which can be used to identify candidate genes affecting disease progression or drug reactions.
The Korean Pharmacogenomic database (KPD) was developed to allow easier searches for Korean Pharmacogenomic information (Tae Sun Kang et al., 2008), and is located at the National Institute of Food and Drug Safety Evaluation (NIFDS) homepage.
When the KPD was first released in 2007, it offered the selected and specialized data (mainly SNP and haplotype information in liver metabolizing enzymes, trasporters and receptors).
For the convenience of the public, researchers and scientific reviewer, link functions to major intermational databases (NCBI, dbSNP database, and PharmGKB) were established in the KPD.
Since its first release, the KPD has been used in a wide range of applications including drug evaluations, selection of SNPs in Koreans and general pharmacogenomic education.
The KPD is being updated con-The status bar, when the cursor is over an image, displays pixel coordinates and values.
The progress bar shows the progress of time- consuming operations.
PPE provides several common visual notations but also allows the user to define his own notations.
The shape menu in the left window, in 3D) presentations, where two and a half dimension is an informal term used to describe visual phenomena which is actually 2D with 3D looking graphics (Baur et al., 2004; Song et al., 2009).
Such visualization allows user to see a detailed view of an objects in focus and its place in the context of the whole system.J2.5dPathway Browser Architecture.
The J2.5DPathway Browser Tool is a tool that hierarchically lists the pathways and super nodes.
The system provides a visual interface to biomedical pathways.
It is basically a graph visualization tool.
An XML-format input data may come from web services (e.g., KEGG: http:// www.genome.jp/kegg/soap/) after the user chooses to view the selected pathway from the J2.5DPathway, or it may come from other file formats, which were locally saved.
J2.5DPathway consists of four major modules: (1) graphical user interface (GUI), (2) two-dimensional representation, (3) two-and-a-half dimensional representation, and (4) layout optimizer.
The layout optimizer module permits user to optimize the appearance of a graph.
imination of nucleus-trapped toxic mDMPK transcripts have been proposed as specific therapies for reversal of the disease phenotype.
They include approaches with antisense, small catalytic RNA, or siRNA, which can reduce the level of mDMPK RNA by directly targeting the expanded CUG repeats of the mDMPK transcript (Furling et al., 2003; Krol et al., 2007; Langlois et al., 2003; Mahadevan et al., 2006).
However, high specificity for degradation of the mutant alleles, nontoxicity, and lack of off-target effects will be required to minimize nonspecific reduction of the normal allele and cellular toxicity.
RNA repair or RNA replacement has been described as a new approach to human gene therapy against genetic disorders, which relies on the Tetrahymena group I-based trans-splicing ribozyme (Sullenger and Gilboa, 2002).
This ribozyme could cleave a target 5-exon RNA and trans-ligate an exon that is tagged at its 3-end onto the cleaved target RNA both in vitro and in bacteria and mammalian cells (Been and Cech, 1986; Sullenger and Cech, 1994; Jones et al., 2006).
Therefore, the trans-splicing ribozyme can reduce the level of a specific target RNA and simultaneously induce transgene activity specifically in the target RNA- expressing cells.
Indeed, trans-splicing ribozymes have been successfully developed to replace mutant transcripts that are associated with several human genetically inherited diseases, including DM1 (Lan et al., ning to emerge (Seitz & Stickel 2007; Stickel et al., 2002).
According to the International Agency for Research on Cancer, ethanol is classified as a human carcinogen because it induces HCC in animals and increases the risk for developing HCC in humans (Baan et al., 2007; Seitz and Stickel 2007).
Cytochrome P450 2E1 (CYP2E1), a member of the cytochrome P450 superfamily is important for the metabolic activation of many low-molecular-weight toxicants such as N-nitrosamines, aniline, vinyl chloride, urethane and alcohol (Guengerich et al., 1991).
The participation of CYP2E1 in the ethanol metabolizing process is less important than alcohol dehydrogenase (ADH) and aldehyde dehydrogenase (ALDH).
However, continuous ethanol consumption is known to increase the activity of CYP2E1 up to 20 fold, a major constituent of the microsomal ethanol oxidizing system in the liver (Stickel & Osterreicher 2006; Takahashi et al., 1993).
Therefore, alcohol’s metabolizing capacity is increased in heavy drinkers.
Moreover, CYP2E1 often catalyze the metabolic activation of various procarcinogens to eventual carcinogens (Guengerich et al., 1991; Koop, 1992).
Genetic variation appears to contribute to interindividual variation in CYP2E1 expression levels and activities (McCarver et al., 1998).
Specifically, RsaI polymorphism (CYP2E1*5B) (rs2031920) has been associated with decreased CYP2E1 activity or inducibility (Hayashi et al., 1991; Lucas et al., 1995; Marchand et al., 1999; g 5), also known as survivin, is a protein encoded by the BIRC5 (Altieri 1994a; Altieri 1994b).
It belongs to the inhibitor of apoptosis (IAP) gene family.
The BIRC5 protein functions to inhibit caspase activation, thus leading to a decrease in apoptosis or programmed cell death.
This has been shown by increase in apoptosis and decrease in tumor growth by disruption of BIRC5 induction pathways.
In addition, the BIRC5 protein is highly expressed in most tumor cells (Sah et al., 2006).
Because of this, BIRC5 is considered as one of the potent target for cancer therapy (Altieri, 2003).
A previous study demonstrates that BIRC5 is associated with microtubules of the mitotic spindle at the start of mitosis through the disruption of microtubule formation after BIRC5 protein in cancer cells are knocked out.
This appearance leads to polyploidy as well as massive apoptosis (Castedo et al., 2004).
Another study involves how the BIRC5 depleted cells exit mitosis without achieving proper chromosome alignment and then reforms single tetraploid nuclei.
It suggests that BIRC5 protein is needed for sustaining mitotic arrest upon encounter with mitosis problems.
The previous studies im-ith other signaling pathways.
According to the central dogma of molecular biology suggested about 50 years ago (Crick, 1958), most of the genetic information in DNA is finally transferred to proteins.
Proteins, the final products in the central dogma, are not only fundamental components of living cells but also mediators of most of the cellular processes.
Thus, to broaden our understanding of cellular functions and processes, it is essential to know the mechanisms how to regulate the activity or abundance of each protein.
On these days, rapid developing technology in life science makes it possible to study proteins at a large-scale or proteomic level.
Proteomics has become well established as a term for studying proteins at a large scale.
Therefore, proteomic research might be very helpful in understanding the precise cellular functions and processes.
In the present study, to gain a novel insight into TOR signaling pathway in Saccharomyces cerevisiae, we sought to analyze proteomic expression changes in yeast cells treated with or without rapamycin.
To determine proteomic expression changes in yeast, we took advantage of a collection of 4159 green fluorescent protein (GFP)-tagged yeast strains (Huh et al., 2003).
Previously, several researches have used the yeast GFP-tagged collection for rapid and precise high-throughput nstitute (EBI).
It can be accessed via web services and can be adapted for identifier mapping.
BridgeDb, though not released publically yet, is another attempt at an id mapping framework for bioinformatics applications.
BridgeDb lets one add the following capabilities quickly and easily: translate identifiers from one system to another, search references by id or symbol, and link out to online information for an identifier.
Applications, such as the PathVisio pathway analysis tool, WikiPathways, CyThesaurus Cytoscape plug-in, and the NetworkMerge Cytoscape plug-in, are utilizing functionalities provided by BridgeDB.Methods.
Instead of developing the system from scratch, we assembled the application by re-using available services, frameworks, and libraries as much as possible.
For ID mapping itself, we used ID mapping services provided by other research groups that allowes programmatic access.
We picked the service provided by PICR, Synergizer, and Biomart, as they are recently developed and/or well maintained.
Currently, the BridgeDb library implements an ID mapping service for them, and we used BridgeDbs API for the access of id mapping to those who perform docking experiments or build focused chemical databases.Data and processing.
The data were collected from and cross-referenced to public chemical data resources, such as PubChem (Wang et al., 2009), the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al., 2000), the Chemical Entities of Biological Interest (ChEBI) (Degtyarenko et al., 2008), NMRShiftDB (Steinbeck et al., 2004), Distributed Structure-Searchable Toxicity (DSSTox) (Richard et al., 2002), and DrugBank (Wishart et al., 2008).
The public chemical data are available in various data formats (e.g., MOL, SDF, SMILES, etc.).
Therefore, our parsing tool was used to parse the public chemical data to extract data according to field name mapping table, which were then dynamically uploaded to the database.
The database schema consisted of chemical identification information and associated data, ranging from molecular descriptors to pathway data, spectroscopic data, and toxicological data.
The molecular descriptors (3D coordinates, hydrogen-bond donors, hydrogen-bond acceptors, octanol/water partition coefficient log P, etc.)
that were not available in the imported data were calcu-aryotes where the majority of genes are composed of introns and exons, and further analysis must be required to detect the intron/exon boundaries and assemble the exons into a contiguous coding sequences.
The Coding Sequence (CDS) and the ORF may be interchangeable but they are a little different from each other.
The CDS is the actual region of DNA that is translated to form proteins.
In Eukaryotes, while the ORF may contain introns as well, the CDS refers to contiguous coding sequence or concatenated exons that can be divided into codons.
In Prokaryotes, the ORF and the CDS can be considered the same entity.
As mentioned above, the ORF can be a good indication of the presence of a CDS or a gene and the determination of ORFs is the first step toward the gene prediction, especially for prokaryotes.
Furthermore, ample ORFs of specific organisms are required to build a general probabilistic model of the gene structural and compositional properties of genomic DNA sequences of the corresponding organisms, which is introduced and applied to the problem of identifying genes in unannotated genomic sequences.
In terms of sequence annotation, it may be reasonable that all possible ORFs should be provided regardless of high specificity and the curator should decide on which ORFs are actually coding regions.
In these respects, we have developed a web-based ORF search tool, the ORF Miner, which fo-en a mutation occurs at cellular machinery which results in faster replication of the cell and/or nullification of apoptosis.
Chronic infection of Hepatitis B or C virus aids the development of HCC because the viruses cause the immune system to attack the body’s own liver cell.
This cycle of constant damaging and repair can lead to mistakes which may then result in carcinogenesis.
Recently, it has been suggested that genetic polymorphisms may be associated with the risk of HCC (Chun et al., 2009; Kim et al., 2006; Oh et al., 2008; Park et al., 2006; Park et al., 2007; Shin et al., 2003; Shin et al., 2007).
SERPINB5 is one of serpins and a tumor suppressor gene (Sager et al., 1997), also commonly known as maspin.
Serpins are sets of proteins that are able to inhibit proteases and affect proteolytic cascades.
However, SERPINB5 is a special case, which acts as a tumor suppressor gene instead of affecting protein pathways.
SERPINB5 has previously shown the ability to suppress tumor cells in places such as bones, pancreas, and esophagus (Cai et al., 2009; Hall et al., 2008; Hong et al., 2009).
From these findings, it can be inferred that SERPINB5 has a potent ability to suppress tumor cells.
It has also been reported that SERPINB12, a member of SERPIN family and a gene that is resided right next to SERPINB5, is expressed in liver (Askew et al., 2001).
SEPRINB12 also affects many tissues of body, including s to be linked to separate genetic and epigenetic aberrations.
These changes are associated with alterations in the expression or formation of an oncogene or a tumor suppressor gene (Patil et al., 2009).
In addition, several previous studies have also reported associations between genetic polymorphisms and the risk of HCC and/or HBV clearance, e.g., histone deacetylase-10 (HDAC10) and secreted phosphoprotein-1 (SPP1) polymorphisms, and interleukin-10 (IL10) haplotypes were also shown to be associated with HBV clearance and/or HCC development (Lin et al., 2005; Park et al., 2007; Shin et al., 2003; Shin et al., 2007).
Members of the integrin family, the beta1 (1) integrins (ITGB1; MIM# 135630) are heterodimeric structures consisting of a common 1 subunit that is non-covalently associated with one of nine different subunits.
These molecules are widely distributed in various cells and they mediate cell-cell and cell-ECM interactions that are related to many biological functions in the development of cell or tissues, hemostasis and immune response (Garrido et al., 2001).
It has been shown that the deregulation of cell adhesion to the ECM and the abnormal expression of ITGB1s, particularly the 51 down-regulation, are closely associated with tumor development and metastasis.
Especially in the liver, the disease susceptibility (Freeman et al., 2006; Iafrate et al., 2004; Sebat et al., 2004).
Indeed, lines of evidence have been reported which demonstrated the association between CNV and ASD (Cho et al., 2009; Cook and Scherer, 2008; Glessner et al., 2009; Marshall et al., 2008; Sebat et al., 2007).
In our previous study, we identified 38 copy number variable regions in ASD patients, two of which (CNVs on 8p23.1 and 17p11.2) were found to be significantly associated with ASD (Cho et al., 2009).
Using 27 CNV loci previously reported to be associated with ASD including ours, we developed an 8.5K ASD-specific BAC array and screened whether the association between those CNVs would be replicated in patient cell lines of four different ethnic groups.Methods.
We used DNA extracted from B-lymphocyte cell lines of 8 ASD patients of four different ethnicities; Caucasian, Chinese, Asiatic Indian, and Hispanic (Coriell Institute for Medical Research, Camden, NJ, USA).
General characteristics of the 8 ASD patients are shown in Table 1.
As et al., 2008).
To date, some microarray analyses have been performed to elucidate the effects of RF radiation on biological systems (Chauhan et al., 2007; Huang et al., 2008; Lee et al., 2006; Lee et al., 2007; Paparini et al., 2009; Qutob et al., 2006; Sanchez et al., 2007; Whitehead et al., 2006; Zeng et al., 2006; Zhao et al., 2007).
For example, non-thermal RF radiation did not affect gene expression in U87MG glioblastoma cells exposed to 1.9 GHz RF at SARs from 0.1 to 10 W/kg for 4 h and 24 h, respectively (Qutob et al., 2006; Chauhan et al., 2007).
It has been reported that analysis of gene expression identified a handful of consistently changed genes in MCF-7 cells after exposure to RF radiation at a low SAR (up to 3.5 W/kg) for 24 h. However, these differentially expressed genes were not confirmed by quantitative RT-PCR, implying that the observed effects might have occurred by chance (Zeng et al., 2006).
Similarly, RF exposure to C3H 10T 1/2 cells at a 5-W/kg SAR induced no greater changes in gene expression compared to the sham group (Whitehead et al., 2006).
In addition, we previously reported unchangeability in the phosphorylation of MAPKs, such as ERK, JNK, and p38, after exposure of auditory hair cells to 1,763 MHz at a 20-W/kg SAR for up to 2 h. Moreover, neither any cell cycle change nor DNA damage was detected under these conditions.
no evidence that non-thermal RF radiation affect gene expression in U87MG glioblastoma cells exposed to 1.9 GHz RF at SARs from 0.1 to 10 W/kg for 4 h and 24h, respectively (Chauhan et al., 2007; Qutob et al., 2006).
Takashima et al.
(2006) also failed to find any changes in cell proliferation with continuous exposure at up to 100 W/kg except at SAR of 200 W/kg suggesting that RF radiation at 200 W/kg SAR affected the cells by combined with heating medium.
Similarly, Sanchez et al.
reported that there were no effects of 1,800 MHz RF radiation at 2 W/kg SAR on HSP expression and apoptosis in human skin cells compare to ultraviolet (UV) and heat shock (Sanchez et al., 2007).
In our previous studies, neither of cell cycle changes nor DNA damage as was detected in after 1,763 MHz RF exposure at an SAR of 10 W/kg SAR in Jurkat-T cells or 20 W/kg in auditory hair cells (Huang et al., 2008).
Although most of the studies on the biological effect of RF exposure could not detect any molecular changes, some reports could demonstrate RF-specific gene expression.
Exposure of 2.45 GHz high frequency electromagnetic field to glioma cells at SAR levels above 20 W/kg led to an increased level of HSP70, even when the effect of raised temperature is taken into account (Tian et al., 2002).
Lee et al.
found that 2.45 GHz RF ses of lovastatin did not cause significant liver injury, when given in very high dose they caused hepatocellular necrosis in rabbits (Chalasani,et al., 2005).
Similarly, high doses of simvastatin caused hepatocellular necrosis in guinea pig (Chalasani et al., 2005).
The liver injury in these animals could be prevented or reversed with supplementing animals with mevalonate suggesting that depletion of mevalonate or its down-stream metabolite might be responsible for liver injury (Chalasani et al., 2005).
According to adverse effect reports, it seemed that all of these hepatic adverse effects do not have any significant increase with statin therapy (Abu, et al., 2005).
However, as the use of lipid lowering treatment including statins was extended to a larger number of high-risk patients, the potential of statin-induced liver toxicity became a major concern for the safety profile of statins.
Moreover, a large-scale clinical study encouraged the use of statins in a wide range of patients at varying levels of risk for cardiovascular events and independently of baseline cholesterol levels (Heart Protection Study, 2002, Wilmshurst et al., 2002).
Taken together, it seemed that the most concerning issue for statin safety resides its potential to cause acute liver failure during wide use of cholesterol lowering therapy.
To address this concern, it would be helpful to understand underlying mechanism of statin-in-t also therapeutic targets.
Gene expression in acute renal failure has been analyzed with microarrays in several studies (Amin et al., 2004; Huang et al., 2001; Safirstein, 2004).
However, the results vary because of a lot of potential confounding factor, such as species difference, injury difference, microarray platform difference and dose and time difference.
Many studies identified KIM1 as potential renal marker through gene expression analyses after kidney injury.
KIM1 is a type I transmenbrane protein that is not detectable in normal kidney but is expressed at high levels in proximal tubule epithelial cells after toxic injury (Ferguson et al., 2008; Ichimura et al., 2004).
Recently, the United States Food and Drug Administration and European Medicines Agency have adopted KIM1 as a standard biomarker for the preclinical safety evaluation of novel drug candidates.
While KIM1 was adopted as a biomarker of renal toxicity, several studies were undertaken in order to evaluate the utility of other putative renal toxicity biomarkers.
To circumvent inconsistency caused by confounding factors in in vivo studies, we set an in vitro study model for nephrotoxicity by using a rat kidney epithelium originated cell line, NRK-52.
To address biological effect of mercuric chloride on the cell line, Rat Genome Survey ey are usually an important signature of the 5' region of many mammalian genes, often overlapping with, or within, a thousand bases downstream of the promoter.
The identification of promoters by CpG islands with a resolution of 2 KB may be most useful for large-scale sequence annotation.
Visual inspection of CpG islands is often used for gene identification by many molecular biologists.
In a word, searching CpG islands are very important from various aspects.
In this respect, I developed the CpG islands search tool, CpG Islands Detector, which can be used for CpG islands determination.
This tool is a window-based Java application implemented with JBuilder 9.0 which is a Java IDE (Integrated Development Environment).
There are several computer programs, including CpG Island Searcher (Takai and Jones, 2002) and CpGIF (Ye et al., 2008), which search genomes for CpG islands and are available on the Web.
Their simple and limited interfaces mean that users are unable to capitalize on the programs by using them to find out the best parameters - parameters which would allow users to locate all of the CpG islands and none of the junk.inked by the transpeptidase enzyme on the C terminal end of the PBP enzyme or by a monofunctional transpeptidase protein such as DD-transpeptidase, DD-carboxypeptidase, and DD-endopeptidase (Ghuysen et al., 1998; Yuan et al., 2007).
The NAM peptide chain containing D-alanine, D-glutamic acid, and meso-diaminopimelic acid is present in the thinner cell walled, gram negative bacteria such as Hemophilus influenza.
Gram positive bacteria such as Staphylococcus aureus have a thicker cell wall as the peptide sequence is: L-alanine, D-glutamine, L-lysine, and D-alanine.
MBPT links the nucleophilic hydroxyl group on the fourth carbon of the NAG moiety to the electrophilic carbon with the protruding diphospholipid chain of the NAM moiety (Yuan et al., 2007).
An over expression of the MBPT gene, mgt, causes an increase in the formation of the peptidoglycan subunits, in vitro, as further work can be done to analyze the precise lengths of the glycan chain created by monofunctional and bifunctional transpeptidase which may infer further differences between infectious and non infectious bacteria (Di Berardino et al., 1996).
The generation of MBPT from the mgt gene also gives MBPT its own identity separate from the bi-KDa, a product of the GJB2 gene.
Connexin exists in a dynamic cycle of synthesis and replacement because of a short life span of a few hours (Musil, et al., 1993).
Connexin 26 is found in cells throughout the body, particularly in the inner ear and the skin, and permits the transport of potassium ions and small molecules in neighboring cells (Castillo et al., 2005).
Its presence in the cochlea is associated with hearing.
GJB2 mutations are the cause of recessive and dominant forms of deafness (Laer et al., 2001), affecting 1 in 1000 children (Bitner-Glindzicz, 2002).
About 20 mutations of gap junction -2 have been described as being related to diseases (Abe et al., 2000) and various syndromes, such as Bart-Pumphrey Syndrome, Keratitis- Ichthyosis-Deafness Syndrome, and Vohwinkel Syndrome.
(Richard et al., 2004; Snoeckx et al., 2005; Yotsumoto et al., 2003).
The mutations alter the amino acid sequence of connexin 26 and result in impaired functioning of the protein associated with ion channels, especially.
In fact, a range of phenotypic presentations of autosomal dominant and recessive gap junction -2 gene mutations are intriguing.
Such mutations result in an unstable or malformed protein, unable to form gap junctions, or they result in a modified protein that builds dysfunctional gap junctions.
So, the falsified or modified connexin 26 curbs the normal arrangement of gap junc-th genetic factors, such as single gene disorders and genetic polymorphisms, including single nucleotide polymorphisms (SNPs) and copy number variation (CNV), and that multiple genetic factors may work together (McCauley et al., 2005; Santangelo and Tsatsanis, 2005).
Significant associations have been reported between ASDs and CNVs of various genes, such as NLGN3 (Xq13.1), NLGN4 (Xp22.23), SHANK3 (22q13.3), and NRXN1 (2p16.3) (Christian et al., 2008; Cook et al., 2008).
However, most of these candidate genes have been neither successfully replicated in independent study populations nor functionally validated as susceptible genes or loci.
Maintaining neuronal cellular connectivity as well as synaptogenesis has been suggested to play a role in the pathologensis of ASD by trafficking of immune cells and mediators through astroglial and microglial cells (Gupta, 2000).
T cell receptor beta affects TCR signaling and T cell polarity (McKinney et al., 2010), which consequently affect neuronal activity, development, and plasticity (Syken and Shatz, 2003).
Several association e associated with changes in pH and tenderness (Ernst et al., 1998; Ciobanu et al., 2001).
Therefore, in this study, genotyping and association analyses of 5 SNPs in the aforementioned genes were investigated in pig reference families generated from Korean native pig (KNP)-crossed Yorkshire (YS) breeds, with the objective to evaluate KNP x YS families as a reference population for exploring and identifying genetic markers to select breeding animals with superior pork quality and to understand genomic mechanisms underlying pork quality variation between KNP and YS.
Methods.
A three-generation resource population was developed from reciprocal crosses between the Korean native pig (KNP) and Yorkshire (YS) breeds at Chungbuk National University.
The F1 crossbreeds were produced from two purebred KNP boars crossed with five purebred YS sows (F1: KY) as well as three purebred YS boars crossed with 14 purebred KNP sows (F1: YK).
Randomly selected F1 crossbreds mated to produce F2 animals uman genome, and how they are distributed among the chromosomes and regionally within a given chromosome.Methods.
Our definition of synteny between a pair of genomes was based on the working principle used by Ensembl Genome Browser (http://www.ensembl.org/info/docs/compara/analyses.
html).
Ensembl filters the result of blastZ (Schwartz, 2003) runs between a pair of genomes by chaining the aligned blocks into bigger nets using blastZ-net, followed by grouping neighboring regions closer than 200kb.
The final syntenic regions are defined by merging those grouped regions if they are not interrupted by non-syntenic blocks.
The resulting synteny data files (dnafrag.txt, dnafrag_region.txt, genome_db.txt, method_line.txt, method_link_species_set.txt, and synteny_region.txt) are downloadable from Ensembl ftp site (ftp:// ftp.ensembl.org/pub/).
We made a synteny database from these six files using MySQL database management system.
This database includes 10 mammalian species (Rattus norvegicus, Macaca mulatta, Pan troglodytes, Canis familiaris, Monodelphis domestica, Mus musculus, Pongo pygmaeus, Equus caballus, Bos Taurus, and c purpose.
However, there is not AKAPs-related database regardless of their important cellular function and difficulty of finding appropriate AKAPs.
Therefore, we developed AKAPDB as secondary database which focuses on a specific topic using from the primary database and other literature sources (Bishop, 1999).
AKAPDB provides sequence information and diverse function of AKAPs in all reported eukaryotic species.
It contains putative phosphorylation sites, functionally important domains and cellular localizations derived from prediction programs and other databases.
In addition, it provides predicted zebrafish AKAP partial sequences which can be used for morpholino-induced zebrafish reverse genetics.
Morpholinos, chemically modified antisense oligonucleotides (25 bases) which bind and block their specific target mRNA, are an easy and efficient technology of zebrafish target-selected functional study (Nasevicius et al., 2000).
So, AKAPDB provides biologists with information which can be used to the efficient setup of experiments.
AKAPDB can be useful to researchers by providing AKAPs information and integrated understanding of these diverse proteins.Methods.
Raw data and information for AKAPDB were collected from a primary database such as NCBI.
And the pub-et al., 2005).
Both BioPAX (Level 3) and SBML (Level 1) can encode signaling pathways, metabolic pathways, and regulatory pathways, although SBML can represent finer details (Fig.
1).
However, they consortium.
Hong et al.
(2010) investigated the feasibility of using imputed SNPs for a GWAS with blood pressure traits to determine whether they could improve the previous results that were analyzed without imputation (Cho et al., 2009).
Park et al.
(2010) performed a GWAS for blood pressure with 3 different phenotype models: young hypertensive cases versus elderly normotensive controls, the upper 25% versus the lower 25% of SBP distribution, and SBP and DBP as continuous traits.
Kim et al.
(2010c) conducted a GWAS for hematological parameters using copy number variations.
Kwon and Kim (2010) analyzed a GWAS for 3 erythrocyte traits, such as hemoglobin, hematocrit, and red blood cells.
Doo and Kim (2010) conducted an association study between an ADIPOQ gene SNP (rs182052) and obesity in Korean women.
Kim et al.
(2010b) compared the Affymetrix SNP array 5.0 and oligoarray platforms for defining copy number variation.
Oh et al.
(2010) applied a joint identification approach to large-scale GWA data in order to identify genetic variants of obesity for the Korean population.
Kim et al.
(2010a) applied a structural equation model to a GWAS for the following two purposes: to model a complex relationship between a genetic network and traits as risk factors and to achieve de association study for large cohorts in Korea was conducted by the Korea National Institute of Health (KNIH) and reported associations with 8 quantitative traits, including blood pressure (Cho et al., 2009).
The Korean GWAS suggested a total of 14 loci for blood pressure, 2 (ATP2B1 and 10 p15.1) of which were significant association loci (p-value10Methods.
Subjects and their genotypes were reported in a previous genomewide association study (Cho et al., 2009).
Briefly, subjects came from 2 community-based cohorts, the rural community Ansung and the urban community Ansan, in KyungGi-Do province, near Seoul, Korea.
Most DNA samples were isolated from the peripheral blood of participants and genotyped using the Affymetrix Genomewide Human SNP array 5.0 (Affymetrix, Inc., Santa Clara, CA, USA).
After the quality control steps, we finally used estyle factors that are known to cause hypertension include sedentary lifestyle, high stress, high salt intake, and alcohol consumption (Kyrou et al., 2006).
Both blood pressure and hypertension are traditional examples of complex traits controlled by the complex interplay of genes and environmental factors (Pickering et al., 1959).
The heritability of blood pressure ranges from 31% to 68%, depending on studies based on different measurements of SBP and DBP (Pilia et al., 2006; Tobin et al., 2005).Ehret (2010) summarized the 12 genomewide association studies (GWASs) on blood pressure and hyper-to 59 years who participated in the Korean Genome Epidemiology Study (KoGES).
The KoGES were performed as cohorts for chronic disease (diabetes, hypertension, osteoporosis, obesity and metabolic syndrome) in adults aged between 40-69, who were recruited from two community-based epidemiology studies in the rural Ansung and urban Ansan commun-lation.
We carried out GWAS of CNV for the chosen traits and identified several loci associated with these traits.
The loci identified in our study are different from than those reported in a previous GWAS of SNP.Methods.
We collected data from 10,004 individuals in two populations from rural Ansung and urban Ansan cohorts as me Analyzer (GA), Roche/454 FLX system, and ABI SOLiD system, recently have significantly improved throughput and reduced the cost remarkably as compared to capillary-based electrophoresis systems (Shendure et al., 2004).
For example, in a single experiment using the Roche/454 FLX, the sequences of approximately 100 million reads of up to 350 bases in length can be determined (Rothberg et al., r detecting CNVs (Baumbusch, 2008; Curtis and Lynch, 2010; Hester et al., 2009), but there has been no report that validated the accuracy and reproducibility of CNVs identified by Affymetrix SNP array 5.0.
In this study, we compared the characteristics of CNVs from the same set of genomic DNAs detected by three different array platforms; Affymetrix SNP array 5.0, Agilent 2X244K CNV array and NimbleGen 2.1M CNV array.
h substantial changes in three obesity-related phenotypes (Scuteri et al., 2007).
Feitosa et al.
used BMI and performed a linkage study from the National Heart, Lung, and Blood Institute Family Heart Study.
They found significant signals near the leptin gene (chromosome 7q32.3) and chromosome 13q14 (Feitosa et al., 2002).
Liu et al.
identified the CTNNBL1 gene for obesity from US Caucasian and French replication case-control samples.
They also confirmed the previously identified associations between obesity and INSIG2 and PFKP (Liu et al., 2008).
Loos et al.
performed association studies in chromosomal region 18q21 and reported that common variants near the MC4R gene influence fat mass, weight, and obesity risk (Loos et al., 2008).
Thorleifsson et al.
con-om complex interactions among multiple genetic factors and environmental factors.
In this study, we applied Structural Equation Models (SEMs) in order to model complex relationships between genetic networks and traits as risk factors (Bollen, 1989).
The SEM was originally developed in the field of social science to fit a model with unobserved variables.
It is well known that the main advantage of the SEM approach is that it allows us to compare several candidate models.
Our application of the SEM to a GWAS enables us to investigate how each risk factor affects a targeted trait directly or through other variables, and SEM is used to represent the relationship among multiple phenotypes.
In order to choose the components scriptome data, allowing them to get information such as how different alleles of a gene are expressed, detect post-transcriptional mutations or identify gene fusions.
te specificity is determined by allosteric effect due to binding of nucleoside triphosphate to a particular site which is different from the catalytic site.
This specific site binding is a property is unique to RNRs (Jordan and Reichard, 1998).
The class Ia RNRs also has an activity site which controls the overall activity of the enzyme using ATP regardless of the specificity.
Considering the different metallocofactors required by different classes of RNRs it seems that the three classes of RNRs evolved independently.
Despite these differences the similar catalytic mechanism of all the three classes of RNR and presence of d identified total 464 genes whose deletions confer the alteration of sensitivity to rapamycin (Chan et al., 2000; Xie et al., 2005).
Yeast extract-peptone-dextrose (YPD) and synthetic complete (SC) media are commonly used for yeast cultivation, but their compositions are quite different.
Recently, it was reported that chronological lifespan in yeast is differentially regulated between YPD and SC media (Lastauskiene and Citavicius, 2008; Weinberger et al., 2010).
Taking into account the fact that the activity of TOR signaling pathway is dependent on nutrient availability, it is reasonable to assume that media composition would affect rapamycin sensitivity of cells.
While nome may overcome some of this resistance.
Second, a multitarget hit strategy seems to benefit cancer therapy.
As we dissect critical cell cycle, proliferation, apoptosis, angiogenesis, and signaling pathways, we can develop molecules to target them (Bishton et al., 2007).
It is clear that in addition to the effect on histone acetylation, non-histone targets are critical for the effects of these drugs.
Such targets for hyperacetylation include p53, -tubulin, HIF-1, and HSP90 (Bali et al., 2005; Huang et al., 2002; Kawaguchi et al., 2003; Qian et al., 2006).
Finally, the good toxicity profile of these agents make them suitable for combina-s, the association between polymorphism and the risk of lung cancer has not yet been fully clarified.
This study was undertaken to examine the polymorphism of PIK3CA, and its relation with the risk of lung cancer in Korean population.
To our best knowledge, this is the first report on the polymorphism of PIK3CA in lung cancer patients in Korean population.Methods.
Between August 2001 and November 2007, blood sam-ge of Medicine, CHA University from January 2000 to August 2003, (Seoul, Korea).
Semen analysis was performed with infertile males further sub-divided following semen analysis performed strictly according to the World Health Organization (WHO) guidelines (WHO) (2001).
The sub-division n increased risk of IS in the Han Chinese population (Zhang et al., 2010).
Kim et al.
(2010) suggested that the 399C/T neuropeptide Y (NPY) promoter polymorphism should be considered a genetic risk factor for IS in older adult and female Koreans.
Munshi et al.
(2010) also suggested that the 344T allele of aldosterone synthase is an important risk factor for hypertension and IS.
Shi et al.
(2009) showed that the 863C/A SNP of tumor necrosis factor (TNF) is associated with an increased risk of idiopathic childhood IS.
Reuter et al.
(2009) reported the association between the 261G/A of TIMP metallopeptidase inhibitor 2 (Kattan et al., 1999; Stephenson et al., 2005a), vulvar cancer (Rouzier et al., 2006), osteosarcoma (Kim et al., 2009), renal cancer (Karakiewicz et al., 2007), breast cancer (Rouzier et al., 2005), and advanced Non-Small-Cell Lung cancer (Hoang et al., 2005).
Furthermore, nomograms have been shown to be superior to the traditional staging systems in predicting the features of various cancers (Hoang et al., 2005; Karakiewicz et al., 2007; Kattan et al., 1999; Kim et al., 2009; Rouzier et al., 2006; Rouzier et al., 2005; Stephenson et al., 2005a).
ependent statistical tests.
In order to minimize false positives due to multiple testing, one typically imposes a very stringent cutoff for the statistical significance level.
However, it is still imaginable that some false positives slip through the threshold.
For example, there are a few orders of magnitude more markers than samples, and some peculiar sampling may cause bias in the population structure, causing spurious associations.
In order to overcome such problems, the results from a GWAS are typically reexamined by a replication study employing a dataset from independent cohorts.
In general strongly associated signals are well replicated in other rder to model higher-order structures of biological sequences, we need more powerful grammatical systems based on formal language theory, as a biological sequence can be thought of as a richly-expressive language for specifying the structures and processes of life.
Searls (1988) initiated pioneering works to view biological sequences cytes, and T lymphocytes (Daugherty et al., 1996; Sallusto et al., 1997; Uguccioni et al., 1997).
In the human genome, the gene encoding eotaxin is located on the short arm of chromosome 17 (17q21.1- q21.2), where most of the chemokine genes are clustered, and comprises three exons and two introns.
These CC chemokines are known to act as chemoattractants and activate inflammatory leukocytes, especially lymphocytes, monocytes, eosinophils, and basophils, as well as some stromal cells (Alam, 1997; Corrigan, 1999).
The eotaxin gene family can be divided s et al., 1998).
Yoon et al.
(2006) constructed an application tool to provide the present large-scale approach for the analysis of GEO microarray data.
Though GEO has abundant practical possibilities, it does not support the standard format or model but its self-structured format.
Moreover, it stores the microarray data without distinguishing processed data and raw data.
These factors hinder providing a comprehensive biological analysis environment and future integration of summary expression value.
In practice, with our data we have found that they also yield somewhat unsatisfactory results.
Many genes on the resulting lists have relatively flat, uninteresting profiles, and yet, many genes filtered from the list have relatively structured, interesting profiles.
This has motivated us to consider using the PM value in addition to the summary expression value to evaluate the significance of genes.
Table 1 and Table 2 show ANOVA results for two sample genes, 265670_s_at and 249093_at.
The p-val-iRNAs studied so far are usually not exactly complementary to their mRNA targets, and seem to inhibit protein synthesis while retaining the stability of the mRNA target (Ambros, 2004).
It has been suggested that transcripts may be regulated by multiple miRNAs and an individual miRNA may target numerous transcripts.
MiRNA identification is an essential requirement for understanding the mechanisms of post-transcriptional regulation.
Consequently the prediction of the potential targets also becomes an essential parameter in the un-ted measures that define personality traits.
Then, we summarize the characteristics of the candidate genes for personality traits and investigate gene variants that have been suggested to be associated with personality traits for individuals with behavioral outcomes and other psychiatric disorders.
The purpose of this review is to improve our understanding of the genetic basis of personality.Self-Reported Personality Measures.
Modern personality research focuses primarily on personality trait dimensions of variation between individuals who are relatively stable over time and predict behavior in various domains (Verweij et al., 2010).
According to the personality theory of Eysenck, there are three broad personality factors, named Neuroticism, Extraversion- teoporotic fractures (e.g., JAG1) (Guo et al., 2011; Kou et al., 2011; Kung et al., 2010).
According to data from a large international consortium involving prominent Eurocentric research groups, previously known common genetic variants have shown conflicting results, with differential genetic effects and limited replication between ethnic populations (Cooper et al., 2008; Ioannidis et al., 2004; Ralston et al., 2006).
The European-specific variants remain to be validated in East Asian populations to overcome ethnic differences ip between BMI and body fat are weak in older adults.
Methods.
For GWAS, subjects were recruited from Ansung (n=5,018) and Ansan (n=5,020) population-based cohorts, established as part of the Korean Genome Epidemiology Study (KoGES) in 2001.
Both were designed to allow longitudinal prospective studies and 2007).
Thus, the genetic cause of MA is still unknown.
However, the predominant occurrence of MA in males suggests that gene(s) in the X chromosome may play a role in MA (Misra et al., 2005).
Thus, in this study, we tested whether genetic variants found in two X-linked neurological candidate genes-androgen receptor (AR) and ubiquitin-like modifier activating enzyme 1 (UBA1) -are involved in the development of MA in Korean patients.
genomic selection practices in dairy cattle has been reported (Cole et al., 2009; Hayes et al., 2009).
Genomic selection requires a careful understanding of the level of SNPs in each cattle breed.
The SNPs in public databases are not validated, and the level of polymorphisms is unknown for many cattle breeds.
Therefore, in this study, we evaluated SNP frequency spectra and the rate of polymorphisms in the Hanwoo and Holstein breeds using the Bovine SNP50 BeadChip genotyping array to facilitate genomic evaluation and selection of the Hanwoo breed.to enable a more detailed analysis.
Chaperones are known to protect the aggregation of misfolded proteins by binding and aiding the recycling of the folding process, especially in the endoplasmic reticulum (ER) during protein synthesis.
The delineation of correct- and misfolded states by chaperones suggests a conundrum because many more proteins exist than chaperones and related molecules.
Complete recognition of a correctly folded structure by a structural protein-protein binding site interaction is almost impossible because one protein might possess numerous structural characters.
Accordingly, a possibility exists that the globular character might be the checkpoint of the correct folding in biological organisms.
This assumption is supported by the aggregation of misfolded proteins because non-globular proteins tend to bind more tightly sequencer.
Sequence analysis was carried out with the software program DNASTAR (DNASTAR, Inc., USA).Multiple sequence alignment was carried out by using the MEGALIGN program of the DNASTAR package by Clustal method.
The PHD secondary structure prediction and prediction-based threading methods (Rost, B. and titutes in Korea.
In addition, genomic information was accumulated and collected through several collaborative institutes and public international institutes.
The integrated biotechnology database is designed to provide information on the genomes of agricultural crops.
This database (http://nabic.naas.go.kr/) has six major the high complexity, large in amount, and changeable property, we propose a clustering algorithm which uses an algorithm that imitates the ecosystem.Particle Swarm Optimization.
The Particle Swarm Optimization (PSO) algorithm (Parsopoulos et al., 2002) that our system uses is a stochastic, population-based computer problem-solving ve both pro- and anti-inflammatory cytokines, it has been proposed that anti-inflammatory cytokines, such as insulin-like growth factor-1 (IGF-1), IL10, IL13 and IL4, play a role in the pathogenesis of insulin resistance and T2DM.
However, the underlying mechanism of their disrupted functions in chronic inflammation in relation to T2DM remains to be defined.
Interestingly, there has been a report showing disruption of anti-inflammatory responses in type 2 diabetic model animals, suggesting a role of anti-inflammatory processes in T2DM (O'Connor et al., 2007).rom the Korean Genome and Epidemiology Study (KoGES) which is an ongoing prospective community-based epidemiological study in the communities of Ansung (rural) and Ansan (urban) (Table 1).
Details of the KoGES and the methods are described in our previous report (Cho et al., 2006; Cho et al., 2009; Kim et al., 2005; Lim et al., 2006).
In brief, eligible subjects (age 40-69 years) were examined in 2001-2002 for demography and epidemiology and then follow up biannually.
A total of 3,588 men and that the comparison of target gene expression needs particular internal controls that are suitable in different tissues or different physiological conditions in human and other organisms (Selvey et al., 2001; Vandesompele et al., 2002; Lossos et al., 2003; Filby and Tyler, 2007; Theis et al., 2007; Rho et al., 2010).
Among housekeeping genes, GAPDH has been demonstrated to be a good internal control in tumor cell lines (Janssens et al., 2004), while HPRT has been identified as a good reference gene for cancer research when comparing solid tumor tissue samples with normal tissue samples (de Kok et al., 2005; Ohl et al., 2005).
Thus, it is recommended that suitable internal controls should be determined for accu-n position and velocity based on these good positions.
Globally best positions that affect all particles as well as locally best positions which affect a subset of neighboring particles are remembered.
As several iterations continue, the globally best position is updated and the final result gives the best particle position.PSO-based classification system.
The PSO algorithm was used to develop the DNA chip page.
The gene view and the condition view show the methylation level of CpG sites for only a selected gene and a selected condition, respectively.
And the full view shows the methylation level of CpG sites for all genes and conditions in a single page.
Coloring of each cell on an image, which corresponds to a CpG site, is determined by the degree of two properties, coverage and methylation level, which are described by two bars on the left top parts of images in the HTML files.
The color intensity represents the level of coverage and the height of the dark represents the de- more individuals who might have a CNV.Methods.
An overview of the program processes is shown in Fig.
1.Phasing amounts to deduce which parental allele has been transmitted.
If a parent has a homozygous genotype and thus the same allele, it is immaterial which one has been transmitted.
As we know which allele of a child has been transmitted from the homozygous parent, gulator approaches (de Jong, 2002; Hasty et al., 2001).
Gene set approaches, such as network modeling, can be utilized when information on the genetic pathway is lacking.
Various studies have contributed to determining gene- gene interactions by constructing gene network models using several statistical approaches (Markou and Singh, 2003a).
Some studies have applied computational methods, such as neural network models (Markou and Singh, 2003b; Vohradsky, 2001).
Gene network models that are based on statistical and computational approaches have been helpful in detecting the "lines" of gene-gene interactions.
However, exploring the biological or functional directions of network lines still remains a challenge, be-Toward these goals, we performed a three- stage analysis.
In the first stage, we conducted a genome-wide survey of CNVs in 3,274 healthy Korean participants as part of the Korean Genome Epidemiology Study (KoGES) using the Affymetrix Genome-Wide Human SNP Array 6.0 platform.
Using stringent quality control, we ascertained 2,206 CNVRs.
In the second stage, we carried out a logistic regression analysis of CNVs with two hypertension-related traits, BP and BMI.
|