@ewha-bio:122
|
D2GSNP: a web server for the selection of Single Nucleotide Polymorphisms within human disease genes
D2GSNP is a web-based server for the selection of
single nucleotide polymorphisms (SNPs) within genes related to human diseases. The D2GSNP is based on a relational database created by downloading and parsing OMIM, GAD, and dbSNP, and merging it with positional information of UCSC Golden Path. Totally our seiver provides 5,142 and 1,932 non-redundant disease genes from OMIM and GAD, respectively. With the D2GSNP web interface, users can select SNPs within genes responding to certain diseases and get their flanking sequences for further genotyping experiments such as association studies.
The importance of single nucleotide polymorphism (SNP) analysis has been increased continuously as the increasing requirements of numerous applications in complex genetic disease, pharmacogenomics, population genetics, and evolutionary studies (Gray et al., 2000; Marnellos, G., 2003; Mooser et al., 2003; Hacia et al., 1999). Currently over 10 million human SNPs have been deposited into dbSNP database (Sherry etal., 2001) and many companies have developed whole genome platforms for genotyping SNPs such as Affymetrix GeneChip, and Illumina BeadArray systems. Although the genotyping costs are rapidly decreasing, choosing effective SNPs within candidate genes is still important and critical to
complex disease studies which needs numerous SNPs and large sample sets to maximize statistical power (Day, 2005; Hirschhorn, 2005).
We have developed a web server, D2GSNP, to support the selection of SNPs within human genes related to diseases (Fig. 1). Through the web interface, researchers are able to retrieve genes for queried diseases and SNPs within those genes intuitively. For example, a researcher who is interested in several diseases with limited resources may use D2GSNP to pick a gene-dense region of chromosomes with a few clicks and select effective SNPs for his/her disease association study. To automate the selection step, D2GSNP constructed a local relational database which integrated four public databases, OMIM (Online Mendelian Inheritance in Man), GAD (Genetic Association Database), dbSNP, and UCSC GoldenPath. The web based interface was implemented using JavaServer Faces (JSF) technology which has an advantage of constructing a clearly defined architecture by separating application logic and presentation. It helps the rapid construction of web services and lowers the cost of maintenance.
Previously, there have been many alternative methods to identify human genes related to monogenic or complex diseases (Perez-lratxeta etal., 2002; Lopez-Bigas etal., 2004; Tiffin etal., 2005). We constructed highly accurate gene maps of diseases based on two manually curated databases including OMIM and GAD. OMIM is a catalog of human genes and genetic disorders (Hamosh et al., 2002) and GAD is an archive of human genetic association studies of complex diseases and disorders (Becker etal., 2004). In detail, we use OMIM Morbid Map which is the cytogenetic map location of disease genes described in OMIM. Among GAD genes, we used genes which showed at least one positive association to a disease. As a result, the number of disease-gene records and non-redundant gene counts in OMIM were 4,058 and 5,142, respectively. There were 8,179 disease-gene records and 1,932 non-redundant genes in GAD.
In order to map SNPs to disease genes, chromosomal position of gene boundaries were retrieved from knownGene,
which provides the information of protein coding genes based on proteins from Uniprot andtheir corresponding mRNAs from GenBank. To link knownGenes with gene symbols from OMIM and GAD, UCSC’s table kgXref (known genes to external reference) was used. However, a few gene symbols from OMIM and GAD did
not match with those from kgXref. Some of these orphan gene symbols were finally linked through gene alias table based on Entrez gene which covers more gene symbols. Our system is based on hg17 in UCSC GoldenPath and Build 125 in dbSNP. Note that the number of SNPs in our database is lower than the total
number of SNPs in dbSNP, because some of them have not been mapped to a unique position on the human genome yet.
The general purpose of D2GSNP is to select SNPs within human genes related to user-queried disease. SNPs can be filtered through their variation types, validation status, minor allele frequency, and functional classes. D2GSNP also provides flanking sequences of selected SNPs for genotyping experiments. Flanking sequences of SNPs with user-defined length are provided in a FASTA format file. Fig. 1D shows an example of filtered SNPs. Usage details of the D2GSNP are available in the on-line help page.
|
Annnotations
- Denotations: 0
- Blocks: 0
- Relations: 0