Results and Discussion Discovery of ethnicity-specific SNPs We identified ethnicity-specific SNPs by eliminating common SNPs from HapMap samples and mapped the SNP positions to the UCSC RefGene lists. 22, 25, three 332 genes were identified in the CEU, in the JPT individuals, and in the YRI individuals, respectively (Fig. 1). Comparison of the three sets showed that YRI individuals had a biased order of SNP-based genes. This result was a consensus among previous evolutionary findings. CEU and JPT belong to the same cluster, together with Amerindians and Australopapuan, while YRI belongs to a separate cluster showing the first split between Africans and non-Africans [16, 17]. African populations subdivided from other sub-Saharan African populations, and a small subset of this population migrated out of Africa in the past 100,000 years. African and non-African populations divided in the past 40,000 years. Phylogenetic analysis of Y chromosomal haplotypes, mtDNA, and autosomes are indicative of the longest history of population subdivision in Africa. Africans are the most ancestral population in human and have fewer sites in linkage disequilibrium, compared with non-African populations [18]. To explore the meaningful biological information of structural variations, we performed GSEA for the SNP-based genes using GO categories (BP, CC, and MF) in the DAVID tool. The significantly categorized functions (p < 0.01) of SNP-based genes for YRI are shown as pie charts in Fig. 2, but none was significantly enriched for CEU and JPT. Six groups of BP and four groups of MF had significant enrichment score ranges of 1.67-4.85 and 7.16E-04-0.002, respectively. The top pie chart in BP presents G-protein-coupled receptor protein signaling pathway, including chemotaxis, and defense response to bacterium (Fig. 2A). In the enriched region, 8% of BP was chemotaxis (GO: 0006935) with an enrichment score of 3.88. Chemotaxis contributes to enhancement of disease aggressiveness in African-Americans [19]. The MFs that were significantly enriched were G-protein-coupled receptor activity and binding, olfactory receptor activity, and transmembrane receptor activity (Fig. 2B). Enriched functions in cellular components were keratin filament (GO:0045095) with an enrichment score of 5.86, which contained the KRTAP gene family (KRTAP12-3, KRTAP4-11, KRT14, KRTAP4-4, KRTAP9-8, KRTAP10-7, KRTAP10-8). KRTAP genes are up-regulated in white hair than in black hair by a microarray analysis. Immunoreactivity for KRTP genes in white hair follicles was increased compared with black hair. Therefore Choi et al. [20] suggested that greying hair, a sign of aging, is associated with hair growth rate. Semantic modeling for ethnicity-specific SNPs Semantic modeling is an emerging method for comprehensively understanding complicated BPs and spacious networks [7]. The continuous production of increasingly large-scale data in biology needs better visualization of complex and huge biological data. We constructed a semantic network model in order to analyze biologically functional implications for ethnicity-specific SNPs. Overall, network entities were used, such as "Gene" (records, 46,354), "Pathway" (records, 362), "Disease" (records, 9,647), "Chemical" (records, 153,021), "Drug" (records, 6,712), "ClinicalTrials" (records, 1,273), and "SNP" (records, 379), and pairwise relationships between entity-entity were curated as "Gene-Pathway" (records, 46,354), "Gene-Disease" (records, 18,391,755), "Gene-Chemical" (records, 308,405), "Disease-Chemical" (records, 401,145), "Disease-Pathway" (records, 43,139), "Chemical-Pathway" (records, 196,073), "Chemical-Drug" (records, 1,702), "SNP-Gene" (records, 379), "ClinicalTrials-Drug" (records, 1,419), and "ClinicalTrials-Disease" (records, 1,210). Entities, including "Pathway," "Chemical," and "Disease," were collected from the Comparative Toxicogenomics Database (CTD) [14, 21], which is a public database to promote the understanding of the interaction of genes, chemical compounds, and disease networks in human health. Drugs were mapped from DrugBank [15, 22], which provides detailed drug action information. We linked a novel relationship for "Chemical-Drug" and "Gene-SNP" by curating the relationship of entities using Python ver. 2.6, and the remaining relations were collected from the CTD. Fig. 3 shows that semantic modeling of ethnicity-specific SNPs is dynamic and flexible. Hierarchy structure is where the parent can have one child, while in Directed Acyclic Graph (DAG) networks, like BioXM, the parent can have more than one child. For example, Gene A is associated with Chemical B or Pathway C. Also, Gene A is associated with Drug C, because Gene A is a curated interaction with Disease B, and Disease B is a curated association with Drug C. Ethnicity-specific SNPs reveals association with 3 diseases and 1 drug Diseases and drugs are very clinically important for understanding ethnic disparities. Many diseases and drugs have been reported to be involved in ethnic disparities, disease susceptibility, drug response, and disposition [23, 24, 25]. We curated "SNP-Gene-Disease-Chemical-Drug" interactions in the semantic networks for ethnicity-specific SNPs. Using these semantic "Gene-Disease" networks, we analyzed the functional implications of ethnic variants. There were 123 diseases associated with ethnicity-specific SNPs in common populations, 3 CEU-specific, and 46 YRI-specific, but JPT had no specified disparity between different ethnic populations (Supplementary Fig. 1A). Three diseases associated with CEU-specific SNPs were shown as phantom limb (MESH:D010591), trochlear nerve diseases (MESH: D020432), and vulvitis (MESH:D014847), while diseases associated with YRI-specific SNPs were observed, such as acquired immune deficiency syndrome (AIDS)-associated nephropathy (AIDSAN), hypertension, primary amyloidosis, and pelvic infection. AIDSAN (MESH:D016263) rates are higher in African-Americans than whites. Although the mortality and morbidity from AIDS infection are reduced, AIDSAN remains a major complication of AIDS infection (http://statgen.ncsu.edu/). Hypertension (MESH:C537095) is a disease threatening the public health in sub-Saharan Africa. In some areas, blacks exhibit higher rates of hypertension than whites. Increased salt intake and obesity are the leading causes of the prevalence of hypertension in Africa [26]. Pelvic infection (MESH:D034161) is a kind of inflammatory disease that blacks are more prone to take than other ethnic groups [27]. By applying the "SNP-Gene-Disease-Chemical-Drug" model, 2 and 14 drugs were revealed with CEU-specific and YRI-specific groups, but JPT-specific drugs had no results (Fig. 4, Supplementary Fig. 1B). One drug (methylphenidate, DB00422) was reported to have ethnic disparities in previously drug studies. The mean dose of methylphenidate is about 1.5 times higher in African-Americans than whites [28], and its use is steadily increasing in South Africa [29]. Ethnicity-specific associations with 5 pathways Analysis using the semantic model for ethnicity-specific SNPs identified 5, 7, and 100 CEU-specific, JPT-specific, and YRI-specific biochemical pathways, respectively. In hemostasis (REACT:604), associated with cardiovascular diseases, plasminogen activator inhibitor-1 activity levels of Africans are lower compared to Caucasians. These negative effects can be seen already at a young age. If addressed in early life, it is possibly adjustable through behavior and optimal dietary changes [30]. Systemic lupus activity measure (SLAM; KEGG:05322) scores were higher in African-Americans (mean = 12.6) and Hispanics (11.0) than in Caucasians (8.5). It caused lack of health insurance, onset of abrupt disease, presence of anti-Ro (SSA) antibody, absence of HLA-DRB, high levels of helplessness, and abnormal illness behaviors. Caucasians lived under less crowded conditions, had less abnormal illness behaviors, and had more education. The results of the regression analyses were showed significant association between higher SLAM scores and higher helplessness, absence of HLA-DRB1*0301, and presence of HLA-DRB*0201 (p < 0.01) [31]. Prostate cancer (KEGG:05215) is a diagnosed male reproductive system cancer. Incidence of prostate cancer in African-American men is higher than in European men (1.6 times). Amundadottir et al. [32] identified that the chromosomal 8q24 region is most frequently gained in prostate cancers, and this gained region has been correlated with aggressive tumors [33]. Estimated population attributable risk is greater in Africans than in European populations. Hepatitis C virus (HCV; KEGG:05160) is a major cause of chronic liver disease in humans. Rates of HCV prevalence in sub-Saharan Africa are the highest in central Africa (3.0%) compared with the median (2.2%). Conjeevaram et al. [34] showed that African-Americans with chronic HCV have lower response to interferon-based antiviral therapy than Caucasian Americans [35]. Rheumatoid arthritis (RA; KEGG:05323) is an autoimmune disease and may affect many organs. The RA prevalence in urban South Africans is similar in Caucasians [36]. In the current study, the pathways shared between all populations were signal transduction (REACT:111102), olfactory transduction (KEGG:04740), and metabolic pathways (KEGG:01100). These pathways were common disease-pathway interactions in previous research. Although ethnicity-specific genes are identified in each population, it is generally observed that genes that are associated with a trait or disease can converge to the same pathway [37]. Those genes are also supposed to converge to common pathways shared between all populations. Therefore, a pathway-based approach allows us to systematically evaluate multiple polymorphic genes from different populations with respect to pathways as a biological unit [38]. Moreover, the pathway-based approach has more capability to detect rare genetic variants with a small effect that do not survive at the stringent significance level [39]. We identified ethnicity-specific SNPs from HapMap data and constructed a semantic network model for the HapMap SNP dataset. Functional studies were analyzed with genebased ethnicity-specific SNPs. Our semantic network model showed robust interactions between ethnic-specific SNPs and public data. However, this model is still in the early stage, and greater data connection and development of more flexible algorithms are required. We expect that our semantic network model is useful for ethnicity-specific SNPs, and our findings will provide prioritization of ethnicity-specific gene-based SNP candidates.