Methods Clinical characteristics and study genotypes This study analyzed cohort data that comprised the Anseong and Ansan population study in the KARE projects. Anseong is a rural area, and Ansan is a city. Both areas are in Gyeonggi-do. Citizen of these two cities have different lifestyles, and they are exposed to different environment. Detailed information of the KARE data was reported [10]. The KARE data included 8,842 individuals, 352,228 SNPs, and 277 phenotypes. Among 8,842 total individuals, we divided patients and normal subjects for a control and case study using a positive diagnosis experience of gastritis. There were 1,885 patients and 6,957 normal subjects. First, for selecting the case, we eliminated 104 patients who were diagnosed under age 20 or had unknown age. Then, 1,781 patients remained. Of these, 804 patients were men, and 977 patients were women. Of 6,957 normal subjects, no one was aged under 20; 3,335 were men, and 3,622 were women. Among 352,228 SNPs, we excluded 3,044 SNPs based on the Hardy-Weinberg equilibrium test for quality control. After frequency and genotyping pruning, 349,184 SNPs remained. Among 277 total phenotypes, we filtered missing phenotypes and low genotyping rates. Then, 120 clinical characteristics remained. We also eliminated gastritis phenotype variables and unknown drug information variables; 101 clinical characteristics remained. Statistical analysis For data filtering and finding significant SNPs, we used PLINK version 1.07, that is a tool made for analyzing whole-genome association using computational methods [11]. We used the default options of PLINK [11], and we analyzed phenotypes by logistic regression test for classifying patients and normal subjects and estimating factors. We also assessed the result factors of the logistic regression by Student's t-test for revealing meaningful differences between patients and normal subjects using R version 3.0.2 for finding gastritis-associated factors. Then, we used the receiver operating characteristic (ROC) curve and area under the curve (AUC) scores to confirm the prediction ability of the factors.