Statistical analysis All analyses were conducted using PLINK version 1.06 (Free Software Foundation, Inc., Boston, MA, USA) and SAS statistical software version 9.0 (SAS Institute Inc., Cary, NC, USA). All statistical tests were two-sided, and statistical significance was determined as p < 0.05. To evaluate general characteristics of the study population, means and standard deviations (SD) were calculated, and frequency of cigarette smoking, alcohol consumption, and physical activity was determined. Paired t-tests were performed to indicate the differences between case participants and control participants for both men and women. A X2 goodness-of-fit test was used to assess whether SNPs were in HWE and to determine differences in genotype frequencies between CRC cases and controls. The GRS was categorized into quartiles. The CRC risk associated with genotype was estimated as s ORs and 95% confidence interval (CI), computed using logistic regression with an additive genetic model. We also used receiver operating characteristic (ROC) curve analysis and calculated the area under the curve (AUC; also known as the C statistic) to evaluate the discrimination power of the model. In addition, internal validity of each model was checked using bootstrap [35], while 10-fold crossvalidation was used for the external validity of each model (Supplementary Tables 3 and 4) [36].