Discussion
Both designs of the case-control study and case-parent study are used widely in the field of genetic epidemiology for studying associations between genetic factors and the risk of disease. Over the past 2 decades, there has been a steep increase in the number of genetic association studies, and these studies have successfully reported a number of gene variants associated with human complex diseases [1, 4, 5, 28]. Recently, GWASs, a new frontier in genetic epidemiology, have identified thousands of new gene variants related to human diseases [29]. The population-based studies with a large sample size have increased statistical power, which leads to smaller variance. However, it requires too much money and takes too long to collect a sufficient number of samples, and these large-scale studies are more likely to be affected by systematic bias and noise [25, 30].
In the current study, we demonstrated the effective sample sizes that are required to achieve 80% statistical power for a case-control study and case-parent study separately under various assumptions regarding effect size, MAF, disease prevalence, LD, case-to-control ratio, and number of SNPs. A lower sample size is required under the dominant model in any assumption, while the recessive model requires too many samples under the same assumptions to achieve adequate statistical power. Further, we confirmed that a lower sample size is required for testing more common SNPs with stronger effect sizes and increased LD between marker allele and disease allele. A lower sample size is required to study a common disease than a rare disease. The statistical power increases by increasing the number of controls per case; however, a case-to-control ratio exceeding 1:4 does not yield a significant increase in statistical power. Among the parameters tested, under the assumption of a high level of LD between a marker and disease variant, a much reduced sample size is needed to detect evidence for association. Common variants are more informative than rare variants in LD-based indirect association studies on complex diseases. It means that researchers can reduce the cost by choosing common variants to be genotyped at the design stage of an LD-based association study. In general, the case-control study design is more powerful than the case-parent study design [8, 10]. Since patients with family histories of the disease are more likely to inherit disease-predisposing alleles than patients without family histories of the disease, researchers can improve the statistical power by sampling patients with affected relatives and by comparing to controls without any family history in case-control association studies [13].
Genome-wide case-control studies have been used to identify genetic variants that predispose to human disease with model assumptions for parameters, such as the inheritance model. Such studies are powerful in detecting common variants with moderate effect in the occurrence of a disease; however, a study with a large number of SNP markers by using 500 K or 1 million chips requires a large number of samples (e.g., thousands of cases and controls) to achieve adequate statistical power. A researcher can rarely successfully conduct a large-scale association study without collaboration using a high-throughput microarray chip, in which most embedded SNPs reveal a small effect size between 1.3 and 1.6 [31]. Therefore, researchers who are planning a genetic association study must calculate the effective sample size and the statistical power in the design phase to perform a cost-effective study that reduces false negative and false positive test results. Although we could not cover all plausible conditions in study design, the estimates of sample size and statistical power that were computed under various assumptions in this study may be useful to determine the sample size in designing a population-based association study.