PMC:4572492 / 32842-49530 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/4572492","sourcedb":"PMC","sourceid":"4572492","source_url":"https://www.ncbi.nlm.nih.gov/pmc/4572492","text":"Appendix A\n\nImplementation: Steps for Applying the JLS Testing Framework\nNote that the following implementation is appropriate for testing association of a phenotype with a genotyped genetic variant (e.g., SNP) or an imputed variant with a “hard call” (i.e., assign individual to the genotype with the highest posterior imputation probability; high genotype uncertainty does not affect type 1 error but will decrease power), using a sample of unrelated subjects.1. Check the phenotype of interest for fit to a normal distribution. If required, adjust the phenotype using a suitable transformation, e.g., inverse normal transform. If the researcher proceeds using a non-normal phenotype, only the permutation (resampling-based) p value analysis will be valid (see step 4b).\n2. Choose the individual location and scale tests based on the distribution of phenotype (normal or non-normal) or preference (for example, parametric or non-parametric versions of each test). In the present paper, our phenotypes were normally distributed (after transformation) and we chose linear regression and Levene’s test for the location and scale tests, respectively.\n3. Choose a JLS testing method of combining information from the individual location and scale tests and calculate the JLS test statistic. We acknowledge that there is no “most powerful” method for all situations in practice. Based on our experience, we recommend the use of Fisher’s method (JLS-Fisher) of combining the association evidence:\nWF=−2(log(pL)+log(pS)),where pL and pS are the individual location and scale test p values, respectively.4. Chose the p value estimation method for the JLS statistic.(a) Based on the approximate asymptotic distribution of the JLS test statistic: For the JLS-Fisher example, WF is distributed as a χ42 random variable, if the chosen individual location and scale tests are independent of each other under the null hypothesis. This assumption is correct if the trait is normally distributed and if the location-only test statistic is a function of the complete sufficient statistic (e.g., linear regression t-statistic, ANOVA F-statistic) and the distribution of the scale-only test statistic does not depend on the model parameters (e.g., Levene’s test or the F-test for equality of variances).\n(b) Based on resampling methods such as permutation:• Calculate the observed JLS test statistic, e.g., WF\n• Choose the number of permutation replicates, K, based on the desired p value accuracy.\n• Permute the phenotype independently K times (not valid if subjects are correlated with each other), and for each replicate k, recalculate the JLS test statistic, WFk, k = 1, …, K.\n• Obtain the permutation p value as [the number of WFK\u003eWF]/K.\n\nSimulation Models\nThe following three models, as previously considered by Aschard et al.,16 were used to simulate the data:Model1:E[Y]=βGG+βE1E1+βGE1G⋅E1Model2:E[Y]=βE1E1+βE2E2+βGE1G⋅E1+βGE2G⋅E2Model3:E[Y]=βGE1G⋅E1\nFor all three models, the observed genetic variant (G) was coded additively with minor allele frequency (MAF) of 0.3. Y was simulated from models with varying effects (βs) and residual variation (ε) following a standard normal distribution (mean = 0, standard deviation = 1).\nModel 1 is analogous to Equation 1 where Y depended on the main effects of both G and E1 and an interaction effect between G and E1. The unobserved exposure variable E1 was binary with frequency 0.3. The main genetic effect βG took on values of 0.01, 0.05, and 0.1, and the interaction effect βGE1 was varied between −1 and 1 by a grid of 0.1. The main exposure effect βE1 was fixed at 0.3 when βGE1 was positive and −0.3 when βGE1 was negative.\nFor Model 2, Y was a function of main effects due to two unobserved exposures (E1 and E2; both binary with frequency 0.3) and interaction effects between the exposures and G. βGE1 was always positive and less than 1, whereas βGE2 was varied between −1 and 1 by a grid of 0.1. βE1 was fixed at 0.3, whereas βE2 was fixed at 0.3 when βGE2 was positive and −0.3 when βGE2 was negative.\nFor Model 3, Y depended only on the interaction between G and E1. For this model, the interaction effect βGE1 and exposure frequency were chosen such that the observed marginal effect of G was fixed at 10% of the trait standard deviation.\nIn all cases, the working association model corresponded to Equation 2 because information on E1 and E2 was assumed to be unavailable.\nTo assess the type 1 error level of the joint location-scale methods at 0.05, 0.005, and 0.0005 levels, we simulated 100,000 replicate samples of n = 2,000 subjects each from the null model with no genetic association (i.e., βG = 0 and βGE = 0). (Results of n = 1,000 and 4,000 are qualitatively similar.) To examine the behavior of the testing methods under small group sizes, we conducted additional simulations under varied MAF (0.3, 0.2, 0.1, 0.05, and 0.03) as well as under fixed genotype group sizes where the rare homozygote group size was small (2, 5, 7, 10, 15, or 20) with the other genotype group sizes determined with respect to Hardy-Weinberg equilibrium. For comparison, empirical type 1 error rates of the individual location-only and scale-only tests are also studied, in addition to the JLS-Fisher and JLS-minP tests, and the LRT of Cao et al.;17 the distribution test of Aschard et al.16 has the correct type 1 error by design. Type 1 error control at the genome-wide level was assessed by phenotype-permutation analysis of the 866,995 T1D GWAS SNPs and 565,884 CF GWAS SNPs.\nFor the sensitivity analysis of genotype imputation uncertainty, simulated true genotypes were converted to probabilistic genotype data using a Dirichlet distribution with scale parameters a for the correct genotype category and (1 − a)/2 for the other two;39 a was fixed at values of 1, 0.9, 0.8, 0.7, 0.6, and 0.5. Based on the simulated posterior probabilities, the most-likely genotype for each subject was the genotype with the highest posterior probability (i.e., the “hard call”); the incorrect call rate under this Dirichlet model ranges from 0% to 50% on average. The most-likely genotypes were then used to assess type 1 error control at the 0.05, 0.005, and 0.0005 levels, using 100,000 simulated replicate samples of n = 2,000, under the null model of no genetic association (i.e., βG = 0 and βGE = 0), and MAF = 0.3 for each level of genotype imputation uncertainty (a).\nFor power evaluation, as in Aschard et al.,16 the results presented focused on MAF = 0.3 and n = 2,000 for Models 1 and 2 and n = 4,000 for Model 3. Power (at the 5 × 10−8 level) was estimated from 500 replicates, based on asymptotic p values of the tests considered, with the exception of the distribution test. For the distribution test, p values required estimation by permutation, and corresponding power results were from Aschard et al.,16 kindly provided by Drs. Aschard and Kraft.\n\nLemma 1: Independence of Location-only and Scale-only Test Statistics under the Null Hypothesis for Normally Distributed Traits\nLet TLocation=βˆ1/(S/Sxx) be our location-only test statistic, testing the linear effect of x on Y in a sample of size n, where S2=(1/(n−2))∑(yi−βˆ0−βˆ1xi)2, βˆ0=y¯−βˆ1x¯, βˆ1=Sxy/Sxx, Sxy=∑(xi−x¯)(yi−y¯), and Sxx=∑(xi−x¯)2 (x = G in Equation 2), and let TScale be our scale-only test statistic, here defined as Levene’s test statistic for equality of variances.14\nLemma 1: For the conditional normal model Yi∼N(β0+β1xi,σxi2), where xi = 0,1 or 2, TLocation and TScale are independent if σ02=σ12=σ22.\nProof: For fixed x, Y is normally distributed with constant variance σ2 and mean E[Y|x]=β0+β1x. The density of Y is(2πσ2)−n/2exp[−12σ2∑(yi−β0−β1xi)2].\nThis is an exponential family with three parameters θ=(θ1,θ2,θ3)=(β1/σ2,−1/2σ2,β0/σ2) for which the sufficient statistics T=(T1,T2,T3)=(∑xiyi,∑yi2,∑yi) are complete. If σ02=σ12=σ22, TScale is approximately distributed as a F3−1,n−3 variable, and it does not depend on θ (i.e., TScale is ancillary for θ). Thus, TScale is independent of T (see page 152 in Lehmann and Romano40). Because TLocation is a function of T, TLocation and TScale are therefore independent under the null.\nNote that the proof of independence holds regardless of the version of Levene’s test statistic chosen, provided that the approximation to the F distribution (or some other distribution not depending on θ) is justifiable. Similar statements of independence with analogous proofs can be obtained for other choices of location test statistics such as the analysis of variance (ANOVA) F-statistic.\n\nAdditional Considerations for the JLS Framework\nIn the most extreme scenario of simulation, i.e., Model 3, when there were no main G or E effects, the interaction effect was large (βGE1 = 2), and the unobserved exposure was rare (prob(E1 = 1) = 0.05), the distribution test was observed to be more powerful than the JLS-Fisher test (0.916 versus 0.406) (row 1 of Table S6). This is because the resulting phenotype distributions across genotypes differed in shape and their differences were not well captured by only mean and variance parameters. (The JLS-Fisher test could, in theory, be extended to include the skewness parameter or higher moments of such a distribution. The gain in power for this particular setting, however, would be associated with power loss for other approximately normally distributed traits.) In this scenario, when the phenotype deviated from normality, the LRT method appeared to be most powerful (power = 0.996). However, further analysis using permutation estimation of p values showed that power of the LRT under asymptotic analysis was greatly inflated, whereas the proposed JLS methods were robust (Table S7).\nThe alternative distribution16 and LRT17 joint testing methods were proposed for analysis of single SNPs. In principle, they can be extended to gene-set or pathway analysis, but multiple issues arise in implementation. The distribution test statistic depends on the size of the genotype groups for each variant, so it is not clear what the best strategy is to combine the statistics across SNPs with different MAFs. The LRT method is sensitive to the normality assumption of the phenotype distribution and to small group size of the genotype distribution (Tables S1 and S2).\nThe proposed multivariate JLS testing method is also extremely relevant for single-variant association analysis. In GWAS or Next-Generation Sequencing (NGS) settings where millions or tens of millions of SNPs are investigated, rapid screening of the whole genome to correctly prioritize SNPs for further examination demands methods that are powerful, yet computationally efficient. The proposed JLS testing method is robust and easy-to-implement, suitable for large-scale whole-genome scans, and can reveal individual genetic variants with main and/or interaction effects without the need to explicitly specify the interacting genetic and/or environmental variables. Compared with the distribution test and LRT alternatives, our method combines both simplicity of implementation and robustness to small size of the rare homozygous genotype group.\nOur simulation analyses also demonstrated that the location (regression) and scale (Levene’s) tests, and consequently our JLS-Fisher and JLS-minP tests, are robust to poorly captured genotype data. These findings agree with the results of Kutalik et al.41 where minimal to no bias in false positive rates and type 1 error was found for location-only association testing of quantitative phenotypes with uncertain genotypes, specifically when imputed genotype probabilities were converted to most-likely genotype categories and analyzed without explicitly accounting for the uncertainty.\nViolations of the normal data assumption can affect the type 1 error level of the proposed JLS test method as it does the LRT approach (increased type 1 error rate is more severe with LRT). This is largely due to the assumption of independence between the individual location-only and scale-only tests, which is required for the χ42 approximation of the JLS-Fisher statistic. To circumvent this issue, investigators might choose to rely on a permutation distribution when estimating p values. However, this aspect can impose computation challenges at the genome-wide level. An alternative approach is to model the dependency between the individual location and scale test statistics and obtain an adjusted distribution for the JLS statistic. These adjusted distributions have been explored elsewhere under complete specification of the dependency structure between the test statistics being combined.42–44 However, it remains to be explored how to model the dependency between the individual location and scale association test statistics at different loci across the genome.\nIn consideration of the choices available for the individual location and scale tests and the methods of combining information from these individual components for each variant, we recognize that there is no single, most powerful method for all circumstances.20,22 We showed that for normally distributed traits, the JLS-Fisher test statistic WF follows a χ42 distribution under the null hypothesis, as long as the location test statistic is a function of the complete sufficient statistic (e.g., linear regression t-statistic, ANOVA F-statistic) and the distribution of the scale test statistic does not depend on the model parameters (e.g., Levene’s test or the F-test for equality of variances). In practice, when the normality assumption is violated or other JLS tests are preferred, permutation-based p value evaluation can be used with increased computational cost.\nThe proposed framework can be easily extended for meta-analysis, where the sample- or study-specific association test statistics or p values to be combined across samples are obtained from the JLS testing application instead of the typical location-only testing method. The sample-specific choices of the individual location and scale tests need not be identical across different studies, as long as p values of the JLS tests are valid within each study. However, choice of optimal weighting factors assigned to individual samples requires further investigation. Analyzing imputed SNPs (explicitly incorporating the genotype probabilities) or correlated subjects (i.e., pedigree family data) using the proposed JLS framework would require development of appropriate scale-only testing methods; location-only methods for these more complex settings are already available.39,45–49\n\nWeb Resources\nThe URLs for data presented herein are as follows:GWAS data for the type 1 diabetes study, DCCT/EDIC, available through dbGaP, http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000086.v3.p1\nJLS analysis tool, http://www.utstat.toronto.edu/sun/\nOMIM, http://www.omim.org/\n\nSupplemental Data\nDocument S1. Figures S1–S5 and Tables S1–S8\nDocument S2. Article plus Supplemental Data\n\nAcknowledgments\nThe authors are grateful to the subjects in the CGS, French CF, and DCCT studies for their participation. The authors also thank Drs. Hughes Aschard and Peter Kraft for providing their simulation results; the CF Gene Modifier Consortium and Drs. Michael Knowles, Garry Cutting, and Mitchell Drumm for helpful discussions; and the Canadian and US CF Foundations for their generous support of the genotyping. This data resource was also supported in part by Genome Canada, through the Ontario Genomics Institute per research agreement 2004-OGI-3-05 (to P.R.D.), with the Ontario Research Fund, Research Excellence Program. This work was funded by the Canadian Institutes of Health Research (CIHR; 201309MOP-310732-G-CEAA-117978 to L.S. and MOP-258916 to L.J.S.); the Natural Sciences and Engineering Research Council of Canada (NSERC; 250053-2013 to L.S. and 371399-2009 to L.J.S.); Cystic Fibrosis Canada #2626 (to L.J.S.); Training grant GET-101831; Université Pierre et Marie Curie Paris; Agence Nationale de la Recherche (R09186DS to H.C.), Direction Générale de la Santé; Association Vaincre la Mucoviscidose, Chancellerie des Universités (Legs Poix); Association Agir Informer Contre la Mucoviscidose; and Groupement d’Intérêt Scientifique (GIS)–Institut des Maladies Rares. A.D.P. held a Canada Research Chair in the Genetics of Complex Diseases. D.S. is a trainee of CIHR STAGE (Strategic Training for Advanced Genetic Epidemiology) program at the University of Toronto. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).\nSupplemental Data include five figures and eight tables and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2015.05.015.","divisions":[{"label":"sec","span":{"begin":0,"end":14519}},{"label":"title","span":{"begin":0,"end":10}},{"label":"sec","span":{"begin":12,"end":2724}},{"label":"title","span":{"begin":12,"end":72}},{"label":"p","span":{"begin":73,"end":2724}},{"label":"label","span":{"begin":462,"end":464}},{"label":"p","span":{"begin":465,"end":772}},{"label":"label","span":{"begin":773,"end":775}},{"label":"p","span":{"begin":776,"end":1148}},{"label":"label","span":{"begin":1149,"end":1151}},{"label":"p","span":{"begin":1152,"end":1491}},{"label":"label","span":{"begin":1597,"end":1599}},{"label":"p","span":{"begin":1600,"end":2724}},{"label":"label","span":{"begin":1658,"end":1661}},{"label":"p","span":{"begin":1662,"end":2285}},{"label":"label","span":{"begin":2286,"end":2289}},{"label":"p","span":{"begin":2290,"end":2724}},{"label":"label","span":{"begin":2338,"end":2339}},{"label":"p","span":{"begin":2340,"end":2391}},{"label":"label","span":{"begin":2392,"end":2393}},{"label":"p","span":{"begin":2394,"end":2480}},{"label":"label","span":{"begin":2481,"end":2482}},{"label":"p","span":{"begin":2483,"end":2662}},{"label":"label","span":{"begin":2663,"end":2664}},{"label":"p","span":{"begin":2665,"end":2724}},{"label":"sec","span":{"begin":2726,"end":6886}},{"label":"title","span":{"begin":2726,"end":2743}},{"label":"p","span":{"begin":2744,"end":2940}},{"label":"p","span":{"begin":2941,"end":3216}},{"label":"p","span":{"begin":3217,"end":3662}},{"label":"p","span":{"begin":3663,"end":4045}},{"label":"p","span":{"begin":4046,"end":4284}},{"label":"p","span":{"begin":4285,"end":4419}},{"label":"p","span":{"begin":4420,"end":5514}},{"label":"p","span":{"begin":5515,"end":6398}},{"label":"p","span":{"begin":6399,"end":6886}},{"label":"sec","span":{"begin":6888,"end":8540}},{"label":"title","span":{"begin":6888,"end":7015}},{"label":"p","span":{"begin":7016,"end":7380}},{"label":"p","span":{"begin":7381,"end":7516}},{"label":"p","span":{"begin":7517,"end":7667}},{"label":"p","span":{"begin":7668,"end":8146}},{"label":"p","span":{"begin":8147,"end":8540}},{"label":"sec","span":{"begin":8542,"end":14519}},{"label":"title","span":{"begin":8542,"end":8589}},{"label":"p","span":{"begin":8590,"end":9684}},{"label":"p","span":{"begin":9685,"end":10259}},{"label":"p","span":{"begin":10260,"end":11106}},{"label":"p","span":{"begin":11107,"end":11692}},{"label":"p","span":{"begin":11693,"end":12768}},{"label":"p","span":{"begin":12769,"end":13640}},{"label":"p","span":{"begin":13641,"end":14519}},{"label":"sec","span":{"begin":14521,"end":14826}},{"label":"title","span":{"begin":14521,"end":14534}},{"label":"p","span":{"begin":14535,"end":14826}},{"label":"p","span":{"begin":14585,"end":14745}},{"label":"p","span":{"begin":14746,"end":14799}},{"label":"p","span":{"begin":14800,"end":14826}},{"label":"sec","span":{"begin":14828,"end":14933}},{"label":"title","span":{"begin":14828,"end":14845}},{"label":"p","span":{"begin":14846,"end":14933}},{"label":"caption","span":{"begin":14846,"end":14889}},{"label":"title","span":{"begin":14846,"end":14889}},{"label":"caption","span":{"begin":14890,"end":14933}},{"label":"title","span":{"begin":14890,"end":14933}},{"label":"ack","span":{"begin":14935,"end":16427}},{"label":"title","span":{"begin":14935,"end":14950}},{"label":"p","span":{"begin":14951,"end":16427}},{"label":"footnote","span":{"begin":16428,"end":16541}},{"label":"p","span":{"begin":16428,"end":16541}}],"tracks":[]}