Introduction
Identifying the genetic architecture of complex traits requires analytic strategies that move beyond single-variant association tests. Multivariate analyses such as gene-based, gene-by-gene interaction (GxG), gene-by-environment interaction (GxE), gene-set, and pathway analyses are now commonly implemented,1–5 yet, one rarely sees GxG or GxE explicitly accounted for within gene-set and pathway analyses.6 The specification of interacting variables that probably differ between genes in gene sets and pathways is not straightforward. Interacting exposure variables (termed E hereafter) could include contributing environmental factors or SNPs or haplotypes from the same region or at other susceptibility loci. Missing or incorrect information on interacting factors, as well as associated computational burden, might also limit more comprehensive surveys of the whole genome for disease association. Here we provide an easy-to-implement, straightforward solution to exploit potential GxG and GxE in gene-set and pathway analyses, and we illustrate the power of such an approach in a gene-modifier study of lung disease in cystic fibrosis (CF [MIM: 219700]).
CF is a life-limiting genetic disease, with the majority of mortality due to lung disease. CF is caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR [MIM: 602421]). Individuals with the same loss-of-function CFTR mutations have variable disease severity across the lungs, intestine, and other CF-affected organs, suggesting that variation in other (modifier) genes might play a role in disease pathophysiology.7 A major source of pathophysiology in CF is impaired fluid and electrolyte flux in CF-affected organ epithelia. Sun et al.5 hypothesized that regulators of fluid, solute, and ion transport that physically co-localize with CFTR would play a role in determining CF disease severity. Using a composite tissue gene set that corresponds to 157 gene products localized to the apical plasma membrane of epithelia, a multivariate sum-test provided evidence that many of the constituent gene products contribute to early intestinal obstruction (called meconium ileus [MIM: 614665]) in CF. Meconium ileus is heritable (∼88% estimated heritability) and presents at birth in ∼15% of CF-affected individuals,8 with limited opportunity for environmental involvement.
Lung disease in CF is distinct, being progressive and notably affected by environmental exposures such as infection history, secondhand smoke, air pollution, and ambient air temperature,9–11 with a ∼50% estimated heritability.12 Similar to the intestinal epithelia, the apical membrane constituents of lung epithelia should likewise contribute to disease. Testing this, or any hypothesis in CF lung disease, however, has the added difficulty of being able to specify and collect accurate information on the contributing exposure variables and model the potential GxE. Therefore, an alternative association testing methodology is needed to account for the effects of interacting variables that are either unknown or unmeasured.
For a biallelic SNP, a genetic interaction (either GxG or GxE) on a quantitative phenotype will lead to differences in phenotypic distribution across the three genotypes, leading to differences in phenotypic variance (scale).13 In light of this, Levene’s scale test of equality of variance14 has been proposed as a method of prioritizing SNPs for subsequent GxG and GxE studies,13,15 in contrast to the standard mean (location) test (i.e., testing for phenotypic differences in mean across genotypes). The advantage of using variance testing to incorporate GxG or GxE is that exposures need not be specified, and the enormous multiple testing burden of formally examining all possible pair-wise interactions is removed. Variance-only testing, however, has limited power to detect SNPs displaying main effects only.
Focusing on single-SNP analysis, Aschard et al.16 proposed a test that compares the percentiles of phenotypic values between genotypes, capturing mean, variance, and other differences. Although this approach comprehensively evaluates the phenotypic distribution between genotypes, it sacrifices statistical power when a (approximately) normally distributed trait is sufficiently summarized by its mean and variance. Furthermore, the distribution test statistic requires computationally intensive permutation-based methods for p value estimation that are challenging to implement on a genome-wide scale.
Cao et al.17 considered a joint test of mean and variance differences by using a full likelihood approach based on linear regression models. The likelihood ratio test (LRT) statistic follows an asymptotic chi-square distribution under the null hypothesis. The LRT approach can increase power but is more sensitive to model assumptions such as normality. Therefore, Cao et al.17 proposed a parametric bootstrap method to calculate “honest” p values at the cost of computational efficiency. For both the LRT and distribution methods, implementation difficulties arise for multivariate (e.g., gene-based, gene-set, and pathway) analyses (see Appendix A).
Here we propose an easy-to-implement joint location-scale (JLS) testing framework that simultaneously tests the null hypothesis of equal mean and equal variance between genotypes, by aggregating association evidence from the individual location-only and scale-only tests, focusing on Fisher’s method of combining information (JLS-Fisher). The proposed method detects association in the presence of underlying genetic main and/or interaction effects, without specifying the interactors; it allows any type of individual location and scale tests to be combined, making it particularly useful for gene-based, gene-set, and pathway analyses.
Using two proof-of-principle examples in the single-variant testing setting, where GxE interactions have been shown to exist from previous studies, we demonstrate the power of the JLS method. The first example pertains to a locus that determines glycosylated hemoglobin (HbA1c) levels based on a genome-wide association study (GWAS) of type 1 diabetes (T1D [MIM: 222100]) complications.18 The second pertains to determinants of CF lung disease severity.19 For these examples, the known environmental interactors are not explicitly modeled, but we leverage their existence to identify genetic contributors to HbA1c and CF lung disease.
We then use the multivariate JLS method to identify and replicate an association between the apical gene set, and the SLC9A3 complex within the set, and lung disease severity in CF; this finding would have been missed by the traditional multivariate gene-set tests that do not account for (heterogeneous) GxG and GxE interactions. Through extensive simulation analyses, we demonstrate that the proposed JLS method has good type 1 error control with improved power compared to other testing options, and, in contrast to the recently proposed distribution16 and LRT joint testing methods,17 can be easily implemented in the context of gene-set and pathway analyses.

Material and Methods

Genetic Model and Notation
Let Y be a quantitative trait and G be the minor allele count for the SNP under investigation (G = 0, 1, or 2); the additive assumption is not critical to the method development. Let E be an exposure variable interacting with the genetic factor. This exposure E could reflect continuous or categorical measures of environmental or genetic background. The underlying true genetic model can include main effects of both G (βG) and E (βE) on Y, as well as the interaction effect (βGE):(Equation 1) Y∼β0+βGG+βEE+βGEGE+ε.We assume that the trait Y is normally distributed with unit variance conditional upon G and E, in other words, Var(Y |G = g, E = e) = 1 and ε∼N(0,1).
When considering only G, the working model would reduce to(Equation 2) Y∼β0+βGG+εG.Pare et al.13 showed that the conditional variance of Y conditional on G alone could be expressed as σG2 = Var(Y|G = g) = (βE+βGEg)2+1. Thus, if an interaction effect was present (i.e., βGE≠0), the trait variance would differ between genotypes.

Joint Location-Scale Testing Procedure for Single-SNP Analysis
Our proposed JLS testing framework, based on the working model of Equation 2, tests the following null hypothesis:H0joint:βG=0andσi=σjforalli≠j,i,j=0,1,2.The alternative hypothesis of interest isH1joint:βG≠0orσi≠σjforsomei≠j.For a SNP under study, different JLS test statistics can be considered. Let pL be the p value for the location test of choice (i.e., testing H0location:βG=0 using, for example, ordinary least-squares regression), and pS be the p value for the scale test of choice (i.e., testing H0scale:σi=σjforalli≠j using, for example, Levene’s test). We first consider Fisher’s method (JLS-Fisher) to combine the association evidence from the individual location and scale tests. The JLS-Fisher statistic is defined asWF=−2(log(pL)+log(pS)).
Large values of WF correspond to small values of pL and/or pS and provide evidence against the null H0joint. If pL and pS are independent under H0joint, WF is distributed as a χ42 random variable. Although Fisher’s method here is used to combine evidence from two tests applied to the same sample, the assumption of independence between pL and pS under H0joint holds theoretically for a normally distributed trait (Appendix A, Lemma 1), as well as empirically for approximately normally distributed traits in finite samples (Figures S1 and S2, Tables S1 and S2).
One can also consider the minimum p value (JLS-minP) approach, or various alternatives based on combining the individual test statistics themselves with or without weights.20–23 The JLS-minP statistic is defined asWM=min(pL,pS).If pL and pS are independent under H0joint, WM is distributed as a Beta random variable (with shape parameters 1 and 2) where small values of WM correspond to small values of pL and/or pS and evidence against the null.

Joint Location-Scale Testing Procedure for Gene-Set Analysis
The chosen JLS test statistic (e.g., WF) for single-SNP analysis can then be used for implementing gene-based, gene-set, or pathway analysis in a direct fashion.
Assume that J SNPs have been annotated to a gene or gene-set of interest. For each SNP j, the JLS-Fisher test statistic (e.g., WF,j) is first obtained and then the association evidence can be aggregated across the SNPs by considering, for example, the sum statistic,5 ∑jWF,j. To account for LD between SNPs, the overall association evidence can be evaluated by a phenotype-permutation approach where the empirical p value is the proportion of K permutation replicates with sum statistics more extreme than the observed value. Because this multivariate method analyzes all J SNPs simultaneously, the number of permutations need not be exceedingly large and K = 10,000 provides accurate estimates for p values in the range of 0.05. If multiple genes or gene-sets are of interest, more replicates would be required to adjust for the corresponding number of hypothesis tests.
To compare strength of association evidence between sets of variants within the same gene or across different genes, an extension of the gene-set approach can be implemented. Sum statistics are obtained as previously described for each group of variants, then calibrated by the respective number of variants. The difference between the two proportional sum statistics is the test statistic of interest,DF=1J∑jWF,j−1K∑kWF,k,where the j = 1,…,J and k = 1,…,K subscripts index the competing sets of variants, and the F subscript indicates that the variant-specific joint location-scale statistics are obtained by Fisher’s method (although other methods of combining such as minP could be used as well). The significance of DF can be evaluated with the phenotype-permutation approach as described above.
In all applications, genotypes were coded additively (G = 0, 1 or 2), and for X chromosome SNPs, female and male genotypes were analyzed together and coded as G = 0, 1, or 2 and G = 0 or 2, respectively.

Application Data: HbA1c Levels in Type 1 Diabetes and Cystic Fibrosis Lung Disease
We tested the proposed JLS approach with applications to genetic association studies of HbA1c levels in type 1 diabetes and lung disease in cystic fibrosis.
The T1D application used the Diabetes Control and Complications Trial (DCCT) sample in which 667 individuals were conventionally treated and 637 intensively treated.18 In this sample, an earlier GWAS of HbA1c levels in type 1 diabetes18 showed that rs1358030 near SORCS1 (10q25.1 [MIM: 606283]) interacts with treatment type (conventional versus intensive) on HbA1c levels (quarterly measured values spanning 6.5 years). To demonstrate that the JLS testing framework could leverage the interaction effect without knowledge of the interacting variable, we analyzed the association of rs1358030 with the average, inverse normal transformed HbA1c value, assuming the treatment variable was not available.
The CF application involves association studies of an averaged lung function measure, forced expiratory volume in 1 s, adjusted for sex, age, height, and mortality, and normalized (SaKnorm),24 using data from the Canadian Cystic Fibrosis Gene Modifier Study (CGS).25 To reduce the duration of heterogeneous environmental exposures that were not measured, and recognizing that age can serve as a surrogate for these exposures, we19 previously restricted our (location-only) analysis to lung function measures from pediatric ages (<18 years, n = 815 subjects from 753 unique families), analyzing eight SNPs in three genes, SLC9A3 (MIM: 182307), SLC6A14 (MIM: 300444), and SLC26A9 (MIM: 608481), previously identified as associated with meconium ileus in a hypothesis-driven GWAS (GWAS-HD).5 This approach involved a 46% reduction in sample size, and that the age restriction be fixed for all variants, despite the possibility that the optimal exposure (here, age) is gene specific. Here we re-analyzed the SNPs with the individual location-only and scale-only tests, as well as the joint JLS-Fisher, JLS-minP, LRT,17 and distribution16 tests, removing the age restriction and using the full CGS sample (n = 1,409 unrelated subjects). For comparison, the location-only analyses restricted to the pediatric population using different age cut-off points were also investigated.
With the full CGS sample, we further tested the hypothesis that multiple proteins present on the apical plasma membrane contribute to lung disease severity as measured by SakNorm; the hypothesis was considered previously for meconium ileus susceptibility in CF.5 In total, 3,814 GWAS SNPs (MAF > 0.02) were annotated to within ±10 kb of 155 apical genes obtained from the Gene Ontology project.5 The JLS-Fisher test was first applied to each SNP, and the SNP-specific test statistic was then aggregated across all SNPs to perform the multivariate apical gene-set analysis. We then used an independent French sample (n = 1,232) for replication. Imputation based on 1000 Genomes26,27 (as outlined in the Online Methods of Sun et al.5) was used to assess regional association within the SLC9A3R1, SLC9A3R2, and EZR binding sites of SLC9A3.
Institutional review committees at all participating institutions for the DCCT T1D study as well as all Canadian CF clinics approved this study. Consent was also obtained for participants from France with procedural approval (CPP 2004/15) and information collection approval by CNIL (04.404). Data collection, genotyping, and quality-control procedures are reported elsewhere for the T1D18 and CF (Canadian and French) studies.5,25

JLS Method Evaluation by Simulation
We conducted extensive simulation analyses to evaluate the performance of the proposed JLS-Fisher and JLS-minP tests for single-variant analysis and compared them with the individual location-only and scale-only tests, as well as the distribution test16 and the LRT.17 We also conducted a sensitivity analysis on the various tests, studying the impact of poorly captured genotypes that can be expected from imputed datasets. Simulation details are provided in Appendix A.

Results

Type 1 Diabetes Complications
Traditional analysis using the individual location-only test did not yield a genome-wide significant association of p < 5 × 10−8 (as previously proposed by Dudbridge and Gusnanto28) between rs1358030 and HbA1c (p = 2.3 × 10−7), whereas the JLS-Fisher method did (p = 4.9 × 10−8). Genome-wide significance was also not achieved by the scale-only test (p = 0.01), the distribution test (p = 1.7 × 10−7, estimated from 108 permutation replicates), the LRT (p = 2.0 × 10−7), or the JLS-minP test (p = 4.6 × 10−7).
Knowing the treatment information, Paterson et al.18 was able to explicitly model this interactor and identified rs1358030 as genome-wide significantly associated with HbA1c levels and interacting with treatment type (p = 3.8 × 10−10 and p = 0.013 for the main and interaction effect, respectively).

Cystic Fibrosis Lung Disease: Single-Variant Association Analysis
We first noted in Table 1 that all methods (the joint and individual tests using the full sample or various sub-samples) consistently highlighted that variants in both SLC9A3 and SLC6A14 are associated with the lung phenotype, but not those in SLC26A9. For the individual location-only test, although the age cut-off of <18 years appears to be ideal for rs17563161 in SLC9A3 resulting in the smallest p value (p = 3.3 × 10−5) even with the smaller sample size (n = 753), a different cut-off (<20 years, n = 830) yields the most significant location-only test result for rs3788766 in SLC6A14 (p = 0.0002). This illustrates the challenge of specifying and modeling interacting exposures in the context of multiple SNPs and genes of interest.
With the entire sample of CF-affected individuals (n = 1,409) encompassing a wider age range and thus greater variability of environmental exposures, we indeed observed evidence for scale differences in the lung phenotype (Levene’s test p = 0.01–0.08) for variants in SLC9A3 and SLC6A14, revealing the possibility of GxG or GxE interaction. (Table S3 provides evidence for SNP-by-age interaction effects from regression models directly incorporating age.) Compared with the distribution and LRT joint tests, the JLS-Fisher test consistently provides the smallest p values for variants in SLC9A3 and SLC6A14.

Cystic Fibrosis Lung Disease: Gene-Set Association Analysis
We observed evidence of an association between the apical gene set and SakNorm (JLS-Fisher permutation p = 0.0099; Figures 1A and 1B and Table 2). Comparison with the individual location-only and scale-only tests showed that this association does not reach statistical significance using the conventional gene-set location test (regression permutation p = 0.0876; Figures 1C and 1D) whereas the variance component contributed a significant result (Levene’s permutation p = 0.0222; Figures 1E and 1F).
Examination of the SNP- and gene-specific JLS-Fisher p values and rankings showed SLC9A3 to be the top ranked (Table S4), with three of the top four ranked SNPs being annotated to SLC9A3. This provided consistent support for the relationship between SLC9A3 and lung function as previously reported (Table 1).19,29 In the independent French sample (n = 1,232), we observed replication of the apical hypothesis via the JLS-Fisher test (permutation p = 0.0180; Figures 2A and 2B and Table 2). Again, the standard location testing approach, by itself, was not powerful enough to detect the association (regression permutation p = 0.2058; Figures 2C and 2D), and the added contribution from the scale-testing component was beneficial (Levene’s permutation p = 0.0077; Figures 2E and 2F). After excluding all ten genotyped SNPs annotated to SLC9A3, the apical gene-set test in both the CGS and French samples remained significant (Table 2), suggesting that the JLS-Fisher method identified multiple additional associations within the gene set, beyond the known SLC9A3 contribution.
Three of the five top-ranked genes in the apical gene set were SLC9A3, SLC9A3R2 (MIM: 606553), and EZR (MIM: 123900), with ten genes in total displaying JLS-Fisher p < 0.05 (Table S4). Of these top five, protein product interactions are known between (1) SLC9A3 (also known as NHE3) and SLC9A3R2 (“SLC9A3 regulator 2,” also known as E3KARP or NHERF2), (2) SLC9A3 and EZR, and (3) SLC9A3R2 and EZR. SLC9A3R1, or “SLC9A3 regulator 1” (also known as EBP50 or NHERF1), is recognized as a paralog of SLC9A3R2 with comparable binding sites to both SLC9A3 and EZR (reviewed in Donowitz and Li30); SLC9A3R1 (MIM: 604990) was the 21st ranked. Based on interaction investigations typically involving intestinal and kidney tissues, a current paradigm is that EZR provides anchorages for the SLC9A3 regulators to the actin cytoskeleton, and probably also facilitates early trafficking of SLC9A3 from the Golgi to the cell periphery, with eventual “hand-off” to the SLC9A3 regulators. The SLC9A3 regulators help to maintain SLC9A3 at the apical membrane and facilitate its dynamic shuffling with endosomes and internal vesicles.30
Notably, variants within the SLC9A3 complex component genes are significantly associated as a set in the CGS sample using the JLS-Fisher or regression tests (p < 0.0001 and p < 0.0001, respectively; Table 2). The French sample provided replicated support for the SLC9A3 complex using the JLS-Fisher test (p = 0.0415), although in this latter sample, SLC9A3R1 is more highly ranked than SLC9A3R2 and EZR. Removal of the four-gene set, SLC9A3, SLC9A3R1/2, and EZR, and the re-testing of the apical gene set with the remaining 151 genes suggests that association(s) beyond the SLC9A3 regulator complexes also exist (JLS-Fisher p = 0.0329 and 0.0201 in the CGS and French samples, respectively).
SLC9A3 contains multi-membrane spanning motifs in its amino portion to facilitate transport function, with a large cytosolic carboxyl portion that affords regulation with binding sites for multiple interactors. Although there is significant LD that extends throughout the gene (Figure 3), there was evidence suggesting greater association for variants as a group from the region corresponding to the regulatory portion, compared to the transporting portion (regression, Levene’s, and JLS-Fisher permutation p = 0.0391, 0.312, and 0.1254, respectively). Amino acids 586–660 of SLC9A3 bind the SLC9A3 regulators where an exonic nucleotide variant (rs2230437) is associated with lung function (JLS-Fisher p = 7.6 × 10−6). This synonymous change is in high LD (r2 > 0.8) with four other variants, all with similar association evidence (rs11743825, rs41282625, 5:475625:GC_G, and rs11745923 with JLS-Fisher p = 1.4 × 10−6, 2.2 × 10−6, 2.3 × 10−6, and 2.7 × 10−6, respectively), and all in noncoding positions. There are also no associated coding variants in the major EZR binding site from amino acids 519–595. Similarly, there are no coding variants in the respective PDZ domains of the SLC9A3 regulators or in the FERM domain of EZR that bind SLC9A3. The imputation analysis would have captured the major variation in this gene. Collectively, because the constituents or their direct physical interactions would not appear to be affected, disturbed expression with altered stoichiometry of components or dynamic positioning of the SLC9A3 complex might be contributing to lung disease severity.

Simulation
Results showed that, under the normal model and in most scenarios considered, the LRT and JLS-Fisher methods had similar power and were more powerful than the JLS-minP and distribution tests (Figures 4 and S3) and the individual location-only and scale-only tests (Figures S4 and S5).
Investigation of type 1 error (100,000 replicates) under the normal model showed that all individual and joint tests considered here maintained the nominal error rates at the 0.05, 0.005, and 0.0005 levels when MAF was at least 0.1 and there were ≥ 20 individuals within each genotype group (Tables S1 and S2). However, as MAF or the smallest genotype group size decreased, the LRT method demonstrated inflated type 1 error (Tables S1 and S2). Departure from normality also resulted in LRT having inflated type 1 error, as previously discussed in Cao et al.17
To study the type 1 error at the genome-wide level, we conducted association analysis of all 866,995 GWAS SNPs available in the T1D HbA1c example18 with permuted phenotype. Results showed that all joint testing methods provide correct type 1 error control in a finite sample (Figures S1A, S1D, and S1G, respectively, for the JLS-Fisher, JLS-minP, and LRT methods; the distribution test by design has the correct type 1 error control). However, when stratified by minor allele frequency (MAF), we observed that, whereas the JLS-Fisher and JLS-minP tests are slightly conservative for SNPs with MAF < 0.1 (Figures S1C and S1F), the LRT method has increased type 1 error rate (Figure S1I) and would probably not be appropriate when the size of the genotype group is limited (n < 20). Permutation analysis of all of the 565,884 GWAS SNPs available in the CF application example5 provided type 1 error results similar to those of the T1D HbA1c example (Figure S2).
For the sensitivity analysis, we observed that the individual location and scale tests and the JLS-Fisher and JLS-minP tests were not affected by poorly captured genotypes; they maintained correct type 1 error level irrespective of the proportion of incorrectly assigned genotypes (Table S5).

Discussion
Interactions between genes in gene-set and pathway analyses are assumed to exist, but they are, in general, not accounted for in current genetic analysis methods. Here we proposed a joint location-scale (mean-variance) testing method to account for these relationships as well as main effects, while avoiding explicit specification of the contributing interactions. We used this method to identify the apical membrane constituent gene set as contributing to CF lung disease and replicated this association in an independent sample of similarly ascertained CF-affected subjects. Further, we illustrated how the gene-set association notably benefits from consideration of phenotypic variance differences that might have been induced by GxG and/or GxE relationships. This apical gene-set association was not initially detected via the classical location-only testing as used in a typical GWAS and provides insight into interacting genes and exposure variables affecting lung function.
SNPs in SLC9A3 have been reported to associate with CF lung disease in two candidate gene studies.19,29 However, a broader role for the SLC9A3/SLC9A3R2/EZR complex was not recognized. The JLS-Fisher test ranked these players as three of the top five contributing apical plasma membrane genes. Variation at SLC9A3R2 or EZR loci and possibly also SLC9A3R1 given its paralogous function (and supporting evidence in the French sample) could be expected to influence CF lung disease severity, presumably by affecting changes in net SLC9A3 activity. For example, modest changes in levels of either SLC9A3R2 or EZR could alter functional apical membrane SLC9A3 levels, just as SLC9A3 locus variants might alter the amount of SLC9A3 transcripts. It should also be noted that the SLC9A3 regulators, with their proximal localization at the apical membrane, could contribute a tethering function for a number of other apical membrane proteins, including CFTR.31 Given loss of function in CF, SLC9A3 regulators might not be mediating their effects via CFTR; however, it is possible that some of the remaining significant apical membrane components, beyond SLC9A3, utilize the EZR and/or the SLC9A3 regulator scaffold.32,33
We demonstrated the potential advantage of using the JLS framework over the conventional location-only or scale-only approach in practice. As proof of principle, we used examples from an association study of a complex trait (glycemic control in type 1 diabetes) and a modifier study of a Mendelian disease (lung function in CF). Both of these datasets provided examples of a genetic effect on a quantitative phenotype that varied based on another variable (i.e., treatment regimen and age, respectively). Our results showed that if the interacting exposure(s) were not known or available, the JLS method could have improved power over conventional approaches in detecting the association.
Recent literature cautions the investigation and interpretation of significant differences in phenotypic variability across genotypes as well as observed GxE and GxG interaction effects.17,34–37 First, variance heterogeneity across genotypes can result from a mean-variance relationship induced by an inappropriate choice of measurement scale for the phenotype under investigation. Second, dependence between a causal locus and an exposure (G or E) can create synthetic interactions between the exposure and other tagged SNPs in incomplete LD with the causal locus.36,37 Consequently, we might observe phenotypic variance heterogeneity across genotypes of a tagged SNP.17 In these scenarios, however, because there is a true underlying genotype-phenotype relationship (main effect of G), it does not matter whether we detect this association through a location or a scale test, or both; the goal of our approach is to leverage complex genetic architecture to efficiently identify the associated G for follow-up investigation of causal interactors.
Challenges in understanding and being able to replicate original findings, particularly in the GxG, GxE, gene-set, and pathway analysis setting, is well documented.38 Given the potential heterogeneity between the Canadian discovery sample and the French replication sample (e.g., clinical treatment, climate, and air quality), interaction models could differ between the two samples. Therefore, even if the exposure variables were precisely measured, a significant multivariate interaction effect found in the discovery sample may not be observed in the replication sample. The proposed method allows different exposure variables between genes and between samples, as long as the corresponding underlying phenotype-genotype association mechanism results in phenotypic mean and/or variance differences between genotypes of the variants of interest. A step-by-step guide for application of the JLS framework, and a discussion of its advantages compared to alternative approaches, is included as additional considerations in Appendix A.
In conclusion, we have provided a robust joint location-scale (JLS) testing framework for the detection of single-variant, gene-set, or pathway associations involving either main or interaction effects, or both. This methodology applies to analysis of a quantitative trait, regardless of the biological interpretation of the chosen scale of phenotypic measurement. Application of the JLS approach identified the SLC9A3 regulatory complex as an important contributor to CF lung disease, which was completely missed in previous analyses of the same data. The JLS association testing method should be considered for future gene-set, pathway, and whole-genome association scans. Further, JLS should be employed to re-examine datasets previously analyzed via the conventional location-only or scale-only testing approaches, for complex traits or complex secondary phenotypes of Mendelian diseases. The method will help researchers pinpoint susceptibility loci for additional analysis toward the understanding of the genetic architecture of a complex trait.