Materials and Methods Animals and tissue collection. C57BL/6J ApoE−/− (B6.ApoE−/−) were purchased from Jackson Laboratory (Bar Harbor, Maine, United States). C3H/HeJ ApoE−/− (C3H.ApoE−/−) were generated by backcrossing B6.ApoE−/− to C3H for ten generations. F1 mice were generated from reciprocal intercrossing between B6.ApoE−/− and C3H.ApoE−/−, and F2 mice were subsequently bred by intercrossing F1 mice. A total of 334 mice (169 female, 165 male) were produced. All mice were fed Purina Chow containing 4% fat until 8 wk of age and then transferred to a “Western” diet containing 42% fat and 0.15% cholesterol for 16 wk. Mice were sacrificed at 24 wk of age. At death, livers were immediately collected and flash-frozen in liquid N2, and gonadal fat pads were extracted and weighed. RNA sample preparation, microarray hybridization, and expression analysis. RNA preparation and array hybridizations were performed at Rosetta Inpharmatics (Seattle, Washington, United States). The custom ink-jet microarrays used in this study (Agilent Technologies, previously described [21,34]) contain 2,186 control probes and 23,574 noncontrol oligonucleotides extracted from mouse UniGene clusters and combined with RefSeq sequences and RIKEN full-length clones. Mouse livers were homogenized and total RNA extracted using TRIzol reagent (Invitrogen, Carlsbad, California, United States) according to manufacturer's protocol. Three micrograms of total RNA was reverse transcribed and labeled with either Cy3 or Cy5 fluorochrome. Purified Cy3 or Cy5 complementary RNA was hybridized to at least two microarray slides with fluor reversal for 24 h in a hybridization chamber, washed, and scanned using a laser confocal scanner. Arrays were quantified on the basis of spot intensity relative to background, adjusted for experimental variation between arrays using average intensity over multiple channels, and fitted to an error model to determine significance (type I error). Gene expression is reported as the mlratio relative to the pool derived from 150 mice randomly selected from the F2 population. For subsequent analyses, mlratio data are assumed to be normally distributed, a valid assumption as previously demonstrated [21,30]. The error model used to assess whether a given gene is significantly differentially expressed in a single sample relative to a pool comprised of a randomly selected subset of 150 samples has been extensively described and tested in a number of publications [35,36]. Genotyping and linkage statistics. Genomic DNA was isolated from kidney by phenol-chloroform extraction. An examination of existing databases identified over 1,300 SNPs that showed variation between the B6 and C3H strains, and a complete linkage map for all 19 autosomes was constructed using 1,032 of these SNPs at an average density of 1.5 cM. Genotyping was conducted by ParAllele (South San Francisco, California, United States) using the molecular-inversion probe (MIB) multiplex technique [37]. Testing for linkage of both clinical traits and gene expression (using mlratio) was conducted using a linear model. Consider a phenotype denoted by y. The linear model that relates variation in y to QTLs and other covariates (e.g., sex) is given by the general form where μ is the trait mean, X′ a vector of covariates, β being the associated vector of regression coefficients, and e the residual error. Linkage was computed for over 20 clinical traits as well as 23,574 liver transcripts. Standard genome scans calculate linkage by comparing the linear model to the null model where β1 and β2 are the regression coefficients of the additive and dominant parameters, respectively. The LOD score represents the difference in the log10 of the likelihood of the above two equations, where the individual model likelihoods are maximized with respect to the model parameters, given the marker genotype and phenotype data. If a trait y differs on average between the two sexes but the QTL has the same effect in both males and females, we can model this interaction by including sex as an additive covariate in the above models, resulting in the new model which is then compared to the null model where β3 is the regression coefficient of the sex parameter. The effect of a QTL may be dependent on the state of a covariate; for instance, a QTL may have an effect specific to one sex, or may have opposite effects in the two sexes. This interaction can be modeled using a full model, which accounts for all additive covariates, as well as interactions between the covariates: which is compared to the above null model (Equation 5). The full model (Equation 6) allows us to model all heritable and sex-specific interactions in a single equation and maximally powers us to detect significant QTLs when the sex and sex-interaction terms are significant, given all 334 animals are included in the analysis. Furthermore, using the full model obviates the need for modeling QTLs in one sex only, a procedure that could decrease our power by halving the sample size, rendering it impossible to detect interactions between QTLs and sex. However, because the full model contains two more parameters than the model that treats sex as an additive covariate (Equation 4), the LOD score threshold for significant linkage is higher (using the convention of Lander and Kruglyak [20]), with QTL-specific significance levels of 2 × 10−3 and 5 × 10−5 equivalent to LOD scores of 4.0 (suggestive linkage) and 5.4 (significant linkage, equivalent to genome-wide p < 0.05), respectively. As a result, if there are no significant sex-additive or sex-dominant interactions, the full model will actually reduce power to detect linkage. To minimize the loss in power of fitting the full model when the sex-interaction terms are not significant, we employed a model selection procedure that introduces sex-interaction terms only if they add significantly to the overall QTL model. The model selection procedure makes use of forward stepwise regression techniques to determine whether it is beneficial to include the sex-interaction terms, conditional on realizing a significant additive effect (p < 0.001). That is, the data are fitted to Equation 4 for a given marker, and if the add term is significant at the 0.001 significance level, then we attempt to add the sex*add term into the model. The sex*add term is retained in the model if the Bayesian information criterion (BIC) for this model is smaller than the BIC for Equation 4. If sex*add is included in the model as a result of this procedure, then we again use BIC to consider including the sex*dom term in the model. To determine which of the four models (Equations 2, 4, 6, and the model selection) is optimal, we empirically estimated the FDR for each model over a set of 3,000 genes randomly selected from the set of all genes detected as expressed in the liver samples. For each gene and for each marker we fitted each of the four models to the data. We performed this same analysis on ten separate permutation sets in which each of the 3,000 gene expression trait vectors was permuted such that the correlation structure among the genes was preserved. The FDR for a given LOD score threshold was then computed as the ratio of the mean number of QTL detected in the permuted datasets (the mean taken over the ten permuted datasets) and the total number of QTL detected in the observed 3,000-gene dataset. We then generated ROC-like curves by varying the LOD score threshold, resulting in a range of FDRs (from 0% to more than 50%). To simplify this type of summary, we considered no more than one QTL per chromosome per gene expression trait by considering only the QTL with the max LOD score on each chromosome for each trait. The ROC curves for each of the four models are shown in Figure S1, with the black curve corresponding to Equation 2, the blue curve corresponding to Equation 4, the red curve corresponding to Equation 6, and the green curve corresponding to the model selection procedure. It is clear from the ROC curves that the sex and sex-interaction terms are significant in the gene expression dataset. For example, at an FDR of 5% we note that 800 QTLs were detected in the set of 3,000 genes for model 1, while 968 (21% increase) and 1,159 (45% increase) QTLs were detected for Equations 4 and 6, respectively. These results demonstrate significantly increased power to detect QTLs when sex is taken into account. We further note that the stepwise selection procedure captures more information than Equation 4, indicating a significant interaction signature in this dataset and also demonstrating that this simple statistical procedure is capable of identifying significant interaction events even at conservative FDR thresholds. Finally, it is of particular note that Equation 4 performed better than Equation 6, the model that incorporated the interaction terms at all times. That is, despite there being a significant interaction signature, the signature was not large enough to justify including interaction terms for every expression trait and at every marker tested. This fact motivated the need to employ the forward regression procedure, and these results further motivate the need to explore sex effects by employing even more sophisticated QTL detection methods, such as that recently described by Yi et al. [38]. Additional statistics. For each 23,574 oligonucleotides represented on the array, we computed a linear regression analysis to test for a correlation between the trait “gonadal fat mass” and each transcript. Similar to our method of calculating linkage, we employed a stepwise linear regression procedure using equations of the form or compared to the null model where β0 is the intercept, and β1, β2, and β3 represent the regression coefficients of their respective terms. As before, the parameter “sex*gene” is only retained if it significantly improves the fit of the model. The p-value threshold for significant correlation is calculated by an F test, which compares the appropriate model (Equation 7 or 8) to the null model (Equation 9). As before, multiple testing was addressed with use of the FDR, ranking the p-values obtained from the above F tests and setting α = 0.01. Genes correlated with the gonadal fat mass trait generated several significant eQTLs. In order to determine if eQTLs generated by these genes were enriched in any locus or if they were distributed randomly, we compared the distribution of these eQTLs against the distribution of all liver eQTLs in overlapping 6-cM bins across the genome using the Fisher exact test. This test is based on exact probabilities from a specific distribution (hypergeometric distribution). p-Values obtained from this test were corrected for multiple comparisons using a simple Bonferonni correction (given that we performed 772 tests across the genome, 0.05/772). Loci with p < 6.5 × 10−5 by Fisher exact test were considered significantly enriched for eQTLs correlated with gonadal fat mass.