American_Journal

PMC:5993513 / 35298-35302 JSON TXT

Landscape of Conditional eQTL in Dorsolateral Prefrontal Cortex and Co-localization with Schizophrenia GWAS Abstract Causal genes and variants within genome-wide association study (GWAS) loci can be identified by integrating GWAS statistics with expression quantitative trait loci (eQTL) and determining which variants underlie both GWAS and eQTL signals. Most analyses, however, consider only the marginal eQTL signal, rather than dissect this signal into multiple conditionally independent signals for each gene. Here we show that analyzing conditional eQTL signatures, which could be important under specific cellular or temporal contexts, leads to improved fine mapping of GWAS associations. Using genotypes and gene expression levels from post-mortem human brain samples (n = 467) reported by the CommonMind Consortium (CMC), we find that conditional eQTL are widespread; 63% of genes with primary eQTL also have conditional eQTL. In addition, genomic features associated with conditional eQTL are consistent with context-specific (e.g., tissue-, cell type-, or developmental time point-specific) regulation of gene expression. Integrating the 2014 Psychiatric Genomics Consortium schizophrenia (SCZ) GWAS and CMC primary and conditional eQTL data reveals 40 loci with strong evidence for co-localization (posterior probability > 0.8), including six loci with co-localization of conditional eQTL. Our co-localization analyses support previously reported genes, identify novel genes associated with schizophrenia risk, and provide specific hypotheses for their functional follow-up. Introduction Significant advances in understanding the genetic architecture of schizophrenia (MIM: 181500) have occurred within the last 10 years. However, for common variants identified in genome-wide association studies (GWASs), the success in locus identification is not yet matched by an understanding of their underlying basic mechanism or effect on pathophysiology. Expression quantitative trait loci (eQTL), which are responsible for a significant proportion of variation in gene expression, could serve as a link between the numerous non-coding genetic associations that have been identified in GWASs and susceptibility to common diseases directly through their association with gene expression regulation.1, 2, 3, 4 Accordingly, results from eQTL mapping studies have been successfully utilized to identify genes and causal variants from GWASs for various complex phenotypes, including asthma (MIM: 600807), body mass index (MIM: 601665), celiac disease (MIM: 212750), and Crohn disease (MIM: 266600).5, 6, 7, 8 Studies integrating eQTL and GWAS data have almost exclusively used marginal association statistics which typically represent the primary, or most significant, eQTL signal when assessing co-localization with GWASs, ignoring other SNPs that affect expression independently of the primary eQTL for a given gene. However, recent findings indicating that conditionally independent eQTL are widespread9, 10, 11, 12 motivate examination of the extent to which considering conditional eQTL may provide additional power to identify likely causal genes in a GWAS locus. Recent reports provide evidence that conditional eQTL are less frequently shared across tissues than primary eQTL10 and, like tissue- and cell type-specific eQTL, are often found more distally to the genes they regulate.10, 13, 14 These lines of evidence suggest that conditionally independent eQTL may contribute to tissue-specific or other context-specific gene regulation (e.g., specific to a particular cell type, developmental stage, or stimulation condition). One mechanism by which disease risk could potentially be mediated by a conditional eQTL is the disruption of a tissue-specific enhancer by a given variant, leading to the dysregulation of the relevant eGene in only the tissue for which the enhancer is specific. For example, an eQTL affecting Parkinson disease risk through expression of SNCA was recently shown to act through the disruption of an enhancer;15 if this enhancer is specific to a disease-relevant cell type, such as nerve cells of the substantia nigra, then it could manifest as a conditional eQTL since it would be only partially represented in brain homogenate. Here, we leveraged genotype and dorsolateral prefrontal cortex (DLPFC) expression data provided by the CommonMind Consortium (CMC) to elucidate the role of conditional eQTL in the etiology of schizophrenia (SCZ). Currently comprising the largest existing postmortem brain genomic resource at nearly 600 samples, the CMC is generating and making publicly available an unprecedented array of functional genomic data, including gene expression (RNA sequencing), histone modification (chromatin immunoprecipitation [ChIP-seq]), and SNP genotypes, from individuals with psychiatric disorders as well as unaffected controls.16 We utilized SNP dosage and RNA-sequencing (RNA-seq) data from the CMC to identify primary and conditionally independent eQTL. We then characterized the resulting eQTL on various genomic attributes including distance to transcription start site and their genes’ specificities across tissues, cell types, and developmental periods. In addition, we quantified enrichment of primary and conditional eQTL in promoter and enhancer functional genomic elements inferred from epigenomic data. Finally, we isolated each independent eQTL signal by conducting a series of “all-but-one” conditional analyses for genes with multiple independent eQTL and then assessed the overlap between all eQTL association signals and the schizophrenia GWAS signals. Material and Methods CommonMind Consortium Data We used pre-QC’ed genotype and expression data from the CommonMind Consortium, and detailed information on quality control, data adjustment, and normalization procedures can be found in Fromer et al.16 Briefly, samples were genotyped at 958,178 markers using the Illumina Infinium HumanOmniExpressExome array and markers were removed on the basis of having no alternate alleles, having a genotyping call rate ≤ 0.98, or having a Hardy-Weinberg p value < 5 × 10−5. After QC, 668 individuals genotyped at 767,368 markers were used for imputation. Phasing was performed on each chromosome using ShapeIt v2.r790,17 and variants were imputed in 5 Mb segments with Impute v2.3.118 using the 1000 Genomes Phase 1 integrated reference panel,19 excluding singleton variants. After phasing and imputation, then filtering out variants with INFO < 0.8 or MAF < 0.05, the number of markers included in the analysis totaled approximately 6.4 million. Gene expression was assayed via RNA-seq using 100 base pair paired end reads and was mapped to human Ensembl gene reference (v.70) using TopHat v.2.0.9 and Bowtie v.2.1.0. After discarding genes with less than 1 CPM (counts per million) in at least 50% of the samples, RNA-seq data for a total of 16,423 Ensembl genes was considered for analysis. The expression data was voom-adjusted for both known covariates (RIN, library batch, institution, diagnosis, post-mortem interval, and sex) and 20 surrogate variables identified via surrogate variable analysis (SVA).20 After the removal of samples that did not pass RNA sample QC (including but not limited to: having RIN < 5.5, having less than 50 million total reads or more than 5% of reads aligning to rRNA, having any discordance between genotyping and RNA-seq data, and having RNA outlier status or evidence for contamination) and retaining only genetically identified European-ancestry individuals, a total of 467 samples was used for downstream analyses. These 467 individuals comprised 209 SCZ-affected case subjects, 52 AFF (bipolar, major depressive disorder, or mood disorder, unspecified)-affected case subjects, and 206 control subjects. eQTL Identification An overview of our workflow can be found in Figure S1. First, to identify primary and conditional cis-eQTL, we a conducted forward stepwise conditional analysis implemented in MatrixEQTL21 using genotype data at 6.4 million markers and RNA-seq data for 16,423 genes. FDR was initially assessed using the Benjamini-Hochberg algorithm across all cis-eQTL tests within each chromosome. FDR was not re-assessed at each conditional step; instead, a fixed p value threshold was used as the inclusion criteria in the stepwise model selection. For each gene with at least one cis-eQTL (gene ± 1 Mb) association at a 5% false discovery rate (FDR), the most significant SNP was added as a covariate in order to identify additional independent associations (considered significant if the p value achieved was less than that corresponding to the initial 5% FDR for primary eQTL). This procedure was repeated iteratively until no further eQTL met the p value threshold criteria. We used a linear regression model, adjusting for diagnosis and five ancestry covariates inferred by GemTools. Following eQTL identification, only autosomal eQTL were retained for downstream analyses. Replication in Independent Datasets Replication was performed in the HBCC microarray cohort (dbGaP: phs000979, see Web Resources) and in the ROSMAP22 RNA-seq cohort by fitting the stepwise regression models identified in the CMC data. For cases in which a marker was unavailable in the replication cohort, all models including that marker (i.e., for that eQTL and higher-order eQTL conditional on it, for a given gene) were omitted from replication. Data from the HBCC cohort was QC’ed and normalized as described in Fromer et al.16 DLPFC tissue was profiled on the Illumina HumanHT-12_V4 BeadChip and normalized in an analogous manner to the CMC data. Genotypes were obtained using the HumanHap650Yv3 or Human1MDuov3 chips and imputed using the 1000 Genomes Phase 1 reference panel. Replication of the eQTL models was performed on 279 genetically inferred European-ancestry samples (76 control subjects, 72 SCZ-affected subjects, 43 BP-affected subjects, 88 MDD-affected subjects), adjusting for diagnosis and five ancestry components. ROSMAP data were obtained from the AMP-AD Knowledge Portal (see Web Resources). Quantile normalized FPKM expression values were adjusted for age of death, RIN, PMI, and 31 hidden confounders from SVA, conditional on diagnosis. Only genes with FPKM > 0 in more than 50 samples were retained. QC’ed genotypes were also obtained from the AMP-AD Knowledge Portal and imputed to the Haplotype Reference Consortium (v.1.1)23 reference panel via the Michigan Imputation Server.24 Only markers with imputation quality score R2 ≥ 0.7 were considered in the replication analysis. GemTools was used to infer ancestry components as was done for the CMC data above. After QC, 494 samples were used for eQTL replication in a linear regression model that also adjusted for diagnosis (Alzheimer disease, mild cognitive impairment, no cognitive impairment, and other) and four ancestry components. Modeling Number of eQTL per Gene on Genomic Features We considered three genomic features (gene length, number of LD blocks in the cis-region, and genic constraint score) for our modeling analyses. Gene lengths were calculated using Ensembl gene locations. We obtained LD blocks from the LDetect Bitbucket site to tally the number of blocks overlapping each gene’s cis-region (gene ± 1 Mb). We obtained loss-of-function-based genic constraint scores from the Exome Aggregation Consortium (ExAC). A negative binomial generalized linear regression model was used to model the number of eQTL per gene based on the above variables; results were qualitatively the same using linear regression of Box-Cox transformed eQTL numbers. Backward-forward stepwise regression using the full model with interaction terms for these three variables was used to determine the relationship between genomic attributes and eQTL number. These analyses were implemented in R. cis-heritability of gene expression was estimated using the same CMC data that were used for eQTL detection, including all markers in the cis-region and implemented in GCTA.25 SNP-heritability estimates were then added to the modeling procedure described above. Tissue, cell type, and developmental time point specificity were measured using the expression specificity metric Tau.26, 27 Tissue specificity for each gene was calculated using publicly available expression data for 53 tissues from the GTEx project28 (release V6p). Expression for each tissue was summarized as the log2 of the median expression plus one, and then used to calculate tissue specificity Tau. Cell type specificity for each gene was computed using publicly available single-cell RNA-sequencing expression data29 generated from human cortex and hippocampus tissues. Raw expression counts for 285 cells comprising six major cell types of the brain were obtained from GEO (GSE67835) and counts data were library normalized to CPM. Expression for each cell type was summarized as the log2 of the mean expression plus one, and then used to compute cell type specificity Tau. Developmental time point specificity for each gene was calculated using publicly available DLPFC expression data for 27 time points, clustered into eight biologically relevant groups, from the BrainSpan atlas (see Web Resources). Eight developmental periods30 were defined as follows: early prenatal (8–12 pcw), early mid-prenatal (13–17 pcw), late mid-prenatal (19–24 pcw), late prenatal (25–37 pcw), infancy (4 months–1 year), childhood (2–11 years), adolescence (13–19 years), and adulthood (21+ years). Expression for each time point was summarized as the log2 of the median expression plus one and then used to calculate developmental period specificity Tau. Each Tau was added to the above model for eQTL number individually, as well as all together. Enrichment Analyses We divided eQTL into separate subgroups by stepwise conditional order (first, second, and greater than second) and created sets of matched SNPs drawn from the SNPsnap31 database for each subgroup, matching on minor allele frequency, gene density (number of genes within 1 Mb of the SNP), distance from SNP to TSS of the nearest gene, and LD (number of LD-partners within r2 ≥ 0.8). For each subgroup of eQTL, we performed a logistic regression of status as eQTL or matched SNP on overlap with functional annotation, including the four SNP matching parameters as covariates. Enrichment was taken as the regression coefficient estimate, interpretable as the log-odds ratio for being an eQTL given a functional annotation. Functional annotations tested included: brain promoters and enhancers (union of all brain region TssA and Enh+EnhG intervals, respectively, from the NIH Roadmap Epigenomics Project32 ChromHMM33 core 15-state model), brain-specific promoters and enhancers (the union of all brain region TssA and Enh+EnhG intervals, excluding those present in seven other non-brain tissues/cell types: primary T helper cells from peripheral blood, osteoblast primary cells, HUES64 cells, adipose nuclei, liver, NHLF lung fibroblast primary cells, and NHEK-epidermal keratinocyte primary cells), and pre-frontal cortex (PFC) neuronal (NeuN+) and non-neuronal (NeuN−) nucleus H3K4me3 and H3K27ac ChIP-seq marks from the CMC. For each data source, active promoter and enhancer (or H3K4me3 and H3K27ac) annotations were tested for enrichment jointly. This analysis was repeated but restricting to matched SNPs located within 1 Mb of any of the 16,423 genes that were tested for eQTL, in order to determine whether the enrichment estimates were inflated due to the proximity of our primary and conditional eQTL to brain-expressed genes, which may be more likely to occur near active regulatory regions in the brain. In addition, to ensure that any enrichment patterns observed were not due to varying effect size among primary and conditional eQTL, the enrichment analyses were also carried out taking into account the variance in expression explained by each eQTL. Variance explained (R2) was estimated using the variancePartition34 R package, and eQTL were stratified into three R2 bins: bin 1, 1 × 10−2 ≤ R2 ≤ 1.75 × 10−2; bin 2, 1.75 × 10−2 ≤ R2 ≤ 2.25 × 10−2; and bin 3, 2.25 × 10−2 ≤ R2 ≤ 3 × 10−2. Logistic regression of status as eQTL or matched SNP was then carried out separately for each R2 bin, within each eQTL order. Conditional eQTL Analyses In order to isolate each conditionally independent cis-eQTL association, we carried out a series of “all-but-one” conditional analyses, implemented within MatrixEQTL,21 for each gene possessing more than one independent eQTL. As these conditional eQTL signals were to be used to test for co-localization with the SCZ GWAS signals, we limited these analyses to those genes (346 in total) with eQTL overlapping GWAS loci. For each of these genes, we conducted an all-but-one analysis for each independent eQTL by regressing the given gene’s expression data on the dosage data, including all of the other independent eQTL for that gene as covariates in addition to diagnosis and five ancestry components. For example, three conditional analyses would be conducted for a gene with three independent eQTL: one analysis conditioning on the secondary and tertiary eQTL, one analysis conditioning on the primary and tertiary, and one analysis conditioning on the primary and secondary. In this manner we generated summary statistics for each independent eQTL in isolation, conditional on all of the other independent eQTL for that gene. Co-localization Analyses For our co-localization analyses, we used summary statistics and genomic intervals from the 2014 Psychiatric Genomics Consortium (PGC) SCZ GWAS.35 We included 217 loci at a p value threshold of 1 × 10−6 (excluding the MHC locus), defined these loci by their LD r2 ≥ 0.6 with the lead SNP, and then merged overlapping loci. GWAS and eQTL signatures were qualitatively compared using p value-p value (P-P) plots, rendered in R, and LocusZoom36 plots. Multiple methods that aim to identify GWAS-eQTL co-localized loci are currently available.37, 38, 39, 40, 41, 42 We chose to further develop coloc39 for our co-localization analyses for several reasons: (1) it uses data from all SNPs within a locus; (2) it avoids the computational burden or approximate results of Bayesian inferential methods for causal variants,41, 42 which rely on reference panel estimates of linkage disequilibrium (LD); and (3) and it has been widely used43, 44, 45 including in direct comparisons of GWAS-eQTL co-localization methods.42, 46 We tested for co-localization using an updated version of coloc39 R functions, which we name coloc2 (see Web Resources), and incorporated several improvements to the method. First, coloc2 pre-processes data by aligning eQTL and GWAS summary statistics for each eQTL cis-region. Second, the coloc2 model optionally incorporates changes implemented in gwas-pw.43 Briefly, we implemented likelihood estimation of mixture proportions of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized) from genome-wide data. Coloc2 uses these proportions as priors (or optionally, coloc default or user-specified priors) in the empirical Bayesian calculation of the posterior probability of co-localization for each locus (eQTL cis-region). Coloc2 averages per-SNP Wakefield asymptotic Bayes factors (WABF)47 across three different values for the WABF prior variance term, 0.01, 0.1, and 0.5, and provides options for specifying phenotypic variance, estimating it from case-control proportions or estimating it from the data. Results Identification of eQTL Primary and conditional eQTL were identified using genotype and RNA-seq data from the CommonMind Consortium post-mortem DLPFC samples (467 European-ancestry case and control subjects).16 We identified 12,813 primary and 16,082 conditional eQTL, totaling 28,895 independent eQTL. Of the genes tested, 81% (12,813 of 15,817 autosomal genes) had at least one eQTL and 63% of these (51% of all genes) also had at least one conditional eQTL, with an average of 1.83 independent eQTL per gene (2.26 among those with at least one eQTL) (Figure 1A). Conversely, when examining the distributions for the number of genes whose expression was affected by each eQTL (Table S1), the majority of eQTL were specific for a single gene, and only a small fraction of eQTL, 1.47%, affected more than one gene, with a maximum of six genes affected by a single eQTL. Figure 1 Characterization of Conditional eQTL (A) Counts of the numbers of genes (y axis) regulated by at least N (1 ≤ N ≤ 16) independent eQTL (x axis). (B) Median Tau value (y axis) for genes with N independent eQTL (x axis), colored by Tau type (cell type, developmental time point, or tissue type Tau). (C) Density plot representing the distance from eSNP to eGene transcription start site (TSS), colored by eQTL order. Dashed lines represent the median distance to TSS for each order of eQTL. We tested for replication of conditional eQTL in two independent datasets, the National Institute of Mental Health’s Human Brain Collection Core (HBCC, n = 279, microarray expression data) and the Religious Orders Study/Memory and Aging Project22 (ROSMAP, n = 494, RNA-seq expression). For each gene the same models were evaluated that were identified in forward-stepwise conditional analysis in the CMC data. We observed significant evidence of replication for both primary and conditional eQTL in the HBCC and ROSMAP post-mortem brain cohorts (Table S2). The estimated proportion of true associations (π1) in ROSMAP was 0.57 and 0.26 for primary and conditional eQTL, respectively; in HBCC π1 was 0.46 and 0.20 for primary and conditional eQTL. Therefore, replication was stronger for primary than for conditional eQTL, as expected given their stronger effect sizes. Replication rates were somewhat higher in the RNA-seq ROSMAP data than in HBCC. Genomic Characterization of Primary and Conditional eQTL The features for which primary and conditional eQTL and their respective eGenes displayed identifiable differences included distance from eQTL to its gene’s transcription start site (TSS), gene length, LD blocks per genic cis-region, genic constraint score, and genic cis-SNP-heritability. According to prior results, eQTL that are shared across tissues and cell types tend to be located closer to transcription start sites than context-specific eQTL;13, 14 we therefore first examined the relationship between primary or conditional eQTL status and distance to genic TSS. Primary eQTL fall closer to the TSS than conditional eQTL (Figure 1C): primary eQTL occur at a median distance of 70.4 kb from the TSS versus a median distance of 302 kb for conditional eQTL. This difference holds true even more proximally to the TSS (Figure S2); 8.1% and 2.5% of primary and conditional eQTL, respectively, fall within 3 kb of the TSS. We next characterized the relationship between the number of independent eQTL per gene and three different genomic features: gene length, number of LD blocks48 in the gene’s cis-region (±1 Mb), and Exome Aggregation Consortium (ExAC) genic constraint score,49 including possible interactions. The best multivariate model for eQTL number included gene length, number of LD blocks, and genic constraint as predictors, as well as a gene length-LD blocks interaction (Table 1). The number of independent eQTL was positively correlated with gene length and number of LD blocks and negatively correlated with genic constraint score (Figure S3). We then examined the variance of gene expression explained by cis-region SNPs, or cis-SNP-heritability, estimated by linear mixed model variance component analysis25 (Figure S4). We found a strong effect of estimated cis-heritability on number of independent eQTL (Table 1, Figure S5). In a joint model with cis-SNP-heritability, the main effects of gene length, number of LD blocks, and genic constraint on eQTL number remained at least nominally significant. Table 1 Number of eQTL per Gene Modeled on Genomic Features Predictor Model 1 Estimate Model 1 Robust SE Model 1 Pr(> |z|) Model 2 Estimate Model 2 Robust SE Model 2 Pr(> |z|) Model 3 Estimate Model 3 Robust SE Model 3 Pr(> |z|) log(Gene length) 0.27 0.04 5.16E−12 0.16 0.03 2.20E−06 0.17 0.03 9.87E−07 LD blocks 0.59 0.17 6.47E−04 0.33 0.15 2.92E−02 0.37 0.15 1.55E−02 log(Gene length): LD blocks −0.03 0.02 7.77E−02 −0.01 0.01 5.65E−01 −0.01 0.01 4.11E−01 Constraint −0.61 0.03 5.93E−85 −0.20 0.03 2.93E−13 −0.15 0.03 5.41E−08 cis-heritability – – – 7.03 0.18 0.00 7.02 0.18 0.00 Tau (tissue) – – – – – – 0.08 0.08 2.76E−01 Tau (DLPFC cell type) – – – – – – 0.20 0.09 3.69E−02 Tau (developmental time point) – – – – – – 0.17 0.09 5.99E−02 We then addressed whether genes with conditional eQTL exhibit greater context specificity as measured by the robust expression specificity metric Tau.26, 27 We calculated Tau across 53 tissues from the Genotype-Tissue Expression (GTEx) project, across 6 DLPFC cell types (astrocytes, endothelial cells, microglia, neurons, oligodendrocytes, and oligodendrocyte progenitor cells) from single-cell RNA-seq,29 and across 8 developmental periods30 (early prenatal, early mid-prenatal, late mid-prenatal, late prenatal, infant, child, adolescent, and adult) from the BrainSpan atlas DLPFC RNA-seq data. We confirmed that higher values of Tau reflect expression specificity by comparing the distributions of all three Tau measures for all genes with the distributions for a subset of housekeeping genes50 (Figure S6). We found positive correlations between eQTL number and tissue, cell type, and developmental time point specificities (Figure 1B, Table 1, Table S3, Figure S7). In a joint model, the strongest correlation was with DLPFC cell type Tau, which is consistent with previous data demonstrating tissue-specific, cell type-dependent expression in blood;12 however, we note that all three Tau sets were inter-correlated (Table S3). Epigenetic Enrichment Analyses One way in which eQTL may affect gene expression is through alteration of cis-regulatory elements such as promoters and enhancers. Putative causal eSNPs have been shown to be enriched in genomic regions containing functional annotations such as DNase hypersensitive sites, transcription factor binding sites, promoters, and enhancers.51, 52, 53, 54 Our observation that conditional eQTL fall farther from transcription start sites than primary eQTL led us to hypothesize that primary eQTL may affect transcription levels by altering functional sites in promoters whereas conditional eQTL may do so by altering more distal regulatory elements such as enhancers. We therefore assessed enrichment of primary and conditional eQTL in brain active promoter (TssA) and enhancer (merged Enh and EnhG) states derived from the NIH Roadmap Epigenomics Project,32, 33 and in H3K4me3 and H3K27ac neuronal (NeuN+) and non-neuronal (NeuN−) ChIP-seq peaks from a subset of the CMC post-mortem DLPFC samples. The overlap of H3K4me3 and H3K27ac ChIP-seq peaks was used as a proxy for active promoters, and H3K27ac peaks that do not overlap H3K4me3 peaks were used as a (relatively non-specific) proxy for enhancers.33 We performed logistic regression of SNP status (eQTL versus random matched SNP) on overlap with functional annotations, separately for each eQTL order (primary, secondary, and greater than secondary). Primary and conditional eQTL were significantly enriched in both promoter and enhancer chromatin states from REMC brain and CMC DLPFC tissues, with greatest enrichments overall observed in PFC neuronal (NeuN+) promoters and enhancers (Figure 2, Table S4). We found that whereas active promoter enrichments in all tissue/cell types markedly decreased with higher conditional order of eQTL, enhancer enrichments either only slightly decreased (REMC brain and PFC NeuN+, Figures 2A and 2C) or remained level (REMC brain-specific, Figure 2B). Though there was also significant enrichment of eQTL in non-neuronal nuclei (NeuN−) promoters and enhancers, this trend of a marked decrease in active promoters but steady levels of enhancer enrichment with greater eQTL order was not observed for non-neuronal PFC nuclei (Figure 2D). This greater decrease in enrichment for promoters compared to enhancers with increasing eQTL order was not confounded by an excess of eQTL near brain-expressed genes in comparison to matched SNPs (Figure S8, Table S5) and furthermore was not an artifact of varying effect size with eQTL order; the same overall pattern was observed when stratifying eQTL by variance in expression explained (R2) and comparing enrichment across eQTL order, within each R2 bin (Figures S9–S12, Table S6). Figure 2 Enrichments of Primary and Conditional eQTL in Active Regulatory Annotations Plotted are enrichments (regression coefficient estimate ± 95% CI from logistic regression, y axes) of primary (x axis eQTL order = 1) and conditional (eQTL order = 2, ≥ 3) eQTL in functional annotations. (A and B) Enrichment in brain (union of all individual brain regions) and brain-specific (present in brain but not in seven other non-brain tissues) active promoter (green) and enhancer (orange) ChromHMM states from the NIH Roadmap Epigenomics Project. (C) Enrichment in neuronal nuclei (NeuN+) for active promoters (intersection of DLPFC H3K4me3 and H3K27ac ChIP-seq peaks, green) and enhancers (H3K27 peaks that do not overlap H3K4me3 peaks, orange). (D) Enrichments in the same annotations, but for DLPFC non-neuronal nuclei (NeuN−). eQTL Co-localization with SCZ GWAS We performed co-localization analyses in order to evaluate the extent of overlap between eQTL and GWAS signatures in schizophrenia and to identify putative causal genes from GWAS associations. Considering 217 loci (Table S7) with lead SNPs reaching a significance threshold of p < 1 × 10−6 from the 2014 Psychiatric Genomics Consortium (PGC) schizophrenia GWAS,35 we tabulated the number of primary and conditional eQTL falling within GWAS loci. A total of 114 out of 217 loci contained primary and/or conditional eQTL for 346 genes; 110 of these genes had one eQTL only and 236 genes had more than one independent eQTL. To quantitatively compare the SCZ GWAS and eQTL association signatures, we modified the R package coloc39 for Bayesian inference of co-localization between the two sets of summary statistics across each gene’s cis-region. Coloc2, our modified implementation of coloc, analyzes the hierarchical model of gwas-pw,43 with likelihood-based estimation of dataset-wide probabilities of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized). We then used these probabilities as priors to calculate empirical Bayesian posterior probabilities for the five hypotheses for each locus, in particular PPH4 for co-localization. For genes with conditional eQTL overlapping SCZ GWAS loci, summary statistics from all-but-one conditional eQTL analyses were assessed for co-localization with the GWAS signature (Figure 3). To illustrate this analytical strategy, we show eQTL results for the iron responsive element binding protein 2 gene IREB2 (MIM: 147582, chr15:78729773–78793798) as an example (Figure 4). Forward stepwise selection analysis identified two independent cis-eQTL for IREB2. In order to generate summary statistics for each eQTL in isolation, we conducted two all-but-one conditional analyses, in each analysis conditioning on all but a focal independent eQTL (for IREB2 this entailed conditioning on only one eQTL per conditional analysis, but involved conditioning on up to six eQTL per gene across all genes considered in the SCZ co-localization analysis). We then tested for co-localization between the GWAS and all of the eQTL summary statistics resulting from the above conditioning analysis using coloc2 (Table S12). In the case of IREB2, the conditional eQTL (rs7171869) was implicated as co-localized with the GWAS signal at this locus with a posterior probability for co-localization (PPH4) of 0.94. A qualitative examination of the IREB2 locus supported the coloc2 results: the correlation between the GWAS p values and conditional eQTL p values was higher than that between the GWAS and primary eQTL p values (Figure 4A). In addition, the GWAS signature for the locus more closely resembled the conditional eQTL signature than either the non-conditional eQTL signature or the primary eQTL signature (Figure 4B). Figure 3 All-but-One Conditional Analysis to Isolate Independent eQTL Signatures (A) Hypothetical GWAS signature (top, green) at a given locus and an overlapping hypothetical eQTL signature (bottom, purple), which comprises two independent eQTL. (B) Same hypothetical GWAS and eQTL signatures after the all-but-one conditional eQTL analysis isolating the primary (red) and secondary (blue) eQTL signatures. Before conditional analysis there is a lack of co-localization between the GWAS signature and eQTL signature. After all-but-one conditional analysis, there is evidence for co-localization between the conditional (secondary) eQTL and GWAS signatures. Figure 4 GWAS Signature for IREB2 Co-localizes with the Conditional eQTL Signature (A) P-P plots comparing −log10 p values from GWAS (y axes) and all-but-one conditional eQTL analysis (x axes), which show the highest correlation to be between the GWAS and the conditional eQTL rs7171869 (blue, bottom). (B) LocusZoom plots for the IREB2 locus, where the GWAS signal (top) more closely resembles the conditional eQTL signal (rs7171869, bottom) than the primary eQTL signal (rs11639224, third from top) or non-conditional eQTL signal (second from top). For all LocusZoom plots, LD is colored with respect to the GWAS lead SNP (rs8042374, labeled). We found that 40 loci contained genes with strong evidence of co-localization between eQTL and GWAS signatures, with posterior probability of H4 (PPH4) ≥ 0.8 (Table 2). When restricting to genome-wide significance for the GWAS, we found co-localization in 24 of the 108 loci. Given the correlations between number of independent eQTL and expression specificity scores (Tau) across tissues, cell types, and development, we tabulated the reported genes’ Tau percentiles and expression levels, to highlight contexts in which the genes are specifically expressed (Table 2, Table S8). We acknowledge that while posterior probability PPH4 ≥ 0.8 demonstrates strong Bayesian evidence for co-localization, it is an arbitrary threshold for characterizing loci as GWAS-eQTL co-localized; we find that many loci with PPH4 ≥ 0.5 appear qualitatively consistent with co-localization. Table 2 GWAS-eQTL Co-localized Loci Chr GWAS Locus Start GWAS Locus End GWAS Lead SNP GWAS p Value eSNP eSNP p Value Primary/Conditional PPH4 Gene Relevant Tissue/Cell Type/Developmental Period 1 2372401 2402501 rs4648845 4.03E−09 rs12037821 4.9E−04 conditional 0.87 SLC35E2 –/–/early mid-prenatal 1 8355697 8638984 rs301797 2.03E−09 rs138050288 1.8E−04 primary 0.95 RERE –/–/– 1 30412551 30443951 rs1498232 1.28E−09 rs2015244 1.8E−08 primary 0.99 PTPRU –/neurons /early mid-prenatal 1 163582923 163766623 rs7521492 5.64E−07 rs10799961 3.18E−11 primary 0.91 PBX1 –/–/early prenatal 1 205015255 205189455 rs16937 8.69E−07 rs12724651 7.31E−07 primary 0.89 TMEM81 –/neurons/– rs12031350 8.15E−06 conditional 0.87 RBBP5 –/–/– 1 214137889 214163689 rs7529073 9.69E−07 rs1431983 1.67E−04 conditional 0.93 PROX1-AS1 cerebellar hemisphere/neurons/adult 2 73194203 73900439 rs56145559 8.42E−08 rs11679809 1.85E−34 primary 0.86 ALMS1P testis/–/– 2 110262036 110398236 rs9330316 7.69E−08 rs892464 2.35E−26 primary 0.92 SEPT10 –/–/late prenatal 2 198148577 198835577 rs6434928 1.48E−11 rs12621129 6.06E−12 primary 0.94 SF3B1 –/–/– 2 200715237 201247789 rs281768 1.78E−14 rs35220450 3.46E−14 primary 0.95 FTCDNL1, AC073043.2 –/–/adult rs186546506 8.77E−04 conditional 0.83 LINC01792, AC007163.3 putamen (basal ganglia)/ –/adult 2 208371631 208531731 rs2709410 5.75E−07 rs34171849 5.86E−17 primary 0.88 METTL21A –/–/– rs2551656 2.85E−09 primary 0.86 CREB1 –/–/early prenatal 2 220033801 220071601 rs6707588 9.51E−07 rs13404754 1.08E−09 primary 0.92 CNPPD1 –/–/– 3 36843183 36945783 rs75968099 3.39E−12 rs9834970 1.88E−05 primary 0.94 DCLK3 nerve - tibial /neurons/infant 3 52281078 53539269 rs2535627 3.96E−11 rs6801235 2.81E−08 conditional 0.86 PPM1M –/neurons/late prenatal 3 63792650 64004050 rs832187 2.58E−08 rs113386200 1.95E−12 primary 0.98 THOC7 –/–/– 3 135807405 136615405 rs7432375 5.27E−11 rs10935184 7.71E−25 primary 0.93 PCCB –/–/– 4 170357552 170646052 rs10520163 1.02E−08 rs7438 1.02E−09 primary 0.97 CLCN3 –/–/– 5 45291475 46404116 rs1501357 1.24E−08 rs9292918 4.45E−05 primary 0.94 BRCAT54, RP11-53O19.1 –/–/adult 6 83779798 84407274 rs3798869 8.57E−10 rs2016358 1.19E−09 primary 0.90 SNAP91 cerebellar hemisphere/–/– 6 108875527 109019327 rs9398171 3.37E−08 rs111727905 3.84E−06 primary 0.97 ZNF259P1 –/–/early mid-prenatal 7 21485312 21545712 rs73060317 6.60E−07 rs141984481 3.59E−05 primary 0.92 SP4 –/–/early prenatal 8 8088038 10056127 rs2945232 2.03E−08 rs2980441 7.68E−69 primary 0.82 FAM86B3P –/–/adolescent 8 26181524 26279124 rs1042992 2.27E−07 rs17055186 3.06E−24 conditional 0.91 SDAD1P1 testis/–/adult 8 38020424 38310924 rs57709857 2.32E−07 rs201999919 1.70E−07 primary 0.88 WHSC1L1 –/–/early prenatal 8 144822546 144871746 rs11784536 1.83E−07 rs12541792 6.45E−35 primary 0.90 FAM83H esophagus - mucosa/oligodendrocytes/adolescent 9 26839508 26909408 rs10967586 4.75E−07 rs12345197 3.90E−06 primary 0.80 IFT74 –/–/– 11 46340213 46751213 rs7951870 1.97E−11 rs16938506 5.08E−05 primary 0.88 MDK –/–/early mid-prenatal 12 57428314 57497814 rs324017 2.13E−07 rs4559 2.02E−05 conditional 0.91 STAT6 –/microglia/adolescent 14 35421614 35847614 rs77477310 1.52E−07 rs1028449 8.09E−04 primary 0.84 RP11-85K15.2 –/–/– 15 78803032 78926732 rs8042374 1.87E−12 rs7171869 1.44E−04 conditional 0.94 IREB2 –/–/early prenatal 15 84661161 85153461 rs950169 7.62E−11 rs35677834 1.54E−34 primary 0.80 LOC101929479, RP11-561C5.3 ovary/–/early mid-prenatal 15 91416560 91436560 rs4702 2.30E−12 rs4702 4.49E−13 primary 1.00 FURIN –/endothelial cells/adolescent 16 4447751 4596451 rs6500602 2.79E−07 rs3747580 4.75E−16 primary 0.90 CORO7 –/–/– rs8046295 2.68E−11 primary 0.89 NMRAL1 –/–/– 16 29924377 30144877 rs12691307 1.30E−10 rs4788203 1.95E−05 primary 0.88 TMEM219 –/–/– rs3935873 7.46E−14 primary 0.87 INO80E –/neurons/– rs4787491 1.60E−04 conditional 0.82 DOC2A brain - cortex/neurons/adolescent 16 58669293 58691393 rs12325245 1.15E−08 rs11647976 4.83E−04 primary 0.94 CNOT1 –/–/– 17 17722402 18030202 rs8082590 6.84E−09 rs4072739 4.74E−13 primary 0.92 DRG2 –/–/– 19 11839736 11859736 rs72986630 4.64E−08 rs72986630 2.20E−14 primary 1.00 ZNF823 –/endothelial cells/early prenatal 19 19374022 19658022 rs2905426 6.92E−09 rs2965199 9.22E−36 primary 0.87 GATAD2A –/–/– 19 50067499 50135399 rs56873913 2.19E−07 rs5023763 9.32E−05 primary 0.93 SNRNP70 –/–/– 22 41408556 42689414 rs9607782 6.76E−12 rs200447424 1.87E−04 primary 0.96 RANGAP1 –/–/– Importantly, for 6 of the 40 co-localizing loci, a conditional rather than primary eQTL co-localized with the GWAS with compelling qualitative support (Table 2, Figure 4, Table S11, Figures S13–S17). The genes showing strong evidence for conditional eQTL co-localization include SLC35E2, PROX1-AS1 (MIM: 601546), PPM1M (MIM: 608979), SDAD1P1, STAT6 (MIM: 601512), and IREB2. Also notable are the occurrences of complex patterns of co-localization for some loci; for example, three loci showed evidence for co-localization with a primary eQTL for one gene and a conditional eQTL for another. Comparison with Previous Co-localization Analyses In the prior CMC study, a GWAS-eQTL co-localization analysis implemented in Sherlock and using non-conditional eQTL summary statistics reported a total of 18 co-localized loci, representing 17% of the 108 genome-wide significant loci examined. Through our all-but-one conditional co-localization analysis, we replicate the majority of their findings and detect an additional 13 instances of co-localization, bringing the total number of co-localizations when considering only the genome-wide significant (and not including the MHC) loci up to 24 (representing 22% of these 108 loci) (Table S9). These 13 comprise instances of conditional eQTL co-localization (for genes SLC35E2 and IREB2) and improved detection of primary eQTL co-localization due to isolation of independent eQTL signatures and our choice of co-localization software (coloc2). Of the six co-localized loci identified in the previous but not current analysis, three resulted from differences in study design such as GWAS locus definition and eQTL overlap criteria, and two were suggestive in the current analysis (0.65 < PPH4 < 0.8). The one remaining discrepant locus (chr8:143302933–143403527) was found to co-localize with TSNARE1 eQTL previously (Sherlock p = 8.24 × 10−7) but not here (coloc2 primary eQTL PPH4 = 0.074, PPH3 = 0.93). A qualitative comparison of the eQTL and GWAS data (Figure S18) did not appear to support co-localization; while the strongest GWAS association and the strongest eQTL are in close physical proximity, the LD between the two index SNPs is low (r2∼0.2–0.4). Additionally, our attempts to disentangle independent eQTL signal via conditional analysis do not reveal the GWAS index SNP to be in high LD with any of the conditionally independent eQTL peaks. We also compared our conditional co-localization results with those from non-conditional eQTL analysis, using coloc2 and the same SCZ GWAS loci (Table S10). Conditional and non-conditional coloc2 results were highly concordant, with slightly higher PPH4s resulting from the same WABFs due to a higher prior probability of co-localization estimated in the non-conditional coloc2 analysis. Thirty-five loci were co-localized in both analyses; five loci that were co-localized in the non-conditional analysis only were highly suggestive in the conditional analysis (0.65 < PPH4 < 0.8), and the five loci that were co-localized only in the conditional coloc2 analysis involved conditional eQTL, emphasizing the utility of the conditional analysis. This conditional eQTL co-localization represents a substantial proportion (∼15%) of all instances of co-localization, and furthermore could reflect context-specific differential expression that has the potential to implicate cell types, tissue types, and developmental stages that are relevant to disease etiology. Discussion We utilized genotype and expression data from 467 human post-mortem brain samples from the DLPFC to conduct eQTL mapping analyses, to characterize both primary and conditional eQTL. We then identified co-localization between SCZ GWAS and eQTL association signals, comprising both primary and conditional eQTL. Our principal findings include four major observations. First, we detect that conditional eQTL are widespread in the brain tissue samples we investigated. In 63% of genes with at least one eQTL, we found multiple statistically independent eQTL (representing 8,136 genes). In addition, conditional eQTL make substantial contributions to regulatory genetic variation, as there is a strong association between eQTL number and gene expression cis-SNP-heritability. This demonstrates that genetic variation affecting RNA abundance is incompletely characterized by focusing on only one primary eQTL per gene, which is the case currently for most eQTL studies. Second, we find the genomics of conditional eQTL and their genes are consistent with complex, context-specific regulation of gene expression, which may be conferred through overlap with distal regulatory elements. Genes with more independent eQTL tend to be larger and span multiple recombination hotspot intervals, and tend to be less constrained at the protein level. While these associations may reflect in part greater power to detect independent eQTL that are not in linkage disequilibrium and explain more phenotypic variance, they are also consistent with more complex regulation and greater potential for regulatory genetic variation. Context-specific genetic regulation of expression could manifest as conditional eQTL signal in the analysis of expression from a heterogeneous source. For example, eQTL in naive and stimulated (LPS, IFN) monocytes55 may occur as either primary or conditional eQTL in our CMC data, due to related microglial cells being present in brain tissue homogenate. We found that 60 stimulation-specific eQTL (FDR < 0.01 in interferon or lipopolysaccharide stimulated monocytes, but FDR ≥ 0.05 in naive monocytes) were also conditional eQTL in DLPFC. Notably, rs7171787, a conditional (tertiary) eQTL in our DLPFC analysis, is a stimulation-specific monocyte eQTL for the neurodevelopmental56, 57, 58 gene CYFIP1. In our data, associations with specificity of expression across tissues, developmental periods, and cell types determined from single-cell RNA-sequencing data suggest that context specificity plays a role in the occurrence of multiple statistically independent eQTL. Cell type specificity is particularly strongly correlated with eQTL number, consistent with those cell types being present in the current tissue homogenate data. Since previous studies have shown the importance of developmental59, 60, 61, 62 or cell-specific contributions61, 63, 64, 65, 66 to schizophrenia, interrogation of independent eQTL effects may elucidate developmental or tissue-specific effects obscured in whole-tissue eQTL studies. This context specificity of expression regulation is potentially mediated through overlap of eSNPs with distal regulatory elements, such as enhancers. Conditional eQTL occur farther from transcription start sites than primary eQTL, consistent with effects on enhancers. In addition, while both primary and conditional eQTL are enriched in both active promoter and enhancer regions, their enrichment in active promoters diminishes with increasing conditional eQTL order. In other words, conditional eQTL show greater enrichment in enhancers relative to promoters than do primary eQTL. Third, we have identified a number of candidate genes for which genetic variation for expression co-localizes with genetic variation for schizophrenia risk (Table 2), including cases of co-localization with conditional eQTL. Genetic co-localization is expected if gene expression causally mediates disease risk, although we recognize that co-localization could also result from pleiotropy or linkage, particularly in regions of extensive linkage disequilibrium and haplotype structure.40, 67 We also note that several co-localization methods have recently been developed,37, 38, 40, 41, 42 and direct comparisons have found broad concordance among these methods and a high degree of specificity of positive results using coloc.42, 45, 46 However, some differences in results would likely be achieved using alternative co-localization methods. Our analyses prioritize 27 genes within 24 genome-wide significant (GWAS p < 5 × 10−8) SCZ loci and 19 genes in 17 suggestive (p < 1 × 10−6) loci. In addition to a number of previously implicated SCZ risk genes, our findings include several genes not previously considered as candidates,35 in some cases—e.g., SLC35E2, PTPRU (MIM: 602454), LINC01792, DCLK3, PPM1M, LOC101929479—because the genes themselves do not overlap the GWAS locus regions but their eQTL do. In examining these genes for expression specificity in GTEx tissues, brain sample cell types from single-cell RNA-seq,29 and in BrainSpan DLPFC developmental periods (Tables 2 and S8), we find their expression contexts show a diversity of patterns and can provide clues to generate specific hypotheses for functional follow-up of their potential roles in SCZ. Interestingly, genes broadly expressed across cell types tend to show prenatal expression. Fourth, we highlight the importance of examining conditional eQTL for co-localization with GWASs. In at least 6 out of 40 loci showing GWAS-eQTL co-localization, a conditional eQTL signal co-localizes with SCZ risk. This is likely to be a conservative estimate, as the smaller effect sizes of conditional eQTL results in bias against detection of conditional GWAS-eQTL co-localization. If we had considered only primary eQTL in the analyses, these instances of co-localization would not have been identified. Among our highlighted conditional eQTL-GWAS co-localized genes are IREB2, STAT6, and PROX1-AS1. IREB2 (iron regulatory element binding protein 2) is a key regulator of iron homeostasis68, 69 that has been previously implicated in neurodegenerative disorders.70, 71 Mouse IREB2 homolog Irp2 knockouts exhibit impairments in coordination and balance, exploration, and nociception.69 The immune-related transcription factor STAT6 induces interleukin 4 (IL-4)-mediated anti-apoptotic activity of T helper cells, and the locus is associated with migraine72, 73 and brain glioma74 as well as several immune/inflammatory diseases.75, 76, 77 STAT6 also activates neuronal progenitor/stem cells and neurogenesis,78 making it intriguing as an immune-related SCZ candidate given recent observations about the role of complement factor 4 (C4) gene as a SCZ risk gene79 and prior work potentially implicating microglia.80 Consistent with a role in immune-mediated synaptic pruning, STAT6 expression is broadly postnatal and shows specificity for microglia (Table S8). PROX1-AS1 encodes a lncRNA that has been implicated as aberrantly expressed in several cancers, is upregulated in the cell cycle S-phase, and promotes G1/S transition in cell culture.81 As a potential regulator of the Prospero Homeobox 1 (PROX1) transcription factor, it could be involved in development and cell differentiation in several tissues, including oligodendrocytes82 and GABAnergic interneurons83 in the brain. PROX1-AS1 expression is specific to neurons and mature oligodendrocytes and is expressed postnatally (Table S8). In conclusion, we find that conditional eQTL are widespread and are consistent with complex and context-specific regulation. Accounting for conditional eQTL leads to new findings of GWAS-eQTL co-localization and generates specific hypotheses for the role of gene expression regulation in disease etiology. The analytical strategy presented here could be implemented as a means of identification of putatively causal genes for any phenotype in which GWAS summary statistics and expression and genotype data from the GWAS phenotype-relevant tissue are available. Conditional eQTL that co-localize with disease risk may reflect regulatory mechanisms that are important in a key developmental period or individual cell type and may be missed when focusing on primary eQTL discovered in adult whole tissue. As further efforts are made to generate data across ranges of tissues or individual cell types, we may have a better ability to directly identify regulatory variants specific to these contexts. However, if a variant is primarily active in a very specific time point or stimulus condition, capturing data reflecting this condition will remain challenging. Conditional co-localization analysis in well-powered eQTL cohorts may best identify the genes driving these trait associations, though further validation work will be required to understand the mechanism by which the gene contributes to disease risk. Consortia CMC leadership: Pamela Sklar, Joseph Buxbaum (Icahn School of Medicine at Mount Sinai), Bernie Devlin, David Lewis (University of Pittsburgh), Raquel Gur, Chang-Gyu Hahn (University of Pennsylvania), Keisuke Hirai, Hiroyoshi Toyoshiba (Takeda Pharmaceuticals Company Limited), Enrico Domenici, Laurent Essioux (F. Hoffmann-La Roche Ltd), Lara Mangravite, Mette Peters (Sage Bionetworks), Thomas Lehner, and Barbara Lipska (NIMH). Additional members of CMC: A. Ercument Cicek, Cong Lu, Kathryn Roeder, Lu Xie (Carnegie Mellon Univ.); Konrad Talbot (Cedars-Sinai Medical Center); Scott E. Hemby (High Point Univ.); Laurent Essioux (Hoffmann-La Roche); Andrew Browne, Andrew Chess, Aaron Topol, Alexander Charney, Amanda Dobbyn, Ben Readhead, Bin Zhang, Dalila Pinto, David A. Bennett, David H. Kavanagh, Douglas M. Ruderfer, Eli A. Stahl, Eric E. Schadt, Gabriel E. Hoffman, Hardik R. Shah, Jun Zhu, Jessica S. Johnson, John F. Fullard, Joel T. Dudley, Kiran Girdhar, Kristen J. Brennand, Laura G. Sloofman, Laura M. Huckins, Menachem Fromer, Milind C. Mahajan, Panos Roussos, Schahram Akbarian, Shaun M. Purcell, Tymor Hamamsy, Towfique Raj, Vahram Haroutunian, Ying-Chih Wang, Zeynep H. Gümüş (Mount Sinai School of Med.); Geetha Senthil, Robin Kramer (NIMH); Benjamin A. Logsdon, Jonathan M.J. Derry, Kristen K. Dang, Solveig K. Sieberts, Thanneer M. Perumal (Sage Bionetworks); Roberto Visintainer (Univ. Trento, Italy); Leslie A. Shinobu (Takeda); Patrick F. Sullivan (Univ. North Carolina); and Lambertus L. Klei (Univ. Pittsburgh School of Med.). Web Resources AMP-AD Knowledge Portal, https://www.synapse.org/ampad BrainSpan – Atlas of the Developing Human Brain, http://www.brainspan.org/ CommonMind Consortium data, https://www.synapse.org/CMC CommonMind Consortium ChIP-seq data, https://www.synapse.org/#!Synapse:syn8040458 coloc2, https://github.com/Stahl-Lab-MSSM/coloc2 dbGaP (accession number phs000979), http://ncbi.nlm.nih.gov/gap ExAC Functional Gene Constraint, http://exac.broadinstitute.org/downloads GCTA, http://cnsgenomics.com/software/gcta/ GemTools, http://wpicr.wpic.pitt.edu/WPICCompgen/GemTools/GemTools.htm GEO (accession number GSE67835), https://www.ncbi.nlm.nih.gov/geo/ GTEx Portal, https://www.gtexportal.org/home/ HBCC microarray cohort, dbGaP (ID: phs000979.v1.p1), https://www.ncbi.nlm.nih.gov/gap LDetect LD blocks, https://bitbucket.org/nygcresearch/ldetect-data/overview NIH Roadmap Epigenomics Project chromatin state learning, http://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state OMIM, http://www.omim.org/ qvalue, http://bioconductor.org/packages/release/bioc/html/qvalue.html R statistical software, https://www.r-project.org/ SNPsnap, https://data.broadinstitute.org/mpg/snpsnap/ SVA: Surrogate Variable Analysis, R package version 3.24.4, http://bioconductor.org/packages/release/bioc/html/sva.html variancePartition, http://bioconductor.org/packages/release/bioc/html/variancePartition.html Supplemental Data Document S1. Figures S1–S18 Tables S1–S12. Additional Data Document S2. Article plus Supplemental Data Acknowledgments Dedicated to the memory of Pamela Sklar, MD, PhD. Data were generated as part of the CommonMind Consortium supported by funding from Takeda Pharmaceuticals Company Limited, F. Hoffmann-La Roche Ltd and NIH grants R01MH085542, R01MH093725, P50MH066392, P50MH080405, R01MH097276, RO1-MH-075916, P50M096891, P50MH084053S1, R37MH057881, R37MH057881S1, HHSN271201300031C, AG02219, AG05138, and MH06692. Brain tissue for the study was obtained from the following brain bank collections: the Mount Sinai NIH Brain and Tissue Repository, the University of Pennsylvania Alzheimer Disease Core Center, and the University of Pittsburgh NeuroBioBank and Brain and Tissue Repositories. Data from the NIMH Human Brain Collection Core were generated as part of the NIMH Human Brain Collection Core (NIH NCT00001260, 999917073). ROSMAP study data were provided by the Rush Alzheimer Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, the Illinois Department of Public Health, and the Translational Genomics Research Institute. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 09/05/16. BrainSpan: Atlas of the Developing Human Brain. Funded by ARRA Awards 1RC2MH089921-01, 1RC2MH090047-01, and 1RC2MH089929-01. Supplemental Data include 18 figures and 12 tables and can be found with this article online at https://doi.org/10.1016/j.ajhg.2018.04.011.

Document structure show

article-title	Landscape of Conditional eQTL in Dorsolateral Prefrontal Cortex and Co-localization with Schizophrenia GWAS
abstract	Causal genes and variants within genome-wide association study (GWAS) loci can be identified by integrating GWAS statistics with expression quantitative trait loci (eQTL) and determining which variants underlie both GWAS and eQTL signals. Most analyses, however, consider only the marginal eQTL signal, rather than dissect this signal into multiple conditionally independent signals for each gene. Here we show that analyzing conditional eQTL signatures, which could be important under specific cellular or temporal contexts, leads to improved fine mapping of GWAS associations. Using genotypes and gene expression levels from post-mortem human brain samples (n = 467) reported by the CommonMind Consortium (CMC), we find that conditional eQTL are widespread; 63% of genes with primary eQTL also have conditional eQTL. In addition, genomic features associated with conditional eQTL are consistent with context-specific (e.g., tissue-, cell type-, or developmental time point-specific) regulation of gene expression. Integrating the 2014 Psychiatric Genomics Consortium schizophrenia (SCZ) GWAS and CMC primary and conditional eQTL data reveals 40 loci with strong evidence for co-localization (posterior probability > 0.8), including six loci with co-localization of conditional eQTL. Our co-localization analyses support previously reported genes, identify novel genes associated with schizophrenia risk, and provide specific hypotheses for their functional follow-up.
p	Causal genes and variants within genome-wide association study (GWAS) loci can be identified by integrating GWAS statistics with expression quantitative trait loci (eQTL) and determining which variants underlie both GWAS and eQTL signals. Most analyses, however, consider only the marginal eQTL signal, rather than dissect this signal into multiple conditionally independent signals for each gene. Here we show that analyzing conditional eQTL signatures, which could be important under specific cellular or temporal contexts, leads to improved fine mapping of GWAS associations. Using genotypes and gene expression levels from post-mortem human brain samples (n = 467) reported by the CommonMind Consortium (CMC), we find that conditional eQTL are widespread; 63% of genes with primary eQTL also have conditional eQTL. In addition, genomic features associated with conditional eQTL are consistent with context-specific (e.g., tissue-, cell type-, or developmental time point-specific) regulation of gene expression. Integrating the 2014 Psychiatric Genomics Consortium schizophrenia (SCZ) GWAS and CMC primary and conditional eQTL data reveals 40 loci with strong evidence for co-localization (posterior probability > 0.8), including six loci with co-localization of conditional eQTL. Our co-localization analyses support previously reported genes, identify novel genes associated with schizophrenia risk, and provide specific hypotheses for their functional follow-up.
body	Introduction Significant advances in understanding the genetic architecture of schizophrenia (MIM: 181500) have occurred within the last 10 years. However, for common variants identified in genome-wide association studies (GWASs), the success in locus identification is not yet matched by an understanding of their underlying basic mechanism or effect on pathophysiology. Expression quantitative trait loci (eQTL), which are responsible for a significant proportion of variation in gene expression, could serve as a link between the numerous non-coding genetic associations that have been identified in GWASs and susceptibility to common diseases directly through their association with gene expression regulation.1, 2, 3, 4 Accordingly, results from eQTL mapping studies have been successfully utilized to identify genes and causal variants from GWASs for various complex phenotypes, including asthma (MIM: 600807), body mass index (MIM: 601665), celiac disease (MIM: 212750), and Crohn disease (MIM: 266600).5, 6, 7, 8 Studies integrating eQTL and GWAS data have almost exclusively used marginal association statistics which typically represent the primary, or most significant, eQTL signal when assessing co-localization with GWASs, ignoring other SNPs that affect expression independently of the primary eQTL for a given gene. However, recent findings indicating that conditionally independent eQTL are widespread9, 10, 11, 12 motivate examination of the extent to which considering conditional eQTL may provide additional power to identify likely causal genes in a GWAS locus. Recent reports provide evidence that conditional eQTL are less frequently shared across tissues than primary eQTL10 and, like tissue- and cell type-specific eQTL, are often found more distally to the genes they regulate.10, 13, 14 These lines of evidence suggest that conditionally independent eQTL may contribute to tissue-specific or other context-specific gene regulation (e.g., specific to a particular cell type, developmental stage, or stimulation condition). One mechanism by which disease risk could potentially be mediated by a conditional eQTL is the disruption of a tissue-specific enhancer by a given variant, leading to the dysregulation of the relevant eGene in only the tissue for which the enhancer is specific. For example, an eQTL affecting Parkinson disease risk through expression of SNCA was recently shown to act through the disruption of an enhancer;15 if this enhancer is specific to a disease-relevant cell type, such as nerve cells of the substantia nigra, then it could manifest as a conditional eQTL since it would be only partially represented in brain homogenate. Here, we leveraged genotype and dorsolateral prefrontal cortex (DLPFC) expression data provided by the CommonMind Consortium (CMC) to elucidate the role of conditional eQTL in the etiology of schizophrenia (SCZ). Currently comprising the largest existing postmortem brain genomic resource at nearly 600 samples, the CMC is generating and making publicly available an unprecedented array of functional genomic data, including gene expression (RNA sequencing), histone modification (chromatin immunoprecipitation [ChIP-seq]), and SNP genotypes, from individuals with psychiatric disorders as well as unaffected controls.16 We utilized SNP dosage and RNA-sequencing (RNA-seq) data from the CMC to identify primary and conditionally independent eQTL. We then characterized the resulting eQTL on various genomic attributes including distance to transcription start site and their genes’ specificities across tissues, cell types, and developmental periods. In addition, we quantified enrichment of primary and conditional eQTL in promoter and enhancer functional genomic elements inferred from epigenomic data. Finally, we isolated each independent eQTL signal by conducting a series of “all-but-one” conditional analyses for genes with multiple independent eQTL and then assessed the overlap between all eQTL association signals and the schizophrenia GWAS signals. Material and Methods CommonMind Consortium Data We used pre-QC’ed genotype and expression data from the CommonMind Consortium, and detailed information on quality control, data adjustment, and normalization procedures can be found in Fromer et al.16 Briefly, samples were genotyped at 958,178 markers using the Illumina Infinium HumanOmniExpressExome array and markers were removed on the basis of having no alternate alleles, having a genotyping call rate ≤ 0.98, or having a Hardy-Weinberg p value < 5 × 10−5. After QC, 668 individuals genotyped at 767,368 markers were used for imputation. Phasing was performed on each chromosome using ShapeIt v2.r790,17 and variants were imputed in 5 Mb segments with Impute v2.3.118 using the 1000 Genomes Phase 1 integrated reference panel,19 excluding singleton variants. After phasing and imputation, then filtering out variants with INFO < 0.8 or MAF < 0.05, the number of markers included in the analysis totaled approximately 6.4 million. Gene expression was assayed via RNA-seq using 100 base pair paired end reads and was mapped to human Ensembl gene reference (v.70) using TopHat v.2.0.9 and Bowtie v.2.1.0. After discarding genes with less than 1 CPM (counts per million) in at least 50% of the samples, RNA-seq data for a total of 16,423 Ensembl genes was considered for analysis. The expression data was voom-adjusted for both known covariates (RIN, library batch, institution, diagnosis, post-mortem interval, and sex) and 20 surrogate variables identified via surrogate variable analysis (SVA).20 After the removal of samples that did not pass RNA sample QC (including but not limited to: having RIN < 5.5, having less than 50 million total reads or more than 5% of reads aligning to rRNA, having any discordance between genotyping and RNA-seq data, and having RNA outlier status or evidence for contamination) and retaining only genetically identified European-ancestry individuals, a total of 467 samples was used for downstream analyses. These 467 individuals comprised 209 SCZ-affected case subjects, 52 AFF (bipolar, major depressive disorder, or mood disorder, unspecified)-affected case subjects, and 206 control subjects. eQTL Identification An overview of our workflow can be found in Figure S1. First, to identify primary and conditional cis-eQTL, we a conducted forward stepwise conditional analysis implemented in MatrixEQTL21 using genotype data at 6.4 million markers and RNA-seq data for 16,423 genes. FDR was initially assessed using the Benjamini-Hochberg algorithm across all cis-eQTL tests within each chromosome. FDR was not re-assessed at each conditional step; instead, a fixed p value threshold was used as the inclusion criteria in the stepwise model selection. For each gene with at least one cis-eQTL (gene ± 1 Mb) association at a 5% false discovery rate (FDR), the most significant SNP was added as a covariate in order to identify additional independent associations (considered significant if the p value achieved was less than that corresponding to the initial 5% FDR for primary eQTL). This procedure was repeated iteratively until no further eQTL met the p value threshold criteria. We used a linear regression model, adjusting for diagnosis and five ancestry covariates inferred by GemTools. Following eQTL identification, only autosomal eQTL were retained for downstream analyses. Replication in Independent Datasets Replication was performed in the HBCC microarray cohort (dbGaP: phs000979, see Web Resources) and in the ROSMAP22 RNA-seq cohort by fitting the stepwise regression models identified in the CMC data. For cases in which a marker was unavailable in the replication cohort, all models including that marker (i.e., for that eQTL and higher-order eQTL conditional on it, for a given gene) were omitted from replication. Data from the HBCC cohort was QC’ed and normalized as described in Fromer et al.16 DLPFC tissue was profiled on the Illumina HumanHT-12_V4 BeadChip and normalized in an analogous manner to the CMC data. Genotypes were obtained using the HumanHap650Yv3 or Human1MDuov3 chips and imputed using the 1000 Genomes Phase 1 reference panel. Replication of the eQTL models was performed on 279 genetically inferred European-ancestry samples (76 control subjects, 72 SCZ-affected subjects, 43 BP-affected subjects, 88 MDD-affected subjects), adjusting for diagnosis and five ancestry components. ROSMAP data were obtained from the AMP-AD Knowledge Portal (see Web Resources). Quantile normalized FPKM expression values were adjusted for age of death, RIN, PMI, and 31 hidden confounders from SVA, conditional on diagnosis. Only genes with FPKM > 0 in more than 50 samples were retained. QC’ed genotypes were also obtained from the AMP-AD Knowledge Portal and imputed to the Haplotype Reference Consortium (v.1.1)23 reference panel via the Michigan Imputation Server.24 Only markers with imputation quality score R2 ≥ 0.7 were considered in the replication analysis. GemTools was used to infer ancestry components as was done for the CMC data above. After QC, 494 samples were used for eQTL replication in a linear regression model that also adjusted for diagnosis (Alzheimer disease, mild cognitive impairment, no cognitive impairment, and other) and four ancestry components. Modeling Number of eQTL per Gene on Genomic Features We considered three genomic features (gene length, number of LD blocks in the cis-region, and genic constraint score) for our modeling analyses. Gene lengths were calculated using Ensembl gene locations. We obtained LD blocks from the LDetect Bitbucket site to tally the number of blocks overlapping each gene’s cis-region (gene ± 1 Mb). We obtained loss-of-function-based genic constraint scores from the Exome Aggregation Consortium (ExAC). A negative binomial generalized linear regression model was used to model the number of eQTL per gene based on the above variables; results were qualitatively the same using linear regression of Box-Cox transformed eQTL numbers. Backward-forward stepwise regression using the full model with interaction terms for these three variables was used to determine the relationship between genomic attributes and eQTL number. These analyses were implemented in R. cis-heritability of gene expression was estimated using the same CMC data that were used for eQTL detection, including all markers in the cis-region and implemented in GCTA.25 SNP-heritability estimates were then added to the modeling procedure described above. Tissue, cell type, and developmental time point specificity were measured using the expression specificity metric Tau.26, 27 Tissue specificity for each gene was calculated using publicly available expression data for 53 tissues from the GTEx project28 (release V6p). Expression for each tissue was summarized as the log2 of the median expression plus one, and then used to calculate tissue specificity Tau. Cell type specificity for each gene was computed using publicly available single-cell RNA-sequencing expression data29 generated from human cortex and hippocampus tissues. Raw expression counts for 285 cells comprising six major cell types of the brain were obtained from GEO (GSE67835) and counts data were library normalized to CPM. Expression for each cell type was summarized as the log2 of the mean expression plus one, and then used to compute cell type specificity Tau. Developmental time point specificity for each gene was calculated using publicly available DLPFC expression data for 27 time points, clustered into eight biologically relevant groups, from the BrainSpan atlas (see Web Resources). Eight developmental periods30 were defined as follows: early prenatal (8–12 pcw), early mid-prenatal (13–17 pcw), late mid-prenatal (19–24 pcw), late prenatal (25–37 pcw), infancy (4 months–1 year), childhood (2–11 years), adolescence (13–19 years), and adulthood (21+ years). Expression for each time point was summarized as the log2 of the median expression plus one and then used to calculate developmental period specificity Tau. Each Tau was added to the above model for eQTL number individually, as well as all together. Enrichment Analyses We divided eQTL into separate subgroups by stepwise conditional order (first, second, and greater than second) and created sets of matched SNPs drawn from the SNPsnap31 database for each subgroup, matching on minor allele frequency, gene density (number of genes within 1 Mb of the SNP), distance from SNP to TSS of the nearest gene, and LD (number of LD-partners within r2 ≥ 0.8). For each subgroup of eQTL, we performed a logistic regression of status as eQTL or matched SNP on overlap with functional annotation, including the four SNP matching parameters as covariates. Enrichment was taken as the regression coefficient estimate, interpretable as the log-odds ratio for being an eQTL given a functional annotation. Functional annotations tested included: brain promoters and enhancers (union of all brain region TssA and Enh+EnhG intervals, respectively, from the NIH Roadmap Epigenomics Project32 ChromHMM33 core 15-state model), brain-specific promoters and enhancers (the union of all brain region TssA and Enh+EnhG intervals, excluding those present in seven other non-brain tissues/cell types: primary T helper cells from peripheral blood, osteoblast primary cells, HUES64 cells, adipose nuclei, liver, NHLF lung fibroblast primary cells, and NHEK-epidermal keratinocyte primary cells), and pre-frontal cortex (PFC) neuronal (NeuN+) and non-neuronal (NeuN−) nucleus H3K4me3 and H3K27ac ChIP-seq marks from the CMC. For each data source, active promoter and enhancer (or H3K4me3 and H3K27ac) annotations were tested for enrichment jointly. This analysis was repeated but restricting to matched SNPs located within 1 Mb of any of the 16,423 genes that were tested for eQTL, in order to determine whether the enrichment estimates were inflated due to the proximity of our primary and conditional eQTL to brain-expressed genes, which may be more likely to occur near active regulatory regions in the brain. In addition, to ensure that any enrichment patterns observed were not due to varying effect size among primary and conditional eQTL, the enrichment analyses were also carried out taking into account the variance in expression explained by each eQTL. Variance explained (R2) was estimated using the variancePartition34 R package, and eQTL were stratified into three R2 bins: bin 1, 1 × 10−2 ≤ R2 ≤ 1.75 × 10−2; bin 2, 1.75 × 10−2 ≤ R2 ≤ 2.25 × 10−2; and bin 3, 2.25 × 10−2 ≤ R2 ≤ 3 × 10−2. Logistic regression of status as eQTL or matched SNP was then carried out separately for each R2 bin, within each eQTL order. Conditional eQTL Analyses In order to isolate each conditionally independent cis-eQTL association, we carried out a series of “all-but-one” conditional analyses, implemented within MatrixEQTL,21 for each gene possessing more than one independent eQTL. As these conditional eQTL signals were to be used to test for co-localization with the SCZ GWAS signals, we limited these analyses to those genes (346 in total) with eQTL overlapping GWAS loci. For each of these genes, we conducted an all-but-one analysis for each independent eQTL by regressing the given gene’s expression data on the dosage data, including all of the other independent eQTL for that gene as covariates in addition to diagnosis and five ancestry components. For example, three conditional analyses would be conducted for a gene with three independent eQTL: one analysis conditioning on the secondary and tertiary eQTL, one analysis conditioning on the primary and tertiary, and one analysis conditioning on the primary and secondary. In this manner we generated summary statistics for each independent eQTL in isolation, conditional on all of the other independent eQTL for that gene. Co-localization Analyses For our co-localization analyses, we used summary statistics and genomic intervals from the 2014 Psychiatric Genomics Consortium (PGC) SCZ GWAS.35 We included 217 loci at a p value threshold of 1 × 10−6 (excluding the MHC locus), defined these loci by their LD r2 ≥ 0.6 with the lead SNP, and then merged overlapping loci. GWAS and eQTL signatures were qualitatively compared using p value-p value (P-P) plots, rendered in R, and LocusZoom36 plots. Multiple methods that aim to identify GWAS-eQTL co-localized loci are currently available.37, 38, 39, 40, 41, 42 We chose to further develop coloc39 for our co-localization analyses for several reasons: (1) it uses data from all SNPs within a locus; (2) it avoids the computational burden or approximate results of Bayesian inferential methods for causal variants,41, 42 which rely on reference panel estimates of linkage disequilibrium (LD); and (3) and it has been widely used43, 44, 45 including in direct comparisons of GWAS-eQTL co-localization methods.42, 46 We tested for co-localization using an updated version of coloc39 R functions, which we name coloc2 (see Web Resources), and incorporated several improvements to the method. First, coloc2 pre-processes data by aligning eQTL and GWAS summary statistics for each eQTL cis-region. Second, the coloc2 model optionally incorporates changes implemented in gwas-pw.43 Briefly, we implemented likelihood estimation of mixture proportions of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized) from genome-wide data. Coloc2 uses these proportions as priors (or optionally, coloc default or user-specified priors) in the empirical Bayesian calculation of the posterior probability of co-localization for each locus (eQTL cis-region). Coloc2 averages per-SNP Wakefield asymptotic Bayes factors (WABF)47 across three different values for the WABF prior variance term, 0.01, 0.1, and 0.5, and provides options for specifying phenotypic variance, estimating it from case-control proportions or estimating it from the data. Results Identification of eQTL Primary and conditional eQTL were identified using genotype and RNA-seq data from the CommonMind Consortium post-mortem DLPFC samples (467 European-ancestry case and control subjects).16 We identified 12,813 primary and 16,082 conditional eQTL, totaling 28,895 independent eQTL. Of the genes tested, 81% (12,813 of 15,817 autosomal genes) had at least one eQTL and 63% of these (51% of all genes) also had at least one conditional eQTL, with an average of 1.83 independent eQTL per gene (2.26 among those with at least one eQTL) (Figure 1A). Conversely, when examining the distributions for the number of genes whose expression was affected by each eQTL (Table S1), the majority of eQTL were specific for a single gene, and only a small fraction of eQTL, 1.47%, affected more than one gene, with a maximum of six genes affected by a single eQTL. Figure 1 Characterization of Conditional eQTL (A) Counts of the numbers of genes (y axis) regulated by at least N (1 ≤ N ≤ 16) independent eQTL (x axis). (B) Median Tau value (y axis) for genes with N independent eQTL (x axis), colored by Tau type (cell type, developmental time point, or tissue type Tau). (C) Density plot representing the distance from eSNP to eGene transcription start site (TSS), colored by eQTL order. Dashed lines represent the median distance to TSS for each order of eQTL. We tested for replication of conditional eQTL in two independent datasets, the National Institute of Mental Health’s Human Brain Collection Core (HBCC, n = 279, microarray expression data) and the Religious Orders Study/Memory and Aging Project22 (ROSMAP, n = 494, RNA-seq expression). For each gene the same models were evaluated that were identified in forward-stepwise conditional analysis in the CMC data. We observed significant evidence of replication for both primary and conditional eQTL in the HBCC and ROSMAP post-mortem brain cohorts (Table S2). The estimated proportion of true associations (π1) in ROSMAP was 0.57 and 0.26 for primary and conditional eQTL, respectively; in HBCC π1 was 0.46 and 0.20 for primary and conditional eQTL. Therefore, replication was stronger for primary than for conditional eQTL, as expected given their stronger effect sizes. Replication rates were somewhat higher in the RNA-seq ROSMAP data than in HBCC. Genomic Characterization of Primary and Conditional eQTL The features for which primary and conditional eQTL and their respective eGenes displayed identifiable differences included distance from eQTL to its gene’s transcription start site (TSS), gene length, LD blocks per genic cis-region, genic constraint score, and genic cis-SNP-heritability. According to prior results, eQTL that are shared across tissues and cell types tend to be located closer to transcription start sites than context-specific eQTL;13, 14 we therefore first examined the relationship between primary or conditional eQTL status and distance to genic TSS. Primary eQTL fall closer to the TSS than conditional eQTL (Figure 1C): primary eQTL occur at a median distance of 70.4 kb from the TSS versus a median distance of 302 kb for conditional eQTL. This difference holds true even more proximally to the TSS (Figure S2); 8.1% and 2.5% of primary and conditional eQTL, respectively, fall within 3 kb of the TSS. We next characterized the relationship between the number of independent eQTL per gene and three different genomic features: gene length, number of LD blocks48 in the gene’s cis-region (±1 Mb), and Exome Aggregation Consortium (ExAC) genic constraint score,49 including possible interactions. The best multivariate model for eQTL number included gene length, number of LD blocks, and genic constraint as predictors, as well as a gene length-LD blocks interaction (Table 1). The number of independent eQTL was positively correlated with gene length and number of LD blocks and negatively correlated with genic constraint score (Figure S3). We then examined the variance of gene expression explained by cis-region SNPs, or cis-SNP-heritability, estimated by linear mixed model variance component analysis25 (Figure S4). We found a strong effect of estimated cis-heritability on number of independent eQTL (Table 1, Figure S5). In a joint model with cis-SNP-heritability, the main effects of gene length, number of LD blocks, and genic constraint on eQTL number remained at least nominally significant. Table 1 Number of eQTL per Gene Modeled on Genomic Features Predictor Model 1 Estimate Model 1 Robust SE Model 1 Pr(> \|z\|) Model 2 Estimate Model 2 Robust SE Model 2 Pr(> \|z\|) Model 3 Estimate Model 3 Robust SE Model 3 Pr(> \|z\|) log(Gene length) 0.27 0.04 5.16E−12 0.16 0.03 2.20E−06 0.17 0.03 9.87E−07 LD blocks 0.59 0.17 6.47E−04 0.33 0.15 2.92E−02 0.37 0.15 1.55E−02 log(Gene length): LD blocks −0.03 0.02 7.77E−02 −0.01 0.01 5.65E−01 −0.01 0.01 4.11E−01 Constraint −0.61 0.03 5.93E−85 −0.20 0.03 2.93E−13 −0.15 0.03 5.41E−08 cis-heritability – – – 7.03 0.18 0.00 7.02 0.18 0.00 Tau (tissue) – – – – – – 0.08 0.08 2.76E−01 Tau (DLPFC cell type) – – – – – – 0.20 0.09 3.69E−02 Tau (developmental time point) – – – – – – 0.17 0.09 5.99E−02 We then addressed whether genes with conditional eQTL exhibit greater context specificity as measured by the robust expression specificity metric Tau.26, 27 We calculated Tau across 53 tissues from the Genotype-Tissue Expression (GTEx) project, across 6 DLPFC cell types (astrocytes, endothelial cells, microglia, neurons, oligodendrocytes, and oligodendrocyte progenitor cells) from single-cell RNA-seq,29 and across 8 developmental periods30 (early prenatal, early mid-prenatal, late mid-prenatal, late prenatal, infant, child, adolescent, and adult) from the BrainSpan atlas DLPFC RNA-seq data. We confirmed that higher values of Tau reflect expression specificity by comparing the distributions of all three Tau measures for all genes with the distributions for a subset of housekeeping genes50 (Figure S6). We found positive correlations between eQTL number and tissue, cell type, and developmental time point specificities (Figure 1B, Table 1, Table S3, Figure S7). In a joint model, the strongest correlation was with DLPFC cell type Tau, which is consistent with previous data demonstrating tissue-specific, cell type-dependent expression in blood;12 however, we note that all three Tau sets were inter-correlated (Table S3). Epigenetic Enrichment Analyses One way in which eQTL may affect gene expression is through alteration of cis-regulatory elements such as promoters and enhancers. Putative causal eSNPs have been shown to be enriched in genomic regions containing functional annotations such as DNase hypersensitive sites, transcription factor binding sites, promoters, and enhancers.51, 52, 53, 54 Our observation that conditional eQTL fall farther from transcription start sites than primary eQTL led us to hypothesize that primary eQTL may affect transcription levels by altering functional sites in promoters whereas conditional eQTL may do so by altering more distal regulatory elements such as enhancers. We therefore assessed enrichment of primary and conditional eQTL in brain active promoter (TssA) and enhancer (merged Enh and EnhG) states derived from the NIH Roadmap Epigenomics Project,32, 33 and in H3K4me3 and H3K27ac neuronal (NeuN+) and non-neuronal (NeuN−) ChIP-seq peaks from a subset of the CMC post-mortem DLPFC samples. The overlap of H3K4me3 and H3K27ac ChIP-seq peaks was used as a proxy for active promoters, and H3K27ac peaks that do not overlap H3K4me3 peaks were used as a (relatively non-specific) proxy for enhancers.33 We performed logistic regression of SNP status (eQTL versus random matched SNP) on overlap with functional annotations, separately for each eQTL order (primary, secondary, and greater than secondary). Primary and conditional eQTL were significantly enriched in both promoter and enhancer chromatin states from REMC brain and CMC DLPFC tissues, with greatest enrichments overall observed in PFC neuronal (NeuN+) promoters and enhancers (Figure 2, Table S4). We found that whereas active promoter enrichments in all tissue/cell types markedly decreased with higher conditional order of eQTL, enhancer enrichments either only slightly decreased (REMC brain and PFC NeuN+, Figures 2A and 2C) or remained level (REMC brain-specific, Figure 2B). Though there was also significant enrichment of eQTL in non-neuronal nuclei (NeuN−) promoters and enhancers, this trend of a marked decrease in active promoters but steady levels of enhancer enrichment with greater eQTL order was not observed for non-neuronal PFC nuclei (Figure 2D). This greater decrease in enrichment for promoters compared to enhancers with increasing eQTL order was not confounded by an excess of eQTL near brain-expressed genes in comparison to matched SNPs (Figure S8, Table S5) and furthermore was not an artifact of varying effect size with eQTL order; the same overall pattern was observed when stratifying eQTL by variance in expression explained (R2) and comparing enrichment across eQTL order, within each R2 bin (Figures S9–S12, Table S6). Figure 2 Enrichments of Primary and Conditional eQTL in Active Regulatory Annotations Plotted are enrichments (regression coefficient estimate ± 95% CI from logistic regression, y axes) of primary (x axis eQTL order = 1) and conditional (eQTL order = 2, ≥ 3) eQTL in functional annotations. (A and B) Enrichment in brain (union of all individual brain regions) and brain-specific (present in brain but not in seven other non-brain tissues) active promoter (green) and enhancer (orange) ChromHMM states from the NIH Roadmap Epigenomics Project. (C) Enrichment in neuronal nuclei (NeuN+) for active promoters (intersection of DLPFC H3K4me3 and H3K27ac ChIP-seq peaks, green) and enhancers (H3K27 peaks that do not overlap H3K4me3 peaks, orange). (D) Enrichments in the same annotations, but for DLPFC non-neuronal nuclei (NeuN−). eQTL Co-localization with SCZ GWAS We performed co-localization analyses in order to evaluate the extent of overlap between eQTL and GWAS signatures in schizophrenia and to identify putative causal genes from GWAS associations. Considering 217 loci (Table S7) with lead SNPs reaching a significance threshold of p < 1 × 10−6 from the 2014 Psychiatric Genomics Consortium (PGC) schizophrenia GWAS,35 we tabulated the number of primary and conditional eQTL falling within GWAS loci. A total of 114 out of 217 loci contained primary and/or conditional eQTL for 346 genes; 110 of these genes had one eQTL only and 236 genes had more than one independent eQTL. To quantitatively compare the SCZ GWAS and eQTL association signatures, we modified the R package coloc39 for Bayesian inference of co-localization between the two sets of summary statistics across each gene’s cis-region. Coloc2, our modified implementation of coloc, analyzes the hierarchical model of gwas-pw,43 with likelihood-based estimation of dataset-wide probabilities of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized). We then used these probabilities as priors to calculate empirical Bayesian posterior probabilities for the five hypotheses for each locus, in particular PPH4 for co-localization. For genes with conditional eQTL overlapping SCZ GWAS loci, summary statistics from all-but-one conditional eQTL analyses were assessed for co-localization with the GWAS signature (Figure 3). To illustrate this analytical strategy, we show eQTL results for the iron responsive element binding protein 2 gene IREB2 (MIM: 147582, chr15:78729773–78793798) as an example (Figure 4). Forward stepwise selection analysis identified two independent cis-eQTL for IREB2. In order to generate summary statistics for each eQTL in isolation, we conducted two all-but-one conditional analyses, in each analysis conditioning on all but a focal independent eQTL (for IREB2 this entailed conditioning on only one eQTL per conditional analysis, but involved conditioning on up to six eQTL per gene across all genes considered in the SCZ co-localization analysis). We then tested for co-localization between the GWAS and all of the eQTL summary statistics resulting from the above conditioning analysis using coloc2 (Table S12). In the case of IREB2, the conditional eQTL (rs7171869) was implicated as co-localized with the GWAS signal at this locus with a posterior probability for co-localization (PPH4) of 0.94. A qualitative examination of the IREB2 locus supported the coloc2 results: the correlation between the GWAS p values and conditional eQTL p values was higher than that between the GWAS and primary eQTL p values (Figure 4A). In addition, the GWAS signature for the locus more closely resembled the conditional eQTL signature than either the non-conditional eQTL signature or the primary eQTL signature (Figure 4B). Figure 3 All-but-One Conditional Analysis to Isolate Independent eQTL Signatures (A) Hypothetical GWAS signature (top, green) at a given locus and an overlapping hypothetical eQTL signature (bottom, purple), which comprises two independent eQTL. (B) Same hypothetical GWAS and eQTL signatures after the all-but-one conditional eQTL analysis isolating the primary (red) and secondary (blue) eQTL signatures. Before conditional analysis there is a lack of co-localization between the GWAS signature and eQTL signature. After all-but-one conditional analysis, there is evidence for co-localization between the conditional (secondary) eQTL and GWAS signatures. Figure 4 GWAS Signature for IREB2 Co-localizes with the Conditional eQTL Signature (A) P-P plots comparing −log10 p values from GWAS (y axes) and all-but-one conditional eQTL analysis (x axes), which show the highest correlation to be between the GWAS and the conditional eQTL rs7171869 (blue, bottom). (B) LocusZoom plots for the IREB2 locus, where the GWAS signal (top) more closely resembles the conditional eQTL signal (rs7171869, bottom) than the primary eQTL signal (rs11639224, third from top) or non-conditional eQTL signal (second from top). For all LocusZoom plots, LD is colored with respect to the GWAS lead SNP (rs8042374, labeled). We found that 40 loci contained genes with strong evidence of co-localization between eQTL and GWAS signatures, with posterior probability of H4 (PPH4) ≥ 0.8 (Table 2). When restricting to genome-wide significance for the GWAS, we found co-localization in 24 of the 108 loci. Given the correlations between number of independent eQTL and expression specificity scores (Tau) across tissues, cell types, and development, we tabulated the reported genes’ Tau percentiles and expression levels, to highlight contexts in which the genes are specifically expressed (Table 2, Table S8). We acknowledge that while posterior probability PPH4 ≥ 0.8 demonstrates strong Bayesian evidence for co-localization, it is an arbitrary threshold for characterizing loci as GWAS-eQTL co-localized; we find that many loci with PPH4 ≥ 0.5 appear qualitatively consistent with co-localization. Table 2 GWAS-eQTL Co-localized Loci Chr GWAS Locus Start GWAS Locus End GWAS Lead SNP GWAS p Value eSNP eSNP p Value Primary/Conditional PPH4 Gene Relevant Tissue/Cell Type/Developmental Period 1 2372401 2402501 rs4648845 4.03E−09 rs12037821 4.9E−04 conditional 0.87 SLC35E2 –/–/early mid-prenatal 1 8355697 8638984 rs301797 2.03E−09 rs138050288 1.8E−04 primary 0.95 RERE –/–/– 1 30412551 30443951 rs1498232 1.28E−09 rs2015244 1.8E−08 primary 0.99 PTPRU –/neurons /early mid-prenatal 1 163582923 163766623 rs7521492 5.64E−07 rs10799961 3.18E−11 primary 0.91 PBX1 –/–/early prenatal 1 205015255 205189455 rs16937 8.69E−07 rs12724651 7.31E−07 primary 0.89 TMEM81 –/neurons/– rs12031350 8.15E−06 conditional 0.87 RBBP5 –/–/– 1 214137889 214163689 rs7529073 9.69E−07 rs1431983 1.67E−04 conditional 0.93 PROX1-AS1 cerebellar hemisphere/neurons/adult 2 73194203 73900439 rs56145559 8.42E−08 rs11679809 1.85E−34 primary 0.86 ALMS1P testis/–/– 2 110262036 110398236 rs9330316 7.69E−08 rs892464 2.35E−26 primary 0.92 SEPT10 –/–/late prenatal 2 198148577 198835577 rs6434928 1.48E−11 rs12621129 6.06E−12 primary 0.94 SF3B1 –/–/– 2 200715237 201247789 rs281768 1.78E−14 rs35220450 3.46E−14 primary 0.95 FTCDNL1, AC073043.2 –/–/adult rs186546506 8.77E−04 conditional 0.83 LINC01792, AC007163.3 putamen (basal ganglia)/ –/adult 2 208371631 208531731 rs2709410 5.75E−07 rs34171849 5.86E−17 primary 0.88 METTL21A –/–/– rs2551656 2.85E−09 primary 0.86 CREB1 –/–/early prenatal 2 220033801 220071601 rs6707588 9.51E−07 rs13404754 1.08E−09 primary 0.92 CNPPD1 –/–/– 3 36843183 36945783 rs75968099 3.39E−12 rs9834970 1.88E−05 primary 0.94 DCLK3 nerve - tibial /neurons/infant 3 52281078 53539269 rs2535627 3.96E−11 rs6801235 2.81E−08 conditional 0.86 PPM1M –/neurons/late prenatal 3 63792650 64004050 rs832187 2.58E−08 rs113386200 1.95E−12 primary 0.98 THOC7 –/–/– 3 135807405 136615405 rs7432375 5.27E−11 rs10935184 7.71E−25 primary 0.93 PCCB –/–/– 4 170357552 170646052 rs10520163 1.02E−08 rs7438 1.02E−09 primary 0.97 CLCN3 –/–/– 5 45291475 46404116 rs1501357 1.24E−08 rs9292918 4.45E−05 primary 0.94 BRCAT54, RP11-53O19.1 –/–/adult 6 83779798 84407274 rs3798869 8.57E−10 rs2016358 1.19E−09 primary 0.90 SNAP91 cerebellar hemisphere/–/– 6 108875527 109019327 rs9398171 3.37E−08 rs111727905 3.84E−06 primary 0.97 ZNF259P1 –/–/early mid-prenatal 7 21485312 21545712 rs73060317 6.60E−07 rs141984481 3.59E−05 primary 0.92 SP4 –/–/early prenatal 8 8088038 10056127 rs2945232 2.03E−08 rs2980441 7.68E−69 primary 0.82 FAM86B3P –/–/adolescent 8 26181524 26279124 rs1042992 2.27E−07 rs17055186 3.06E−24 conditional 0.91 SDAD1P1 testis/–/adult 8 38020424 38310924 rs57709857 2.32E−07 rs201999919 1.70E−07 primary 0.88 WHSC1L1 –/–/early prenatal 8 144822546 144871746 rs11784536 1.83E−07 rs12541792 6.45E−35 primary 0.90 FAM83H esophagus - mucosa/oligodendrocytes/adolescent 9 26839508 26909408 rs10967586 4.75E−07 rs12345197 3.90E−06 primary 0.80 IFT74 –/–/– 11 46340213 46751213 rs7951870 1.97E−11 rs16938506 5.08E−05 primary 0.88 MDK –/–/early mid-prenatal 12 57428314 57497814 rs324017 2.13E−07 rs4559 2.02E−05 conditional 0.91 STAT6 –/microglia/adolescent 14 35421614 35847614 rs77477310 1.52E−07 rs1028449 8.09E−04 primary 0.84 RP11-85K15.2 –/–/– 15 78803032 78926732 rs8042374 1.87E−12 rs7171869 1.44E−04 conditional 0.94 IREB2 –/–/early prenatal 15 84661161 85153461 rs950169 7.62E−11 rs35677834 1.54E−34 primary 0.80 LOC101929479, RP11-561C5.3 ovary/–/early mid-prenatal 15 91416560 91436560 rs4702 2.30E−12 rs4702 4.49E−13 primary 1.00 FURIN –/endothelial cells/adolescent 16 4447751 4596451 rs6500602 2.79E−07 rs3747580 4.75E−16 primary 0.90 CORO7 –/–/– rs8046295 2.68E−11 primary 0.89 NMRAL1 –/–/– 16 29924377 30144877 rs12691307 1.30E−10 rs4788203 1.95E−05 primary 0.88 TMEM219 –/–/– rs3935873 7.46E−14 primary 0.87 INO80E –/neurons/– rs4787491 1.60E−04 conditional 0.82 DOC2A brain - cortex/neurons/adolescent 16 58669293 58691393 rs12325245 1.15E−08 rs11647976 4.83E−04 primary 0.94 CNOT1 –/–/– 17 17722402 18030202 rs8082590 6.84E−09 rs4072739 4.74E−13 primary 0.92 DRG2 –/–/– 19 11839736 11859736 rs72986630 4.64E−08 rs72986630 2.20E−14 primary 1.00 ZNF823 –/endothelial cells/early prenatal 19 19374022 19658022 rs2905426 6.92E−09 rs2965199 9.22E−36 primary 0.87 GATAD2A –/–/– 19 50067499 50135399 rs56873913 2.19E−07 rs5023763 9.32E−05 primary 0.93 SNRNP70 –/–/– 22 41408556 42689414 rs9607782 6.76E−12 rs200447424 1.87E−04 primary 0.96 RANGAP1 –/–/– Importantly, for 6 of the 40 co-localizing loci, a conditional rather than primary eQTL co-localized with the GWAS with compelling qualitative support (Table 2, Figure 4, Table S11, Figures S13–S17). The genes showing strong evidence for conditional eQTL co-localization include SLC35E2, PROX1-AS1 (MIM: 601546), PPM1M (MIM: 608979), SDAD1P1, STAT6 (MIM: 601512), and IREB2. Also notable are the occurrences of complex patterns of co-localization for some loci; for example, three loci showed evidence for co-localization with a primary eQTL for one gene and a conditional eQTL for another. Comparison with Previous Co-localization Analyses In the prior CMC study, a GWAS-eQTL co-localization analysis implemented in Sherlock and using non-conditional eQTL summary statistics reported a total of 18 co-localized loci, representing 17% of the 108 genome-wide significant loci examined. Through our all-but-one conditional co-localization analysis, we replicate the majority of their findings and detect an additional 13 instances of co-localization, bringing the total number of co-localizations when considering only the genome-wide significant (and not including the MHC) loci up to 24 (representing 22% of these 108 loci) (Table S9). These 13 comprise instances of conditional eQTL co-localization (for genes SLC35E2 and IREB2) and improved detection of primary eQTL co-localization due to isolation of independent eQTL signatures and our choice of co-localization software (coloc2). Of the six co-localized loci identified in the previous but not current analysis, three resulted from differences in study design such as GWAS locus definition and eQTL overlap criteria, and two were suggestive in the current analysis (0.65 < PPH4 < 0.8). The one remaining discrepant locus (chr8:143302933–143403527) was found to co-localize with TSNARE1 eQTL previously (Sherlock p = 8.24 × 10−7) but not here (coloc2 primary eQTL PPH4 = 0.074, PPH3 = 0.93). A qualitative comparison of the eQTL and GWAS data (Figure S18) did not appear to support co-localization; while the strongest GWAS association and the strongest eQTL are in close physical proximity, the LD between the two index SNPs is low (r2∼0.2–0.4). Additionally, our attempts to disentangle independent eQTL signal via conditional analysis do not reveal the GWAS index SNP to be in high LD with any of the conditionally independent eQTL peaks. We also compared our conditional co-localization results with those from non-conditional eQTL analysis, using coloc2 and the same SCZ GWAS loci (Table S10). Conditional and non-conditional coloc2 results were highly concordant, with slightly higher PPH4s resulting from the same WABFs due to a higher prior probability of co-localization estimated in the non-conditional coloc2 analysis. Thirty-five loci were co-localized in both analyses; five loci that were co-localized in the non-conditional analysis only were highly suggestive in the conditional analysis (0.65 < PPH4 < 0.8), and the five loci that were co-localized only in the conditional coloc2 analysis involved conditional eQTL, emphasizing the utility of the conditional analysis. This conditional eQTL co-localization represents a substantial proportion (∼15%) of all instances of co-localization, and furthermore could reflect context-specific differential expression that has the potential to implicate cell types, tissue types, and developmental stages that are relevant to disease etiology. Discussion We utilized genotype and expression data from 467 human post-mortem brain samples from the DLPFC to conduct eQTL mapping analyses, to characterize both primary and conditional eQTL. We then identified co-localization between SCZ GWAS and eQTL association signals, comprising both primary and conditional eQTL. Our principal findings include four major observations. First, we detect that conditional eQTL are widespread in the brain tissue samples we investigated. In 63% of genes with at least one eQTL, we found multiple statistically independent eQTL (representing 8,136 genes). In addition, conditional eQTL make substantial contributions to regulatory genetic variation, as there is a strong association between eQTL number and gene expression cis-SNP-heritability. This demonstrates that genetic variation affecting RNA abundance is incompletely characterized by focusing on only one primary eQTL per gene, which is the case currently for most eQTL studies. Second, we find the genomics of conditional eQTL and their genes are consistent with complex, context-specific regulation of gene expression, which may be conferred through overlap with distal regulatory elements. Genes with more independent eQTL tend to be larger and span multiple recombination hotspot intervals, and tend to be less constrained at the protein level. While these associations may reflect in part greater power to detect independent eQTL that are not in linkage disequilibrium and explain more phenotypic variance, they are also consistent with more complex regulation and greater potential for regulatory genetic variation. Context-specific genetic regulation of expression could manifest as conditional eQTL signal in the analysis of expression from a heterogeneous source. For example, eQTL in naive and stimulated (LPS, IFN) monocytes55 may occur as either primary or conditional eQTL in our CMC data, due to related microglial cells being present in brain tissue homogenate. We found that 60 stimulation-specific eQTL (FDR < 0.01 in interferon or lipopolysaccharide stimulated monocytes, but FDR ≥ 0.05 in naive monocytes) were also conditional eQTL in DLPFC. Notably, rs7171787, a conditional (tertiary) eQTL in our DLPFC analysis, is a stimulation-specific monocyte eQTL for the neurodevelopmental56, 57, 58 gene CYFIP1. In our data, associations with specificity of expression across tissues, developmental periods, and cell types determined from single-cell RNA-sequencing data suggest that context specificity plays a role in the occurrence of multiple statistically independent eQTL. Cell type specificity is particularly strongly correlated with eQTL number, consistent with those cell types being present in the current tissue homogenate data. Since previous studies have shown the importance of developmental59, 60, 61, 62 or cell-specific contributions61, 63, 64, 65, 66 to schizophrenia, interrogation of independent eQTL effects may elucidate developmental or tissue-specific effects obscured in whole-tissue eQTL studies. This context specificity of expression regulation is potentially mediated through overlap of eSNPs with distal regulatory elements, such as enhancers. Conditional eQTL occur farther from transcription start sites than primary eQTL, consistent with effects on enhancers. In addition, while both primary and conditional eQTL are enriched in both active promoter and enhancer regions, their enrichment in active promoters diminishes with increasing conditional eQTL order. In other words, conditional eQTL show greater enrichment in enhancers relative to promoters than do primary eQTL. Third, we have identified a number of candidate genes for which genetic variation for expression co-localizes with genetic variation for schizophrenia risk (Table 2), including cases of co-localization with conditional eQTL. Genetic co-localization is expected if gene expression causally mediates disease risk, although we recognize that co-localization could also result from pleiotropy or linkage, particularly in regions of extensive linkage disequilibrium and haplotype structure.40, 67 We also note that several co-localization methods have recently been developed,37, 38, 40, 41, 42 and direct comparisons have found broad concordance among these methods and a high degree of specificity of positive results using coloc.42, 45, 46 However, some differences in results would likely be achieved using alternative co-localization methods. Our analyses prioritize 27 genes within 24 genome-wide significant (GWAS p < 5 × 10−8) SCZ loci and 19 genes in 17 suggestive (p < 1 × 10−6) loci. In addition to a number of previously implicated SCZ risk genes, our findings include several genes not previously considered as candidates,35 in some cases—e.g., SLC35E2, PTPRU (MIM: 602454), LINC01792, DCLK3, PPM1M, LOC101929479—because the genes themselves do not overlap the GWAS locus regions but their eQTL do. In examining these genes for expression specificity in GTEx tissues, brain sample cell types from single-cell RNA-seq,29 and in BrainSpan DLPFC developmental periods (Tables 2 and S8), we find their expression contexts show a diversity of patterns and can provide clues to generate specific hypotheses for functional follow-up of their potential roles in SCZ. Interestingly, genes broadly expressed across cell types tend to show prenatal expression. Fourth, we highlight the importance of examining conditional eQTL for co-localization with GWASs. In at least 6 out of 40 loci showing GWAS-eQTL co-localization, a conditional eQTL signal co-localizes with SCZ risk. This is likely to be a conservative estimate, as the smaller effect sizes of conditional eQTL results in bias against detection of conditional GWAS-eQTL co-localization. If we had considered only primary eQTL in the analyses, these instances of co-localization would not have been identified. Among our highlighted conditional eQTL-GWAS co-localized genes are IREB2, STAT6, and PROX1-AS1. IREB2 (iron regulatory element binding protein 2) is a key regulator of iron homeostasis68, 69 that has been previously implicated in neurodegenerative disorders.70, 71 Mouse IREB2 homolog Irp2 knockouts exhibit impairments in coordination and balance, exploration, and nociception.69 The immune-related transcription factor STAT6 induces interleukin 4 (IL-4)-mediated anti-apoptotic activity of T helper cells, and the locus is associated with migraine72, 73 and brain glioma74 as well as several immune/inflammatory diseases.75, 76, 77 STAT6 also activates neuronal progenitor/stem cells and neurogenesis,78 making it intriguing as an immune-related SCZ candidate given recent observations about the role of complement factor 4 (C4) gene as a SCZ risk gene79 and prior work potentially implicating microglia.80 Consistent with a role in immune-mediated synaptic pruning, STAT6 expression is broadly postnatal and shows specificity for microglia (Table S8). PROX1-AS1 encodes a lncRNA that has been implicated as aberrantly expressed in several cancers, is upregulated in the cell cycle S-phase, and promotes G1/S transition in cell culture.81 As a potential regulator of the Prospero Homeobox 1 (PROX1) transcription factor, it could be involved in development and cell differentiation in several tissues, including oligodendrocytes82 and GABAnergic interneurons83 in the brain. PROX1-AS1 expression is specific to neurons and mature oligodendrocytes and is expressed postnatally (Table S8). In conclusion, we find that conditional eQTL are widespread and are consistent with complex and context-specific regulation. Accounting for conditional eQTL leads to new findings of GWAS-eQTL co-localization and generates specific hypotheses for the role of gene expression regulation in disease etiology. The analytical strategy presented here could be implemented as a means of identification of putatively causal genes for any phenotype in which GWAS summary statistics and expression and genotype data from the GWAS phenotype-relevant tissue are available. Conditional eQTL that co-localize with disease risk may reflect regulatory mechanisms that are important in a key developmental period or individual cell type and may be missed when focusing on primary eQTL discovered in adult whole tissue. As further efforts are made to generate data across ranges of tissues or individual cell types, we may have a better ability to directly identify regulatory variants specific to these contexts. However, if a variant is primarily active in a very specific time point or stimulus condition, capturing data reflecting this condition will remain challenging. Conditional co-localization analysis in well-powered eQTL cohorts may best identify the genes driving these trait associations, though further validation work will be required to understand the mechanism by which the gene contributes to disease risk. Consortia CMC leadership: Pamela Sklar, Joseph Buxbaum (Icahn School of Medicine at Mount Sinai), Bernie Devlin, David Lewis (University of Pittsburgh), Raquel Gur, Chang-Gyu Hahn (University of Pennsylvania), Keisuke Hirai, Hiroyoshi Toyoshiba (Takeda Pharmaceuticals Company Limited), Enrico Domenici, Laurent Essioux (F. Hoffmann-La Roche Ltd), Lara Mangravite, Mette Peters (Sage Bionetworks), Thomas Lehner, and Barbara Lipska (NIMH). Additional members of CMC: A. Ercument Cicek, Cong Lu, Kathryn Roeder, Lu Xie (Carnegie Mellon Univ.); Konrad Talbot (Cedars-Sinai Medical Center); Scott E. Hemby (High Point Univ.); Laurent Essioux (Hoffmann-La Roche); Andrew Browne, Andrew Chess, Aaron Topol, Alexander Charney, Amanda Dobbyn, Ben Readhead, Bin Zhang, Dalila Pinto, David A. Bennett, David H. Kavanagh, Douglas M. Ruderfer, Eli A. Stahl, Eric E. Schadt, Gabriel E. Hoffman, Hardik R. Shah, Jun Zhu, Jessica S. Johnson, John F. Fullard, Joel T. Dudley, Kiran Girdhar, Kristen J. Brennand, Laura G. Sloofman, Laura M. Huckins, Menachem Fromer, Milind C. Mahajan, Panos Roussos, Schahram Akbarian, Shaun M. Purcell, Tymor Hamamsy, Towfique Raj, Vahram Haroutunian, Ying-Chih Wang, Zeynep H. Gümüş (Mount Sinai School of Med.); Geetha Senthil, Robin Kramer (NIMH); Benjamin A. Logsdon, Jonathan M.J. Derry, Kristen K. Dang, Solveig K. Sieberts, Thanneer M. Perumal (Sage Bionetworks); Roberto Visintainer (Univ. Trento, Italy); Leslie A. Shinobu (Takeda); Patrick F. Sullivan (Univ. North Carolina); and Lambertus L. Klei (Univ. Pittsburgh School of Med.).
sec	Introduction Significant advances in understanding the genetic architecture of schizophrenia (MIM: 181500) have occurred within the last 10 years. However, for common variants identified in genome-wide association studies (GWASs), the success in locus identification is not yet matched by an understanding of their underlying basic mechanism or effect on pathophysiology. Expression quantitative trait loci (eQTL), which are responsible for a significant proportion of variation in gene expression, could serve as a link between the numerous non-coding genetic associations that have been identified in GWASs and susceptibility to common diseases directly through their association with gene expression regulation.1, 2, 3, 4 Accordingly, results from eQTL mapping studies have been successfully utilized to identify genes and causal variants from GWASs for various complex phenotypes, including asthma (MIM: 600807), body mass index (MIM: 601665), celiac disease (MIM: 212750), and Crohn disease (MIM: 266600).5, 6, 7, 8 Studies integrating eQTL and GWAS data have almost exclusively used marginal association statistics which typically represent the primary, or most significant, eQTL signal when assessing co-localization with GWASs, ignoring other SNPs that affect expression independently of the primary eQTL for a given gene. However, recent findings indicating that conditionally independent eQTL are widespread9, 10, 11, 12 motivate examination of the extent to which considering conditional eQTL may provide additional power to identify likely causal genes in a GWAS locus. Recent reports provide evidence that conditional eQTL are less frequently shared across tissues than primary eQTL10 and, like tissue- and cell type-specific eQTL, are often found more distally to the genes they regulate.10, 13, 14 These lines of evidence suggest that conditionally independent eQTL may contribute to tissue-specific or other context-specific gene regulation (e.g., specific to a particular cell type, developmental stage, or stimulation condition). One mechanism by which disease risk could potentially be mediated by a conditional eQTL is the disruption of a tissue-specific enhancer by a given variant, leading to the dysregulation of the relevant eGene in only the tissue for which the enhancer is specific. For example, an eQTL affecting Parkinson disease risk through expression of SNCA was recently shown to act through the disruption of an enhancer;15 if this enhancer is specific to a disease-relevant cell type, such as nerve cells of the substantia nigra, then it could manifest as a conditional eQTL since it would be only partially represented in brain homogenate. Here, we leveraged genotype and dorsolateral prefrontal cortex (DLPFC) expression data provided by the CommonMind Consortium (CMC) to elucidate the role of conditional eQTL in the etiology of schizophrenia (SCZ). Currently comprising the largest existing postmortem brain genomic resource at nearly 600 samples, the CMC is generating and making publicly available an unprecedented array of functional genomic data, including gene expression (RNA sequencing), histone modification (chromatin immunoprecipitation [ChIP-seq]), and SNP genotypes, from individuals with psychiatric disorders as well as unaffected controls.16 We utilized SNP dosage and RNA-sequencing (RNA-seq) data from the CMC to identify primary and conditionally independent eQTL. We then characterized the resulting eQTL on various genomic attributes including distance to transcription start site and their genes’ specificities across tissues, cell types, and developmental periods. In addition, we quantified enrichment of primary and conditional eQTL in promoter and enhancer functional genomic elements inferred from epigenomic data. Finally, we isolated each independent eQTL signal by conducting a series of “all-but-one” conditional analyses for genes with multiple independent eQTL and then assessed the overlap between all eQTL association signals and the schizophrenia GWAS signals.
title	Introduction
p	Significant advances in understanding the genetic architecture of schizophrenia (MIM: 181500) have occurred within the last 10 years. However, for common variants identified in genome-wide association studies (GWASs), the success in locus identification is not yet matched by an understanding of their underlying basic mechanism or effect on pathophysiology. Expression quantitative trait loci (eQTL), which are responsible for a significant proportion of variation in gene expression, could serve as a link between the numerous non-coding genetic associations that have been identified in GWASs and susceptibility to common diseases directly through their association with gene expression regulation.1, 2, 3, 4 Accordingly, results from eQTL mapping studies have been successfully utilized to identify genes and causal variants from GWASs for various complex phenotypes, including asthma (MIM: 600807), body mass index (MIM: 601665), celiac disease (MIM: 212750), and Crohn disease (MIM: 266600).5, 6, 7, 8
p	Studies integrating eQTL and GWAS data have almost exclusively used marginal association statistics which typically represent the primary, or most significant, eQTL signal when assessing co-localization with GWASs, ignoring other SNPs that affect expression independently of the primary eQTL for a given gene. However, recent findings indicating that conditionally independent eQTL are widespread9, 10, 11, 12 motivate examination of the extent to which considering conditional eQTL may provide additional power to identify likely causal genes in a GWAS locus. Recent reports provide evidence that conditional eQTL are less frequently shared across tissues than primary eQTL10 and, like tissue- and cell type-specific eQTL, are often found more distally to the genes they regulate.10, 13, 14 These lines of evidence suggest that conditionally independent eQTL may contribute to tissue-specific or other context-specific gene regulation (e.g., specific to a particular cell type, developmental stage, or stimulation condition). One mechanism by which disease risk could potentially be mediated by a conditional eQTL is the disruption of a tissue-specific enhancer by a given variant, leading to the dysregulation of the relevant eGene in only the tissue for which the enhancer is specific. For example, an eQTL affecting Parkinson disease risk through expression of SNCA was recently shown to act through the disruption of an enhancer;15 if this enhancer is specific to a disease-relevant cell type, such as nerve cells of the substantia nigra, then it could manifest as a conditional eQTL since it would be only partially represented in brain homogenate.
p	Here, we leveraged genotype and dorsolateral prefrontal cortex (DLPFC) expression data provided by the CommonMind Consortium (CMC) to elucidate the role of conditional eQTL in the etiology of schizophrenia (SCZ). Currently comprising the largest existing postmortem brain genomic resource at nearly 600 samples, the CMC is generating and making publicly available an unprecedented array of functional genomic data, including gene expression (RNA sequencing), histone modification (chromatin immunoprecipitation [ChIP-seq]), and SNP genotypes, from individuals with psychiatric disorders as well as unaffected controls.16 We utilized SNP dosage and RNA-sequencing (RNA-seq) data from the CMC to identify primary and conditionally independent eQTL. We then characterized the resulting eQTL on various genomic attributes including distance to transcription start site and their genes’ specificities across tissues, cell types, and developmental periods. In addition, we quantified enrichment of primary and conditional eQTL in promoter and enhancer functional genomic elements inferred from epigenomic data. Finally, we isolated each independent eQTL signal by conducting a series of “all-but-one” conditional analyses for genes with multiple independent eQTL and then assessed the overlap between all eQTL association signals and the schizophrenia GWAS signals.
sec	Material and Methods CommonMind Consortium Data We used pre-QC’ed genotype and expression data from the CommonMind Consortium, and detailed information on quality control, data adjustment, and normalization procedures can be found in Fromer et al.16 Briefly, samples were genotyped at 958,178 markers using the Illumina Infinium HumanOmniExpressExome array and markers were removed on the basis of having no alternate alleles, having a genotyping call rate ≤ 0.98, or having a Hardy-Weinberg p value < 5 × 10−5. After QC, 668 individuals genotyped at 767,368 markers were used for imputation. Phasing was performed on each chromosome using ShapeIt v2.r790,17 and variants were imputed in 5 Mb segments with Impute v2.3.118 using the 1000 Genomes Phase 1 integrated reference panel,19 excluding singleton variants. After phasing and imputation, then filtering out variants with INFO < 0.8 or MAF < 0.05, the number of markers included in the analysis totaled approximately 6.4 million. Gene expression was assayed via RNA-seq using 100 base pair paired end reads and was mapped to human Ensembl gene reference (v.70) using TopHat v.2.0.9 and Bowtie v.2.1.0. After discarding genes with less than 1 CPM (counts per million) in at least 50% of the samples, RNA-seq data for a total of 16,423 Ensembl genes was considered for analysis. The expression data was voom-adjusted for both known covariates (RIN, library batch, institution, diagnosis, post-mortem interval, and sex) and 20 surrogate variables identified via surrogate variable analysis (SVA).20 After the removal of samples that did not pass RNA sample QC (including but not limited to: having RIN < 5.5, having less than 50 million total reads or more than 5% of reads aligning to rRNA, having any discordance between genotyping and RNA-seq data, and having RNA outlier status or evidence for contamination) and retaining only genetically identified European-ancestry individuals, a total of 467 samples was used for downstream analyses. These 467 individuals comprised 209 SCZ-affected case subjects, 52 AFF (bipolar, major depressive disorder, or mood disorder, unspecified)-affected case subjects, and 206 control subjects. eQTL Identification An overview of our workflow can be found in Figure S1. First, to identify primary and conditional cis-eQTL, we a conducted forward stepwise conditional analysis implemented in MatrixEQTL21 using genotype data at 6.4 million markers and RNA-seq data for 16,423 genes. FDR was initially assessed using the Benjamini-Hochberg algorithm across all cis-eQTL tests within each chromosome. FDR was not re-assessed at each conditional step; instead, a fixed p value threshold was used as the inclusion criteria in the stepwise model selection. For each gene with at least one cis-eQTL (gene ± 1 Mb) association at a 5% false discovery rate (FDR), the most significant SNP was added as a covariate in order to identify additional independent associations (considered significant if the p value achieved was less than that corresponding to the initial 5% FDR for primary eQTL). This procedure was repeated iteratively until no further eQTL met the p value threshold criteria. We used a linear regression model, adjusting for diagnosis and five ancestry covariates inferred by GemTools. Following eQTL identification, only autosomal eQTL were retained for downstream analyses. Replication in Independent Datasets Replication was performed in the HBCC microarray cohort (dbGaP: phs000979, see Web Resources) and in the ROSMAP22 RNA-seq cohort by fitting the stepwise regression models identified in the CMC data. For cases in which a marker was unavailable in the replication cohort, all models including that marker (i.e., for that eQTL and higher-order eQTL conditional on it, for a given gene) were omitted from replication. Data from the HBCC cohort was QC’ed and normalized as described in Fromer et al.16 DLPFC tissue was profiled on the Illumina HumanHT-12_V4 BeadChip and normalized in an analogous manner to the CMC data. Genotypes were obtained using the HumanHap650Yv3 or Human1MDuov3 chips and imputed using the 1000 Genomes Phase 1 reference panel. Replication of the eQTL models was performed on 279 genetically inferred European-ancestry samples (76 control subjects, 72 SCZ-affected subjects, 43 BP-affected subjects, 88 MDD-affected subjects), adjusting for diagnosis and five ancestry components. ROSMAP data were obtained from the AMP-AD Knowledge Portal (see Web Resources). Quantile normalized FPKM expression values were adjusted for age of death, RIN, PMI, and 31 hidden confounders from SVA, conditional on diagnosis. Only genes with FPKM > 0 in more than 50 samples were retained. QC’ed genotypes were also obtained from the AMP-AD Knowledge Portal and imputed to the Haplotype Reference Consortium (v.1.1)23 reference panel via the Michigan Imputation Server.24 Only markers with imputation quality score R2 ≥ 0.7 were considered in the replication analysis. GemTools was used to infer ancestry components as was done for the CMC data above. After QC, 494 samples were used for eQTL replication in a linear regression model that also adjusted for diagnosis (Alzheimer disease, mild cognitive impairment, no cognitive impairment, and other) and four ancestry components. Modeling Number of eQTL per Gene on Genomic Features We considered three genomic features (gene length, number of LD blocks in the cis-region, and genic constraint score) for our modeling analyses. Gene lengths were calculated using Ensembl gene locations. We obtained LD blocks from the LDetect Bitbucket site to tally the number of blocks overlapping each gene’s cis-region (gene ± 1 Mb). We obtained loss-of-function-based genic constraint scores from the Exome Aggregation Consortium (ExAC). A negative binomial generalized linear regression model was used to model the number of eQTL per gene based on the above variables; results were qualitatively the same using linear regression of Box-Cox transformed eQTL numbers. Backward-forward stepwise regression using the full model with interaction terms for these three variables was used to determine the relationship between genomic attributes and eQTL number. These analyses were implemented in R. cis-heritability of gene expression was estimated using the same CMC data that were used for eQTL detection, including all markers in the cis-region and implemented in GCTA.25 SNP-heritability estimates were then added to the modeling procedure described above. Tissue, cell type, and developmental time point specificity were measured using the expression specificity metric Tau.26, 27 Tissue specificity for each gene was calculated using publicly available expression data for 53 tissues from the GTEx project28 (release V6p). Expression for each tissue was summarized as the log2 of the median expression plus one, and then used to calculate tissue specificity Tau. Cell type specificity for each gene was computed using publicly available single-cell RNA-sequencing expression data29 generated from human cortex and hippocampus tissues. Raw expression counts for 285 cells comprising six major cell types of the brain were obtained from GEO (GSE67835) and counts data were library normalized to CPM. Expression for each cell type was summarized as the log2 of the mean expression plus one, and then used to compute cell type specificity Tau. Developmental time point specificity for each gene was calculated using publicly available DLPFC expression data for 27 time points, clustered into eight biologically relevant groups, from the BrainSpan atlas (see Web Resources). Eight developmental periods30 were defined as follows: early prenatal (8–12 pcw), early mid-prenatal (13–17 pcw), late mid-prenatal (19–24 pcw), late prenatal (25–37 pcw), infancy (4 months–1 year), childhood (2–11 years), adolescence (13–19 years), and adulthood (21+ years). Expression for each time point was summarized as the log2 of the median expression plus one and then used to calculate developmental period specificity Tau. Each Tau was added to the above model for eQTL number individually, as well as all together. Enrichment Analyses We divided eQTL into separate subgroups by stepwise conditional order (first, second, and greater than second) and created sets of matched SNPs drawn from the SNPsnap31 database for each subgroup, matching on minor allele frequency, gene density (number of genes within 1 Mb of the SNP), distance from SNP to TSS of the nearest gene, and LD (number of LD-partners within r2 ≥ 0.8). For each subgroup of eQTL, we performed a logistic regression of status as eQTL or matched SNP on overlap with functional annotation, including the four SNP matching parameters as covariates. Enrichment was taken as the regression coefficient estimate, interpretable as the log-odds ratio for being an eQTL given a functional annotation. Functional annotations tested included: brain promoters and enhancers (union of all brain region TssA and Enh+EnhG intervals, respectively, from the NIH Roadmap Epigenomics Project32 ChromHMM33 core 15-state model), brain-specific promoters and enhancers (the union of all brain region TssA and Enh+EnhG intervals, excluding those present in seven other non-brain tissues/cell types: primary T helper cells from peripheral blood, osteoblast primary cells, HUES64 cells, adipose nuclei, liver, NHLF lung fibroblast primary cells, and NHEK-epidermal keratinocyte primary cells), and pre-frontal cortex (PFC) neuronal (NeuN+) and non-neuronal (NeuN−) nucleus H3K4me3 and H3K27ac ChIP-seq marks from the CMC. For each data source, active promoter and enhancer (or H3K4me3 and H3K27ac) annotations were tested for enrichment jointly. This analysis was repeated but restricting to matched SNPs located within 1 Mb of any of the 16,423 genes that were tested for eQTL, in order to determine whether the enrichment estimates were inflated due to the proximity of our primary and conditional eQTL to brain-expressed genes, which may be more likely to occur near active regulatory regions in the brain. In addition, to ensure that any enrichment patterns observed were not due to varying effect size among primary and conditional eQTL, the enrichment analyses were also carried out taking into account the variance in expression explained by each eQTL. Variance explained (R2) was estimated using the variancePartition34 R package, and eQTL were stratified into three R2 bins: bin 1, 1 × 10−2 ≤ R2 ≤ 1.75 × 10−2; bin 2, 1.75 × 10−2 ≤ R2 ≤ 2.25 × 10−2; and bin 3, 2.25 × 10−2 ≤ R2 ≤ 3 × 10−2. Logistic regression of status as eQTL or matched SNP was then carried out separately for each R2 bin, within each eQTL order. Conditional eQTL Analyses In order to isolate each conditionally independent cis-eQTL association, we carried out a series of “all-but-one” conditional analyses, implemented within MatrixEQTL,21 for each gene possessing more than one independent eQTL. As these conditional eQTL signals were to be used to test for co-localization with the SCZ GWAS signals, we limited these analyses to those genes (346 in total) with eQTL overlapping GWAS loci. For each of these genes, we conducted an all-but-one analysis for each independent eQTL by regressing the given gene’s expression data on the dosage data, including all of the other independent eQTL for that gene as covariates in addition to diagnosis and five ancestry components. For example, three conditional analyses would be conducted for a gene with three independent eQTL: one analysis conditioning on the secondary and tertiary eQTL, one analysis conditioning on the primary and tertiary, and one analysis conditioning on the primary and secondary. In this manner we generated summary statistics for each independent eQTL in isolation, conditional on all of the other independent eQTL for that gene. Co-localization Analyses For our co-localization analyses, we used summary statistics and genomic intervals from the 2014 Psychiatric Genomics Consortium (PGC) SCZ GWAS.35 We included 217 loci at a p value threshold of 1 × 10−6 (excluding the MHC locus), defined these loci by their LD r2 ≥ 0.6 with the lead SNP, and then merged overlapping loci. GWAS and eQTL signatures were qualitatively compared using p value-p value (P-P) plots, rendered in R, and LocusZoom36 plots. Multiple methods that aim to identify GWAS-eQTL co-localized loci are currently available.37, 38, 39, 40, 41, 42 We chose to further develop coloc39 for our co-localization analyses for several reasons: (1) it uses data from all SNPs within a locus; (2) it avoids the computational burden or approximate results of Bayesian inferential methods for causal variants,41, 42 which rely on reference panel estimates of linkage disequilibrium (LD); and (3) and it has been widely used43, 44, 45 including in direct comparisons of GWAS-eQTL co-localization methods.42, 46 We tested for co-localization using an updated version of coloc39 R functions, which we name coloc2 (see Web Resources), and incorporated several improvements to the method. First, coloc2 pre-processes data by aligning eQTL and GWAS summary statistics for each eQTL cis-region. Second, the coloc2 model optionally incorporates changes implemented in gwas-pw.43 Briefly, we implemented likelihood estimation of mixture proportions of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized) from genome-wide data. Coloc2 uses these proportions as priors (or optionally, coloc default or user-specified priors) in the empirical Bayesian calculation of the posterior probability of co-localization for each locus (eQTL cis-region). Coloc2 averages per-SNP Wakefield asymptotic Bayes factors (WABF)47 across three different values for the WABF prior variance term, 0.01, 0.1, and 0.5, and provides options for specifying phenotypic variance, estimating it from case-control proportions or estimating it from the data.
title	Material and Methods
sec	CommonMind Consortium Data We used pre-QC’ed genotype and expression data from the CommonMind Consortium, and detailed information on quality control, data adjustment, and normalization procedures can be found in Fromer et al.16 Briefly, samples were genotyped at 958,178 markers using the Illumina Infinium HumanOmniExpressExome array and markers were removed on the basis of having no alternate alleles, having a genotyping call rate ≤ 0.98, or having a Hardy-Weinberg p value < 5 × 10−5. After QC, 668 individuals genotyped at 767,368 markers were used for imputation. Phasing was performed on each chromosome using ShapeIt v2.r790,17 and variants were imputed in 5 Mb segments with Impute v2.3.118 using the 1000 Genomes Phase 1 integrated reference panel,19 excluding singleton variants. After phasing and imputation, then filtering out variants with INFO < 0.8 or MAF < 0.05, the number of markers included in the analysis totaled approximately 6.4 million. Gene expression was assayed via RNA-seq using 100 base pair paired end reads and was mapped to human Ensembl gene reference (v.70) using TopHat v.2.0.9 and Bowtie v.2.1.0. After discarding genes with less than 1 CPM (counts per million) in at least 50% of the samples, RNA-seq data for a total of 16,423 Ensembl genes was considered for analysis. The expression data was voom-adjusted for both known covariates (RIN, library batch, institution, diagnosis, post-mortem interval, and sex) and 20 surrogate variables identified via surrogate variable analysis (SVA).20 After the removal of samples that did not pass RNA sample QC (including but not limited to: having RIN < 5.5, having less than 50 million total reads or more than 5% of reads aligning to rRNA, having any discordance between genotyping and RNA-seq data, and having RNA outlier status or evidence for contamination) and retaining only genetically identified European-ancestry individuals, a total of 467 samples was used for downstream analyses. These 467 individuals comprised 209 SCZ-affected case subjects, 52 AFF (bipolar, major depressive disorder, or mood disorder, unspecified)-affected case subjects, and 206 control subjects.
title	CommonMind Consortium Data
p	We used pre-QC’ed genotype and expression data from the CommonMind Consortium, and detailed information on quality control, data adjustment, and normalization procedures can be found in Fromer et al.16 Briefly, samples were genotyped at 958,178 markers using the Illumina Infinium HumanOmniExpressExome array and markers were removed on the basis of having no alternate alleles, having a genotyping call rate ≤ 0.98, or having a Hardy-Weinberg p value < 5 × 10−5. After QC, 668 individuals genotyped at 767,368 markers were used for imputation. Phasing was performed on each chromosome using ShapeIt v2.r790,17 and variants were imputed in 5 Mb segments with Impute v2.3.118 using the 1000 Genomes Phase 1 integrated reference panel,19 excluding singleton variants. After phasing and imputation, then filtering out variants with INFO < 0.8 or MAF < 0.05, the number of markers included in the analysis totaled approximately 6.4 million. Gene expression was assayed via RNA-seq using 100 base pair paired end reads and was mapped to human Ensembl gene reference (v.70) using TopHat v.2.0.9 and Bowtie v.2.1.0. After discarding genes with less than 1 CPM (counts per million) in at least 50% of the samples, RNA-seq data for a total of 16,423 Ensembl genes was considered for analysis. The expression data was voom-adjusted for both known covariates (RIN, library batch, institution, diagnosis, post-mortem interval, and sex) and 20 surrogate variables identified via surrogate variable analysis (SVA).20 After the removal of samples that did not pass RNA sample QC (including but not limited to: having RIN < 5.5, having less than 50 million total reads or more than 5% of reads aligning to rRNA, having any discordance between genotyping and RNA-seq data, and having RNA outlier status or evidence for contamination) and retaining only genetically identified European-ancestry individuals, a total of 467 samples was used for downstream analyses. These 467 individuals comprised 209 SCZ-affected case subjects, 52 AFF (bipolar, major depressive disorder, or mood disorder, unspecified)-affected case subjects, and 206 control subjects.
sec	eQTL Identification An overview of our workflow can be found in Figure S1. First, to identify primary and conditional cis-eQTL, we a conducted forward stepwise conditional analysis implemented in MatrixEQTL21 using genotype data at 6.4 million markers and RNA-seq data for 16,423 genes. FDR was initially assessed using the Benjamini-Hochberg algorithm across all cis-eQTL tests within each chromosome. FDR was not re-assessed at each conditional step; instead, a fixed p value threshold was used as the inclusion criteria in the stepwise model selection. For each gene with at least one cis-eQTL (gene ± 1 Mb) association at a 5% false discovery rate (FDR), the most significant SNP was added as a covariate in order to identify additional independent associations (considered significant if the p value achieved was less than that corresponding to the initial 5% FDR for primary eQTL). This procedure was repeated iteratively until no further eQTL met the p value threshold criteria. We used a linear regression model, adjusting for diagnosis and five ancestry covariates inferred by GemTools. Following eQTL identification, only autosomal eQTL were retained for downstream analyses.
title	eQTL Identification
p	An overview of our workflow can be found in Figure S1. First, to identify primary and conditional cis-eQTL, we a conducted forward stepwise conditional analysis implemented in MatrixEQTL21 using genotype data at 6.4 million markers and RNA-seq data for 16,423 genes. FDR was initially assessed using the Benjamini-Hochberg algorithm across all cis-eQTL tests within each chromosome. FDR was not re-assessed at each conditional step; instead, a fixed p value threshold was used as the inclusion criteria in the stepwise model selection. For each gene with at least one cis-eQTL (gene ± 1 Mb) association at a 5% false discovery rate (FDR), the most significant SNP was added as a covariate in order to identify additional independent associations (considered significant if the p value achieved was less than that corresponding to the initial 5% FDR for primary eQTL). This procedure was repeated iteratively until no further eQTL met the p value threshold criteria. We used a linear regression model, adjusting for diagnosis and five ancestry covariates inferred by GemTools. Following eQTL identification, only autosomal eQTL were retained for downstream analyses.
sec	Replication in Independent Datasets Replication was performed in the HBCC microarray cohort (dbGaP: phs000979, see Web Resources) and in the ROSMAP22 RNA-seq cohort by fitting the stepwise regression models identified in the CMC data. For cases in which a marker was unavailable in the replication cohort, all models including that marker (i.e., for that eQTL and higher-order eQTL conditional on it, for a given gene) were omitted from replication. Data from the HBCC cohort was QC’ed and normalized as described in Fromer et al.16 DLPFC tissue was profiled on the Illumina HumanHT-12_V4 BeadChip and normalized in an analogous manner to the CMC data. Genotypes were obtained using the HumanHap650Yv3 or Human1MDuov3 chips and imputed using the 1000 Genomes Phase 1 reference panel. Replication of the eQTL models was performed on 279 genetically inferred European-ancestry samples (76 control subjects, 72 SCZ-affected subjects, 43 BP-affected subjects, 88 MDD-affected subjects), adjusting for diagnosis and five ancestry components. ROSMAP data were obtained from the AMP-AD Knowledge Portal (see Web Resources). Quantile normalized FPKM expression values were adjusted for age of death, RIN, PMI, and 31 hidden confounders from SVA, conditional on diagnosis. Only genes with FPKM > 0 in more than 50 samples were retained. QC’ed genotypes were also obtained from the AMP-AD Knowledge Portal and imputed to the Haplotype Reference Consortium (v.1.1)23 reference panel via the Michigan Imputation Server.24 Only markers with imputation quality score R2 ≥ 0.7 were considered in the replication analysis. GemTools was used to infer ancestry components as was done for the CMC data above. After QC, 494 samples were used for eQTL replication in a linear regression model that also adjusted for diagnosis (Alzheimer disease, mild cognitive impairment, no cognitive impairment, and other) and four ancestry components.
title	Replication in Independent Datasets
p	Replication was performed in the HBCC microarray cohort (dbGaP: phs000979, see Web Resources) and in the ROSMAP22 RNA-seq cohort by fitting the stepwise regression models identified in the CMC data. For cases in which a marker was unavailable in the replication cohort, all models including that marker (i.e., for that eQTL and higher-order eQTL conditional on it, for a given gene) were omitted from replication.
p	Data from the HBCC cohort was QC’ed and normalized as described in Fromer et al.16 DLPFC tissue was profiled on the Illumina HumanHT-12_V4 BeadChip and normalized in an analogous manner to the CMC data. Genotypes were obtained using the HumanHap650Yv3 or Human1MDuov3 chips and imputed using the 1000 Genomes Phase 1 reference panel. Replication of the eQTL models was performed on 279 genetically inferred European-ancestry samples (76 control subjects, 72 SCZ-affected subjects, 43 BP-affected subjects, 88 MDD-affected subjects), adjusting for diagnosis and five ancestry components.
p	ROSMAP data were obtained from the AMP-AD Knowledge Portal (see Web Resources). Quantile normalized FPKM expression values were adjusted for age of death, RIN, PMI, and 31 hidden confounders from SVA, conditional on diagnosis. Only genes with FPKM > 0 in more than 50 samples were retained. QC’ed genotypes were also obtained from the AMP-AD Knowledge Portal and imputed to the Haplotype Reference Consortium (v.1.1)23 reference panel via the Michigan Imputation Server.24 Only markers with imputation quality score R2 ≥ 0.7 were considered in the replication analysis. GemTools was used to infer ancestry components as was done for the CMC data above. After QC, 494 samples were used for eQTL replication in a linear regression model that also adjusted for diagnosis (Alzheimer disease, mild cognitive impairment, no cognitive impairment, and other) and four ancestry components.
sec	Modeling Number of eQTL per Gene on Genomic Features We considered three genomic features (gene length, number of LD blocks in the cis-region, and genic constraint score) for our modeling analyses. Gene lengths were calculated using Ensembl gene locations. We obtained LD blocks from the LDetect Bitbucket site to tally the number of blocks overlapping each gene’s cis-region (gene ± 1 Mb). We obtained loss-of-function-based genic constraint scores from the Exome Aggregation Consortium (ExAC). A negative binomial generalized linear regression model was used to model the number of eQTL per gene based on the above variables; results were qualitatively the same using linear regression of Box-Cox transformed eQTL numbers. Backward-forward stepwise regression using the full model with interaction terms for these three variables was used to determine the relationship between genomic attributes and eQTL number. These analyses were implemented in R. cis-heritability of gene expression was estimated using the same CMC data that were used for eQTL detection, including all markers in the cis-region and implemented in GCTA.25 SNP-heritability estimates were then added to the modeling procedure described above. Tissue, cell type, and developmental time point specificity were measured using the expression specificity metric Tau.26, 27 Tissue specificity for each gene was calculated using publicly available expression data for 53 tissues from the GTEx project28 (release V6p). Expression for each tissue was summarized as the log2 of the median expression plus one, and then used to calculate tissue specificity Tau. Cell type specificity for each gene was computed using publicly available single-cell RNA-sequencing expression data29 generated from human cortex and hippocampus tissues. Raw expression counts for 285 cells comprising six major cell types of the brain were obtained from GEO (GSE67835) and counts data were library normalized to CPM. Expression for each cell type was summarized as the log2 of the mean expression plus one, and then used to compute cell type specificity Tau. Developmental time point specificity for each gene was calculated using publicly available DLPFC expression data for 27 time points, clustered into eight biologically relevant groups, from the BrainSpan atlas (see Web Resources). Eight developmental periods30 were defined as follows: early prenatal (8–12 pcw), early mid-prenatal (13–17 pcw), late mid-prenatal (19–24 pcw), late prenatal (25–37 pcw), infancy (4 months–1 year), childhood (2–11 years), adolescence (13–19 years), and adulthood (21+ years). Expression for each time point was summarized as the log2 of the median expression plus one and then used to calculate developmental period specificity Tau. Each Tau was added to the above model for eQTL number individually, as well as all together.
title	Modeling Number of eQTL per Gene on Genomic Features
p	We considered three genomic features (gene length, number of LD blocks in the cis-region, and genic constraint score) for our modeling analyses. Gene lengths were calculated using Ensembl gene locations. We obtained LD blocks from the LDetect Bitbucket site to tally the number of blocks overlapping each gene’s cis-region (gene ± 1 Mb). We obtained loss-of-function-based genic constraint scores from the Exome Aggregation Consortium (ExAC). A negative binomial generalized linear regression model was used to model the number of eQTL per gene based on the above variables; results were qualitatively the same using linear regression of Box-Cox transformed eQTL numbers. Backward-forward stepwise regression using the full model with interaction terms for these three variables was used to determine the relationship between genomic attributes and eQTL number. These analyses were implemented in R. cis-heritability of gene expression was estimated using the same CMC data that were used for eQTL detection, including all markers in the cis-region and implemented in GCTA.25 SNP-heritability estimates were then added to the modeling procedure described above.
p	Tissue, cell type, and developmental time point specificity were measured using the expression specificity metric Tau.26, 27 Tissue specificity for each gene was calculated using publicly available expression data for 53 tissues from the GTEx project28 (release V6p). Expression for each tissue was summarized as the log2 of the median expression plus one, and then used to calculate tissue specificity Tau. Cell type specificity for each gene was computed using publicly available single-cell RNA-sequencing expression data29 generated from human cortex and hippocampus tissues. Raw expression counts for 285 cells comprising six major cell types of the brain were obtained from GEO (GSE67835) and counts data were library normalized to CPM. Expression for each cell type was summarized as the log2 of the mean expression plus one, and then used to compute cell type specificity Tau. Developmental time point specificity for each gene was calculated using publicly available DLPFC expression data for 27 time points, clustered into eight biologically relevant groups, from the BrainSpan atlas (see Web Resources). Eight developmental periods30 were defined as follows: early prenatal (8–12 pcw), early mid-prenatal (13–17 pcw), late mid-prenatal (19–24 pcw), late prenatal (25–37 pcw), infancy (4 months–1 year), childhood (2–11 years), adolescence (13–19 years), and adulthood (21+ years). Expression for each time point was summarized as the log2 of the median expression plus one and then used to calculate developmental period specificity Tau. Each Tau was added to the above model for eQTL number individually, as well as all together.
sec	Enrichment Analyses We divided eQTL into separate subgroups by stepwise conditional order (first, second, and greater than second) and created sets of matched SNPs drawn from the SNPsnap31 database for each subgroup, matching on minor allele frequency, gene density (number of genes within 1 Mb of the SNP), distance from SNP to TSS of the nearest gene, and LD (number of LD-partners within r2 ≥ 0.8). For each subgroup of eQTL, we performed a logistic regression of status as eQTL or matched SNP on overlap with functional annotation, including the four SNP matching parameters as covariates. Enrichment was taken as the regression coefficient estimate, interpretable as the log-odds ratio for being an eQTL given a functional annotation. Functional annotations tested included: brain promoters and enhancers (union of all brain region TssA and Enh+EnhG intervals, respectively, from the NIH Roadmap Epigenomics Project32 ChromHMM33 core 15-state model), brain-specific promoters and enhancers (the union of all brain region TssA and Enh+EnhG intervals, excluding those present in seven other non-brain tissues/cell types: primary T helper cells from peripheral blood, osteoblast primary cells, HUES64 cells, adipose nuclei, liver, NHLF lung fibroblast primary cells, and NHEK-epidermal keratinocyte primary cells), and pre-frontal cortex (PFC) neuronal (NeuN+) and non-neuronal (NeuN−) nucleus H3K4me3 and H3K27ac ChIP-seq marks from the CMC. For each data source, active promoter and enhancer (or H3K4me3 and H3K27ac) annotations were tested for enrichment jointly. This analysis was repeated but restricting to matched SNPs located within 1 Mb of any of the 16,423 genes that were tested for eQTL, in order to determine whether the enrichment estimates were inflated due to the proximity of our primary and conditional eQTL to brain-expressed genes, which may be more likely to occur near active regulatory regions in the brain. In addition, to ensure that any enrichment patterns observed were not due to varying effect size among primary and conditional eQTL, the enrichment analyses were also carried out taking into account the variance in expression explained by each eQTL. Variance explained (R2) was estimated using the variancePartition34 R package, and eQTL were stratified into three R2 bins: bin 1, 1 × 10−2 ≤ R2 ≤ 1.75 × 10−2; bin 2, 1.75 × 10−2 ≤ R2 ≤ 2.25 × 10−2; and bin 3, 2.25 × 10−2 ≤ R2 ≤ 3 × 10−2. Logistic regression of status as eQTL or matched SNP was then carried out separately for each R2 bin, within each eQTL order.
title	Enrichment Analyses
p	We divided eQTL into separate subgroups by stepwise conditional order (first, second, and greater than second) and created sets of matched SNPs drawn from the SNPsnap31 database for each subgroup, matching on minor allele frequency, gene density (number of genes within 1 Mb of the SNP), distance from SNP to TSS of the nearest gene, and LD (number of LD-partners within r2 ≥ 0.8). For each subgroup of eQTL, we performed a logistic regression of status as eQTL or matched SNP on overlap with functional annotation, including the four SNP matching parameters as covariates. Enrichment was taken as the regression coefficient estimate, interpretable as the log-odds ratio for being an eQTL given a functional annotation. Functional annotations tested included: brain promoters and enhancers (union of all brain region TssA and Enh+EnhG intervals, respectively, from the NIH Roadmap Epigenomics Project32 ChromHMM33 core 15-state model), brain-specific promoters and enhancers (the union of all brain region TssA and Enh+EnhG intervals, excluding those present in seven other non-brain tissues/cell types: primary T helper cells from peripheral blood, osteoblast primary cells, HUES64 cells, adipose nuclei, liver, NHLF lung fibroblast primary cells, and NHEK-epidermal keratinocyte primary cells), and pre-frontal cortex (PFC) neuronal (NeuN+) and non-neuronal (NeuN−) nucleus H3K4me3 and H3K27ac ChIP-seq marks from the CMC. For each data source, active promoter and enhancer (or H3K4me3 and H3K27ac) annotations were tested for enrichment jointly. This analysis was repeated but restricting to matched SNPs located within 1 Mb of any of the 16,423 genes that were tested for eQTL, in order to determine whether the enrichment estimates were inflated due to the proximity of our primary and conditional eQTL to brain-expressed genes, which may be more likely to occur near active regulatory regions in the brain. In addition, to ensure that any enrichment patterns observed were not due to varying effect size among primary and conditional eQTL, the enrichment analyses were also carried out taking into account the variance in expression explained by each eQTL. Variance explained (R2) was estimated using the variancePartition34 R package, and eQTL were stratified into three R2 bins: bin 1, 1 × 10−2 ≤ R2 ≤ 1.75 × 10−2; bin 2, 1.75 × 10−2 ≤ R2 ≤ 2.25 × 10−2; and bin 3, 2.25 × 10−2 ≤ R2 ≤ 3 × 10−2. Logistic regression of status as eQTL or matched SNP was then carried out separately for each R2 bin, within each eQTL order.
sec	Conditional eQTL Analyses In order to isolate each conditionally independent cis-eQTL association, we carried out a series of “all-but-one” conditional analyses, implemented within MatrixEQTL,21 for each gene possessing more than one independent eQTL. As these conditional eQTL signals were to be used to test for co-localization with the SCZ GWAS signals, we limited these analyses to those genes (346 in total) with eQTL overlapping GWAS loci. For each of these genes, we conducted an all-but-one analysis for each independent eQTL by regressing the given gene’s expression data on the dosage data, including all of the other independent eQTL for that gene as covariates in addition to diagnosis and five ancestry components. For example, three conditional analyses would be conducted for a gene with three independent eQTL: one analysis conditioning on the secondary and tertiary eQTL, one analysis conditioning on the primary and tertiary, and one analysis conditioning on the primary and secondary. In this manner we generated summary statistics for each independent eQTL in isolation, conditional on all of the other independent eQTL for that gene.
title	Conditional eQTL Analyses
p	In order to isolate each conditionally independent cis-eQTL association, we carried out a series of “all-but-one” conditional analyses, implemented within MatrixEQTL,21 for each gene possessing more than one independent eQTL. As these conditional eQTL signals were to be used to test for co-localization with the SCZ GWAS signals, we limited these analyses to those genes (346 in total) with eQTL overlapping GWAS loci. For each of these genes, we conducted an all-but-one analysis for each independent eQTL by regressing the given gene’s expression data on the dosage data, including all of the other independent eQTL for that gene as covariates in addition to diagnosis and five ancestry components. For example, three conditional analyses would be conducted for a gene with three independent eQTL: one analysis conditioning on the secondary and tertiary eQTL, one analysis conditioning on the primary and tertiary, and one analysis conditioning on the primary and secondary. In this manner we generated summary statistics for each independent eQTL in isolation, conditional on all of the other independent eQTL for that gene.
sec	Co-localization Analyses For our co-localization analyses, we used summary statistics and genomic intervals from the 2014 Psychiatric Genomics Consortium (PGC) SCZ GWAS.35 We included 217 loci at a p value threshold of 1 × 10−6 (excluding the MHC locus), defined these loci by their LD r2 ≥ 0.6 with the lead SNP, and then merged overlapping loci. GWAS and eQTL signatures were qualitatively compared using p value-p value (P-P) plots, rendered in R, and LocusZoom36 plots. Multiple methods that aim to identify GWAS-eQTL co-localized loci are currently available.37, 38, 39, 40, 41, 42 We chose to further develop coloc39 for our co-localization analyses for several reasons: (1) it uses data from all SNPs within a locus; (2) it avoids the computational burden or approximate results of Bayesian inferential methods for causal variants,41, 42 which rely on reference panel estimates of linkage disequilibrium (LD); and (3) and it has been widely used43, 44, 45 including in direct comparisons of GWAS-eQTL co-localization methods.42, 46 We tested for co-localization using an updated version of coloc39 R functions, which we name coloc2 (see Web Resources), and incorporated several improvements to the method. First, coloc2 pre-processes data by aligning eQTL and GWAS summary statistics for each eQTL cis-region. Second, the coloc2 model optionally incorporates changes implemented in gwas-pw.43 Briefly, we implemented likelihood estimation of mixture proportions of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized) from genome-wide data. Coloc2 uses these proportions as priors (or optionally, coloc default or user-specified priors) in the empirical Bayesian calculation of the posterior probability of co-localization for each locus (eQTL cis-region). Coloc2 averages per-SNP Wakefield asymptotic Bayes factors (WABF)47 across three different values for the WABF prior variance term, 0.01, 0.1, and 0.5, and provides options for specifying phenotypic variance, estimating it from case-control proportions or estimating it from the data.
title	Co-localization Analyses
p	For our co-localization analyses, we used summary statistics and genomic intervals from the 2014 Psychiatric Genomics Consortium (PGC) SCZ GWAS.35 We included 217 loci at a p value threshold of 1 × 10−6 (excluding the MHC locus), defined these loci by their LD r2 ≥ 0.6 with the lead SNP, and then merged overlapping loci. GWAS and eQTL signatures were qualitatively compared using p value-p value (P-P) plots, rendered in R, and LocusZoom36 plots.
p	Multiple methods that aim to identify GWAS-eQTL co-localized loci are currently available.37, 38, 39, 40, 41, 42 We chose to further develop coloc39 for our co-localization analyses for several reasons: (1) it uses data from all SNPs within a locus; (2) it avoids the computational burden or approximate results of Bayesian inferential methods for causal variants,41, 42 which rely on reference panel estimates of linkage disequilibrium (LD); and (3) and it has been widely used43, 44, 45 including in direct comparisons of GWAS-eQTL co-localization methods.42, 46 We tested for co-localization using an updated version of coloc39 R functions, which we name coloc2 (see Web Resources), and incorporated several improvements to the method. First, coloc2 pre-processes data by aligning eQTL and GWAS summary statistics for each eQTL cis-region. Second, the coloc2 model optionally incorporates changes implemented in gwas-pw.43 Briefly, we implemented likelihood estimation of mixture proportions of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized) from genome-wide data. Coloc2 uses these proportions as priors (or optionally, coloc default or user-specified priors) in the empirical Bayesian calculation of the posterior probability of co-localization for each locus (eQTL cis-region). Coloc2 averages per-SNP Wakefield asymptotic Bayes factors (WABF)47 across three different values for the WABF prior variance term, 0.01, 0.1, and 0.5, and provides options for specifying phenotypic variance, estimating it from case-control proportions or estimating it from the data.
sec	Results Identification of eQTL Primary and conditional eQTL were identified using genotype and RNA-seq data from the CommonMind Consortium post-mortem DLPFC samples (467 European-ancestry case and control subjects).16 We identified 12,813 primary and 16,082 conditional eQTL, totaling 28,895 independent eQTL. Of the genes tested, 81% (12,813 of 15,817 autosomal genes) had at least one eQTL and 63% of these (51% of all genes) also had at least one conditional eQTL, with an average of 1.83 independent eQTL per gene (2.26 among those with at least one eQTL) (Figure 1A). Conversely, when examining the distributions for the number of genes whose expression was affected by each eQTL (Table S1), the majority of eQTL were specific for a single gene, and only a small fraction of eQTL, 1.47%, affected more than one gene, with a maximum of six genes affected by a single eQTL. Figure 1 Characterization of Conditional eQTL (A) Counts of the numbers of genes (y axis) regulated by at least N (1 ≤ N ≤ 16) independent eQTL (x axis). (B) Median Tau value (y axis) for genes with N independent eQTL (x axis), colored by Tau type (cell type, developmental time point, or tissue type Tau). (C) Density plot representing the distance from eSNP to eGene transcription start site (TSS), colored by eQTL order. Dashed lines represent the median distance to TSS for each order of eQTL. We tested for replication of conditional eQTL in two independent datasets, the National Institute of Mental Health’s Human Brain Collection Core (HBCC, n = 279, microarray expression data) and the Religious Orders Study/Memory and Aging Project22 (ROSMAP, n = 494, RNA-seq expression). For each gene the same models were evaluated that were identified in forward-stepwise conditional analysis in the CMC data. We observed significant evidence of replication for both primary and conditional eQTL in the HBCC and ROSMAP post-mortem brain cohorts (Table S2). The estimated proportion of true associations (π1) in ROSMAP was 0.57 and 0.26 for primary and conditional eQTL, respectively; in HBCC π1 was 0.46 and 0.20 for primary and conditional eQTL. Therefore, replication was stronger for primary than for conditional eQTL, as expected given their stronger effect sizes. Replication rates were somewhat higher in the RNA-seq ROSMAP data than in HBCC. Genomic Characterization of Primary and Conditional eQTL The features for which primary and conditional eQTL and their respective eGenes displayed identifiable differences included distance from eQTL to its gene’s transcription start site (TSS), gene length, LD blocks per genic cis-region, genic constraint score, and genic cis-SNP-heritability. According to prior results, eQTL that are shared across tissues and cell types tend to be located closer to transcription start sites than context-specific eQTL;13, 14 we therefore first examined the relationship between primary or conditional eQTL status and distance to genic TSS. Primary eQTL fall closer to the TSS than conditional eQTL (Figure 1C): primary eQTL occur at a median distance of 70.4 kb from the TSS versus a median distance of 302 kb for conditional eQTL. This difference holds true even more proximally to the TSS (Figure S2); 8.1% and 2.5% of primary and conditional eQTL, respectively, fall within 3 kb of the TSS. We next characterized the relationship between the number of independent eQTL per gene and three different genomic features: gene length, number of LD blocks48 in the gene’s cis-region (±1 Mb), and Exome Aggregation Consortium (ExAC) genic constraint score,49 including possible interactions. The best multivariate model for eQTL number included gene length, number of LD blocks, and genic constraint as predictors, as well as a gene length-LD blocks interaction (Table 1). The number of independent eQTL was positively correlated with gene length and number of LD blocks and negatively correlated with genic constraint score (Figure S3). We then examined the variance of gene expression explained by cis-region SNPs, or cis-SNP-heritability, estimated by linear mixed model variance component analysis25 (Figure S4). We found a strong effect of estimated cis-heritability on number of independent eQTL (Table 1, Figure S5). In a joint model with cis-SNP-heritability, the main effects of gene length, number of LD blocks, and genic constraint on eQTL number remained at least nominally significant. Table 1 Number of eQTL per Gene Modeled on Genomic Features Predictor Model 1 Estimate Model 1 Robust SE Model 1 Pr(> \|z\|) Model 2 Estimate Model 2 Robust SE Model 2 Pr(> \|z\|) Model 3 Estimate Model 3 Robust SE Model 3 Pr(> \|z\|) log(Gene length) 0.27 0.04 5.16E−12 0.16 0.03 2.20E−06 0.17 0.03 9.87E−07 LD blocks 0.59 0.17 6.47E−04 0.33 0.15 2.92E−02 0.37 0.15 1.55E−02 log(Gene length): LD blocks −0.03 0.02 7.77E−02 −0.01 0.01 5.65E−01 −0.01 0.01 4.11E−01 Constraint −0.61 0.03 5.93E−85 −0.20 0.03 2.93E−13 −0.15 0.03 5.41E−08 cis-heritability – – – 7.03 0.18 0.00 7.02 0.18 0.00 Tau (tissue) – – – – – – 0.08 0.08 2.76E−01 Tau (DLPFC cell type) – – – – – – 0.20 0.09 3.69E−02 Tau (developmental time point) – – – – – – 0.17 0.09 5.99E−02 We then addressed whether genes with conditional eQTL exhibit greater context specificity as measured by the robust expression specificity metric Tau.26, 27 We calculated Tau across 53 tissues from the Genotype-Tissue Expression (GTEx) project, across 6 DLPFC cell types (astrocytes, endothelial cells, microglia, neurons, oligodendrocytes, and oligodendrocyte progenitor cells) from single-cell RNA-seq,29 and across 8 developmental periods30 (early prenatal, early mid-prenatal, late mid-prenatal, late prenatal, infant, child, adolescent, and adult) from the BrainSpan atlas DLPFC RNA-seq data. We confirmed that higher values of Tau reflect expression specificity by comparing the distributions of all three Tau measures for all genes with the distributions for a subset of housekeeping genes50 (Figure S6). We found positive correlations between eQTL number and tissue, cell type, and developmental time point specificities (Figure 1B, Table 1, Table S3, Figure S7). In a joint model, the strongest correlation was with DLPFC cell type Tau, which is consistent with previous data demonstrating tissue-specific, cell type-dependent expression in blood;12 however, we note that all three Tau sets were inter-correlated (Table S3). Epigenetic Enrichment Analyses One way in which eQTL may affect gene expression is through alteration of cis-regulatory elements such as promoters and enhancers. Putative causal eSNPs have been shown to be enriched in genomic regions containing functional annotations such as DNase hypersensitive sites, transcription factor binding sites, promoters, and enhancers.51, 52, 53, 54 Our observation that conditional eQTL fall farther from transcription start sites than primary eQTL led us to hypothesize that primary eQTL may affect transcription levels by altering functional sites in promoters whereas conditional eQTL may do so by altering more distal regulatory elements such as enhancers. We therefore assessed enrichment of primary and conditional eQTL in brain active promoter (TssA) and enhancer (merged Enh and EnhG) states derived from the NIH Roadmap Epigenomics Project,32, 33 and in H3K4me3 and H3K27ac neuronal (NeuN+) and non-neuronal (NeuN−) ChIP-seq peaks from a subset of the CMC post-mortem DLPFC samples. The overlap of H3K4me3 and H3K27ac ChIP-seq peaks was used as a proxy for active promoters, and H3K27ac peaks that do not overlap H3K4me3 peaks were used as a (relatively non-specific) proxy for enhancers.33 We performed logistic regression of SNP status (eQTL versus random matched SNP) on overlap with functional annotations, separately for each eQTL order (primary, secondary, and greater than secondary). Primary and conditional eQTL were significantly enriched in both promoter and enhancer chromatin states from REMC brain and CMC DLPFC tissues, with greatest enrichments overall observed in PFC neuronal (NeuN+) promoters and enhancers (Figure 2, Table S4). We found that whereas active promoter enrichments in all tissue/cell types markedly decreased with higher conditional order of eQTL, enhancer enrichments either only slightly decreased (REMC brain and PFC NeuN+, Figures 2A and 2C) or remained level (REMC brain-specific, Figure 2B). Though there was also significant enrichment of eQTL in non-neuronal nuclei (NeuN−) promoters and enhancers, this trend of a marked decrease in active promoters but steady levels of enhancer enrichment with greater eQTL order was not observed for non-neuronal PFC nuclei (Figure 2D). This greater decrease in enrichment for promoters compared to enhancers with increasing eQTL order was not confounded by an excess of eQTL near brain-expressed genes in comparison to matched SNPs (Figure S8, Table S5) and furthermore was not an artifact of varying effect size with eQTL order; the same overall pattern was observed when stratifying eQTL by variance in expression explained (R2) and comparing enrichment across eQTL order, within each R2 bin (Figures S9–S12, Table S6). Figure 2 Enrichments of Primary and Conditional eQTL in Active Regulatory Annotations Plotted are enrichments (regression coefficient estimate ± 95% CI from logistic regression, y axes) of primary (x axis eQTL order = 1) and conditional (eQTL order = 2, ≥ 3) eQTL in functional annotations. (A and B) Enrichment in brain (union of all individual brain regions) and brain-specific (present in brain but not in seven other non-brain tissues) active promoter (green) and enhancer (orange) ChromHMM states from the NIH Roadmap Epigenomics Project. (C) Enrichment in neuronal nuclei (NeuN+) for active promoters (intersection of DLPFC H3K4me3 and H3K27ac ChIP-seq peaks, green) and enhancers (H3K27 peaks that do not overlap H3K4me3 peaks, orange). (D) Enrichments in the same annotations, but for DLPFC non-neuronal nuclei (NeuN−). eQTL Co-localization with SCZ GWAS We performed co-localization analyses in order to evaluate the extent of overlap between eQTL and GWAS signatures in schizophrenia and to identify putative causal genes from GWAS associations. Considering 217 loci (Table S7) with lead SNPs reaching a significance threshold of p < 1 × 10−6 from the 2014 Psychiatric Genomics Consortium (PGC) schizophrenia GWAS,35 we tabulated the number of primary and conditional eQTL falling within GWAS loci. A total of 114 out of 217 loci contained primary and/or conditional eQTL for 346 genes; 110 of these genes had one eQTL only and 236 genes had more than one independent eQTL. To quantitatively compare the SCZ GWAS and eQTL association signatures, we modified the R package coloc39 for Bayesian inference of co-localization between the two sets of summary statistics across each gene’s cis-region. Coloc2, our modified implementation of coloc, analyzes the hierarchical model of gwas-pw,43 with likelihood-based estimation of dataset-wide probabilities of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized). We then used these probabilities as priors to calculate empirical Bayesian posterior probabilities for the five hypotheses for each locus, in particular PPH4 for co-localization. For genes with conditional eQTL overlapping SCZ GWAS loci, summary statistics from all-but-one conditional eQTL analyses were assessed for co-localization with the GWAS signature (Figure 3). To illustrate this analytical strategy, we show eQTL results for the iron responsive element binding protein 2 gene IREB2 (MIM: 147582, chr15:78729773–78793798) as an example (Figure 4). Forward stepwise selection analysis identified two independent cis-eQTL for IREB2. In order to generate summary statistics for each eQTL in isolation, we conducted two all-but-one conditional analyses, in each analysis conditioning on all but a focal independent eQTL (for IREB2 this entailed conditioning on only one eQTL per conditional analysis, but involved conditioning on up to six eQTL per gene across all genes considered in the SCZ co-localization analysis). We then tested for co-localization between the GWAS and all of the eQTL summary statistics resulting from the above conditioning analysis using coloc2 (Table S12). In the case of IREB2, the conditional eQTL (rs7171869) was implicated as co-localized with the GWAS signal at this locus with a posterior probability for co-localization (PPH4) of 0.94. A qualitative examination of the IREB2 locus supported the coloc2 results: the correlation between the GWAS p values and conditional eQTL p values was higher than that between the GWAS and primary eQTL p values (Figure 4A). In addition, the GWAS signature for the locus more closely resembled the conditional eQTL signature than either the non-conditional eQTL signature or the primary eQTL signature (Figure 4B). Figure 3 All-but-One Conditional Analysis to Isolate Independent eQTL Signatures (A) Hypothetical GWAS signature (top, green) at a given locus and an overlapping hypothetical eQTL signature (bottom, purple), which comprises two independent eQTL. (B) Same hypothetical GWAS and eQTL signatures after the all-but-one conditional eQTL analysis isolating the primary (red) and secondary (blue) eQTL signatures. Before conditional analysis there is a lack of co-localization between the GWAS signature and eQTL signature. After all-but-one conditional analysis, there is evidence for co-localization between the conditional (secondary) eQTL and GWAS signatures. Figure 4 GWAS Signature for IREB2 Co-localizes with the Conditional eQTL Signature (A) P-P plots comparing −log10 p values from GWAS (y axes) and all-but-one conditional eQTL analysis (x axes), which show the highest correlation to be between the GWAS and the conditional eQTL rs7171869 (blue, bottom). (B) LocusZoom plots for the IREB2 locus, where the GWAS signal (top) more closely resembles the conditional eQTL signal (rs7171869, bottom) than the primary eQTL signal (rs11639224, third from top) or non-conditional eQTL signal (second from top). For all LocusZoom plots, LD is colored with respect to the GWAS lead SNP (rs8042374, labeled). We found that 40 loci contained genes with strong evidence of co-localization between eQTL and GWAS signatures, with posterior probability of H4 (PPH4) ≥ 0.8 (Table 2). When restricting to genome-wide significance for the GWAS, we found co-localization in 24 of the 108 loci. Given the correlations between number of independent eQTL and expression specificity scores (Tau) across tissues, cell types, and development, we tabulated the reported genes’ Tau percentiles and expression levels, to highlight contexts in which the genes are specifically expressed (Table 2, Table S8). We acknowledge that while posterior probability PPH4 ≥ 0.8 demonstrates strong Bayesian evidence for co-localization, it is an arbitrary threshold for characterizing loci as GWAS-eQTL co-localized; we find that many loci with PPH4 ≥ 0.5 appear qualitatively consistent with co-localization. Table 2 GWAS-eQTL Co-localized Loci Chr GWAS Locus Start GWAS Locus End GWAS Lead SNP GWAS p Value eSNP eSNP p Value Primary/Conditional PPH4 Gene Relevant Tissue/Cell Type/Developmental Period 1 2372401 2402501 rs4648845 4.03E−09 rs12037821 4.9E−04 conditional 0.87 SLC35E2 –/–/early mid-prenatal 1 8355697 8638984 rs301797 2.03E−09 rs138050288 1.8E−04 primary 0.95 RERE –/–/– 1 30412551 30443951 rs1498232 1.28E−09 rs2015244 1.8E−08 primary 0.99 PTPRU –/neurons /early mid-prenatal 1 163582923 163766623 rs7521492 5.64E−07 rs10799961 3.18E−11 primary 0.91 PBX1 –/–/early prenatal 1 205015255 205189455 rs16937 8.69E−07 rs12724651 7.31E−07 primary 0.89 TMEM81 –/neurons/– rs12031350 8.15E−06 conditional 0.87 RBBP5 –/–/– 1 214137889 214163689 rs7529073 9.69E−07 rs1431983 1.67E−04 conditional 0.93 PROX1-AS1 cerebellar hemisphere/neurons/adult 2 73194203 73900439 rs56145559 8.42E−08 rs11679809 1.85E−34 primary 0.86 ALMS1P testis/–/– 2 110262036 110398236 rs9330316 7.69E−08 rs892464 2.35E−26 primary 0.92 SEPT10 –/–/late prenatal 2 198148577 198835577 rs6434928 1.48E−11 rs12621129 6.06E−12 primary 0.94 SF3B1 –/–/– 2 200715237 201247789 rs281768 1.78E−14 rs35220450 3.46E−14 primary 0.95 FTCDNL1, AC073043.2 –/–/adult rs186546506 8.77E−04 conditional 0.83 LINC01792, AC007163.3 putamen (basal ganglia)/ –/adult 2 208371631 208531731 rs2709410 5.75E−07 rs34171849 5.86E−17 primary 0.88 METTL21A –/–/– rs2551656 2.85E−09 primary 0.86 CREB1 –/–/early prenatal 2 220033801 220071601 rs6707588 9.51E−07 rs13404754 1.08E−09 primary 0.92 CNPPD1 –/–/– 3 36843183 36945783 rs75968099 3.39E−12 rs9834970 1.88E−05 primary 0.94 DCLK3 nerve - tibial /neurons/infant 3 52281078 53539269 rs2535627 3.96E−11 rs6801235 2.81E−08 conditional 0.86 PPM1M –/neurons/late prenatal 3 63792650 64004050 rs832187 2.58E−08 rs113386200 1.95E−12 primary 0.98 THOC7 –/–/– 3 135807405 136615405 rs7432375 5.27E−11 rs10935184 7.71E−25 primary 0.93 PCCB –/–/– 4 170357552 170646052 rs10520163 1.02E−08 rs7438 1.02E−09 primary 0.97 CLCN3 –/–/– 5 45291475 46404116 rs1501357 1.24E−08 rs9292918 4.45E−05 primary 0.94 BRCAT54, RP11-53O19.1 –/–/adult 6 83779798 84407274 rs3798869 8.57E−10 rs2016358 1.19E−09 primary 0.90 SNAP91 cerebellar hemisphere/–/– 6 108875527 109019327 rs9398171 3.37E−08 rs111727905 3.84E−06 primary 0.97 ZNF259P1 –/–/early mid-prenatal 7 21485312 21545712 rs73060317 6.60E−07 rs141984481 3.59E−05 primary 0.92 SP4 –/–/early prenatal 8 8088038 10056127 rs2945232 2.03E−08 rs2980441 7.68E−69 primary 0.82 FAM86B3P –/–/adolescent 8 26181524 26279124 rs1042992 2.27E−07 rs17055186 3.06E−24 conditional 0.91 SDAD1P1 testis/–/adult 8 38020424 38310924 rs57709857 2.32E−07 rs201999919 1.70E−07 primary 0.88 WHSC1L1 –/–/early prenatal 8 144822546 144871746 rs11784536 1.83E−07 rs12541792 6.45E−35 primary 0.90 FAM83H esophagus - mucosa/oligodendrocytes/adolescent 9 26839508 26909408 rs10967586 4.75E−07 rs12345197 3.90E−06 primary 0.80 IFT74 –/–/– 11 46340213 46751213 rs7951870 1.97E−11 rs16938506 5.08E−05 primary 0.88 MDK –/–/early mid-prenatal 12 57428314 57497814 rs324017 2.13E−07 rs4559 2.02E−05 conditional 0.91 STAT6 –/microglia/adolescent 14 35421614 35847614 rs77477310 1.52E−07 rs1028449 8.09E−04 primary 0.84 RP11-85K15.2 –/–/– 15 78803032 78926732 rs8042374 1.87E−12 rs7171869 1.44E−04 conditional 0.94 IREB2 –/–/early prenatal 15 84661161 85153461 rs950169 7.62E−11 rs35677834 1.54E−34 primary 0.80 LOC101929479, RP11-561C5.3 ovary/–/early mid-prenatal 15 91416560 91436560 rs4702 2.30E−12 rs4702 4.49E−13 primary 1.00 FURIN –/endothelial cells/adolescent 16 4447751 4596451 rs6500602 2.79E−07 rs3747580 4.75E−16 primary 0.90 CORO7 –/–/– rs8046295 2.68E−11 primary 0.89 NMRAL1 –/–/– 16 29924377 30144877 rs12691307 1.30E−10 rs4788203 1.95E−05 primary 0.88 TMEM219 –/–/– rs3935873 7.46E−14 primary 0.87 INO80E –/neurons/– rs4787491 1.60E−04 conditional 0.82 DOC2A brain - cortex/neurons/adolescent 16 58669293 58691393 rs12325245 1.15E−08 rs11647976 4.83E−04 primary 0.94 CNOT1 –/–/– 17 17722402 18030202 rs8082590 6.84E−09 rs4072739 4.74E−13 primary 0.92 DRG2 –/–/– 19 11839736 11859736 rs72986630 4.64E−08 rs72986630 2.20E−14 primary 1.00 ZNF823 –/endothelial cells/early prenatal 19 19374022 19658022 rs2905426 6.92E−09 rs2965199 9.22E−36 primary 0.87 GATAD2A –/–/– 19 50067499 50135399 rs56873913 2.19E−07 rs5023763 9.32E−05 primary 0.93 SNRNP70 –/–/– 22 41408556 42689414 rs9607782 6.76E−12 rs200447424 1.87E−04 primary 0.96 RANGAP1 –/–/– Importantly, for 6 of the 40 co-localizing loci, a conditional rather than primary eQTL co-localized with the GWAS with compelling qualitative support (Table 2, Figure 4, Table S11, Figures S13–S17). The genes showing strong evidence for conditional eQTL co-localization include SLC35E2, PROX1-AS1 (MIM: 601546), PPM1M (MIM: 608979), SDAD1P1, STAT6 (MIM: 601512), and IREB2. Also notable are the occurrences of complex patterns of co-localization for some loci; for example, three loci showed evidence for co-localization with a primary eQTL for one gene and a conditional eQTL for another. Comparison with Previous Co-localization Analyses In the prior CMC study, a GWAS-eQTL co-localization analysis implemented in Sherlock and using non-conditional eQTL summary statistics reported a total of 18 co-localized loci, representing 17% of the 108 genome-wide significant loci examined. Through our all-but-one conditional co-localization analysis, we replicate the majority of their findings and detect an additional 13 instances of co-localization, bringing the total number of co-localizations when considering only the genome-wide significant (and not including the MHC) loci up to 24 (representing 22% of these 108 loci) (Table S9). These 13 comprise instances of conditional eQTL co-localization (for genes SLC35E2 and IREB2) and improved detection of primary eQTL co-localization due to isolation of independent eQTL signatures and our choice of co-localization software (coloc2). Of the six co-localized loci identified in the previous but not current analysis, three resulted from differences in study design such as GWAS locus definition and eQTL overlap criteria, and two were suggestive in the current analysis (0.65 < PPH4 < 0.8). The one remaining discrepant locus (chr8:143302933–143403527) was found to co-localize with TSNARE1 eQTL previously (Sherlock p = 8.24 × 10−7) but not here (coloc2 primary eQTL PPH4 = 0.074, PPH3 = 0.93). A qualitative comparison of the eQTL and GWAS data (Figure S18) did not appear to support co-localization; while the strongest GWAS association and the strongest eQTL are in close physical proximity, the LD between the two index SNPs is low (r2∼0.2–0.4). Additionally, our attempts to disentangle independent eQTL signal via conditional analysis do not reveal the GWAS index SNP to be in high LD with any of the conditionally independent eQTL peaks. We also compared our conditional co-localization results with those from non-conditional eQTL analysis, using coloc2 and the same SCZ GWAS loci (Table S10). Conditional and non-conditional coloc2 results were highly concordant, with slightly higher PPH4s resulting from the same WABFs due to a higher prior probability of co-localization estimated in the non-conditional coloc2 analysis. Thirty-five loci were co-localized in both analyses; five loci that were co-localized in the non-conditional analysis only were highly suggestive in the conditional analysis (0.65 < PPH4 < 0.8), and the five loci that were co-localized only in the conditional coloc2 analysis involved conditional eQTL, emphasizing the utility of the conditional analysis. This conditional eQTL co-localization represents a substantial proportion (∼15%) of all instances of co-localization, and furthermore could reflect context-specific differential expression that has the potential to implicate cell types, tissue types, and developmental stages that are relevant to disease etiology.
title	Results
sec	Identification of eQTL Primary and conditional eQTL were identified using genotype and RNA-seq data from the CommonMind Consortium post-mortem DLPFC samples (467 European-ancestry case and control subjects).16 We identified 12,813 primary and 16,082 conditional eQTL, totaling 28,895 independent eQTL. Of the genes tested, 81% (12,813 of 15,817 autosomal genes) had at least one eQTL and 63% of these (51% of all genes) also had at least one conditional eQTL, with an average of 1.83 independent eQTL per gene (2.26 among those with at least one eQTL) (Figure 1A). Conversely, when examining the distributions for the number of genes whose expression was affected by each eQTL (Table S1), the majority of eQTL were specific for a single gene, and only a small fraction of eQTL, 1.47%, affected more than one gene, with a maximum of six genes affected by a single eQTL. Figure 1 Characterization of Conditional eQTL (A) Counts of the numbers of genes (y axis) regulated by at least N (1 ≤ N ≤ 16) independent eQTL (x axis). (B) Median Tau value (y axis) for genes with N independent eQTL (x axis), colored by Tau type (cell type, developmental time point, or tissue type Tau). (C) Density plot representing the distance from eSNP to eGene transcription start site (TSS), colored by eQTL order. Dashed lines represent the median distance to TSS for each order of eQTL. We tested for replication of conditional eQTL in two independent datasets, the National Institute of Mental Health’s Human Brain Collection Core (HBCC, n = 279, microarray expression data) and the Religious Orders Study/Memory and Aging Project22 (ROSMAP, n = 494, RNA-seq expression). For each gene the same models were evaluated that were identified in forward-stepwise conditional analysis in the CMC data. We observed significant evidence of replication for both primary and conditional eQTL in the HBCC and ROSMAP post-mortem brain cohorts (Table S2). The estimated proportion of true associations (π1) in ROSMAP was 0.57 and 0.26 for primary and conditional eQTL, respectively; in HBCC π1 was 0.46 and 0.20 for primary and conditional eQTL. Therefore, replication was stronger for primary than for conditional eQTL, as expected given their stronger effect sizes. Replication rates were somewhat higher in the RNA-seq ROSMAP data than in HBCC.
title	Identification of eQTL
p	Primary and conditional eQTL were identified using genotype and RNA-seq data from the CommonMind Consortium post-mortem DLPFC samples (467 European-ancestry case and control subjects).16 We identified 12,813 primary and 16,082 conditional eQTL, totaling 28,895 independent eQTL. Of the genes tested, 81% (12,813 of 15,817 autosomal genes) had at least one eQTL and 63% of these (51% of all genes) also had at least one conditional eQTL, with an average of 1.83 independent eQTL per gene (2.26 among those with at least one eQTL) (Figure 1A). Conversely, when examining the distributions for the number of genes whose expression was affected by each eQTL (Table S1), the majority of eQTL were specific for a single gene, and only a small fraction of eQTL, 1.47%, affected more than one gene, with a maximum of six genes affected by a single eQTL. Figure 1 Characterization of Conditional eQTL (A) Counts of the numbers of genes (y axis) regulated by at least N (1 ≤ N ≤ 16) independent eQTL (x axis). (B) Median Tau value (y axis) for genes with N independent eQTL (x axis), colored by Tau type (cell type, developmental time point, or tissue type Tau). (C) Density plot representing the distance from eSNP to eGene transcription start site (TSS), colored by eQTL order. Dashed lines represent the median distance to TSS for each order of eQTL.
figure	Figure 1 Characterization of Conditional eQTL (A) Counts of the numbers of genes (y axis) regulated by at least N (1 ≤ N ≤ 16) independent eQTL (x axis). (B) Median Tau value (y axis) for genes with N independent eQTL (x axis), colored by Tau type (cell type, developmental time point, or tissue type Tau). (C) Density plot representing the distance from eSNP to eGene transcription start site (TSS), colored by eQTL order. Dashed lines represent the median distance to TSS for each order of eQTL.
label	Figure 1
caption	Characterization of Conditional eQTL (A) Counts of the numbers of genes (y axis) regulated by at least N (1 ≤ N ≤ 16) independent eQTL (x axis). (B) Median Tau value (y axis) for genes with N independent eQTL (x axis), colored by Tau type (cell type, developmental time point, or tissue type Tau). (C) Density plot representing the distance from eSNP to eGene transcription start site (TSS), colored by eQTL order. Dashed lines represent the median distance to TSS for each order of eQTL.
p	Characterization of Conditional eQTL
p	(A) Counts of the numbers of genes (y axis) regulated by at least N (1 ≤ N ≤ 16) independent eQTL (x axis).
p	(B) Median Tau value (y axis) for genes with N independent eQTL (x axis), colored by Tau type (cell type, developmental time point, or tissue type Tau).
p	(C) Density plot representing the distance from eSNP to eGene transcription start site (TSS), colored by eQTL order. Dashed lines represent the median distance to TSS for each order of eQTL.
p	We tested for replication of conditional eQTL in two independent datasets, the National Institute of Mental Health’s Human Brain Collection Core (HBCC, n = 279, microarray expression data) and the Religious Orders Study/Memory and Aging Project22 (ROSMAP, n = 494, RNA-seq expression). For each gene the same models were evaluated that were identified in forward-stepwise conditional analysis in the CMC data. We observed significant evidence of replication for both primary and conditional eQTL in the HBCC and ROSMAP post-mortem brain cohorts (Table S2). The estimated proportion of true associations (π1) in ROSMAP was 0.57 and 0.26 for primary and conditional eQTL, respectively; in HBCC π1 was 0.46 and 0.20 for primary and conditional eQTL. Therefore, replication was stronger for primary than for conditional eQTL, as expected given their stronger effect sizes. Replication rates were somewhat higher in the RNA-seq ROSMAP data than in HBCC.
sec	Genomic Characterization of Primary and Conditional eQTL The features for which primary and conditional eQTL and their respective eGenes displayed identifiable differences included distance from eQTL to its gene’s transcription start site (TSS), gene length, LD blocks per genic cis-region, genic constraint score, and genic cis-SNP-heritability. According to prior results, eQTL that are shared across tissues and cell types tend to be located closer to transcription start sites than context-specific eQTL;13, 14 we therefore first examined the relationship between primary or conditional eQTL status and distance to genic TSS. Primary eQTL fall closer to the TSS than conditional eQTL (Figure 1C): primary eQTL occur at a median distance of 70.4 kb from the TSS versus a median distance of 302 kb for conditional eQTL. This difference holds true even more proximally to the TSS (Figure S2); 8.1% and 2.5% of primary and conditional eQTL, respectively, fall within 3 kb of the TSS. We next characterized the relationship between the number of independent eQTL per gene and three different genomic features: gene length, number of LD blocks48 in the gene’s cis-region (±1 Mb), and Exome Aggregation Consortium (ExAC) genic constraint score,49 including possible interactions. The best multivariate model for eQTL number included gene length, number of LD blocks, and genic constraint as predictors, as well as a gene length-LD blocks interaction (Table 1). The number of independent eQTL was positively correlated with gene length and number of LD blocks and negatively correlated with genic constraint score (Figure S3). We then examined the variance of gene expression explained by cis-region SNPs, or cis-SNP-heritability, estimated by linear mixed model variance component analysis25 (Figure S4). We found a strong effect of estimated cis-heritability on number of independent eQTL (Table 1, Figure S5). In a joint model with cis-SNP-heritability, the main effects of gene length, number of LD blocks, and genic constraint on eQTL number remained at least nominally significant. Table 1 Number of eQTL per Gene Modeled on Genomic Features Predictor Model 1 Estimate Model 1 Robust SE Model 1 Pr(> \|z\|) Model 2 Estimate Model 2 Robust SE Model 2 Pr(> \|z\|) Model 3 Estimate Model 3 Robust SE Model 3 Pr(> \|z\|) log(Gene length) 0.27 0.04 5.16E−12 0.16 0.03 2.20E−06 0.17 0.03 9.87E−07 LD blocks 0.59 0.17 6.47E−04 0.33 0.15 2.92E−02 0.37 0.15 1.55E−02 log(Gene length): LD blocks −0.03 0.02 7.77E−02 −0.01 0.01 5.65E−01 −0.01 0.01 4.11E−01 Constraint −0.61 0.03 5.93E−85 −0.20 0.03 2.93E−13 −0.15 0.03 5.41E−08 cis-heritability – – – 7.03 0.18 0.00 7.02 0.18 0.00 Tau (tissue) – – – – – – 0.08 0.08 2.76E−01 Tau (DLPFC cell type) – – – – – – 0.20 0.09 3.69E−02 Tau (developmental time point) – – – – – – 0.17 0.09 5.99E−02 We then addressed whether genes with conditional eQTL exhibit greater context specificity as measured by the robust expression specificity metric Tau.26, 27 We calculated Tau across 53 tissues from the Genotype-Tissue Expression (GTEx) project, across 6 DLPFC cell types (astrocytes, endothelial cells, microglia, neurons, oligodendrocytes, and oligodendrocyte progenitor cells) from single-cell RNA-seq,29 and across 8 developmental periods30 (early prenatal, early mid-prenatal, late mid-prenatal, late prenatal, infant, child, adolescent, and adult) from the BrainSpan atlas DLPFC RNA-seq data. We confirmed that higher values of Tau reflect expression specificity by comparing the distributions of all three Tau measures for all genes with the distributions for a subset of housekeeping genes50 (Figure S6). We found positive correlations between eQTL number and tissue, cell type, and developmental time point specificities (Figure 1B, Table 1, Table S3, Figure S7). In a joint model, the strongest correlation was with DLPFC cell type Tau, which is consistent with previous data demonstrating tissue-specific, cell type-dependent expression in blood;12 however, we note that all three Tau sets were inter-correlated (Table S3).
title	Genomic Characterization of Primary and Conditional eQTL
p	The features for which primary and conditional eQTL and their respective eGenes displayed identifiable differences included distance from eQTL to its gene’s transcription start site (TSS), gene length, LD blocks per genic cis-region, genic constraint score, and genic cis-SNP-heritability. According to prior results, eQTL that are shared across tissues and cell types tend to be located closer to transcription start sites than context-specific eQTL;13, 14 we therefore first examined the relationship between primary or conditional eQTL status and distance to genic TSS. Primary eQTL fall closer to the TSS than conditional eQTL (Figure 1C): primary eQTL occur at a median distance of 70.4 kb from the TSS versus a median distance of 302 kb for conditional eQTL. This difference holds true even more proximally to the TSS (Figure S2); 8.1% and 2.5% of primary and conditional eQTL, respectively, fall within 3 kb of the TSS. We next characterized the relationship between the number of independent eQTL per gene and three different genomic features: gene length, number of LD blocks48 in the gene’s cis-region (±1 Mb), and Exome Aggregation Consortium (ExAC) genic constraint score,49 including possible interactions. The best multivariate model for eQTL number included gene length, number of LD blocks, and genic constraint as predictors, as well as a gene length-LD blocks interaction (Table 1). The number of independent eQTL was positively correlated with gene length and number of LD blocks and negatively correlated with genic constraint score (Figure S3). We then examined the variance of gene expression explained by cis-region SNPs, or cis-SNP-heritability, estimated by linear mixed model variance component analysis25 (Figure S4). We found a strong effect of estimated cis-heritability on number of independent eQTL (Table 1, Figure S5). In a joint model with cis-SNP-heritability, the main effects of gene length, number of LD blocks, and genic constraint on eQTL number remained at least nominally significant. Table 1 Number of eQTL per Gene Modeled on Genomic Features Predictor Model 1 Estimate Model 1 Robust SE Model 1 Pr(> \|z\|) Model 2 Estimate Model 2 Robust SE Model 2 Pr(> \|z\|) Model 3 Estimate Model 3 Robust SE Model 3 Pr(> \|z\|) log(Gene length) 0.27 0.04 5.16E−12 0.16 0.03 2.20E−06 0.17 0.03 9.87E−07 LD blocks 0.59 0.17 6.47E−04 0.33 0.15 2.92E−02 0.37 0.15 1.55E−02 log(Gene length): LD blocks −0.03 0.02 7.77E−02 −0.01 0.01 5.65E−01 −0.01 0.01 4.11E−01 Constraint −0.61 0.03 5.93E−85 −0.20 0.03 2.93E−13 −0.15 0.03 5.41E−08 cis-heritability – – – 7.03 0.18 0.00 7.02 0.18 0.00 Tau (tissue) – – – – – – 0.08 0.08 2.76E−01 Tau (DLPFC cell type) – – – – – – 0.20 0.09 3.69E−02 Tau (developmental time point) – – – – – – 0.17 0.09 5.99E−02
table-wrap	Table 1 Number of eQTL per Gene Modeled on Genomic Features Predictor Model 1 Estimate Model 1 Robust SE Model 1 Pr(> \|z\|) Model 2 Estimate Model 2 Robust SE Model 2 Pr(> \|z\|) Model 3 Estimate Model 3 Robust SE Model 3 Pr(> \|z\|) log(Gene length) 0.27 0.04 5.16E−12 0.16 0.03 2.20E−06 0.17 0.03 9.87E−07 LD blocks 0.59 0.17 6.47E−04 0.33 0.15 2.92E−02 0.37 0.15 1.55E−02 log(Gene length): LD blocks −0.03 0.02 7.77E−02 −0.01 0.01 5.65E−01 −0.01 0.01 4.11E−01 Constraint −0.61 0.03 5.93E−85 −0.20 0.03 2.93E−13 −0.15 0.03 5.41E−08 cis-heritability – – – 7.03 0.18 0.00 7.02 0.18 0.00 Tau (tissue) – – – – – – 0.08 0.08 2.76E−01 Tau (DLPFC cell type) – – – – – – 0.20 0.09 3.69E−02 Tau (developmental time point) – – – – – – 0.17 0.09 5.99E−02
label	Table 1
caption	Number of eQTL per Gene Modeled on Genomic Features
p	Number of eQTL per Gene Modeled on Genomic Features
table	Predictor Model 1 Estimate Model 1 Robust SE Model 1 Pr(> \|z\|) Model 2 Estimate Model 2 Robust SE Model 2 Pr(> \|z\|) Model 3 Estimate Model 3 Robust SE Model 3 Pr(> \|z\|) log(Gene length) 0.27 0.04 5.16E−12 0.16 0.03 2.20E−06 0.17 0.03 9.87E−07 LD blocks 0.59 0.17 6.47E−04 0.33 0.15 2.92E−02 0.37 0.15 1.55E−02 log(Gene length): LD blocks −0.03 0.02 7.77E−02 −0.01 0.01 5.65E−01 −0.01 0.01 4.11E−01 Constraint −0.61 0.03 5.93E−85 −0.20 0.03 2.93E−13 −0.15 0.03 5.41E−08 cis-heritability – – – 7.03 0.18 0.00 7.02 0.18 0.00 Tau (tissue) – – – – – – 0.08 0.08 2.76E−01 Tau (DLPFC cell type) – – – – – – 0.20 0.09 3.69E−02 Tau (developmental time point) – – – – – – 0.17 0.09 5.99E−02
tr	Predictor Model 1 Estimate Model 1 Robust SE Model 1 Pr(> \|z\|) Model 2 Estimate Model 2 Robust SE Model 2 Pr(> \|z\|) Model 3 Estimate Model 3 Robust SE Model 3 Pr(> \|z\|)
th	Predictor
th	Model 1 Estimate
th	Model 1 Robust SE
th	Model 1 Pr(> \|z\|)
th	Model 2 Estimate
th	Model 2 Robust SE
th	Model 2 Pr(> \|z\|)
th	Model 3 Estimate
th	Model 3 Robust SE
th	Model 3 Pr(> \|z\|)
tr	log(Gene length) 0.27 0.04 5.16E−12 0.16 0.03 2.20E−06 0.17 0.03 9.87E−07
td	log(Gene length)
td	0.27
td	0.04
td	5.16E−12
td	0.16
td	0.03
td	2.20E−06
td	0.17
td	0.03
td	9.87E−07
tr	LD blocks 0.59 0.17 6.47E−04 0.33 0.15 2.92E−02 0.37 0.15 1.55E−02
td	LD blocks
td	0.59
td	0.17
td	6.47E−04
td	0.33
td	0.15
td	2.92E−02
td	0.37
td	0.15
td	1.55E−02
tr	log(Gene length): LD blocks −0.03 0.02 7.77E−02 −0.01 0.01 5.65E−01 −0.01 0.01 4.11E−01
td	log(Gene length): LD blocks
td	−0.03
td	0.02
td	7.77E−02
td	−0.01
td	0.01
td	5.65E−01
td	−0.01
td	0.01
td	4.11E−01
tr	Constraint −0.61 0.03 5.93E−85 −0.20 0.03 2.93E−13 −0.15 0.03 5.41E−08
td	Constraint
td	−0.61
td	0.03
td	5.93E−85
td	−0.20
td	0.03
td	2.93E−13
td	−0.15
td	0.03
td	5.41E−08
tr	cis-heritability – – – 7.03 0.18 0.00 7.02 0.18 0.00
td	cis-heritability
td	–
td	–
td	–
td	7.03
td	0.18
td	0.00
td	7.02
td	0.18
td	0.00
tr	Tau (tissue) – – – – – – 0.08 0.08 2.76E−01
td	Tau (tissue)
td	–
td	–
td	–
td	–
td	–
td	–
td	0.08
td	0.08
td	2.76E−01
tr	Tau (DLPFC cell type) – – – – – – 0.20 0.09 3.69E−02
td	Tau (DLPFC cell type)
td	–
td	–
td	–
td	–
td	–
td	–
td	0.20
td	0.09
td	3.69E−02
tr	Tau (developmental time point) – – – – – – 0.17 0.09 5.99E−02
td	Tau (developmental time point)
td	–
td	–
td	–
td	–
td	–
td	–
td	0.17
td	0.09
td	5.99E−02
p	We then addressed whether genes with conditional eQTL exhibit greater context specificity as measured by the robust expression specificity metric Tau.26, 27 We calculated Tau across 53 tissues from the Genotype-Tissue Expression (GTEx) project, across 6 DLPFC cell types (astrocytes, endothelial cells, microglia, neurons, oligodendrocytes, and oligodendrocyte progenitor cells) from single-cell RNA-seq,29 and across 8 developmental periods30 (early prenatal, early mid-prenatal, late mid-prenatal, late prenatal, infant, child, adolescent, and adult) from the BrainSpan atlas DLPFC RNA-seq data. We confirmed that higher values of Tau reflect expression specificity by comparing the distributions of all three Tau measures for all genes with the distributions for a subset of housekeeping genes50 (Figure S6). We found positive correlations between eQTL number and tissue, cell type, and developmental time point specificities (Figure 1B, Table 1, Table S3, Figure S7). In a joint model, the strongest correlation was with DLPFC cell type Tau, which is consistent with previous data demonstrating tissue-specific, cell type-dependent expression in blood;12 however, we note that all three Tau sets were inter-correlated (Table S3).
sec	Epigenetic Enrichment Analyses One way in which eQTL may affect gene expression is through alteration of cis-regulatory elements such as promoters and enhancers. Putative causal eSNPs have been shown to be enriched in genomic regions containing functional annotations such as DNase hypersensitive sites, transcription factor binding sites, promoters, and enhancers.51, 52, 53, 54 Our observation that conditional eQTL fall farther from transcription start sites than primary eQTL led us to hypothesize that primary eQTL may affect transcription levels by altering functional sites in promoters whereas conditional eQTL may do so by altering more distal regulatory elements such as enhancers. We therefore assessed enrichment of primary and conditional eQTL in brain active promoter (TssA) and enhancer (merged Enh and EnhG) states derived from the NIH Roadmap Epigenomics Project,32, 33 and in H3K4me3 and H3K27ac neuronal (NeuN+) and non-neuronal (NeuN−) ChIP-seq peaks from a subset of the CMC post-mortem DLPFC samples. The overlap of H3K4me3 and H3K27ac ChIP-seq peaks was used as a proxy for active promoters, and H3K27ac peaks that do not overlap H3K4me3 peaks were used as a (relatively non-specific) proxy for enhancers.33 We performed logistic regression of SNP status (eQTL versus random matched SNP) on overlap with functional annotations, separately for each eQTL order (primary, secondary, and greater than secondary). Primary and conditional eQTL were significantly enriched in both promoter and enhancer chromatin states from REMC brain and CMC DLPFC tissues, with greatest enrichments overall observed in PFC neuronal (NeuN+) promoters and enhancers (Figure 2, Table S4). We found that whereas active promoter enrichments in all tissue/cell types markedly decreased with higher conditional order of eQTL, enhancer enrichments either only slightly decreased (REMC brain and PFC NeuN+, Figures 2A and 2C) or remained level (REMC brain-specific, Figure 2B). Though there was also significant enrichment of eQTL in non-neuronal nuclei (NeuN−) promoters and enhancers, this trend of a marked decrease in active promoters but steady levels of enhancer enrichment with greater eQTL order was not observed for non-neuronal PFC nuclei (Figure 2D). This greater decrease in enrichment for promoters compared to enhancers with increasing eQTL order was not confounded by an excess of eQTL near brain-expressed genes in comparison to matched SNPs (Figure S8, Table S5) and furthermore was not an artifact of varying effect size with eQTL order; the same overall pattern was observed when stratifying eQTL by variance in expression explained (R2) and comparing enrichment across eQTL order, within each R2 bin (Figures S9–S12, Table S6). Figure 2 Enrichments of Primary and Conditional eQTL in Active Regulatory Annotations Plotted are enrichments (regression coefficient estimate ± 95% CI from logistic regression, y axes) of primary (x axis eQTL order = 1) and conditional (eQTL order = 2, ≥ 3) eQTL in functional annotations. (A and B) Enrichment in brain (union of all individual brain regions) and brain-specific (present in brain but not in seven other non-brain tissues) active promoter (green) and enhancer (orange) ChromHMM states from the NIH Roadmap Epigenomics Project. (C) Enrichment in neuronal nuclei (NeuN+) for active promoters (intersection of DLPFC H3K4me3 and H3K27ac ChIP-seq peaks, green) and enhancers (H3K27 peaks that do not overlap H3K4me3 peaks, orange). (D) Enrichments in the same annotations, but for DLPFC non-neuronal nuclei (NeuN−).
title	Epigenetic Enrichment Analyses
p	One way in which eQTL may affect gene expression is through alteration of cis-regulatory elements such as promoters and enhancers. Putative causal eSNPs have been shown to be enriched in genomic regions containing functional annotations such as DNase hypersensitive sites, transcription factor binding sites, promoters, and enhancers.51, 52, 53, 54 Our observation that conditional eQTL fall farther from transcription start sites than primary eQTL led us to hypothesize that primary eQTL may affect transcription levels by altering functional sites in promoters whereas conditional eQTL may do so by altering more distal regulatory elements such as enhancers. We therefore assessed enrichment of primary and conditional eQTL in brain active promoter (TssA) and enhancer (merged Enh and EnhG) states derived from the NIH Roadmap Epigenomics Project,32, 33 and in H3K4me3 and H3K27ac neuronal (NeuN+) and non-neuronal (NeuN−) ChIP-seq peaks from a subset of the CMC post-mortem DLPFC samples. The overlap of H3K4me3 and H3K27ac ChIP-seq peaks was used as a proxy for active promoters, and H3K27ac peaks that do not overlap H3K4me3 peaks were used as a (relatively non-specific) proxy for enhancers.33 We performed logistic regression of SNP status (eQTL versus random matched SNP) on overlap with functional annotations, separately for each eQTL order (primary, secondary, and greater than secondary).
p	Primary and conditional eQTL were significantly enriched in both promoter and enhancer chromatin states from REMC brain and CMC DLPFC tissues, with greatest enrichments overall observed in PFC neuronal (NeuN+) promoters and enhancers (Figure 2, Table S4). We found that whereas active promoter enrichments in all tissue/cell types markedly decreased with higher conditional order of eQTL, enhancer enrichments either only slightly decreased (REMC brain and PFC NeuN+, Figures 2A and 2C) or remained level (REMC brain-specific, Figure 2B). Though there was also significant enrichment of eQTL in non-neuronal nuclei (NeuN−) promoters and enhancers, this trend of a marked decrease in active promoters but steady levels of enhancer enrichment with greater eQTL order was not observed for non-neuronal PFC nuclei (Figure 2D). This greater decrease in enrichment for promoters compared to enhancers with increasing eQTL order was not confounded by an excess of eQTL near brain-expressed genes in comparison to matched SNPs (Figure S8, Table S5) and furthermore was not an artifact of varying effect size with eQTL order; the same overall pattern was observed when stratifying eQTL by variance in expression explained (R2) and comparing enrichment across eQTL order, within each R2 bin (Figures S9–S12, Table S6). Figure 2 Enrichments of Primary and Conditional eQTL in Active Regulatory Annotations Plotted are enrichments (regression coefficient estimate ± 95% CI from logistic regression, y axes) of primary (x axis eQTL order = 1) and conditional (eQTL order = 2, ≥ 3) eQTL in functional annotations. (A and B) Enrichment in brain (union of all individual brain regions) and brain-specific (present in brain but not in seven other non-brain tissues) active promoter (green) and enhancer (orange) ChromHMM states from the NIH Roadmap Epigenomics Project. (C) Enrichment in neuronal nuclei (NeuN+) for active promoters (intersection of DLPFC H3K4me3 and H3K27ac ChIP-seq peaks, green) and enhancers (H3K27 peaks that do not overlap H3K4me3 peaks, orange). (D) Enrichments in the same annotations, but for DLPFC non-neuronal nuclei (NeuN−).
figure	Figure 2 Enrichments of Primary and Conditional eQTL in Active Regulatory Annotations Plotted are enrichments (regression coefficient estimate ± 95% CI from logistic regression, y axes) of primary (x axis eQTL order = 1) and conditional (eQTL order = 2, ≥ 3) eQTL in functional annotations. (A and B) Enrichment in brain (union of all individual brain regions) and brain-specific (present in brain but not in seven other non-brain tissues) active promoter (green) and enhancer (orange) ChromHMM states from the NIH Roadmap Epigenomics Project. (C) Enrichment in neuronal nuclei (NeuN+) for active promoters (intersection of DLPFC H3K4me3 and H3K27ac ChIP-seq peaks, green) and enhancers (H3K27 peaks that do not overlap H3K4me3 peaks, orange). (D) Enrichments in the same annotations, but for DLPFC non-neuronal nuclei (NeuN−).
label	Figure 2
caption	Enrichments of Primary and Conditional eQTL in Active Regulatory Annotations Plotted are enrichments (regression coefficient estimate ± 95% CI from logistic regression, y axes) of primary (x axis eQTL order = 1) and conditional (eQTL order = 2, ≥ 3) eQTL in functional annotations. (A and B) Enrichment in brain (union of all individual brain regions) and brain-specific (present in brain but not in seven other non-brain tissues) active promoter (green) and enhancer (orange) ChromHMM states from the NIH Roadmap Epigenomics Project. (C) Enrichment in neuronal nuclei (NeuN+) for active promoters (intersection of DLPFC H3K4me3 and H3K27ac ChIP-seq peaks, green) and enhancers (H3K27 peaks that do not overlap H3K4me3 peaks, orange). (D) Enrichments in the same annotations, but for DLPFC non-neuronal nuclei (NeuN−).
p	Enrichments of Primary and Conditional eQTL in Active Regulatory Annotations
p	Plotted are enrichments (regression coefficient estimate ± 95% CI from logistic regression, y axes) of primary (x axis eQTL order = 1) and conditional (eQTL order = 2, ≥ 3) eQTL in functional annotations.
p	(A and B) Enrichment in brain (union of all individual brain regions) and brain-specific (present in brain but not in seven other non-brain tissues) active promoter (green) and enhancer (orange) ChromHMM states from the NIH Roadmap Epigenomics Project.
p	(C) Enrichment in neuronal nuclei (NeuN+) for active promoters (intersection of DLPFC H3K4me3 and H3K27ac ChIP-seq peaks, green) and enhancers (H3K27 peaks that do not overlap H3K4me3 peaks, orange).
p	(D) Enrichments in the same annotations, but for DLPFC non-neuronal nuclei (NeuN−).
sec	eQTL Co-localization with SCZ GWAS We performed co-localization analyses in order to evaluate the extent of overlap between eQTL and GWAS signatures in schizophrenia and to identify putative causal genes from GWAS associations. Considering 217 loci (Table S7) with lead SNPs reaching a significance threshold of p < 1 × 10−6 from the 2014 Psychiatric Genomics Consortium (PGC) schizophrenia GWAS,35 we tabulated the number of primary and conditional eQTL falling within GWAS loci. A total of 114 out of 217 loci contained primary and/or conditional eQTL for 346 genes; 110 of these genes had one eQTL only and 236 genes had more than one independent eQTL. To quantitatively compare the SCZ GWAS and eQTL association signatures, we modified the R package coloc39 for Bayesian inference of co-localization between the two sets of summary statistics across each gene’s cis-region. Coloc2, our modified implementation of coloc, analyzes the hierarchical model of gwas-pw,43 with likelihood-based estimation of dataset-wide probabilities of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized). We then used these probabilities as priors to calculate empirical Bayesian posterior probabilities for the five hypotheses for each locus, in particular PPH4 for co-localization. For genes with conditional eQTL overlapping SCZ GWAS loci, summary statistics from all-but-one conditional eQTL analyses were assessed for co-localization with the GWAS signature (Figure 3). To illustrate this analytical strategy, we show eQTL results for the iron responsive element binding protein 2 gene IREB2 (MIM: 147582, chr15:78729773–78793798) as an example (Figure 4). Forward stepwise selection analysis identified two independent cis-eQTL for IREB2. In order to generate summary statistics for each eQTL in isolation, we conducted two all-but-one conditional analyses, in each analysis conditioning on all but a focal independent eQTL (for IREB2 this entailed conditioning on only one eQTL per conditional analysis, but involved conditioning on up to six eQTL per gene across all genes considered in the SCZ co-localization analysis). We then tested for co-localization between the GWAS and all of the eQTL summary statistics resulting from the above conditioning analysis using coloc2 (Table S12). In the case of IREB2, the conditional eQTL (rs7171869) was implicated as co-localized with the GWAS signal at this locus with a posterior probability for co-localization (PPH4) of 0.94. A qualitative examination of the IREB2 locus supported the coloc2 results: the correlation between the GWAS p values and conditional eQTL p values was higher than that between the GWAS and primary eQTL p values (Figure 4A). In addition, the GWAS signature for the locus more closely resembled the conditional eQTL signature than either the non-conditional eQTL signature or the primary eQTL signature (Figure 4B). Figure 3 All-but-One Conditional Analysis to Isolate Independent eQTL Signatures (A) Hypothetical GWAS signature (top, green) at a given locus and an overlapping hypothetical eQTL signature (bottom, purple), which comprises two independent eQTL. (B) Same hypothetical GWAS and eQTL signatures after the all-but-one conditional eQTL analysis isolating the primary (red) and secondary (blue) eQTL signatures. Before conditional analysis there is a lack of co-localization between the GWAS signature and eQTL signature. After all-but-one conditional analysis, there is evidence for co-localization between the conditional (secondary) eQTL and GWAS signatures. Figure 4 GWAS Signature for IREB2 Co-localizes with the Conditional eQTL Signature (A) P-P plots comparing −log10 p values from GWAS (y axes) and all-but-one conditional eQTL analysis (x axes), which show the highest correlation to be between the GWAS and the conditional eQTL rs7171869 (blue, bottom). (B) LocusZoom plots for the IREB2 locus, where the GWAS signal (top) more closely resembles the conditional eQTL signal (rs7171869, bottom) than the primary eQTL signal (rs11639224, third from top) or non-conditional eQTL signal (second from top). For all LocusZoom plots, LD is colored with respect to the GWAS lead SNP (rs8042374, labeled). We found that 40 loci contained genes with strong evidence of co-localization between eQTL and GWAS signatures, with posterior probability of H4 (PPH4) ≥ 0.8 (Table 2). When restricting to genome-wide significance for the GWAS, we found co-localization in 24 of the 108 loci. Given the correlations between number of independent eQTL and expression specificity scores (Tau) across tissues, cell types, and development, we tabulated the reported genes’ Tau percentiles and expression levels, to highlight contexts in which the genes are specifically expressed (Table 2, Table S8). We acknowledge that while posterior probability PPH4 ≥ 0.8 demonstrates strong Bayesian evidence for co-localization, it is an arbitrary threshold for characterizing loci as GWAS-eQTL co-localized; we find that many loci with PPH4 ≥ 0.5 appear qualitatively consistent with co-localization. Table 2 GWAS-eQTL Co-localized Loci Chr GWAS Locus Start GWAS Locus End GWAS Lead SNP GWAS p Value eSNP eSNP p Value Primary/Conditional PPH4 Gene Relevant Tissue/Cell Type/Developmental Period 1 2372401 2402501 rs4648845 4.03E−09 rs12037821 4.9E−04 conditional 0.87 SLC35E2 –/–/early mid-prenatal 1 8355697 8638984 rs301797 2.03E−09 rs138050288 1.8E−04 primary 0.95 RERE –/–/– 1 30412551 30443951 rs1498232 1.28E−09 rs2015244 1.8E−08 primary 0.99 PTPRU –/neurons /early mid-prenatal 1 163582923 163766623 rs7521492 5.64E−07 rs10799961 3.18E−11 primary 0.91 PBX1 –/–/early prenatal 1 205015255 205189455 rs16937 8.69E−07 rs12724651 7.31E−07 primary 0.89 TMEM81 –/neurons/– rs12031350 8.15E−06 conditional 0.87 RBBP5 –/–/– 1 214137889 214163689 rs7529073 9.69E−07 rs1431983 1.67E−04 conditional 0.93 PROX1-AS1 cerebellar hemisphere/neurons/adult 2 73194203 73900439 rs56145559 8.42E−08 rs11679809 1.85E−34 primary 0.86 ALMS1P testis/–/– 2 110262036 110398236 rs9330316 7.69E−08 rs892464 2.35E−26 primary 0.92 SEPT10 –/–/late prenatal 2 198148577 198835577 rs6434928 1.48E−11 rs12621129 6.06E−12 primary 0.94 SF3B1 –/–/– 2 200715237 201247789 rs281768 1.78E−14 rs35220450 3.46E−14 primary 0.95 FTCDNL1, AC073043.2 –/–/adult rs186546506 8.77E−04 conditional 0.83 LINC01792, AC007163.3 putamen (basal ganglia)/ –/adult 2 208371631 208531731 rs2709410 5.75E−07 rs34171849 5.86E−17 primary 0.88 METTL21A –/–/– rs2551656 2.85E−09 primary 0.86 CREB1 –/–/early prenatal 2 220033801 220071601 rs6707588 9.51E−07 rs13404754 1.08E−09 primary 0.92 CNPPD1 –/–/– 3 36843183 36945783 rs75968099 3.39E−12 rs9834970 1.88E−05 primary 0.94 DCLK3 nerve - tibial /neurons/infant 3 52281078 53539269 rs2535627 3.96E−11 rs6801235 2.81E−08 conditional 0.86 PPM1M –/neurons/late prenatal 3 63792650 64004050 rs832187 2.58E−08 rs113386200 1.95E−12 primary 0.98 THOC7 –/–/– 3 135807405 136615405 rs7432375 5.27E−11 rs10935184 7.71E−25 primary 0.93 PCCB –/–/– 4 170357552 170646052 rs10520163 1.02E−08 rs7438 1.02E−09 primary 0.97 CLCN3 –/–/– 5 45291475 46404116 rs1501357 1.24E−08 rs9292918 4.45E−05 primary 0.94 BRCAT54, RP11-53O19.1 –/–/adult 6 83779798 84407274 rs3798869 8.57E−10 rs2016358 1.19E−09 primary 0.90 SNAP91 cerebellar hemisphere/–/– 6 108875527 109019327 rs9398171 3.37E−08 rs111727905 3.84E−06 primary 0.97 ZNF259P1 –/–/early mid-prenatal 7 21485312 21545712 rs73060317 6.60E−07 rs141984481 3.59E−05 primary 0.92 SP4 –/–/early prenatal 8 8088038 10056127 rs2945232 2.03E−08 rs2980441 7.68E−69 primary 0.82 FAM86B3P –/–/adolescent 8 26181524 26279124 rs1042992 2.27E−07 rs17055186 3.06E−24 conditional 0.91 SDAD1P1 testis/–/adult 8 38020424 38310924 rs57709857 2.32E−07 rs201999919 1.70E−07 primary 0.88 WHSC1L1 –/–/early prenatal 8 144822546 144871746 rs11784536 1.83E−07 rs12541792 6.45E−35 primary 0.90 FAM83H esophagus - mucosa/oligodendrocytes/adolescent 9 26839508 26909408 rs10967586 4.75E−07 rs12345197 3.90E−06 primary 0.80 IFT74 –/–/– 11 46340213 46751213 rs7951870 1.97E−11 rs16938506 5.08E−05 primary 0.88 MDK –/–/early mid-prenatal 12 57428314 57497814 rs324017 2.13E−07 rs4559 2.02E−05 conditional 0.91 STAT6 –/microglia/adolescent 14 35421614 35847614 rs77477310 1.52E−07 rs1028449 8.09E−04 primary 0.84 RP11-85K15.2 –/–/– 15 78803032 78926732 rs8042374 1.87E−12 rs7171869 1.44E−04 conditional 0.94 IREB2 –/–/early prenatal 15 84661161 85153461 rs950169 7.62E−11 rs35677834 1.54E−34 primary 0.80 LOC101929479, RP11-561C5.3 ovary/–/early mid-prenatal 15 91416560 91436560 rs4702 2.30E−12 rs4702 4.49E−13 primary 1.00 FURIN –/endothelial cells/adolescent 16 4447751 4596451 rs6500602 2.79E−07 rs3747580 4.75E−16 primary 0.90 CORO7 –/–/– rs8046295 2.68E−11 primary 0.89 NMRAL1 –/–/– 16 29924377 30144877 rs12691307 1.30E−10 rs4788203 1.95E−05 primary 0.88 TMEM219 –/–/– rs3935873 7.46E−14 primary 0.87 INO80E –/neurons/– rs4787491 1.60E−04 conditional 0.82 DOC2A brain - cortex/neurons/adolescent 16 58669293 58691393 rs12325245 1.15E−08 rs11647976 4.83E−04 primary 0.94 CNOT1 –/–/– 17 17722402 18030202 rs8082590 6.84E−09 rs4072739 4.74E−13 primary 0.92 DRG2 –/–/– 19 11839736 11859736 rs72986630 4.64E−08 rs72986630 2.20E−14 primary 1.00 ZNF823 –/endothelial cells/early prenatal 19 19374022 19658022 rs2905426 6.92E−09 rs2965199 9.22E−36 primary 0.87 GATAD2A –/–/– 19 50067499 50135399 rs56873913 2.19E−07 rs5023763 9.32E−05 primary 0.93 SNRNP70 –/–/– 22 41408556 42689414 rs9607782 6.76E−12 rs200447424 1.87E−04 primary 0.96 RANGAP1 –/–/– Importantly, for 6 of the 40 co-localizing loci, a conditional rather than primary eQTL co-localized with the GWAS with compelling qualitative support (Table 2, Figure 4, Table S11, Figures S13–S17). The genes showing strong evidence for conditional eQTL co-localization include SLC35E2, PROX1-AS1 (MIM: 601546), PPM1M (MIM: 608979), SDAD1P1, STAT6 (MIM: 601512), and IREB2. Also notable are the occurrences of complex patterns of co-localization for some loci; for example, three loci showed evidence for co-localization with a primary eQTL for one gene and a conditional eQTL for another.
title	eQTL Co-localization with SCZ GWAS
p	We performed co-localization analyses in order to evaluate the extent of overlap between eQTL and GWAS signatures in schizophrenia and to identify putative causal genes from GWAS associations. Considering 217 loci (Table S7) with lead SNPs reaching a significance threshold of p < 1 × 10−6 from the 2014 Psychiatric Genomics Consortium (PGC) schizophrenia GWAS,35 we tabulated the number of primary and conditional eQTL falling within GWAS loci. A total of 114 out of 217 loci contained primary and/or conditional eQTL for 346 genes; 110 of these genes had one eQTL only and 236 genes had more than one independent eQTL.
p	To quantitatively compare the SCZ GWAS and eQTL association signatures, we modified the R package coloc39 for Bayesian inference of co-localization between the two sets of summary statistics across each gene’s cis-region. Coloc2, our modified implementation of coloc, analyzes the hierarchical model of gwas-pw,43 with likelihood-based estimation of dataset-wide probabilities of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized). We then used these probabilities as priors to calculate empirical Bayesian posterior probabilities for the five hypotheses for each locus, in particular PPH4 for co-localization.
p	For genes with conditional eQTL overlapping SCZ GWAS loci, summary statistics from all-but-one conditional eQTL analyses were assessed for co-localization with the GWAS signature (Figure 3). To illustrate this analytical strategy, we show eQTL results for the iron responsive element binding protein 2 gene IREB2 (MIM: 147582, chr15:78729773–78793798) as an example (Figure 4). Forward stepwise selection analysis identified two independent cis-eQTL for IREB2. In order to generate summary statistics for each eQTL in isolation, we conducted two all-but-one conditional analyses, in each analysis conditioning on all but a focal independent eQTL (for IREB2 this entailed conditioning on only one eQTL per conditional analysis, but involved conditioning on up to six eQTL per gene across all genes considered in the SCZ co-localization analysis). We then tested for co-localization between the GWAS and all of the eQTL summary statistics resulting from the above conditioning analysis using coloc2 (Table S12). In the case of IREB2, the conditional eQTL (rs7171869) was implicated as co-localized with the GWAS signal at this locus with a posterior probability for co-localization (PPH4) of 0.94. A qualitative examination of the IREB2 locus supported the coloc2 results: the correlation between the GWAS p values and conditional eQTL p values was higher than that between the GWAS and primary eQTL p values (Figure 4A). In addition, the GWAS signature for the locus more closely resembled the conditional eQTL signature than either the non-conditional eQTL signature or the primary eQTL signature (Figure 4B). Figure 3 All-but-One Conditional Analysis to Isolate Independent eQTL Signatures (A) Hypothetical GWAS signature (top, green) at a given locus and an overlapping hypothetical eQTL signature (bottom, purple), which comprises two independent eQTL. (B) Same hypothetical GWAS and eQTL signatures after the all-but-one conditional eQTL analysis isolating the primary (red) and secondary (blue) eQTL signatures. Before conditional analysis there is a lack of co-localization between the GWAS signature and eQTL signature. After all-but-one conditional analysis, there is evidence for co-localization between the conditional (secondary) eQTL and GWAS signatures. Figure 4 GWAS Signature for IREB2 Co-localizes with the Conditional eQTL Signature (A) P-P plots comparing −log10 p values from GWAS (y axes) and all-but-one conditional eQTL analysis (x axes), which show the highest correlation to be between the GWAS and the conditional eQTL rs7171869 (blue, bottom). (B) LocusZoom plots for the IREB2 locus, where the GWAS signal (top) more closely resembles the conditional eQTL signal (rs7171869, bottom) than the primary eQTL signal (rs11639224, third from top) or non-conditional eQTL signal (second from top). For all LocusZoom plots, LD is colored with respect to the GWAS lead SNP (rs8042374, labeled).
figure	Figure 3 All-but-One Conditional Analysis to Isolate Independent eQTL Signatures (A) Hypothetical GWAS signature (top, green) at a given locus and an overlapping hypothetical eQTL signature (bottom, purple), which comprises two independent eQTL. (B) Same hypothetical GWAS and eQTL signatures after the all-but-one conditional eQTL analysis isolating the primary (red) and secondary (blue) eQTL signatures. Before conditional analysis there is a lack of co-localization between the GWAS signature and eQTL signature. After all-but-one conditional analysis, there is evidence for co-localization between the conditional (secondary) eQTL and GWAS signatures.
label	Figure 3
caption	All-but-One Conditional Analysis to Isolate Independent eQTL Signatures (A) Hypothetical GWAS signature (top, green) at a given locus and an overlapping hypothetical eQTL signature (bottom, purple), which comprises two independent eQTL. (B) Same hypothetical GWAS and eQTL signatures after the all-but-one conditional eQTL analysis isolating the primary (red) and secondary (blue) eQTL signatures. Before conditional analysis there is a lack of co-localization between the GWAS signature and eQTL signature. After all-but-one conditional analysis, there is evidence for co-localization between the conditional (secondary) eQTL and GWAS signatures.
p	All-but-One Conditional Analysis to Isolate Independent eQTL Signatures
p	(A) Hypothetical GWAS signature (top, green) at a given locus and an overlapping hypothetical eQTL signature (bottom, purple), which comprises two independent eQTL.
p	(B) Same hypothetical GWAS and eQTL signatures after the all-but-one conditional eQTL analysis isolating the primary (red) and secondary (blue) eQTL signatures. Before conditional analysis there is a lack of co-localization between the GWAS signature and eQTL signature. After all-but-one conditional analysis, there is evidence for co-localization between the conditional (secondary) eQTL and GWAS signatures.
figure	Figure 4 GWAS Signature for IREB2 Co-localizes with the Conditional eQTL Signature (A) P-P plots comparing −log10 p values from GWAS (y axes) and all-but-one conditional eQTL analysis (x axes), which show the highest correlation to be between the GWAS and the conditional eQTL rs7171869 (blue, bottom). (B) LocusZoom plots for the IREB2 locus, where the GWAS signal (top) more closely resembles the conditional eQTL signal (rs7171869, bottom) than the primary eQTL signal (rs11639224, third from top) or non-conditional eQTL signal (second from top). For all LocusZoom plots, LD is colored with respect to the GWAS lead SNP (rs8042374, labeled).
label	Figure 4
caption	GWAS Signature for IREB2 Co-localizes with the Conditional eQTL Signature (A) P-P plots comparing −log10 p values from GWAS (y axes) and all-but-one conditional eQTL analysis (x axes), which show the highest correlation to be between the GWAS and the conditional eQTL rs7171869 (blue, bottom). (B) LocusZoom plots for the IREB2 locus, where the GWAS signal (top) more closely resembles the conditional eQTL signal (rs7171869, bottom) than the primary eQTL signal (rs11639224, third from top) or non-conditional eQTL signal (second from top). For all LocusZoom plots, LD is colored with respect to the GWAS lead SNP (rs8042374, labeled).
p	GWAS Signature for IREB2 Co-localizes with the Conditional eQTL Signature
p	(A) P-P plots comparing −log10 p values from GWAS (y axes) and all-but-one conditional eQTL analysis (x axes), which show the highest correlation to be between the GWAS and the conditional eQTL rs7171869 (blue, bottom).
p	(B) LocusZoom plots for the IREB2 locus, where the GWAS signal (top) more closely resembles the conditional eQTL signal (rs7171869, bottom) than the primary eQTL signal (rs11639224, third from top) or non-conditional eQTL signal (second from top). For all LocusZoom plots, LD is colored with respect to the GWAS lead SNP (rs8042374, labeled).
p	We found that 40 loci contained genes with strong evidence of co-localization between eQTL and GWAS signatures, with posterior probability of H4 (PPH4) ≥ 0.8 (Table 2). When restricting to genome-wide significance for the GWAS, we found co-localization in 24 of the 108 loci. Given the correlations between number of independent eQTL and expression specificity scores (Tau) across tissues, cell types, and development, we tabulated the reported genes’ Tau percentiles and expression levels, to highlight contexts in which the genes are specifically expressed (Table 2, Table S8). We acknowledge that while posterior probability PPH4 ≥ 0.8 demonstrates strong Bayesian evidence for co-localization, it is an arbitrary threshold for characterizing loci as GWAS-eQTL co-localized; we find that many loci with PPH4 ≥ 0.5 appear qualitatively consistent with co-localization. Table 2 GWAS-eQTL Co-localized Loci Chr GWAS Locus Start GWAS Locus End GWAS Lead SNP GWAS p Value eSNP eSNP p Value Primary/Conditional PPH4 Gene Relevant Tissue/Cell Type/Developmental Period 1 2372401 2402501 rs4648845 4.03E−09 rs12037821 4.9E−04 conditional 0.87 SLC35E2 –/–/early mid-prenatal 1 8355697 8638984 rs301797 2.03E−09 rs138050288 1.8E−04 primary 0.95 RERE –/–/– 1 30412551 30443951 rs1498232 1.28E−09 rs2015244 1.8E−08 primary 0.99 PTPRU –/neurons /early mid-prenatal 1 163582923 163766623 rs7521492 5.64E−07 rs10799961 3.18E−11 primary 0.91 PBX1 –/–/early prenatal 1 205015255 205189455 rs16937 8.69E−07 rs12724651 7.31E−07 primary 0.89 TMEM81 –/neurons/– rs12031350 8.15E−06 conditional 0.87 RBBP5 –/–/– 1 214137889 214163689 rs7529073 9.69E−07 rs1431983 1.67E−04 conditional 0.93 PROX1-AS1 cerebellar hemisphere/neurons/adult 2 73194203 73900439 rs56145559 8.42E−08 rs11679809 1.85E−34 primary 0.86 ALMS1P testis/–/– 2 110262036 110398236 rs9330316 7.69E−08 rs892464 2.35E−26 primary 0.92 SEPT10 –/–/late prenatal 2 198148577 198835577 rs6434928 1.48E−11 rs12621129 6.06E−12 primary 0.94 SF3B1 –/–/– 2 200715237 201247789 rs281768 1.78E−14 rs35220450 3.46E−14 primary 0.95 FTCDNL1, AC073043.2 –/–/adult rs186546506 8.77E−04 conditional 0.83 LINC01792, AC007163.3 putamen (basal ganglia)/ –/adult 2 208371631 208531731 rs2709410 5.75E−07 rs34171849 5.86E−17 primary 0.88 METTL21A –/–/– rs2551656 2.85E−09 primary 0.86 CREB1 –/–/early prenatal 2 220033801 220071601 rs6707588 9.51E−07 rs13404754 1.08E−09 primary 0.92 CNPPD1 –/–/– 3 36843183 36945783 rs75968099 3.39E−12 rs9834970 1.88E−05 primary 0.94 DCLK3 nerve - tibial /neurons/infant 3 52281078 53539269 rs2535627 3.96E−11 rs6801235 2.81E−08 conditional 0.86 PPM1M –/neurons/late prenatal 3 63792650 64004050 rs832187 2.58E−08 rs113386200 1.95E−12 primary 0.98 THOC7 –/–/– 3 135807405 136615405 rs7432375 5.27E−11 rs10935184 7.71E−25 primary 0.93 PCCB –/–/– 4 170357552 170646052 rs10520163 1.02E−08 rs7438 1.02E−09 primary 0.97 CLCN3 –/–/– 5 45291475 46404116 rs1501357 1.24E−08 rs9292918 4.45E−05 primary 0.94 BRCAT54, RP11-53O19.1 –/–/adult 6 83779798 84407274 rs3798869 8.57E−10 rs2016358 1.19E−09 primary 0.90 SNAP91 cerebellar hemisphere/–/– 6 108875527 109019327 rs9398171 3.37E−08 rs111727905 3.84E−06 primary 0.97 ZNF259P1 –/–/early mid-prenatal 7 21485312 21545712 rs73060317 6.60E−07 rs141984481 3.59E−05 primary 0.92 SP4 –/–/early prenatal 8 8088038 10056127 rs2945232 2.03E−08 rs2980441 7.68E−69 primary 0.82 FAM86B3P –/–/adolescent 8 26181524 26279124 rs1042992 2.27E−07 rs17055186 3.06E−24 conditional 0.91 SDAD1P1 testis/–/adult 8 38020424 38310924 rs57709857 2.32E−07 rs201999919 1.70E−07 primary 0.88 WHSC1L1 –/–/early prenatal 8 144822546 144871746 rs11784536 1.83E−07 rs12541792 6.45E−35 primary 0.90 FAM83H esophagus - mucosa/oligodendrocytes/adolescent 9 26839508 26909408 rs10967586 4.75E−07 rs12345197 3.90E−06 primary 0.80 IFT74 –/–/– 11 46340213 46751213 rs7951870 1.97E−11 rs16938506 5.08E−05 primary 0.88 MDK –/–/early mid-prenatal 12 57428314 57497814 rs324017 2.13E−07 rs4559 2.02E−05 conditional 0.91 STAT6 –/microglia/adolescent 14 35421614 35847614 rs77477310 1.52E−07 rs1028449 8.09E−04 primary 0.84 RP11-85K15.2 –/–/– 15 78803032 78926732 rs8042374 1.87E−12 rs7171869 1.44E−04 conditional 0.94 IREB2 –/–/early prenatal 15 84661161 85153461 rs950169 7.62E−11 rs35677834 1.54E−34 primary 0.80 LOC101929479, RP11-561C5.3 ovary/–/early mid-prenatal 15 91416560 91436560 rs4702 2.30E−12 rs4702 4.49E−13 primary 1.00 FURIN –/endothelial cells/adolescent 16 4447751 4596451 rs6500602 2.79E−07 rs3747580 4.75E−16 primary 0.90 CORO7 –/–/– rs8046295 2.68E−11 primary 0.89 NMRAL1 –/–/– 16 29924377 30144877 rs12691307 1.30E−10 rs4788203 1.95E−05 primary 0.88 TMEM219 –/–/– rs3935873 7.46E−14 primary 0.87 INO80E –/neurons/– rs4787491 1.60E−04 conditional 0.82 DOC2A brain - cortex/neurons/adolescent 16 58669293 58691393 rs12325245 1.15E−08 rs11647976 4.83E−04 primary 0.94 CNOT1 –/–/– 17 17722402 18030202 rs8082590 6.84E−09 rs4072739 4.74E−13 primary 0.92 DRG2 –/–/– 19 11839736 11859736 rs72986630 4.64E−08 rs72986630 2.20E−14 primary 1.00 ZNF823 –/endothelial cells/early prenatal 19 19374022 19658022 rs2905426 6.92E−09 rs2965199 9.22E−36 primary 0.87 GATAD2A –/–/– 19 50067499 50135399 rs56873913 2.19E−07 rs5023763 9.32E−05 primary 0.93 SNRNP70 –/–/– 22 41408556 42689414 rs9607782 6.76E−12 rs200447424 1.87E−04 primary 0.96 RANGAP1 –/–/–
table-wrap	Table 2 GWAS-eQTL Co-localized Loci Chr GWAS Locus Start GWAS Locus End GWAS Lead SNP GWAS p Value eSNP eSNP p Value Primary/Conditional PPH4 Gene Relevant Tissue/Cell Type/Developmental Period 1 2372401 2402501 rs4648845 4.03E−09 rs12037821 4.9E−04 conditional 0.87 SLC35E2 –/–/early mid-prenatal 1 8355697 8638984 rs301797 2.03E−09 rs138050288 1.8E−04 primary 0.95 RERE –/–/– 1 30412551 30443951 rs1498232 1.28E−09 rs2015244 1.8E−08 primary 0.99 PTPRU –/neurons /early mid-prenatal 1 163582923 163766623 rs7521492 5.64E−07 rs10799961 3.18E−11 primary 0.91 PBX1 –/–/early prenatal 1 205015255 205189455 rs16937 8.69E−07 rs12724651 7.31E−07 primary 0.89 TMEM81 –/neurons/– rs12031350 8.15E−06 conditional 0.87 RBBP5 –/–/– 1 214137889 214163689 rs7529073 9.69E−07 rs1431983 1.67E−04 conditional 0.93 PROX1-AS1 cerebellar hemisphere/neurons/adult 2 73194203 73900439 rs56145559 8.42E−08 rs11679809 1.85E−34 primary 0.86 ALMS1P testis/–/– 2 110262036 110398236 rs9330316 7.69E−08 rs892464 2.35E−26 primary 0.92 SEPT10 –/–/late prenatal 2 198148577 198835577 rs6434928 1.48E−11 rs12621129 6.06E−12 primary 0.94 SF3B1 –/–/– 2 200715237 201247789 rs281768 1.78E−14 rs35220450 3.46E−14 primary 0.95 FTCDNL1, AC073043.2 –/–/adult rs186546506 8.77E−04 conditional 0.83 LINC01792, AC007163.3 putamen (basal ganglia)/ –/adult 2 208371631 208531731 rs2709410 5.75E−07 rs34171849 5.86E−17 primary 0.88 METTL21A –/–/– rs2551656 2.85E−09 primary 0.86 CREB1 –/–/early prenatal 2 220033801 220071601 rs6707588 9.51E−07 rs13404754 1.08E−09 primary 0.92 CNPPD1 –/–/– 3 36843183 36945783 rs75968099 3.39E−12 rs9834970 1.88E−05 primary 0.94 DCLK3 nerve - tibial /neurons/infant 3 52281078 53539269 rs2535627 3.96E−11 rs6801235 2.81E−08 conditional 0.86 PPM1M –/neurons/late prenatal 3 63792650 64004050 rs832187 2.58E−08 rs113386200 1.95E−12 primary 0.98 THOC7 –/–/– 3 135807405 136615405 rs7432375 5.27E−11 rs10935184 7.71E−25 primary 0.93 PCCB –/–/– 4 170357552 170646052 rs10520163 1.02E−08 rs7438 1.02E−09 primary 0.97 CLCN3 –/–/– 5 45291475 46404116 rs1501357 1.24E−08 rs9292918 4.45E−05 primary 0.94 BRCAT54, RP11-53O19.1 –/–/adult 6 83779798 84407274 rs3798869 8.57E−10 rs2016358 1.19E−09 primary 0.90 SNAP91 cerebellar hemisphere/–/– 6 108875527 109019327 rs9398171 3.37E−08 rs111727905 3.84E−06 primary 0.97 ZNF259P1 –/–/early mid-prenatal 7 21485312 21545712 rs73060317 6.60E−07 rs141984481 3.59E−05 primary 0.92 SP4 –/–/early prenatal 8 8088038 10056127 rs2945232 2.03E−08 rs2980441 7.68E−69 primary 0.82 FAM86B3P –/–/adolescent 8 26181524 26279124 rs1042992 2.27E−07 rs17055186 3.06E−24 conditional 0.91 SDAD1P1 testis/–/adult 8 38020424 38310924 rs57709857 2.32E−07 rs201999919 1.70E−07 primary 0.88 WHSC1L1 –/–/early prenatal 8 144822546 144871746 rs11784536 1.83E−07 rs12541792 6.45E−35 primary 0.90 FAM83H esophagus - mucosa/oligodendrocytes/adolescent 9 26839508 26909408 rs10967586 4.75E−07 rs12345197 3.90E−06 primary 0.80 IFT74 –/–/– 11 46340213 46751213 rs7951870 1.97E−11 rs16938506 5.08E−05 primary 0.88 MDK –/–/early mid-prenatal 12 57428314 57497814 rs324017 2.13E−07 rs4559 2.02E−05 conditional 0.91 STAT6 –/microglia/adolescent 14 35421614 35847614 rs77477310 1.52E−07 rs1028449 8.09E−04 primary 0.84 RP11-85K15.2 –/–/– 15 78803032 78926732 rs8042374 1.87E−12 rs7171869 1.44E−04 conditional 0.94 IREB2 –/–/early prenatal 15 84661161 85153461 rs950169 7.62E−11 rs35677834 1.54E−34 primary 0.80 LOC101929479, RP11-561C5.3 ovary/–/early mid-prenatal 15 91416560 91436560 rs4702 2.30E−12 rs4702 4.49E−13 primary 1.00 FURIN –/endothelial cells/adolescent 16 4447751 4596451 rs6500602 2.79E−07 rs3747580 4.75E−16 primary 0.90 CORO7 –/–/– rs8046295 2.68E−11 primary 0.89 NMRAL1 –/–/– 16 29924377 30144877 rs12691307 1.30E−10 rs4788203 1.95E−05 primary 0.88 TMEM219 –/–/– rs3935873 7.46E−14 primary 0.87 INO80E –/neurons/– rs4787491 1.60E−04 conditional 0.82 DOC2A brain - cortex/neurons/adolescent 16 58669293 58691393 rs12325245 1.15E−08 rs11647976 4.83E−04 primary 0.94 CNOT1 –/–/– 17 17722402 18030202 rs8082590 6.84E−09 rs4072739 4.74E−13 primary 0.92 DRG2 –/–/– 19 11839736 11859736 rs72986630 4.64E−08 rs72986630 2.20E−14 primary 1.00 ZNF823 –/endothelial cells/early prenatal 19 19374022 19658022 rs2905426 6.92E−09 rs2965199 9.22E−36 primary 0.87 GATAD2A –/–/– 19 50067499 50135399 rs56873913 2.19E−07 rs5023763 9.32E−05 primary 0.93 SNRNP70 –/–/– 22 41408556 42689414 rs9607782 6.76E−12 rs200447424 1.87E−04 primary 0.96 RANGAP1 –/–/–
label	Table 2
caption	GWAS-eQTL Co-localized Loci
p	GWAS-eQTL Co-localized Loci
table	Chr GWAS Locus Start GWAS Locus End GWAS Lead SNP GWAS p Value eSNP eSNP p Value Primary/Conditional PPH4 Gene Relevant Tissue/Cell Type/Developmental Period 1 2372401 2402501 rs4648845 4.03E−09 rs12037821 4.9E−04 conditional 0.87 SLC35E2 –/–/early mid-prenatal 1 8355697 8638984 rs301797 2.03E−09 rs138050288 1.8E−04 primary 0.95 RERE –/–/– 1 30412551 30443951 rs1498232 1.28E−09 rs2015244 1.8E−08 primary 0.99 PTPRU –/neurons /early mid-prenatal 1 163582923 163766623 rs7521492 5.64E−07 rs10799961 3.18E−11 primary 0.91 PBX1 –/–/early prenatal 1 205015255 205189455 rs16937 8.69E−07 rs12724651 7.31E−07 primary 0.89 TMEM81 –/neurons/– rs12031350 8.15E−06 conditional 0.87 RBBP5 –/–/– 1 214137889 214163689 rs7529073 9.69E−07 rs1431983 1.67E−04 conditional 0.93 PROX1-AS1 cerebellar hemisphere/neurons/adult 2 73194203 73900439 rs56145559 8.42E−08 rs11679809 1.85E−34 primary 0.86 ALMS1P testis/–/– 2 110262036 110398236 rs9330316 7.69E−08 rs892464 2.35E−26 primary 0.92 SEPT10 –/–/late prenatal 2 198148577 198835577 rs6434928 1.48E−11 rs12621129 6.06E−12 primary 0.94 SF3B1 –/–/– 2 200715237 201247789 rs281768 1.78E−14 rs35220450 3.46E−14 primary 0.95 FTCDNL1, AC073043.2 –/–/adult rs186546506 8.77E−04 conditional 0.83 LINC01792, AC007163.3 putamen (basal ganglia)/ –/adult 2 208371631 208531731 rs2709410 5.75E−07 rs34171849 5.86E−17 primary 0.88 METTL21A –/–/– rs2551656 2.85E−09 primary 0.86 CREB1 –/–/early prenatal 2 220033801 220071601 rs6707588 9.51E−07 rs13404754 1.08E−09 primary 0.92 CNPPD1 –/–/– 3 36843183 36945783 rs75968099 3.39E−12 rs9834970 1.88E−05 primary 0.94 DCLK3 nerve - tibial /neurons/infant 3 52281078 53539269 rs2535627 3.96E−11 rs6801235 2.81E−08 conditional 0.86 PPM1M –/neurons/late prenatal 3 63792650 64004050 rs832187 2.58E−08 rs113386200 1.95E−12 primary 0.98 THOC7 –/–/– 3 135807405 136615405 rs7432375 5.27E−11 rs10935184 7.71E−25 primary 0.93 PCCB –/–/– 4 170357552 170646052 rs10520163 1.02E−08 rs7438 1.02E−09 primary 0.97 CLCN3 –/–/– 5 45291475 46404116 rs1501357 1.24E−08 rs9292918 4.45E−05 primary 0.94 BRCAT54, RP11-53O19.1 –/–/adult 6 83779798 84407274 rs3798869 8.57E−10 rs2016358 1.19E−09 primary 0.90 SNAP91 cerebellar hemisphere/–/– 6 108875527 109019327 rs9398171 3.37E−08 rs111727905 3.84E−06 primary 0.97 ZNF259P1 –/–/early mid-prenatal 7 21485312 21545712 rs73060317 6.60E−07 rs141984481 3.59E−05 primary 0.92 SP4 –/–/early prenatal 8 8088038 10056127 rs2945232 2.03E−08 rs2980441 7.68E−69 primary 0.82 FAM86B3P –/–/adolescent 8 26181524 26279124 rs1042992 2.27E−07 rs17055186 3.06E−24 conditional 0.91 SDAD1P1 testis/–/adult 8 38020424 38310924 rs57709857 2.32E−07 rs201999919 1.70E−07 primary 0.88 WHSC1L1 –/–/early prenatal 8 144822546 144871746 rs11784536 1.83E−07 rs12541792 6.45E−35 primary 0.90 FAM83H esophagus - mucosa/oligodendrocytes/adolescent 9 26839508 26909408 rs10967586 4.75E−07 rs12345197 3.90E−06 primary 0.80 IFT74 –/–/– 11 46340213 46751213 rs7951870 1.97E−11 rs16938506 5.08E−05 primary 0.88 MDK –/–/early mid-prenatal 12 57428314 57497814 rs324017 2.13E−07 rs4559 2.02E−05 conditional 0.91 STAT6 –/microglia/adolescent 14 35421614 35847614 rs77477310 1.52E−07 rs1028449 8.09E−04 primary 0.84 RP11-85K15.2 –/–/– 15 78803032 78926732 rs8042374 1.87E−12 rs7171869 1.44E−04 conditional 0.94 IREB2 –/–/early prenatal 15 84661161 85153461 rs950169 7.62E−11 rs35677834 1.54E−34 primary 0.80 LOC101929479, RP11-561C5.3 ovary/–/early mid-prenatal 15 91416560 91436560 rs4702 2.30E−12 rs4702 4.49E−13 primary 1.00 FURIN –/endothelial cells/adolescent 16 4447751 4596451 rs6500602 2.79E−07 rs3747580 4.75E−16 primary 0.90 CORO7 –/–/– rs8046295 2.68E−11 primary 0.89 NMRAL1 –/–/– 16 29924377 30144877 rs12691307 1.30E−10 rs4788203 1.95E−05 primary 0.88 TMEM219 –/–/– rs3935873 7.46E−14 primary 0.87 INO80E –/neurons/– rs4787491 1.60E−04 conditional 0.82 DOC2A brain - cortex/neurons/adolescent 16 58669293 58691393 rs12325245 1.15E−08 rs11647976 4.83E−04 primary 0.94 CNOT1 –/–/– 17 17722402 18030202 rs8082590 6.84E−09 rs4072739 4.74E−13 primary 0.92 DRG2 –/–/– 19 11839736 11859736 rs72986630 4.64E−08 rs72986630 2.20E−14 primary 1.00 ZNF823 –/endothelial cells/early prenatal 19 19374022 19658022 rs2905426 6.92E−09 rs2965199 9.22E−36 primary 0.87 GATAD2A –/–/– 19 50067499 50135399 rs56873913 2.19E−07 rs5023763 9.32E−05 primary 0.93 SNRNP70 –/–/– 22 41408556 42689414 rs9607782 6.76E−12 rs200447424 1.87E−04 primary 0.96 RANGAP1 –/–/–
tr	Chr GWAS Locus Start GWAS Locus End GWAS Lead SNP GWAS p Value eSNP eSNP p Value Primary/Conditional PPH4 Gene Relevant Tissue/Cell Type/Developmental Period
th	Chr
th	GWAS Locus Start
th	GWAS Locus End
th	GWAS Lead SNP
th	GWAS p Value
th	eSNP
th	eSNP p Value
th	Primary/Conditional
th	PPH4
th	Gene
th	Relevant Tissue/Cell Type/Developmental Period
tr	1 2372401 2402501 rs4648845 4.03E−09 rs12037821 4.9E−04 conditional 0.87 SLC35E2 –/–/early mid-prenatal
td	1
td	2372401
td	2402501
td	rs4648845
td	4.03E−09
td	rs12037821
td	4.9E−04
td	conditional
td	0.87
td	SLC35E2
td	–/–/early mid-prenatal
tr	1 8355697 8638984 rs301797 2.03E−09 rs138050288 1.8E−04 primary 0.95 RERE –/–/–
td	1
td	8355697
td	8638984
td	rs301797
td	2.03E−09
td	rs138050288
td	1.8E−04
td	primary
td	0.95
td	RERE
td	–/–/–
tr	1 30412551 30443951 rs1498232 1.28E−09 rs2015244 1.8E−08 primary 0.99 PTPRU –/neurons /early mid-prenatal
td	1
td	30412551
td	30443951
td	rs1498232
td	1.28E−09
td	rs2015244
td	1.8E−08
td	primary
td	0.99
td	PTPRU
td	–/neurons /early mid-prenatal
tr	1 163582923 163766623 rs7521492 5.64E−07 rs10799961 3.18E−11 primary 0.91 PBX1 –/–/early prenatal
td	1
td	163582923
td	163766623
td	rs7521492
td	5.64E−07
td	rs10799961
td	3.18E−11
td	primary
td	0.91
td	PBX1
td	–/–/early prenatal
tr	1 205015255 205189455 rs16937 8.69E−07 rs12724651 7.31E−07 primary 0.89 TMEM81 –/neurons/–
td	1
td	205015255
td	205189455
td	rs16937
td	8.69E−07
td	rs12724651
td	7.31E−07
td	primary
td	0.89
td	TMEM81
td	–/neurons/–
tr	rs12031350 8.15E−06 conditional 0.87 RBBP5 –/–/–
td	rs12031350
td	8.15E−06
td	conditional
td	0.87
td	RBBP5
td	–/–/–
tr	1 214137889 214163689 rs7529073 9.69E−07 rs1431983 1.67E−04 conditional 0.93 PROX1-AS1 cerebellar hemisphere/neurons/adult
td	1
td	214137889
td	214163689
td	rs7529073
td	9.69E−07
td	rs1431983
td	1.67E−04
td	conditional
td	0.93
td	PROX1-AS1
td	cerebellar hemisphere/neurons/adult
tr	2 73194203 73900439 rs56145559 8.42E−08 rs11679809 1.85E−34 primary 0.86 ALMS1P testis/–/–
td	2
td	73194203
td	73900439
td	rs56145559
td	8.42E−08
td	rs11679809
td	1.85E−34
td	primary
td	0.86
td	ALMS1P
td	testis/–/–
tr	2 110262036 110398236 rs9330316 7.69E−08 rs892464 2.35E−26 primary 0.92 SEPT10 –/–/late prenatal
td	2
td	110262036
td	110398236
td	rs9330316
td	7.69E−08
td	rs892464
td	2.35E−26
td	primary
td	0.92
td	SEPT10
td	–/–/late prenatal
tr	2 198148577 198835577 rs6434928 1.48E−11 rs12621129 6.06E−12 primary 0.94 SF3B1 –/–/–
td	2
td	198148577
td	198835577
td	rs6434928
td	1.48E−11
td	rs12621129
td	6.06E−12
td	primary
td	0.94
td	SF3B1
td	–/–/–
tr	2 200715237 201247789 rs281768 1.78E−14 rs35220450 3.46E−14 primary 0.95 FTCDNL1, AC073043.2 –/–/adult
td	2
td	200715237
td	201247789
td	rs281768
td	1.78E−14
td	rs35220450
td	3.46E−14
td	primary
td	0.95
td	FTCDNL1, AC073043.2
td	–/–/adult
tr	rs186546506 8.77E−04 conditional 0.83 LINC01792, AC007163.3 putamen (basal ganglia)/ –/adult
td	rs186546506
td	8.77E−04
td	conditional
td	0.83
td	LINC01792, AC007163.3
td	putamen (basal ganglia)/ –/adult
tr	2 208371631 208531731 rs2709410 5.75E−07 rs34171849 5.86E−17 primary 0.88 METTL21A –/–/–
td	2
td	208371631
td	208531731
td	rs2709410
td	5.75E−07
td	rs34171849
td	5.86E−17
td	primary
td	0.88
td	METTL21A
td	–/–/–
tr	rs2551656 2.85E−09 primary 0.86 CREB1 –/–/early prenatal
td	rs2551656
td	2.85E−09
td	primary
td	0.86
td	CREB1
td	–/–/early prenatal
tr	2 220033801 220071601 rs6707588 9.51E−07 rs13404754 1.08E−09 primary 0.92 CNPPD1 –/–/–
td	2
td	220033801
td	220071601
td	rs6707588
td	9.51E−07
td	rs13404754
td	1.08E−09
td	primary
td	0.92
td	CNPPD1
td	–/–/–
tr	3 36843183 36945783 rs75968099 3.39E−12 rs9834970 1.88E−05 primary 0.94 DCLK3 nerve - tibial /neurons/infant
td	3
td	36843183
td	36945783
td	rs75968099
td	3.39E−12
td	rs9834970
td	1.88E−05
td	primary
td	0.94
td	DCLK3
td	nerve - tibial /neurons/infant
tr	3 52281078 53539269 rs2535627 3.96E−11 rs6801235 2.81E−08 conditional 0.86 PPM1M –/neurons/late prenatal
td	3
td	52281078
td	53539269
td	rs2535627
td	3.96E−11
td	rs6801235
td	2.81E−08
td	conditional
td	0.86
td	PPM1M
td	–/neurons/late prenatal
tr	3 63792650 64004050 rs832187 2.58E−08 rs113386200 1.95E−12 primary 0.98 THOC7 –/–/–
td	3
td	63792650
td	64004050
td	rs832187
td	2.58E−08
td	rs113386200
td	1.95E−12
td	primary
td	0.98
td	THOC7
td	–/–/–
tr	3 135807405 136615405 rs7432375 5.27E−11 rs10935184 7.71E−25 primary 0.93 PCCB –/–/–
td	3
td	135807405
td	136615405
td	rs7432375
td	5.27E−11
td	rs10935184
td	7.71E−25
td	primary
td	0.93
td	PCCB
td	–/–/–
tr	4 170357552 170646052 rs10520163 1.02E−08 rs7438 1.02E−09 primary 0.97 CLCN3 –/–/–
td	4
td	170357552
td	170646052
td	rs10520163
td	1.02E−08
td	rs7438
td	1.02E−09
td	primary
td	0.97
td	CLCN3
td	–/–/–
tr	5 45291475 46404116 rs1501357 1.24E−08 rs9292918 4.45E−05 primary 0.94 BRCAT54, RP11-53O19.1 –/–/adult
td	5
td	45291475
td	46404116
td	rs1501357
td	1.24E−08
td	rs9292918
td	4.45E−05
td	primary
td	0.94
td	BRCAT54, RP11-53O19.1
td	–/–/adult
tr	6 83779798 84407274 rs3798869 8.57E−10 rs2016358 1.19E−09 primary 0.90 SNAP91 cerebellar hemisphere/–/–
td	6
td	83779798
td	84407274
td	rs3798869
td	8.57E−10
td	rs2016358
td	1.19E−09
td	primary
td	0.90
td	SNAP91
td	cerebellar hemisphere/–/–
tr	6 108875527 109019327 rs9398171 3.37E−08 rs111727905 3.84E−06 primary 0.97 ZNF259P1 –/–/early mid-prenatal
td	6
td	108875527
td	109019327
td	rs9398171
td	3.37E−08
td	rs111727905
td	3.84E−06
td	primary
td	0.97
td	ZNF259P1
td	–/–/early mid-prenatal
tr	7 21485312 21545712 rs73060317 6.60E−07 rs141984481 3.59E−05 primary 0.92 SP4 –/–/early prenatal
td	7
td	21485312
td	21545712
td	rs73060317
td	6.60E−07
td	rs141984481
td	3.59E−05
td	primary
td	0.92
td	SP4
td	–/–/early prenatal
tr	8 8088038 10056127 rs2945232 2.03E−08 rs2980441 7.68E−69 primary 0.82 FAM86B3P –/–/adolescent
td	8
td	8088038
td	10056127
td	rs2945232
td	2.03E−08
td	rs2980441
td	7.68E−69
td	primary
td	0.82
td	FAM86B3P
td	–/–/adolescent
tr	8 26181524 26279124 rs1042992 2.27E−07 rs17055186 3.06E−24 conditional 0.91 SDAD1P1 testis/–/adult
td	8
td	26181524
td	26279124
td	rs1042992
td	2.27E−07
td	rs17055186
td	3.06E−24
td	conditional
td	0.91
td	SDAD1P1
td	testis/–/adult
tr	8 38020424 38310924 rs57709857 2.32E−07 rs201999919 1.70E−07 primary 0.88 WHSC1L1 –/–/early prenatal
td	8
td	38020424
td	38310924
td	rs57709857
td	2.32E−07
td	rs201999919
td	1.70E−07
td	primary
td	0.88
td	WHSC1L1
td	–/–/early prenatal
tr	8 144822546 144871746 rs11784536 1.83E−07 rs12541792 6.45E−35 primary 0.90 FAM83H esophagus - mucosa/oligodendrocytes/adolescent
td	8
td	144822546
td	144871746
td	rs11784536
td	1.83E−07
td	rs12541792
td	6.45E−35
td	primary
td	0.90
td	FAM83H
td	esophagus - mucosa/oligodendrocytes/adolescent
tr	9 26839508 26909408 rs10967586 4.75E−07 rs12345197 3.90E−06 primary 0.80 IFT74 –/–/–
td	9
td	26839508
td	26909408
td	rs10967586
td	4.75E−07
td	rs12345197
td	3.90E−06
td	primary
td	0.80
td	IFT74
td	–/–/–
tr	11 46340213 46751213 rs7951870 1.97E−11 rs16938506 5.08E−05 primary 0.88 MDK –/–/early mid-prenatal
td	11
td	46340213
td	46751213
td	rs7951870
td	1.97E−11
td	rs16938506
td	5.08E−05
td	primary
td	0.88
td	MDK
td	–/–/early mid-prenatal
tr	12 57428314 57497814 rs324017 2.13E−07 rs4559 2.02E−05 conditional 0.91 STAT6 –/microglia/adolescent
td	12
td	57428314
td	57497814
td	rs324017
td	2.13E−07
td	rs4559
td	2.02E−05
td	conditional
td	0.91
td	STAT6
td	–/microglia/adolescent
tr	14 35421614 35847614 rs77477310 1.52E−07 rs1028449 8.09E−04 primary 0.84 RP11-85K15.2 –/–/–
td	14
td	35421614
td	35847614
td	rs77477310
td	1.52E−07
td	rs1028449
td	8.09E−04
td	primary
td	0.84
td	RP11-85K15.2
td	–/–/–
tr	15 78803032 78926732 rs8042374 1.87E−12 rs7171869 1.44E−04 conditional 0.94 IREB2 –/–/early prenatal
td	15
td	78803032
td	78926732
td	rs8042374
td	1.87E−12
td	rs7171869
td	1.44E−04
td	conditional
td	0.94
td	IREB2
td	–/–/early prenatal
tr	15 84661161 85153461 rs950169 7.62E−11 rs35677834 1.54E−34 primary 0.80 LOC101929479, RP11-561C5.3 ovary/–/early mid-prenatal
td	15
td	84661161
td	85153461
td	rs950169
td	7.62E−11
td	rs35677834
td	1.54E−34
td	primary
td	0.80
td	LOC101929479, RP11-561C5.3
td	ovary/–/early mid-prenatal
tr	15 91416560 91436560 rs4702 2.30E−12 rs4702 4.49E−13 primary 1.00 FURIN –/endothelial cells/adolescent
td	15
td	91416560
td	91436560
td	rs4702
td	2.30E−12
td	rs4702
td	4.49E−13
td	primary
td	1.00
td	FURIN
td	–/endothelial cells/adolescent
tr	16 4447751 4596451 rs6500602 2.79E−07 rs3747580 4.75E−16 primary 0.90 CORO7 –/–/–
td	16
td	4447751
td	4596451
td	rs6500602
td	2.79E−07
td	rs3747580
td	4.75E−16
td	primary
td	0.90
td	CORO7
td	–/–/–
tr	rs8046295 2.68E−11 primary 0.89 NMRAL1 –/–/–
td	rs8046295
td	2.68E−11
td	primary
td	0.89
td	NMRAL1
td	–/–/–
tr	16 29924377 30144877 rs12691307 1.30E−10 rs4788203 1.95E−05 primary 0.88 TMEM219 –/–/–
td	16
td	29924377
td	30144877
td	rs12691307
td	1.30E−10
td	rs4788203
td	1.95E−05
td	primary
td	0.88
td	TMEM219
td	–/–/–
tr	rs3935873 7.46E−14 primary 0.87 INO80E –/neurons/–
td	rs3935873
td	7.46E−14
td	primary
td	0.87
td	INO80E
td	–/neurons/–
tr	rs4787491 1.60E−04 conditional 0.82 DOC2A brain - cortex/neurons/adolescent
td	rs4787491
td	1.60E−04
td	conditional
td	0.82
td	DOC2A
td	brain - cortex/neurons/adolescent
tr	16 58669293 58691393 rs12325245 1.15E−08 rs11647976 4.83E−04 primary 0.94 CNOT1 –/–/–
td	16
td	58669293
td	58691393
td	rs12325245
td	1.15E−08
td	rs11647976
td	4.83E−04
td	primary
td	0.94
td	CNOT1
td	–/–/–
tr	17 17722402 18030202 rs8082590 6.84E−09 rs4072739 4.74E−13 primary 0.92 DRG2 –/–/–
td	17
td	17722402
td	18030202
td	rs8082590
td	6.84E−09
td	rs4072739
td	4.74E−13
td	primary
td	0.92
td	DRG2
td	–/–/–
tr	19 11839736 11859736 rs72986630 4.64E−08 rs72986630 2.20E−14 primary 1.00 ZNF823 –/endothelial cells/early prenatal
td	19
td	11839736
td	11859736
td	rs72986630
td	4.64E−08
td	rs72986630
td	2.20E−14
td	primary
td	1.00
td	ZNF823
td	–/endothelial cells/early prenatal
tr	19 19374022 19658022 rs2905426 6.92E−09 rs2965199 9.22E−36 primary 0.87 GATAD2A –/–/–
td	19
td	19374022
td	19658022
td	rs2905426
td	6.92E−09
td	rs2965199
td	9.22E−36
td	primary
td	0.87
td	GATAD2A
td	–/–/–
tr	19 50067499 50135399 rs56873913 2.19E−07 rs5023763 9.32E−05 primary 0.93 SNRNP70 –/–/–
td	19
td	50067499
td	50135399
td	rs56873913
td	2.19E−07
td	rs5023763
td	9.32E−05
td	primary
td	0.93
td	SNRNP70
td	–/–/–
tr	22 41408556 42689414 rs9607782 6.76E−12 rs200447424 1.87E−04 primary 0.96 RANGAP1 –/–/–
td	22
td	41408556
td	42689414
td	rs9607782
td	6.76E−12
td	rs200447424
td	1.87E−04
td	primary
td	0.96
td	RANGAP1
td	–/–/–
p	Importantly, for 6 of the 40 co-localizing loci, a conditional rather than primary eQTL co-localized with the GWAS with compelling qualitative support (Table 2, Figure 4, Table S11, Figures S13–S17). The genes showing strong evidence for conditional eQTL co-localization include SLC35E2, PROX1-AS1 (MIM: 601546), PPM1M (MIM: 608979), SDAD1P1, STAT6 (MIM: 601512), and IREB2. Also notable are the occurrences of complex patterns of co-localization for some loci; for example, three loci showed evidence for co-localization with a primary eQTL for one gene and a conditional eQTL for another.
sec	Comparison with Previous Co-localization Analyses In the prior CMC study, a GWAS-eQTL co-localization analysis implemented in Sherlock and using non-conditional eQTL summary statistics reported a total of 18 co-localized loci, representing 17% of the 108 genome-wide significant loci examined. Through our all-but-one conditional co-localization analysis, we replicate the majority of their findings and detect an additional 13 instances of co-localization, bringing the total number of co-localizations when considering only the genome-wide significant (and not including the MHC) loci up to 24 (representing 22% of these 108 loci) (Table S9). These 13 comprise instances of conditional eQTL co-localization (for genes SLC35E2 and IREB2) and improved detection of primary eQTL co-localization due to isolation of independent eQTL signatures and our choice of co-localization software (coloc2). Of the six co-localized loci identified in the previous but not current analysis, three resulted from differences in study design such as GWAS locus definition and eQTL overlap criteria, and two were suggestive in the current analysis (0.65 < PPH4 < 0.8). The one remaining discrepant locus (chr8:143302933–143403527) was found to co-localize with TSNARE1 eQTL previously (Sherlock p = 8.24 × 10−7) but not here (coloc2 primary eQTL PPH4 = 0.074, PPH3 = 0.93). A qualitative comparison of the eQTL and GWAS data (Figure S18) did not appear to support co-localization; while the strongest GWAS association and the strongest eQTL are in close physical proximity, the LD between the two index SNPs is low (r2∼0.2–0.4). Additionally, our attempts to disentangle independent eQTL signal via conditional analysis do not reveal the GWAS index SNP to be in high LD with any of the conditionally independent eQTL peaks. We also compared our conditional co-localization results with those from non-conditional eQTL analysis, using coloc2 and the same SCZ GWAS loci (Table S10). Conditional and non-conditional coloc2 results were highly concordant, with slightly higher PPH4s resulting from the same WABFs due to a higher prior probability of co-localization estimated in the non-conditional coloc2 analysis. Thirty-five loci were co-localized in both analyses; five loci that were co-localized in the non-conditional analysis only were highly suggestive in the conditional analysis (0.65 < PPH4 < 0.8), and the five loci that were co-localized only in the conditional coloc2 analysis involved conditional eQTL, emphasizing the utility of the conditional analysis. This conditional eQTL co-localization represents a substantial proportion (∼15%) of all instances of co-localization, and furthermore could reflect context-specific differential expression that has the potential to implicate cell types, tissue types, and developmental stages that are relevant to disease etiology.
title	Comparison with Previous Co-localization Analyses
p	In the prior CMC study, a GWAS-eQTL co-localization analysis implemented in Sherlock and using non-conditional eQTL summary statistics reported a total of 18 co-localized loci, representing 17% of the 108 genome-wide significant loci examined. Through our all-but-one conditional co-localization analysis, we replicate the majority of their findings and detect an additional 13 instances of co-localization, bringing the total number of co-localizations when considering only the genome-wide significant (and not including the MHC) loci up to 24 (representing 22% of these 108 loci) (Table S9). These 13 comprise instances of conditional eQTL co-localization (for genes SLC35E2 and IREB2) and improved detection of primary eQTL co-localization due to isolation of independent eQTL signatures and our choice of co-localization software (coloc2). Of the six co-localized loci identified in the previous but not current analysis, three resulted from differences in study design such as GWAS locus definition and eQTL overlap criteria, and two were suggestive in the current analysis (0.65 < PPH4 < 0.8). The one remaining discrepant locus (chr8:143302933–143403527) was found to co-localize with TSNARE1 eQTL previously (Sherlock p = 8.24 × 10−7) but not here (coloc2 primary eQTL PPH4 = 0.074, PPH3 = 0.93). A qualitative comparison of the eQTL and GWAS data (Figure S18) did not appear to support co-localization; while the strongest GWAS association and the strongest eQTL are in close physical proximity, the LD between the two index SNPs is low (r2∼0.2–0.4). Additionally, our attempts to disentangle independent eQTL signal via conditional analysis do not reveal the GWAS index SNP to be in high LD with any of the conditionally independent eQTL peaks.
p	We also compared our conditional co-localization results with those from non-conditional eQTL analysis, using coloc2 and the same SCZ GWAS loci (Table S10). Conditional and non-conditional coloc2 results were highly concordant, with slightly higher PPH4s resulting from the same WABFs due to a higher prior probability of co-localization estimated in the non-conditional coloc2 analysis. Thirty-five loci were co-localized in both analyses; five loci that were co-localized in the non-conditional analysis only were highly suggestive in the conditional analysis (0.65 < PPH4 < 0.8), and the five loci that were co-localized only in the conditional coloc2 analysis involved conditional eQTL, emphasizing the utility of the conditional analysis. This conditional eQTL co-localization represents a substantial proportion (∼15%) of all instances of co-localization, and furthermore could reflect context-specific differential expression that has the potential to implicate cell types, tissue types, and developmental stages that are relevant to disease etiology.
sec	Discussion We utilized genotype and expression data from 467 human post-mortem brain samples from the DLPFC to conduct eQTL mapping analyses, to characterize both primary and conditional eQTL. We then identified co-localization between SCZ GWAS and eQTL association signals, comprising both primary and conditional eQTL. Our principal findings include four major observations. First, we detect that conditional eQTL are widespread in the brain tissue samples we investigated. In 63% of genes with at least one eQTL, we found multiple statistically independent eQTL (representing 8,136 genes). In addition, conditional eQTL make substantial contributions to regulatory genetic variation, as there is a strong association between eQTL number and gene expression cis-SNP-heritability. This demonstrates that genetic variation affecting RNA abundance is incompletely characterized by focusing on only one primary eQTL per gene, which is the case currently for most eQTL studies. Second, we find the genomics of conditional eQTL and their genes are consistent with complex, context-specific regulation of gene expression, which may be conferred through overlap with distal regulatory elements. Genes with more independent eQTL tend to be larger and span multiple recombination hotspot intervals, and tend to be less constrained at the protein level. While these associations may reflect in part greater power to detect independent eQTL that are not in linkage disequilibrium and explain more phenotypic variance, they are also consistent with more complex regulation and greater potential for regulatory genetic variation. Context-specific genetic regulation of expression could manifest as conditional eQTL signal in the analysis of expression from a heterogeneous source. For example, eQTL in naive and stimulated (LPS, IFN) monocytes55 may occur as either primary or conditional eQTL in our CMC data, due to related microglial cells being present in brain tissue homogenate. We found that 60 stimulation-specific eQTL (FDR < 0.01 in interferon or lipopolysaccharide stimulated monocytes, but FDR ≥ 0.05 in naive monocytes) were also conditional eQTL in DLPFC. Notably, rs7171787, a conditional (tertiary) eQTL in our DLPFC analysis, is a stimulation-specific monocyte eQTL for the neurodevelopmental56, 57, 58 gene CYFIP1. In our data, associations with specificity of expression across tissues, developmental periods, and cell types determined from single-cell RNA-sequencing data suggest that context specificity plays a role in the occurrence of multiple statistically independent eQTL. Cell type specificity is particularly strongly correlated with eQTL number, consistent with those cell types being present in the current tissue homogenate data. Since previous studies have shown the importance of developmental59, 60, 61, 62 or cell-specific contributions61, 63, 64, 65, 66 to schizophrenia, interrogation of independent eQTL effects may elucidate developmental or tissue-specific effects obscured in whole-tissue eQTL studies. This context specificity of expression regulation is potentially mediated through overlap of eSNPs with distal regulatory elements, such as enhancers. Conditional eQTL occur farther from transcription start sites than primary eQTL, consistent with effects on enhancers. In addition, while both primary and conditional eQTL are enriched in both active promoter and enhancer regions, their enrichment in active promoters diminishes with increasing conditional eQTL order. In other words, conditional eQTL show greater enrichment in enhancers relative to promoters than do primary eQTL. Third, we have identified a number of candidate genes for which genetic variation for expression co-localizes with genetic variation for schizophrenia risk (Table 2), including cases of co-localization with conditional eQTL. Genetic co-localization is expected if gene expression causally mediates disease risk, although we recognize that co-localization could also result from pleiotropy or linkage, particularly in regions of extensive linkage disequilibrium and haplotype structure.40, 67 We also note that several co-localization methods have recently been developed,37, 38, 40, 41, 42 and direct comparisons have found broad concordance among these methods and a high degree of specificity of positive results using coloc.42, 45, 46 However, some differences in results would likely be achieved using alternative co-localization methods. Our analyses prioritize 27 genes within 24 genome-wide significant (GWAS p < 5 × 10−8) SCZ loci and 19 genes in 17 suggestive (p < 1 × 10−6) loci. In addition to a number of previously implicated SCZ risk genes, our findings include several genes not previously considered as candidates,35 in some cases—e.g., SLC35E2, PTPRU (MIM: 602454), LINC01792, DCLK3, PPM1M, LOC101929479—because the genes themselves do not overlap the GWAS locus regions but their eQTL do. In examining these genes for expression specificity in GTEx tissues, brain sample cell types from single-cell RNA-seq,29 and in BrainSpan DLPFC developmental periods (Tables 2 and S8), we find their expression contexts show a diversity of patterns and can provide clues to generate specific hypotheses for functional follow-up of their potential roles in SCZ. Interestingly, genes broadly expressed across cell types tend to show prenatal expression. Fourth, we highlight the importance of examining conditional eQTL for co-localization with GWASs. In at least 6 out of 40 loci showing GWAS-eQTL co-localization, a conditional eQTL signal co-localizes with SCZ risk. This is likely to be a conservative estimate, as the smaller effect sizes of conditional eQTL results in bias against detection of conditional GWAS-eQTL co-localization. If we had considered only primary eQTL in the analyses, these instances of co-localization would not have been identified. Among our highlighted conditional eQTL-GWAS co-localized genes are IREB2, STAT6, and PROX1-AS1. IREB2 (iron regulatory element binding protein 2) is a key regulator of iron homeostasis68, 69 that has been previously implicated in neurodegenerative disorders.70, 71 Mouse IREB2 homolog Irp2 knockouts exhibit impairments in coordination and balance, exploration, and nociception.69 The immune-related transcription factor STAT6 induces interleukin 4 (IL-4)-mediated anti-apoptotic activity of T helper cells, and the locus is associated with migraine72, 73 and brain glioma74 as well as several immune/inflammatory diseases.75, 76, 77 STAT6 also activates neuronal progenitor/stem cells and neurogenesis,78 making it intriguing as an immune-related SCZ candidate given recent observations about the role of complement factor 4 (C4) gene as a SCZ risk gene79 and prior work potentially implicating microglia.80 Consistent with a role in immune-mediated synaptic pruning, STAT6 expression is broadly postnatal and shows specificity for microglia (Table S8). PROX1-AS1 encodes a lncRNA that has been implicated as aberrantly expressed in several cancers, is upregulated in the cell cycle S-phase, and promotes G1/S transition in cell culture.81 As a potential regulator of the Prospero Homeobox 1 (PROX1) transcription factor, it could be involved in development and cell differentiation in several tissues, including oligodendrocytes82 and GABAnergic interneurons83 in the brain. PROX1-AS1 expression is specific to neurons and mature oligodendrocytes and is expressed postnatally (Table S8). In conclusion, we find that conditional eQTL are widespread and are consistent with complex and context-specific regulation. Accounting for conditional eQTL leads to new findings of GWAS-eQTL co-localization and generates specific hypotheses for the role of gene expression regulation in disease etiology. The analytical strategy presented here could be implemented as a means of identification of putatively causal genes for any phenotype in which GWAS summary statistics and expression and genotype data from the GWAS phenotype-relevant tissue are available. Conditional eQTL that co-localize with disease risk may reflect regulatory mechanisms that are important in a key developmental period or individual cell type and may be missed when focusing on primary eQTL discovered in adult whole tissue. As further efforts are made to generate data across ranges of tissues or individual cell types, we may have a better ability to directly identify regulatory variants specific to these contexts. However, if a variant is primarily active in a very specific time point or stimulus condition, capturing data reflecting this condition will remain challenging. Conditional co-localization analysis in well-powered eQTL cohorts may best identify the genes driving these trait associations, though further validation work will be required to understand the mechanism by which the gene contributes to disease risk.
title	Discussion
p	We utilized genotype and expression data from 467 human post-mortem brain samples from the DLPFC to conduct eQTL mapping analyses, to characterize both primary and conditional eQTL. We then identified co-localization between SCZ GWAS and eQTL association signals, comprising both primary and conditional eQTL. Our principal findings include four major observations. First, we detect that conditional eQTL are widespread in the brain tissue samples we investigated. In 63% of genes with at least one eQTL, we found multiple statistically independent eQTL (representing 8,136 genes). In addition, conditional eQTL make substantial contributions to regulatory genetic variation, as there is a strong association between eQTL number and gene expression cis-SNP-heritability. This demonstrates that genetic variation affecting RNA abundance is incompletely characterized by focusing on only one primary eQTL per gene, which is the case currently for most eQTL studies.
p	Second, we find the genomics of conditional eQTL and their genes are consistent with complex, context-specific regulation of gene expression, which may be conferred through overlap with distal regulatory elements. Genes with more independent eQTL tend to be larger and span multiple recombination hotspot intervals, and tend to be less constrained at the protein level. While these associations may reflect in part greater power to detect independent eQTL that are not in linkage disequilibrium and explain more phenotypic variance, they are also consistent with more complex regulation and greater potential for regulatory genetic variation. Context-specific genetic regulation of expression could manifest as conditional eQTL signal in the analysis of expression from a heterogeneous source. For example, eQTL in naive and stimulated (LPS, IFN) monocytes55 may occur as either primary or conditional eQTL in our CMC data, due to related microglial cells being present in brain tissue homogenate. We found that 60 stimulation-specific eQTL (FDR < 0.01 in interferon or lipopolysaccharide stimulated monocytes, but FDR ≥ 0.05 in naive monocytes) were also conditional eQTL in DLPFC. Notably, rs7171787, a conditional (tertiary) eQTL in our DLPFC analysis, is a stimulation-specific monocyte eQTL for the neurodevelopmental56, 57, 58 gene CYFIP1. In our data, associations with specificity of expression across tissues, developmental periods, and cell types determined from single-cell RNA-sequencing data suggest that context specificity plays a role in the occurrence of multiple statistically independent eQTL. Cell type specificity is particularly strongly correlated with eQTL number, consistent with those cell types being present in the current tissue homogenate data. Since previous studies have shown the importance of developmental59, 60, 61, 62 or cell-specific contributions61, 63, 64, 65, 66 to schizophrenia, interrogation of independent eQTL effects may elucidate developmental or tissue-specific effects obscured in whole-tissue eQTL studies.
p	This context specificity of expression regulation is potentially mediated through overlap of eSNPs with distal regulatory elements, such as enhancers. Conditional eQTL occur farther from transcription start sites than primary eQTL, consistent with effects on enhancers. In addition, while both primary and conditional eQTL are enriched in both active promoter and enhancer regions, their enrichment in active promoters diminishes with increasing conditional eQTL order. In other words, conditional eQTL show greater enrichment in enhancers relative to promoters than do primary eQTL.
p	Third, we have identified a number of candidate genes for which genetic variation for expression co-localizes with genetic variation for schizophrenia risk (Table 2), including cases of co-localization with conditional eQTL. Genetic co-localization is expected if gene expression causally mediates disease risk, although we recognize that co-localization could also result from pleiotropy or linkage, particularly in regions of extensive linkage disequilibrium and haplotype structure.40, 67 We also note that several co-localization methods have recently been developed,37, 38, 40, 41, 42 and direct comparisons have found broad concordance among these methods and a high degree of specificity of positive results using coloc.42, 45, 46 However, some differences in results would likely be achieved using alternative co-localization methods.
p	Our analyses prioritize 27 genes within 24 genome-wide significant (GWAS p < 5 × 10−8) SCZ loci and 19 genes in 17 suggestive (p < 1 × 10−6) loci. In addition to a number of previously implicated SCZ risk genes, our findings include several genes not previously considered as candidates,35 in some cases—e.g., SLC35E2, PTPRU (MIM: 602454), LINC01792, DCLK3, PPM1M, LOC101929479—because the genes themselves do not overlap the GWAS locus regions but their eQTL do. In examining these genes for expression specificity in GTEx tissues, brain sample cell types from single-cell RNA-seq,29 and in BrainSpan DLPFC developmental periods (Tables 2 and S8), we find their expression contexts show a diversity of patterns and can provide clues to generate specific hypotheses for functional follow-up of their potential roles in SCZ. Interestingly, genes broadly expressed across cell types tend to show prenatal expression.
p	Fourth, we highlight the importance of examining conditional eQTL for co-localization with GWASs. In at least 6 out of 40 loci showing GWAS-eQTL co-localization, a conditional eQTL signal co-localizes with SCZ risk. This is likely to be a conservative estimate, as the smaller effect sizes of conditional eQTL results in bias against detection of conditional GWAS-eQTL co-localization. If we had considered only primary eQTL in the analyses, these instances of co-localization would not have been identified. Among our highlighted conditional eQTL-GWAS co-localized genes are IREB2, STAT6, and PROX1-AS1. IREB2 (iron regulatory element binding protein 2) is a key regulator of iron homeostasis68, 69 that has been previously implicated in neurodegenerative disorders.70, 71 Mouse IREB2 homolog Irp2 knockouts exhibit impairments in coordination and balance, exploration, and nociception.69 The immune-related transcription factor STAT6 induces interleukin 4 (IL-4)-mediated anti-apoptotic activity of T helper cells, and the locus is associated with migraine72, 73 and brain glioma74 as well as several immune/inflammatory diseases.75, 76, 77 STAT6 also activates neuronal progenitor/stem cells and neurogenesis,78 making it intriguing as an immune-related SCZ candidate given recent observations about the role of complement factor 4 (C4) gene as a SCZ risk gene79 and prior work potentially implicating microglia.80 Consistent with a role in immune-mediated synaptic pruning, STAT6 expression is broadly postnatal and shows specificity for microglia (Table S8). PROX1-AS1 encodes a lncRNA that has been implicated as aberrantly expressed in several cancers, is upregulated in the cell cycle S-phase, and promotes G1/S transition in cell culture.81 As a potential regulator of the Prospero Homeobox 1 (PROX1) transcription factor, it could be involved in development and cell differentiation in several tissues, including oligodendrocytes82 and GABAnergic interneurons83 in the brain. PROX1-AS1 expression is specific to neurons and mature oligodendrocytes and is expressed postnatally (Table S8).
p	In conclusion, we find that conditional eQTL are widespread and are consistent with complex and context-specific regulation. Accounting for conditional eQTL leads to new findings of GWAS-eQTL co-localization and generates specific hypotheses for the role of gene expression regulation in disease etiology. The analytical strategy presented here could be implemented as a means of identification of putatively causal genes for any phenotype in which GWAS summary statistics and expression and genotype data from the GWAS phenotype-relevant tissue are available. Conditional eQTL that co-localize with disease risk may reflect regulatory mechanisms that are important in a key developmental period or individual cell type and may be missed when focusing on primary eQTL discovered in adult whole tissue. As further efforts are made to generate data across ranges of tissues or individual cell types, we may have a better ability to directly identify regulatory variants specific to these contexts. However, if a variant is primarily active in a very specific time point or stimulus condition, capturing data reflecting this condition will remain challenging. Conditional co-localization analysis in well-powered eQTL cohorts may best identify the genes driving these trait associations, though further validation work will be required to understand the mechanism by which the gene contributes to disease risk.
sec	Consortia CMC leadership: Pamela Sklar, Joseph Buxbaum (Icahn School of Medicine at Mount Sinai), Bernie Devlin, David Lewis (University of Pittsburgh), Raquel Gur, Chang-Gyu Hahn (University of Pennsylvania), Keisuke Hirai, Hiroyoshi Toyoshiba (Takeda Pharmaceuticals Company Limited), Enrico Domenici, Laurent Essioux (F. Hoffmann-La Roche Ltd), Lara Mangravite, Mette Peters (Sage Bionetworks), Thomas Lehner, and Barbara Lipska (NIMH). Additional members of CMC: A. Ercument Cicek, Cong Lu, Kathryn Roeder, Lu Xie (Carnegie Mellon Univ.); Konrad Talbot (Cedars-Sinai Medical Center); Scott E. Hemby (High Point Univ.); Laurent Essioux (Hoffmann-La Roche); Andrew Browne, Andrew Chess, Aaron Topol, Alexander Charney, Amanda Dobbyn, Ben Readhead, Bin Zhang, Dalila Pinto, David A. Bennett, David H. Kavanagh, Douglas M. Ruderfer, Eli A. Stahl, Eric E. Schadt, Gabriel E. Hoffman, Hardik R. Shah, Jun Zhu, Jessica S. Johnson, John F. Fullard, Joel T. Dudley, Kiran Girdhar, Kristen J. Brennand, Laura G. Sloofman, Laura M. Huckins, Menachem Fromer, Milind C. Mahajan, Panos Roussos, Schahram Akbarian, Shaun M. Purcell, Tymor Hamamsy, Towfique Raj, Vahram Haroutunian, Ying-Chih Wang, Zeynep H. Gümüş (Mount Sinai School of Med.); Geetha Senthil, Robin Kramer (NIMH); Benjamin A. Logsdon, Jonathan M.J. Derry, Kristen K. Dang, Solveig K. Sieberts, Thanneer M. Perumal (Sage Bionetworks); Roberto Visintainer (Univ. Trento, Italy); Leslie A. Shinobu (Takeda); Patrick F. Sullivan (Univ. North Carolina); and Lambertus L. Klei (Univ. Pittsburgh School of Med.).
title	Consortia
p	CMC leadership: Pamela Sklar, Joseph Buxbaum (Icahn School of Medicine at Mount Sinai), Bernie Devlin, David Lewis (University of Pittsburgh), Raquel Gur, Chang-Gyu Hahn (University of Pennsylvania), Keisuke Hirai, Hiroyoshi Toyoshiba (Takeda Pharmaceuticals Company Limited), Enrico Domenici, Laurent Essioux (F. Hoffmann-La Roche Ltd), Lara Mangravite, Mette Peters (Sage Bionetworks), Thomas Lehner, and Barbara Lipska (NIMH). Additional members of CMC: A. Ercument Cicek, Cong Lu, Kathryn Roeder, Lu Xie (Carnegie Mellon Univ.); Konrad Talbot (Cedars-Sinai Medical Center); Scott E. Hemby (High Point Univ.); Laurent Essioux (Hoffmann-La Roche); Andrew Browne, Andrew Chess, Aaron Topol, Alexander Charney, Amanda Dobbyn, Ben Readhead, Bin Zhang, Dalila Pinto, David A. Bennett, David H. Kavanagh, Douglas M. Ruderfer, Eli A. Stahl, Eric E. Schadt, Gabriel E. Hoffman, Hardik R. Shah, Jun Zhu, Jessica S. Johnson, John F. Fullard, Joel T. Dudley, Kiran Girdhar, Kristen J. Brennand, Laura G. Sloofman, Laura M. Huckins, Menachem Fromer, Milind C. Mahajan, Panos Roussos, Schahram Akbarian, Shaun M. Purcell, Tymor Hamamsy, Towfique Raj, Vahram Haroutunian, Ying-Chih Wang, Zeynep H. Gümüş (Mount Sinai School of Med.); Geetha Senthil, Robin Kramer (NIMH); Benjamin A. Logsdon, Jonathan M.J. Derry, Kristen K. Dang, Solveig K. Sieberts, Thanneer M. Perumal (Sage Bionetworks); Roberto Visintainer (Univ. Trento, Italy); Leslie A. Shinobu (Takeda); Patrick F. Sullivan (Univ. North Carolina); and Lambertus L. Klei (Univ. Pittsburgh School of Med.).
back	Web Resources AMP-AD Knowledge Portal, https://www.synapse.org/ampad BrainSpan – Atlas of the Developing Human Brain, http://www.brainspan.org/ CommonMind Consortium data, https://www.synapse.org/CMC CommonMind Consortium ChIP-seq data, https://www.synapse.org/#!Synapse:syn8040458 coloc2, https://github.com/Stahl-Lab-MSSM/coloc2 dbGaP (accession number phs000979), http://ncbi.nlm.nih.gov/gap ExAC Functional Gene Constraint, http://exac.broadinstitute.org/downloads GCTA, http://cnsgenomics.com/software/gcta/ GemTools, http://wpicr.wpic.pitt.edu/WPICCompgen/GemTools/GemTools.htm GEO (accession number GSE67835), https://www.ncbi.nlm.nih.gov/geo/ GTEx Portal, https://www.gtexportal.org/home/ HBCC microarray cohort, dbGaP (ID: phs000979.v1.p1), https://www.ncbi.nlm.nih.gov/gap LDetect LD blocks, https://bitbucket.org/nygcresearch/ldetect-data/overview NIH Roadmap Epigenomics Project chromatin state learning, http://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state OMIM, http://www.omim.org/ qvalue, http://bioconductor.org/packages/release/bioc/html/qvalue.html R statistical software, https://www.r-project.org/ SNPsnap, https://data.broadinstitute.org/mpg/snpsnap/ SVA: Surrogate Variable Analysis, R package version 3.24.4, http://bioconductor.org/packages/release/bioc/html/sva.html variancePartition, http://bioconductor.org/packages/release/bioc/html/variancePartition.html Supplemental Data Document S1. Figures S1–S18 Tables S1–S12. Additional Data Document S2. Article plus Supplemental Data Acknowledgments Dedicated to the memory of Pamela Sklar, MD, PhD. Data were generated as part of the CommonMind Consortium supported by funding from Takeda Pharmaceuticals Company Limited, F. Hoffmann-La Roche Ltd and NIH grants R01MH085542, R01MH093725, P50MH066392, P50MH080405, R01MH097276, RO1-MH-075916, P50M096891, P50MH084053S1, R37MH057881, R37MH057881S1, HHSN271201300031C, AG02219, AG05138, and MH06692. Brain tissue for the study was obtained from the following brain bank collections: the Mount Sinai NIH Brain and Tissue Repository, the University of Pennsylvania Alzheimer Disease Core Center, and the University of Pittsburgh NeuroBioBank and Brain and Tissue Repositories. Data from the NIMH Human Brain Collection Core were generated as part of the NIMH Human Brain Collection Core (NIH NCT00001260, 999917073). ROSMAP study data were provided by the Rush Alzheimer Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, the Illinois Department of Public Health, and the Translational Genomics Research Institute. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 09/05/16. BrainSpan: Atlas of the Developing Human Brain. Funded by ARRA Awards 1RC2MH089921-01, 1RC2MH090047-01, and 1RC2MH089929-01. Supplemental Data include 18 figures and 12 tables and can be found with this article online at https://doi.org/10.1016/j.ajhg.2018.04.011.
sec	Web Resources AMP-AD Knowledge Portal, https://www.synapse.org/ampad BrainSpan – Atlas of the Developing Human Brain, http://www.brainspan.org/ CommonMind Consortium data, https://www.synapse.org/CMC CommonMind Consortium ChIP-seq data, https://www.synapse.org/#!Synapse:syn8040458 coloc2, https://github.com/Stahl-Lab-MSSM/coloc2 dbGaP (accession number phs000979), http://ncbi.nlm.nih.gov/gap ExAC Functional Gene Constraint, http://exac.broadinstitute.org/downloads GCTA, http://cnsgenomics.com/software/gcta/ GemTools, http://wpicr.wpic.pitt.edu/WPICCompgen/GemTools/GemTools.htm GEO (accession number GSE67835), https://www.ncbi.nlm.nih.gov/geo/ GTEx Portal, https://www.gtexportal.org/home/ HBCC microarray cohort, dbGaP (ID: phs000979.v1.p1), https://www.ncbi.nlm.nih.gov/gap LDetect LD blocks, https://bitbucket.org/nygcresearch/ldetect-data/overview NIH Roadmap Epigenomics Project chromatin state learning, http://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state OMIM, http://www.omim.org/ qvalue, http://bioconductor.org/packages/release/bioc/html/qvalue.html R statistical software, https://www.r-project.org/ SNPsnap, https://data.broadinstitute.org/mpg/snpsnap/ SVA: Surrogate Variable Analysis, R package version 3.24.4, http://bioconductor.org/packages/release/bioc/html/sva.html variancePartition, http://bioconductor.org/packages/release/bioc/html/variancePartition.html
title	Web Resources
p	AMP-AD Knowledge Portal, https://www.synapse.org/ampad BrainSpan – Atlas of the Developing Human Brain, http://www.brainspan.org/ CommonMind Consortium data, https://www.synapse.org/CMC CommonMind Consortium ChIP-seq data, https://www.synapse.org/#!Synapse:syn8040458 coloc2, https://github.com/Stahl-Lab-MSSM/coloc2 dbGaP (accession number phs000979), http://ncbi.nlm.nih.gov/gap ExAC Functional Gene Constraint, http://exac.broadinstitute.org/downloads GCTA, http://cnsgenomics.com/software/gcta/ GemTools, http://wpicr.wpic.pitt.edu/WPICCompgen/GemTools/GemTools.htm GEO (accession number GSE67835), https://www.ncbi.nlm.nih.gov/geo/ GTEx Portal, https://www.gtexportal.org/home/ HBCC microarray cohort, dbGaP (ID: phs000979.v1.p1), https://www.ncbi.nlm.nih.gov/gap LDetect LD blocks, https://bitbucket.org/nygcresearch/ldetect-data/overview NIH Roadmap Epigenomics Project chromatin state learning, http://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state OMIM, http://www.omim.org/ qvalue, http://bioconductor.org/packages/release/bioc/html/qvalue.html R statistical software, https://www.r-project.org/ SNPsnap, https://data.broadinstitute.org/mpg/snpsnap/ SVA: Surrogate Variable Analysis, R package version 3.24.4, http://bioconductor.org/packages/release/bioc/html/sva.html variancePartition, http://bioconductor.org/packages/release/bioc/html/variancePartition.html
p	AMP-AD Knowledge Portal, https://www.synapse.org/ampad
p	BrainSpan – Atlas of the Developing Human Brain, http://www.brainspan.org/
p	CommonMind Consortium data, https://www.synapse.org/CMC
p	CommonMind Consortium ChIP-seq data, https://www.synapse.org/#!Synapse:syn8040458
p	coloc2, https://github.com/Stahl-Lab-MSSM/coloc2
p	dbGaP (accession number phs000979), http://ncbi.nlm.nih.gov/gap
p	ExAC Functional Gene Constraint, http://exac.broadinstitute.org/downloads
p	GCTA, http://cnsgenomics.com/software/gcta/
p	GemTools, http://wpicr.wpic.pitt.edu/WPICCompgen/GemTools/GemTools.htm
p	GEO (accession number GSE67835), https://www.ncbi.nlm.nih.gov/geo/
p	GTEx Portal, https://www.gtexportal.org/home/
p	HBCC microarray cohort, dbGaP (ID: phs000979.v1.p1), https://www.ncbi.nlm.nih.gov/gap
p	LDetect LD blocks, https://bitbucket.org/nygcresearch/ldetect-data/overview
p	NIH Roadmap Epigenomics Project chromatin state learning, http://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state
p	OMIM, http://www.omim.org/
p	qvalue, http://bioconductor.org/packages/release/bioc/html/qvalue.html
p	R statistical software, https://www.r-project.org/
p	SNPsnap, https://data.broadinstitute.org/mpg/snpsnap/
p	SVA: Surrogate Variable Analysis, R package version 3.24.4, http://bioconductor.org/packages/release/bioc/html/sva.html
p	variancePartition, http://bioconductor.org/packages/release/bioc/html/variancePartition.html
sec	Supplemental Data Document S1. Figures S1–S18 Tables S1–S12. Additional Data Document S2. Article plus Supplemental Data
title	Supplemental Data
p	Document S1. Figures S1–S18 Tables S1–S12. Additional Data Document S2. Article plus Supplemental Data
caption	Document S1. Figures S1–S18
title	Document S1. Figures S1–S18
caption	Tables S1–S12. Additional Data
title	Tables S1–S12. Additional Data
caption	Document S2. Article plus Supplemental Data
title	Document S2. Article plus Supplemental Data
ack	Acknowledgments Dedicated to the memory of Pamela Sklar, MD, PhD. Data were generated as part of the CommonMind Consortium supported by funding from Takeda Pharmaceuticals Company Limited, F. Hoffmann-La Roche Ltd and NIH grants R01MH085542, R01MH093725, P50MH066392, P50MH080405, R01MH097276, RO1-MH-075916, P50M096891, P50MH084053S1, R37MH057881, R37MH057881S1, HHSN271201300031C, AG02219, AG05138, and MH06692. Brain tissue for the study was obtained from the following brain bank collections: the Mount Sinai NIH Brain and Tissue Repository, the University of Pennsylvania Alzheimer Disease Core Center, and the University of Pittsburgh NeuroBioBank and Brain and Tissue Repositories. Data from the NIMH Human Brain Collection Core were generated as part of the NIMH Human Brain Collection Core (NIH NCT00001260, 999917073). ROSMAP study data were provided by the Rush Alzheimer Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, the Illinois Department of Public Health, and the Translational Genomics Research Institute. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 09/05/16. BrainSpan: Atlas of the Developing Human Brain. Funded by ARRA Awards 1RC2MH089921-01, 1RC2MH090047-01, and 1RC2MH089929-01.
title	Acknowledgments
p	Dedicated to the memory of Pamela Sklar, MD, PhD.
p	Data were generated as part of the CommonMind Consortium supported by funding from Takeda Pharmaceuticals Company Limited, F. Hoffmann-La Roche Ltd and NIH grants R01MH085542, R01MH093725, P50MH066392, P50MH080405, R01MH097276, RO1-MH-075916, P50M096891, P50MH084053S1, R37MH057881, R37MH057881S1, HHSN271201300031C, AG02219, AG05138, and MH06692. Brain tissue for the study was obtained from the following brain bank collections: the Mount Sinai NIH Brain and Tissue Repository, the University of Pennsylvania Alzheimer Disease Core Center, and the University of Pittsburgh NeuroBioBank and Brain and Tissue Repositories. Data from the NIMH Human Brain Collection Core were generated as part of the NIMH Human Brain Collection Core (NIH NCT00001260, 999917073).
p	ROSMAP study data were provided by the Rush Alzheimer Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, the Illinois Department of Public Health, and the Translational Genomics Research Institute. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 09/05/16. BrainSpan: Atlas of the Developing Human Brain. Funded by ARRA Awards 1RC2MH089921-01, 1RC2MH090047-01, and 1RC2MH089929-01.
footnote	Supplemental Data include 18 figures and 12 tables and can be found with this article online at https://doi.org/10.1016/j.ajhg.2018.04.011.
p	Supplemental Data include 18 figures and 12 tables and can be found with this article online at https://doi.org/10.1016/j.ajhg.2018.04.011.

Annnotations

blinded

PMC:5993513 / 35298-35302 JSONTXT

Document structure show

Annnotations

PMC:5993513 / 35298-35302 JSON TXT