Subjects and Methods Sample Description The British Household Panel Survey (BHPS) began in 1991, and in 2010 it was incorporated into the larger UKHLS29 (also known as Understanding Society), which is a longitudinal panel survey of 40,000 UK households from England, Scotland, Wales, and Northern Ireland. Since 1991, annual interviews have collected sociodemographic information, and in 2011–2012, biomedical measures and blood samples for BHPS participants were collected during a nurse visit in the participant’s home. Respondents were eligible to give a blood sample if they had taken part in the previous main interview in English; were 16 or older; lived in England, Wales, or Scotland; were not pregnant; and met other conditions detailed in the user guide.30 For each participant, non-fasting blood samples were collected through venipuncture; these were subsequently centrifuged so that plasma and serum were separated, and samples were aliquoted and frozen at −80°C. DNA has been extracted and stored for genetic and epigenetic analyses. Genome-wide Quantification of DNAm DNAm was profiled in DNA extracted from whole blood for 1,193 individuals who were aged from 28 to 98; who were eligible for and consented to both blood sampling and genetic analysis; who had been present at all annual interviews between 1999 and 2011; and whose time between blood sample collection and processing did not exceed 3 days. Eligibility requirements for genetic analyses meant that the epigenetic sample was restricted to participants of white ethnicity. The EZ-96 DNA Methylation-Gold kit (Zymo Research) was used for treating 500 ng of DNA from each sample with sodium bisulfite. DNAm was quantified with the Illumina Infinium HumanMethylationEPIC BeadChip run on an Illumina iScan System according to the manufacturer’s standard protocol. Samples were randomly assigned to chips and plates so that batch effects would be minimized. In addition, the inclusion of a fully methylated control (CpG Methylated HeLa Genomic DNA; New England BioLabs) in a random position on each plate facilitated sample tracking and helped to resolve experimental inconsistencies and confirm data quality. DNAm Data Preprocessing Raw signal intensities were imported from.idat files into the R statistical environment31 and converted into beta values (the proportion of DNA methylation at individual sites was measured) with the bigmelon package.32 These data were processed via a standard pipeline including the following steps: (1) detection of outlier samples via principal-component analysis and Mahalanobis distance equivalents, (2) confirmation of complete bisulphite conversion via control probes, (3) comparison of estimated age from the data via the Horvath Epigenetic Clock algorithm33 and reported age at sampling, and (4) visualization of principal components. Data were normalized with the dasen function within the wateRmelon package,34 which performs background adjustment and between-sample quantile normalization of methylated (M) and unmethylated (U) intensities separately for type I and type II probes. Samples that were dramatically altered as a result of normalization were excluded on the basis of the difference between the normalized and raw data; those with a root mean square and standard deviation > 0.05 were removed. Samples were then filtered so that those with >1% of sites with a detection p value > 0.05 were excluded. Finally, DNA-methylation sites with a bead count <3 were excluded along with those in which >1% of the sample had a detection p value > 0.05. The raw DNA methylation data from the final sample set was then re-normalzsed with the dasen function. The final dataset included 857,071 DNA-methylation sites and 1,175 individuals for subsequent analysis. These DNAm data are available upon request through the European Genome-Phenome Archive under accession code EGAS00001001232. Annotation of DNAm Sites The genomic location of DNAm sites along with genic, DNase hypersensitivy sites and open chromatin annotation were taken from the manifest files provided by Illumina and downloaded from the product support pages (see Web Resources). Genotyping and Imputation UKHLS samples were genotyped with the Illumina Infinium HumanCoreExome BeadChip Kit as previously described (12v1-0).35 This array contains a set of >250,000 highly informative genome-wide tagging single-nucleotide polymorphisms (SNPs) as well as a panel of functional (protein-altering) exonic markers, including a large proportion of low-frequency (MAF 1%–5%) and rare (MAF < 1%) variants. Genotype calling was performed with the gencall algorithm within GenomeStudio (Illumina). After only the samples with matched DNAm data were selected, variants were refiltered prior to imputation. PLINK36 was used for removing samples with >5% missing data. We also excluded SNPs characterized by >5% missing values, a Hardy-Weinberg equilibrium p value < 0.001, and a minor-allele frequency of <5%. For identification of related samples, SNPs underwent LD pruning, and the –genome command in PLINK was used for calculating the proportion of identity-by-descent for all pairs of samples; 58 pairs of related samples (PI_HAT > 0.2) were identified, and randomly excluding one individual from each pair ensured that the samples were independent. These data were then imputed with the 1000 Genomes phase 3 version5 reference panels SHAPEIT and minimac3.37 Best-guess genotypes were called, and variants were filtered to those with a minor-allele frequency >0.01 and an INFO score >0.8. Because variants were named using their locations (“chr:pos”) and variant type (SNP/INDEL), duplicate variants were also excluded. Principal components were calculated from the imputed genotype data via GCTA (a tool for genome-wide complex-trait analysis).38 16 samples were identified as being outliers (defined as more than 2 standard deviations from the mean) in a scatterplot of the first two principal components and were excluded from subsequent genetic analyses. Principal components were then recalculated for inclusion as covariates in QTL analyses. The imputed genetic variants were then filtered so that variants characterized by >5% missing values, a Hardy-Weinberg equilibrium p value <0.001, a minor-allele frequency of <5%, and a minimum of five observations in each genotype group were excluded. These genotype data are available on application through the European Genome-phenome Archive under accession code EGAS00001001232. DNAm Quantitative-Trait Loci Cross-hybridizing probes, probes with a common SNP (European population minor-allele frequency > 0.01) within 10 bp of the CpG site or a single base extension39, 40 and probes on the sex chromosomes were excluded from the QTL analysis. In addition, 977 substandard probes identified by Illumina were also excluded. We performed a genome-wide mQTL analysis; in total, we tested 766,714 DNAm sites against 5,210,475 genetic variants by using the R package MatrixEQTL.41 This package enables fast computation of QTLs by only saving those more significant than a pre-defined threshold (set to p = 1 × 10−8 for this analysis). We fitted an additive linear model to test whether the number of alleles (coded 0,1,2) predicted DNAm at each site; we included covariates for age, sex, six estimated cellular composition variables (B cells, CD8 T cells, CD4 T cells, monocytes, granulocytes, natural killer T cells),42, 43 two binary batch variables, and the first ten principal components from the genotype data to control for ethnicity differences. We used a Bonferroni-corrected multiple-testing threshold, set to genome-wide significance for GWAS and divided by the number of DNAm sites tested (i.e., 5 × 10−8/766714 = 6.52 × 10−14). We used the clump command in PLINK36 to identify the number of independent associations for each DNAm site with more than 1 significant mQT by using the following parameters: –clump-p1 1e-8–clump-p2 1e-8–clump-r2 0.1–clump-kb 250. Bayesian Co-localization Out of all DNAm sites with at least 1 significant mQTL (p < 1 × 10−10), all pairs of DNAm sites located on the same chromosome and within 250 kb of each other were tested for co-localization. Because data for all SNPs (regardless of significance) are required for this analysis, first, the mQTL analysis was rerun for these DNAm sites so that all association statistics (p value, regression coefficient, and t-statistic, so that the standard error could be inferred) could be recorded for all SNPs within 500 kb of the DNAm site. Co-localization analysis was performed as previously described44 with the R coloc package (see Web Resources). From our mQTL results we input the regression coefficients, their variances, and SNP minor-allele frequencies, and we left the prior probabilities as their default values. This methodology allowed us to quantify the support across the results of each GWAS for five hypotheses by calculating the posterior probabilities, denoted as PPi for hypothesis Hi.H0: there exist no causal variants for either CpG site; H1: there exists a causal variant for CpG1 only; H2: there exists a causal variant for CpG2 only; H3: there exist two distinct causal variants, one for each CpG; or H4: there exists a single causal variant common to both CpGs. Summarized Mendelian Randomization Analysis 1: Identifying Putative Pleiotropic Relationships between DNAm and Complex Traits SMR analysis between DNAm and complex traits was performed with publicly available software (see Web Resources) as previously described.27, 28 Publicly available genome-wide association study (GWAS) results were downloaded from a range of sources and converted to the appropriate format for the SMR analysis. We renamed SNPs in the 1000 Genomes format (chr:bp) to align them with the mQTL output by using dbSNP version 141 (where SNP locations for hg19 were not provided in the results file). Where allele frequency was not provided, it was taken from the European subset of 1000 Genomes (phase 3, version 5). Details for how each set of results was processed can be found in Table S4. We used significant mQTLs (p < 1 × 10−10) calculated in the UKHLS sample to identify genetic instruments for 126,457 DNAm sites that were included in the SMR analysis. The SMR test comprises of two steps. First, we performed a two-sample Mendelian randomization with the two-step least-squares (2SLS) approach by using the effect size of the top cis-QTL SNP and its corresponding effect in the GWAS. The significance threshold for this part of test was set at 3.95 × 10−7, calculated by the Bonferroni correction method and adjusted for the number of DNAm sites tested (0.05/126,457). Second, we tested for heterogeneity of effects by using alternative SNPs as the instrumental variable, on the basis of the theory that if both DNAm and the GWAS trait were associated with the same causal variant, the choice of SNP would be irrelevant, whereas if they were associated with different causal variants, the differing linkage disequilibrium relationships between the instruments and each causal variant would lead to variation in the estimated effect between the trait and DNAm. Non-significant heterogeneity (heterogeneity in dependent instruments [HEIDI] p > 0.05) indicates that there is a pleiotropic effect on a GWAS trait and DNAm. This approach was repeated with publicly available eQTL data from Westra et al.;45 in this analysis, significant pleotropic associations between gene expression and complex traits were selected as those with SMR p < 8.38 × 10−6 (corrected for 5,966 gene expression probes tested) and HEIDI p > 0.05. Summarized Mendelian Randomization Analysis 2: Identifying Putative Pleiotropic Relationships between DNAm and Gene Expression We used a second application of the SMR analysis to identify pleiotropic relationships between DNAm and gene expression. Gene eQTL results from the Westra eQTL study45 were downloaded along with the SMR software. SNP IDs were converted to the 1000 Genomes format (so they would match the mQTL output), and SNP frequencies were taken from the European subset of 1000 Genomes (phase 3, version 5). These data included eQTLs at 5,966 probes. All pairs of CpG and genes where tested as long as (1) the CpG had a significant mQTL (p < 1 × 10−10), (2) the gene had a significant eQTL (p < 5 × 10−8), and (3) there was a common genetic variant tested within 500 kilobases of the gene expression probe and DNAm site. In total, 488,342 pairs of DNAm sites and gene expression transcripts were tested; therefore, the significance threshold for the first stage of the SMR test was set to p < 1.02 × 10−7 after a Bonferroni correction for the number of tests was applied. Consistent with all other SMR analyses in this manuscript, a non-significant heterogeneity test (HEIDI p > 0.05) in step 2 of the SMR analysis was used for classifying pleiotropic relationships from artifacts of linkage disequilibrium. Enrichment Analyses DNAm sites were annotated to genes and CpG islands with the information provided in the Illumina manifest file, which is based on the UCSC RefGene and CpG island databases. Sites are annotated to genes if they are located within the gene body or up to 1,500 base pairs form the transcription start site. Sites are annotated to CpG islands if they are located within the boundaries of a CpG island, to a shore if they are located up to 2,000 base pairs from an island, or to a shelf if they are between 2,000 and 4,000 base pairs from an island. Frequency tables were used for recording the number of sites annotated to each feature category, and Chi-square tests were used for identifying different distributions across these annotation categories between all tested DNAm sites and the subset of sites considered for enrichment analysis (e.g., all DNAm sites with at least one significant mQTL). Data Availability Summary statistics for all Bonferroni-significant DNA-methylation quantitative-trait loci are available for download from the Complex Disease Epigenomics Group website, where readers can also explore many of the results included in this manuscript through our interactive web application. Analysis scripts used in this manuscript are available on GitHub, and data on phenotypes linked to DNA methylation are available on METADAC. See the Web Resources and the Accession Numbers sections.