Material and Methods The study design is summarized in Figure 1. Figure 1 Study Design Abbreviations are as follows: SV, structural variant; SNV, single-nucleotide variant; SO, Sequence Ontology; HGMD, Human Gene Mutation Database; ACMG, American College of Medical Genetics; and IGV, Integrated Genomics Viewer. Participants 460 participants from 440 families were recruited through clinical genetics services in the UK (442 individuals), Greece (nine individuals), Hong Kong (three individuals), the US (three individuals), Israel (two individuals), and Ireland (one individual). In each family, there was a clinical suspicion of a cancer-predisposition syndrome, but routine genetic assessment and testing had not identified a germline molecular genetic diagnosis at the time of recruitment. 435 individuals had developed MPTs (defined here as at least two primaries by age 60 years or at least three by 70 years), and 25 had developed a single primary tumor and had a first-degree relative with MPTs. Tumors in the same tissue type and organ were considered separate primary tumors if, in the case of paired organs, they occurred bilaterally or if the medical record clearly denoted them as distinct. International Agency for Research on Cancer criteria for defining separate primaries were also used.8 Tumor diagnoses in the series were labeled according to site and cell of origin (Table S1). All participants gave written informed consent to participate in the NIHR BioResource Rare Diseases, Molecular Pathology of Human Genetic Disease (HumGenDis), and/or Investigating Hereditary Cancer Predisposition (IHCAP) studies. The NIHR BioResource projects were approved by research ethics committees in the UK and appropriate ethics authorities in non-UK enrollment centers. Ethical approval for HumGenDis and IHCAP was given by the South Birmingham and East of England Cambridgeshire and Hertfordshire research ethics committees, respectively. WGS and Panel Sequencing WGS was performed on samples from study participants as part of the NIHR BioResource Rare Diseases study.5 Blood DNA samples were fragmented (mean size 450 bp) with the Covaris LE220 kit and further processed with an Illumina TruSeq DNA PCR-Free Library Prep Kit. Libraries were sequenced with an Illumina HiSeq 2500 sequencer with one library per two lanes. FASTQ files were generated by HiSeq Analysis Software v.2.0 (Illumina). Alignment (GRCh37) and variant calling (including structural variants [SVs])9, 10 was performed with Isaac (Illumina). For 411 samples, the Illumina TruSight Cancer Panel (TCP) was also used (gene list in Table S2), and libraries were sequenced with an Illumina MiSeq. BCL files resulting from the sequencing were converted to FASTQ files with Illumina’s bcl2fastq. FASTQ files were checked for coverage and other quality-control parameters with fastqc software. FASTQ files were aligned to the UCSC Genome Browser (hg19) with the Burrows-Wheeler Aligner (BWA-MEM) with default parameters and SAMtools for the generation of a binary compressed sequence alignment map (BAM) files.11, 12 Variants were called from BAM files with the Genome Analysis Toolkit Unified Genotyper algorithm.13, 14 All data were annotated with Variant Effect Predictor (VEP) v.87 on the basis of canonical transcripts.15 SNV and Indel Identification and Assessment Variants were extracted from VCF files if they were within a gene specified in a comprehensive list of 83 CPGs (Table 1) and had a predicted Sequence Ontology (SO) consequence indicating a deleterious effect on protein function. The gene list used for analysis was initially composed of all genes listed in a 2014 review of CPGs1 (n = 114; gene list in Table S3) and/or those sequenced by the Illumina TCP (n = 94). Two additional more recently described CPGs, namely NTHL1 (MIM: 602656)16 and CDKN2B (MIM: 600431),17 were also included (Table S3). We subsequently reviewed and filtered the genes to produce a list that would be applicable to referrals to clinical cancer genetic services. Genes were included if deleterious variants affecting them were associated with adult-onset tumors and if neoplastic lesions were likely to be a primary presenting feature. For example, SOS1 was not included because although Noonan syndrome is associated with increased neoplasia risk, other features of the condition are likely to prompt initial referral. Table 1 Gene List Used for Analysis (n = 83) AIP (MIM: 605555) EGFR (MIM: 131550)a NF1 (MIM: 613113) SDHB (MIM: 185470) ALK (MIM: 105590)a EPCAM (MIM: 185535) NF2 (MIM: 607379) SDHC (MIM: 602413) APC (MIM: 611731) ERCC2 (MIM: 126340)b NTHL1 (MIM: 602656)b SDHD (MIM: 602690) ATM (MIM: 607585) ERCC3 (MIM: 133510)b PALB2 (MIM: 610355) SERPINA1 (MIM: 107400)b AXIN2 (MIM: 604025) ERCC4 (MIM: 133520)b PDGFRA (MIM: 173490)a SMAD4 (MIM: 600993) BAP1 (MIM: 603089) ERCC5 (MIM: 133530)b PHOX2B (MIM: 603851) SMARCA4 (MIM: 603254) BMPR1A (MIM: 601299) EXT1 (MIM: 608177) PMS2 (MIM: 600259) SMARCB1 (MIM: 601607) BRCA1 (MIM: 113705) EXT2 (MIM: 608210) POLD1 (MIM: 174761) SMARCE1 (MIM: 603111) BRCA2 (MIM: 600185) FH (MIM: 136850) POLE (MIM: 174762) SRY (MIM: 480000) BRIP1 (MIM: 605882) FLCN (MIM: 607273) POLH (MIM: 603968)b STK11 (MIM: 602216) CDC73 (MIM: 607393) GATA2 (MIM: 137295) PRKAR1A (MIM: 188830) SUFU (MIM: 607035) CDH1 (MIM: 192090) HFE (MIM: 613609)b PTCH1 (MIM: 601309) TGFBR1 (MIM: 190181) CDK4 (MIM:123829)a HNF1A (MIM: 142410) PTEN (MIM: 601728) TMEM127 (MIM: 613403) CDKN1B (MIM: 600778) KIT (MIM: 164920)a RAD51C (MIM: 602774) TP53 (MIM: 191170) CDKN2A (MIM: 600160) MAX (MIM: 154950) RAD51D (MIM: 602954) TSC1 (MIM: 605284) CDKN2B (MIM: 600431) MEN1 (MIM: 613733) RB1 (MIM: 614041) TSC2 (MIM: 191092) CEBPA (MIM: 116897) MET (MIM: 164860)a RET (MIM: 164761)a VHL (MIM: 608537) CHEK2 (MIM: 604373) MLH1 (MIM: 120436) RHBDF2 (MIM: 614404)a WT1 (MIM: 607102) CYLD (MIM: 605018) MSH2 (MIM: 609309) RUNX1 (MIM: 151385) XPA (MIM: 611153)b DDB2 (MIM: 600811) MSH6 (MIM: 600678) SDHA (MIM: 600857) XPC (MIM: 613208)b DICER1 (MIM: 606241) MUTYH (MIM: 604933)b SDHAF2 (MIM: 613019) a Considered to be proto-oncogenes. b Considered to be associated with tumor predisposition in the homozygous or compound-heterozygous state only. In order to identify clinically relevant variants, we subjected the resulting data to a range of filters (Figure S1). First, variants were removed if they failed to satisfy the quality criteria of a genotype quality (GQ) ≥ 30 (a Phred-scaled probability that the called genotype is incorrect), read depth (DP) ≥ 10 (at least ten reads covering the variant base[s]), variant allele fraction (VAF) ≥ 33%, and filter PASS (quality criteria applied by the Isaac variant caller in the NIHR BioResource Rare Disease Project). Second, variants were excluded if they had an allele frequency above 0.01 in either the Exome Aggregation Consortium (ExAC) Browser18 (all populations) or the 1000 Genomes Project19 (all populations). Third, variants were retained for further review if the predicted consequence was among a list of SO terms indicating protein truncation, if there was evidence of pathogenicity in ClinVar20 (at least two-star evidence of pathogenic or likely pathogenic [P/LP] effect corresponding to multiple submissions with no conflicts as to the assertion of clinical significance), or if the variant was assigned a disease mutation (DM) status in the Human Gene Mutation Database (HGMD).21 In order to consider a subset of non-truncating variants that are predicted to be pathogenic by in silico tools but do not appear in public databases, we also retained variants exceeding a Phred-scaled CADD22 score threshold of 34 for further review. CADD was selected for this purpose given that it incorporates a range of tools and consequently a number of lines of evidence. The threshold was chosen as the median of scores assigned to other variants (affecting any gene) deemed pathogenic according to the criteria described below. Therefore, as a second variant filtering process, variants were identified for retention solely on the basis of CADD scores after variants retained for other reasons were assessed. In the strategy described above, significant variants that are located in non-coding regions, such as introns, and affect genes in the gene list would not be extracted from the original VCF files because their SO consequence would not be in said list. Therefore, we used ClinVar to compile a list of known pathogenic variants falling outside of exons or splice sites and filtered VCFs on the basis of their genomic positions in a separate interrogation. Variants were incorporated in the list if they occurred in or near a gene in the list, were classified as near gene, non-coding RNA or untranslated region, and had at least two-star evidence of a P/LP effect. This process produced only three known pathogenic variants to search for in the WGS data. Distant non-coding variants affecting gene function (e.g., enhancers) were not considered in the current study. Retained variants were subsequently excluded if their putative pathogenicity could be refuted by one of the following criteria: (1) a predicted protein-truncating variant for which there was at least two-star evidence of a benign or uncertain effect in ClinVar; (2) a predicted protein-truncating variant in a proto-oncogene in a list compiled on the basis of a literature review1 (constitutional cancer-predisposing variants in proto-oncogenes are associated with gain-of-function variants, so truncation of the protein product is unlikely to increase tumor risk), (3) a predicted protein-truncating variant affecting <5% of the canonical transcript (according to the LOFTEE VEP plugin), (4) a variant affecting a gene associated with only recessive tumor predisposition (as defined by a literature review1, 16, 23) unless an individual appeared to harbor two filtered variants in the same gene, and (5) a variant with HGMD DM status or that exceeded the CADD score threshold and had at least two-star evidence of a benign or uncertain clinical effect or one-star evidence if there were multiple submissions without a P/LP assertion. We used the Integrated Genomics Viewer (IGV)24 to review variants that had passed filters to check for issues such as adjacent variants affecting the predicted consequence or variants being located at the end of sequencing reads. Pathogenicity was then assessed according to the American College of Medical Genetics (ACMG) criteria (Table S4),25 which provide a framework for compiling multiple weighted lines of evidence. Additionally, for each variant, it was noted whether the corresponding individual had previously been diagnosed with a tumor typically associated with pathogenic variants in that gene (according to Rahman,1 the Familial Cancer Database,23 or the original paper reporting the gene as a CPG). Validation of P/LP variants was carried out with data from the TCP or Sanger sequencing according to standard protocols if TCP data were unavailable. Primer sequences are available on request. SV Identification and Assessment Structural variant (SV) calls that were predicted to affect a gene on the gene list (n = 83) were filtered and assessed according to the quality of the call, rarity of the variant, and biological plausibility of tumor predisposition caused by the variant (Figure S2). We initially filtered SVs called by Canvas and/or Manta to retain those that were predicted to affect at least one exon, occurred at a frequency of less than 1% across all NIHR BioResource Rare Disease samples (n = 9,110), and fulfilled minimum quality criteria (GQ ≥ 30 for Manta, QUAL ≥ 30 for Canvas). Remaining variants were regarded as potentially pathogenic if they affected a gene associated with tumor predisposition in the heterozygous state (unless there was evidence of homozygosity or compound heterozygosity) and fell into one of the following categories: (1) copy-number loss of coding regions of a tumor-suppressor gene, (2) copy-number gain of coding regions of a proto-oncogene, and (3) any SV type with a predicted breakpoint disrupting the gene. Subsequently, these SV calls were reviewed with IGV and excluded if they occurred in a copy-number variation map of the human genome26 (hg19 stringent). The occurrence of tumors associated with disruption of particular genes in individuals harboring suspected SVs was noted in the same manner as for single-nucleotide variants (SNVs) and indels. BAM files corresponding to all suspected deleterious calls were reviewed in IGV. All SVs were confirmed with Sanger sequencing according to standard protocols. Inversions, translocations, and tandem duplications were confirmed by sequencing across breakpoints, whereas deletions were confirmed by fragment size resulting from long-range PCR if sequencing across the breakpoint was not possible. Primer sequences are available on request. Comparison of MPT Series with Other Datasets To consider how the tumor combinations in our series differed from those in the general population, we compared combination frequencies in our MPT data with a previously analyzed dataset from the East Anglia Cancer Registry (2009–2014; population size ∼5.5 million). Registry data recorded individuals with two cancer (or central nervous system [CNS] tumor) diagnoses before the age of 60 years and only included tumors occurring before that age. Consequently, only combinations in MPT data of two malignant (or CNS) tumors occurring before 60 years of age were considered for this comparison. To compare detection rates of loss-of-function variants in our cohort with a large-scale WGS dataset unselected for neoplastic phenotypes, we interrogated gnomAD18 (data downloaded in February 2018) for variants occurring in the same set of 83 genes. Only truncating or splice-site variants were considered for comparison purposes because these are less likely to be false positives and made up 52/63 (82.5%) (see Results) of the P/LP variants in our cohort. Variants extracted from gnomAD were filtered and assessed in the same manner as those occurring in the MPT cohort. The frequency of variants assessed as P/LP was also calculated for males and females, and the sex distribution of individuals in the gnomAD dataset (55.3% male and 44.6% female) was estimated with mean allele count across all positions in the gnomAD VCF file of chromosomes 1–22. In order to estimate gnomAD P/LP variant frequency as though the sex distribution was equivalent to that in the MPT series (23% male and 77% female), we applied the sex-specific frequency to the estimated total number of gnomAD females (n = 6,929) and a reduced number of males (n = 2,064) that would achieve the desired proportion. We then summed the respective allele-frequency estimates to provide a figure for comparison with the MPT series. Calculation of Coverage For BAM files from WGS and TCP data, coverage statistics for regions of interest were generated with SAMtools depth.12 A BED file compiled with Ensembl BioMart27 to represent translated exonic regions and splice sites of genes in the gene list was utilized for this purpose. Statistical Analysis All statistical tests were performed with R v.3.4.3.28 Pearson’s χ2 tests and Student’s t tests were performed with the chisq.test and t.test functions, respectively.