Discussion Genetic variant classification is crucial to accurate genetic diagnoses and represents a major challenge in the post-genome era, particularly for a disorder with genetic and phenotypic heterogeneity like deafness. The DVD was designed as a deafness-specific, comprehensive, open-access database that collates and summarizes all available data in addition to providing expert curation of genetic variants implicated in deafness (Figure 1). We integrate its use into a weekly multidisciplinary conference where a person’s genotypic data are reviewed in the context of available phenotypic data to provide expert contextual interpretation of the genetic results. As a first step, DVD annotations are used for prioritization of a person’s variant list, automatically flagging variants known to be reported as pathogenic in the DVD, as well as retaining DVD-classified LP/P variants that may have been filtered out of our NGS processing pipeline due to poor quality or ambiguous mapping. This type of curation reduces false negative rates and highlights the importance of disease-specific knowledge and disease-specific databases. CNVs are not integrated in the current release of the DVD. We have shown previously that they are major contributor to hearing loss and are implicated in ∼18% of all positive diagnoses.25 The challenge to their incorporation in the DVD resides in the lack of data regarding their exact breakpoint junctions. As more data become available, integration of CNVs should be an integral part of any variant database. While it is difficult to provide “universal” MAF thresholds, the ACMG does recommend using MAF data as a key filter in their guidelines for variant interpretation. We deem a MAF ≥ 0.5% to be incompatible with a classification of P/LP for hereditary hearing loss aside from specific cases such as variants in GJB2 and SLC26A4, and we use this threshold to automatically classify any variant as B (Table S2).16 It is important to note that with the availability of new datasets from large-population sequencing projects such as gnomAD, MAF for some variants will change, which in turn may affect their clinical significance. While using a universal MAF cutoff is beneficial, for a common disease such as deafness this filter aided in classifying only 4% of coding variants as benign, illustrating that MAF cutoffs and rarity alone are not sufficient to determine deleteriousness. As an additional aid, the DVD integrates predictions from six algorithms—two assessing conservation (PhyloP and GERP++) and four evaluating deleteriousness (SIFT, PolyPhen-2, MutationTaster, and LRT)—from which to calculate a composite PS. As more than 95% of known P variants have a pathogenicity score > 40%, the PPV of this approach reaches 0.995 (Figure S1). Using this threshold, we classify variants as either VUS (PS ≥ 60%) or LB (PS ≤ 40%). ACMG guidelines also endorse predictions from in silico algorithms as one of the eight evidence criteria recommended for variant clinical interpretation, and although outcomes and results from several studies vary depending on the algorithms used, these studies all agree on the utility of such tools for improving accuracy and reducing VUS burden in clinical diagnosis.3, 26 Questions remain regarding the strength and amount of evidence needed to sway a classification from a VUS to P/LP or B/LB. Since in clinical settings substantial evidence is needed to reach a P/LP classification, we have opted to use in silico algorithms exclusively to shift a VUS classification to LB.27 We require additional evidence (genotype, phenotype, family history, segregation, and functional studies) to upgrade a VUS to a P or LP variant. Discrepancies in variant classification between the DVD versus ClinVar and HGMD were observed at 14.5% and 5%, respectively (Figure 2). Differences were due in part to the misclassification of B and LB variants as P or LP and have been reported in other studies highlighting the limitations of ClinVar and HGMD.28, 29, 30 ClinVar is based on submissions from researchers and clinical diagnostic laboratories. It is an invaluable resource that creates an open platform for sharing genetic data and variant interpretation, but it has some disadvantages. Most obvious are the differences in the methods used to detect, validate, curate, and derive variant interpretation, which understandably vary between groups and thus can lead to conflicting classifications.27, 31, 32, 33 Unlike ClinVar, HGMD relies on published literature and is primarily a disease-causing focused variant database. Although the variants reported in HGMD have been published and therefore have undergone peer review, the HGMD curation process is error prone due to the potential for subjective misinterpretation of the literature and a lack of disease-specific experts reviewing the material. We implemented major categorical reclassifications that led to medically significant changes in 52 genes. In 33 genes, the change affected three or fewer variants (20 genes, 1 variant change; 11 genes, 2 variant changes; 2 genes, 3 variant changes); however, of the top 20% of genes carrying the greatest number of reclassifications, six cause Usher syndrome (ADGRV1 [USH2C], CDH23 [USH1D and DFNB12], MYO7A [USH1B, DFNB2, and DFNA11], PCDH15 [USH1F and DFNB23], USH2A [USH2A], and USH1C [USH1C and DFNB18A]) (Figure 2G, Table S4). Differentiating USH1 from NSHL is possible if a directed developmental history is obtained, because sitting and walking milestones are significantly delayed in USH1 due to the associated vestibular dysfunction, emphasizing the need to correlate clinical history with the interpretation of genetic data. We also consider audioprofiles, noting any progression of hearing loss, age at diagnosis and symmetry; imaging studies if available; and family history, as it is often possible to refine a diagnosis when more clinical information is provided. For example, a genetic diagnosis consistent with either USH1C or DNFB18A would be changed to USH1C if the child had delayed developmental milestones. Recognizing the importance of more stringent filtering strategies to improve variant classification prompted us to use the DVD to define the molecular landscape of deafness-associated genes. When normalized to genomic size, some genes show remarkably high variation rates, such as ACTG1, although for the majority of genes the variation rate is below 10% (Figures 4B and S2B). This trend changes dramatically when only clinically relevant regions (coding and splice regions) are considered, implying that most variation is intronic. The coding/splice-site variation rate is highest for GJB2 (∼69%) and ranges from 8.5% to 53% for all other genes (Figures 4B and S2B, Table S5). Other studies, notably by Petrovski et al.,34 Lek et al.,12 and Samocha et al.,35 have used population-scale databases of variant numbers and allele frequencies to infer gene constraint or tolerance to genetic variation. Their assumption is that genes carrying more variants than expected have low constraint, while those with lesser variants have higher constraint and are intolerant to genetic variation. Our data showed that GJB2 does not fit into this model. Although it has the highest variation rate, it also carries the highest fraction of pathogenic variants (Figure 4C). This observation contrasts with its z-score of −1.07 (ExAC), which implies tolerance to variation and decreased constraint (Table S5). Similar findings are seen for SLC26A4, where every other variant is disease causing although its Z score is −3.23. These findings highlight the need to integrate real variant clinical interpretation data for each gene-phenotype association as large-scale population data can be misleading. Several studies have also emphasized the importance of moving from gene-wide constraint calculations to protein domain-specific constraints as a method of identifying regions of functional importance.35, 36, 37, 38 This refinement is particularly important for proteins involved in hearing loss, as most have various structurally different domains with distinct functions. Furthermore, some show an extraordinary pleiotropy and cause both autosomal-dominant NSHL (ADNSHL) and autosomal-recessive NSHL (ARNSHL) (TECTA [MIM: 602574] and TMC1 [MIM: 606706]) or both syndromic hearing loss and NSHL (Usher type 1-associated genes, WFS1, TBC1D24 [MIM: 613577], and COL11A1 [MIM: 120290]).39, 40, 41, 42, 43, 44 Classifying variants by domain or regional constraint can minimize both false-positive and false-negative pathogenicity predictions and facilitate proper diagnosis, especially for genes associated with NSHL mimics. Our assessment of variant distribution by mutation type, classification, and gene-specific MAFs across all 152 genes and microRNAs uncovered gene-specific variant architecture (Figures 5, 6, S4, and S5). For example, some genes (GJB2, SLC26A4, and COL4A5) are relatively depleted of synonymous changes when compared to other genes (Figure 5B). Interestingly, these same genes possess the highest intolerance to variation with 70%, 55%, and 47% of all coding variants being P/LP for GJB2, COL4A5, and SLC26A4, respectively (Figure 4C, Table S5). The involvement of synonymous variants in disease is secondary to splice alteration by changing exonic splice enhancers or silencers, or through codon usage bias that impacts gene expression by affecting mRNA folding and stability, messenger ribonucleoprotein (mRNP) complex formation, translation rate, and protein folding and function.45, 46 Synonymous variants may be under selective pressure in GJB2, COL4A5, and SLC26A4, implying a potential unrecognized disease mechanism that would affect their proper expression in inner ear. This highlights the need to carefully review these variants when interpreting sequence data from persons with hearing loss. Conversely, for other genes like ACTG1, synonymous variants predominate, while the only reported deafness-associated pathogenic variants are missense, suggesting intolerance to changes at the protein level, which is in line with the reported gain-of-function mechanism for these variants. Overall, there was a great diversity in the contribution of LoF and missense variants to the mutational load across genes (Figures 5C–5E and S4). Of all the LoF variants, the fraction contributing to the P/LP group was highest for genes for which haploinsufficiency is the mechanism of action (autosomal-dominant and X-linked genes) such as EYA1, TCOF1, SOX10, and GATA3 (Figure 5C). This trend was further accentuated when assessing the contribution of LoF variants to the mutational spectrum of these genes (Figure 5E). For example, LoF mutations in TCOF1 and EYA1 represent ∼90% and 80% of all reported pathogenic variants, respectively. The variability in the fractional contribution of nonsense, splice-site, and frameshift indels to the mutational load across genes is intriguing. Although for autosomal-recessive genes this variability may not affect the outcome at the protein level (as most of these variants are expected to result in null alleles), the story is quite different for autosomal-dominant genes. The latter exert their effect via haploinsufficieny or a gain-of-function/dominant-negative mode of action, and the specific type of mutation might be crucial. LoF variants in genes known to have a dominant-negative/gain-of-function mechanism of action are not traditionally predicted to be pathogenic for a dominant disease. However, this caveat ignores the position-dependent effect of these variants on nonsense-mediated mRNA decay (NMD).47 For example, truncating pathogenic variants in DIAPH1 are linked to two different phenotypes: (1) autosomal-recessive seizures, cortical blindness, and microcephaly syndrome (MIM: 616632) due to null alleles through NMD and (2) autosomal-dominant DFNA1 hearing loss with thrombocytopenia due to gain-of-function truncating variants (that escape NMD) in the C-terminal DAD domain, which disrupt the autoinhibitory activity of the DAD and renders the protein constitutively active.48, 49 SOX10 and PTPRQ are other examples where the impact of a LoF variant is position dependent.50, 51 COL11A1 has the largest proportion of splice-site pathogenic variants when compared to all other deafness-associated genes. The majority of these variants are located in the triple-helical domain and cause inframe exon skipping rather than frameshifts. The mutant proteins exert their effect through a dominant-negative mechanism to cause Marshall syndrome (MRSHS) and Stickler syndrome type II (STL2).52, 53 However, biallelic null alleles cause fibrochondrogenesis (FBCG1), a severe recessive often neonatally lethal disease.54 This genotype-phenotype correlation explains the enrichement of splice-altering disease variants in COL11A1. For genes where most missense variants are classified as B/LB, we estimate that a majority of the variants that are currently classified as VUS will be subsequently downgraded to B/LB. Similar to the diversity of variant distribution across classifications, we exposed clear distinctions in the maximum MAFs of P/LP variants depending on the gene and variant type (LoF versus missense) (Figures 6 and S5). The emerging global picture from our findings is an intricate and complex portrait of the genomic landscape and mutational signature of deafness-associated genes. Although this work lays the foundation for improved variant interpretation, which greatly enhances clinical decision making, significant challenges remain. For example, of coding variants with a MAF < 0.5%, missense variants predominate. They constitute 70% of all VUSs and their accurate reclassification will require better computational tools (Figure 7). The non-coding pathogenic landscape also must be defined, warranting coordinated studies to integrate expression and genomic data. Figure 7 The Challenge of VUSs Variant architecture correlating variant type (inner ring) and clinical significance (outer ring) for variants with MAF less than 0.5% and located within the clinically relevant regions. Of all coding variants with MAF < 0.5%, missense variants represent the majority at 61.5%; of these missense variants, 70% are classified as VUSs. Abbreviations: Indel-In, in-frame indel; Indel-Fs, frameshift indel; Mit-Mir, mitochondrial and microRNA. In summary, using decision support tools and human expert curation, we have developed an integrated approach to facilitate the application of comprehensive genetic testing to the clinical care of persons with hearing loss. We believe that detailed disease-specific knowledge of the genomic landscape is requisite to establish a framework for variant interpretation and show that there are gene-specific mutational signatures, the knowledge of which will refine guidelines for variant interpretation for deafness and advance our understanding of disease biology. This resource is freely available to the public and configurable to allow its implementation for any Mendelian genetic disorder.