Disparity in Gene Variation Rates As expected, the number of variants per gene correlated with gene size, with larger genes carrying higher numbers of variants (Figures 4A and S2A, Table S5). The greatest variant load was found in PCDH15 (MIM: 605514), USH2A, ADGRV1, and CDH23, but when the number of variants was normalized for gene size, different trends emerged (Figures S2B and S2C, Table S5). ACTG1 (MIM: 102560) had the highest variant rate at 41% (4 of 10 bases carry reported variants), with most genes (85%) having a variation rate below 10%. If we restricted the analysis to coding and splice-site regions, again there was a correlation between the number of variants and the size of the coding regions, with USH2A, ADGRV1, and CDH23 carrying the highest number of variants (Table S5). Normalizing to the size of the coding region, however, gave strikingly different results: GJB2 carried the greatest variation at ∼69% (nearly 7 of 10 bases carry reported variants) and six other genes had variation rates higher than 30%: WFS1 (53%), KCNQ1 (MIM: 607542) (44%), ACTG1 (39%), SLC26A4 (37%), and KCNE1 (MIM: 176261) (36%). The average variation rate was ∼22% (Figures 4B and S3A). Figure 4 Variation Rate for Deafness-Associated Genes (A) Total number of variants per gene. (B) Normalized number of coding variants based on the size of the coding and splice regions. (C) Normalized number of deafness-associated variants (P+LP) based on the total number of coding variants. Only genes with ≥14 reported deafness-associated variants are included in this figure; the remaining genes are shown in Figures S2 and S3. To determine whether gene-specific variation rates correlated with tolerance or intolerance to variation, we focused on the 6,490 variants classified as P and LP for deafness and normalized to the total number of coding variants. We found that ∼69% of coding variants in GJB2 are disease causing (P and LP variants), meaning that for any new variant identified in the coding sequence of GJB2, there is a 70% chance that it is pathogenic. Both COL4A5 (MIM: 303630) (55.3%) and SLC26A4 (47.2%) also had high (P+LP)/(Coding Variant) ratios (Figures 4C and S3B, Table S5).