Introduction Autism spectrum disorders (ASDs) affect ∼1% of the population and are characterized by impairments in social interaction and communication, as well as by repetitive and restricted behaviors. ASDs include mild to severe levels of impairment—cognitive function ranges from above average to intellectual disability (ID)—and are often accompanied by seizures and other medical problems. There is a ∼4:1 male-to-female gender ratio in ASD. ASDs are highly heritable,1 and genomic studies have revealed that a substantial proportion of ASD risk resides in high-impact rare variation, ranging from chromosome abnormalities and copy-number variation (CNV)2–6 to single-nucleotide variation (SNV).7–11 These studies have highlighted a striking degree of genetic heterogeneity, implicating both de novo germline mutation and rare inherited ASD variation distributed across numerous genes. De novo CNVs are observed in 5%–10% of screened ASD-affected individuals, and after further follow-up studies, some of them have proven to alter high-risk genes (e.g., NRXN112 [MIM 600565]). De novo or transmitted CNVs, such as 15q11.2–q13 duplications of the affected region in Prader-Willi syndrome (PWS [MIM 176270]) and Angelman syndrome (AS [MIM 105830]), 16p11.2 deletion (MIM 611913), 16p11.2 duplication (MIM 614671), and X-linked deletions including the PTCHD1-PTCHD1AS locus (MIM 300828), have also been found to contribute to risk.6,13,14 Exome and whole-genome sequencing studies have estimated at least another ∼6% contribution to ASD7–10,15 and an additional 5% conferred by rare inherited recessive or X-linked loss-of-function (LoF) SNVs.11,16 A genetic overlap between ASD and other neuropsychiatric conditions has also been increasingly recognized. Interestingly, CNV testing and exome sequencing have so far yielded mostly nonoverlapping genes, which might reflect different mutational mechanisms, although they might still perturb connected biological pathways.17 Although numerous ASD-associated loci have been recognized to date,18 they only account for a small fraction of the overall estimated heritability, consistent with predictions that there might be ∼1,000 loci underlying ASD19 and that many associated genes and risk variants remain to be identified. Here, we have assessed the impact of de novo and inherited rare CNV in 2,446 ASD individuals and their parents from the Autism Genome Project (AGP), along with 2,640 unrelated controls, by applying a series of approaches to characterize candidate ASD-associated genes disrupted by CNVs and to identify the biological relationships and common pathways they share. Using evidence from multiple sources, we were able to directly implicate numerous dosage-sensitive genes as risk factors and provide insights into different but related mechanisms underlying ASD. Subjects and Methods ASD Samples The samples were collected as part of the AGP, an international consortium with over 50 sites in North America and Europe. The first phase of the AGP involved examining genetic linkage and chromosomal rearrangements in 1,168 families with at least two ASD-affected individuals.5 In the second phase, we genotyped simplex and multiplex families by using high-resolution microarrays to examine the contribution of rare CNVs and common SNPs to ASD. The second phase was divided in two stages; the results of stage 1, involving the first half of the families, were published in 2010.6,20 In stage 2, we genotyped the remaining families (n = 1,604) for a total of over 2,845 families and performed genome-wide CNV (this study) and association studies.21 Informed consent was obtained from all participants, and all procedures followed were in accordance with the ethical standards on human experimentation of the participating sites. The AGP sample set is a collection of families comprising an affected proband and two parents, as previously described in Pinto et al.6 and Anney et al.20,21 Many of the subjects at the recruiting sites were tested for fragile X syndrome (FXS [MIM 300624]) and assessed for chromosomal rearrangements with karyotype, fluorescence in situ hybridization, or multiplex-ligation-dependent probe amplification (MLPA); subjects with known karyotypic abnormalities, FXS, or other genetic disorders were typically excluded. The main analyses presented here were restricted to subjects of European ancestry.21 All diagnostic, clinical, and cognitive assessments were carried out at each contributing site. All data were gathered at a central coordination site for standardization of data formatting and data quality assurance. Autism Classification Affected AGP participants were classified according to the Autism Diagnostic Observation Schedule (ADOS)22 and the Autism Diagnostic Interview, Revised (ADI-R).23 The ADOS is a semistructured, clinically administered instrument for assessing and diagnosing ASD. The ADI-R is a structured clinical interview conducted with the parents or caregivers; spectrum classification on the ADI-R was based on Risi et al.24 The AGP strict and spectrum classifications are based on both instruments (Table S1A, available online). To meet criteria for strict autism, affected individuals must have an autism classification on both measures, whereas for the spectrum classification, individuals must meet the autism spectrum criteria on both measures or meet criteria for autism on one measure if the other measure was not available or not administered. The mean age of ADI-R assessment was 8 years. Simplex and Multiplex Classification Family type was classified as simplex, multiplex, or unknown. Simplex families had one known affected individual among the first- to third-degree relatives (cousins only) and included affected monozygotic twins. Multiplex families had at least two first- to third-degree relatives (cousins only) with a validated, clinical ASD diagnosis. All other situations, including instances where a family history of autism was not assessed explicitly, were coded as unknown. Developmental Impairment Cognitive functioning and adaptive function were measured with an appropriate standardized cognitive-testing instrument and the Vineland Adaptive Behavior Scale (VABS),25 respectively. To maximize the available data, we created a developmental-impairment variable by using a hierarchical combination of scores on full-scale, performance, and verbal IQ measures and the VABS composite score. A cutoff of 70 was applied on all measures; subjects who could not complete an IQ assessment because of low functioning or behavior were assigned to the “low” category. In the hierarchy, full-scale IQ (followed by performance IQ, verbal IQ, and finally the VABS composite score) was the preferred measure. For example, a subject with a full-scale IQ < 70 but a performance IQ ≥ 70 was considered positive for developmental impairment. Additionally, subjects missing all IQ information with a “low” VABS composite score were also assigned to the developmental-impairment category. Control Subjects Unrelated control subjects were assembled from three studies in which individuals had no obvious psychiatric history: the Study of Addiction Genetics and Environment (SAGE),26 the Ontario Colorectal Cancer Case-Control Study,27,28 and Health, Aging, and Body Composition (HABC) (Table S1B).29 Samples were genotyped on the same array platforms (Illumina 1M single or duo arrays) as those of ASD subjects and parents and were analyzed with the same quality-control (QC) procedures and CNV analysis pipeline. The control data set used in the primary CNV analysis was composed of 2,640 control individuals of European ancestry (1,241 males and 1,399 females) who passed QC (Table S1B). Secondary analyses included 1,843 subjects from other ancestries (SAGE and HABC non-European control individuals), giving a total of 4,768 control subjects of all ancestries. Data Analysis We performed genotyping and data cleaning, including SNP and intensity QC for CNV detection, as described previously6 to ensure that CNV ascertainment was consistent among affected subjects, parents, and control subjects (see Table S1B for detailed QC steps). Samples not meeting our quality thresholds were excluded. CNV Analysis CNVs were detected with our analytical pipeline of Illumina 1M arrays (v.1 and v.3)6,30 and analyzed for case-control differences in burden with PLINK v.1.0730, R stats, and custom scripts. The p values associated with odds ratios (ORs) were calculated with Fisher’s exact test. Rare de novo CNVs, clinically relevant CNVs, and other selected rare CNVs were validated by at least one method (quantitative PCR, MLPA, and/or long-range PCR). Table S4 shows all validated de novo CNVs. A list of CNV calls passing QC in affected subjects, including all experimentally validated CNVs, is available in Tables S17A, S17B, and S17C. Secondary analyses included comparisons of CNV number, length, and intersected gene number between our 102 de novo CNVs identified in affected subjects and the 76 de novo CNVs in control subjects of two published data sets: (1) 17 de novo CNVs identified in 15 unaffected siblings from 872 families with a single ASD-affected offspring and an unaffected sibling from the Simons Simplex Collection4 and (2) 59 de novo CNVs detected in 57 out of 2,623 Icelandic control trios.31 The clinical relevance of CNVs was interpreted according to the American College of Medical Genetics guidelines32 irrespective of the subjects’ affected status, and CNVs were classified as pathogenic, uncertain, or benign. Pathogenic CNVs are documented as clinically significant in multiple peer-reviewed publications and databases (e.g., OMIM and GeneReviews), even if the penetrance and the expressivity might be variable. Gene Lists In order to perform burden analyses, we compiled a series of lists:(1) Genes and loci implicated causally in ASD (updated from Betancur),18 all of which have also been implicated in ID, as well as genes and loci implicated in ID, but not yet in ASD (Tables S6A–S6D). Note that the list of genes and loci involved in ASD was updated independently of the data from AGP stage 1;6 thus, genes and loci were included only if there was independent evidence from other studies. (2) Highly-brain-expressed genes defined by a log(RPKM [reads per kb per million reads]) > 4.5 by the BrainSpan resource (n = 5,610 genes). (3) Functionally characterized control genes not expressed in the brain (log(RPKM) < 1; n = 5,410 genes). (4) Postsynaptic density (PSD) genes.33 (5) Genes found to interact with fragile X mental retardation protein (FMRP).34 (6) Genes associated with neurological phenotypes compiled from the Human Phenotype Ontology (HPO) and Mammalian Phenotype Ontology (MPO). (7) Genes grouped by their probability of haploinsufficiency (pHI)35 into three subgroups: pHI > 0.15 (n = 8,862 genes), pHI > 0.35 (n = 4,136 genes), and pHI > 0.55 (n = 2,214 genes). One-Gene- and Multiple-Gene-Hit Burden Analysis One-gene-hit burden analyses were performed with Fisher’s exact test. When considering the possibility that multiple genes within a CNV event or across events in the same subject act in concert to increase risk (i.e., multiple-gene-hit burden), we fit a series of logit models to the data. For the logit model, which is a special case of generalized linear model, log odds of case status (logit) was fit to predictor variables, namely the number of brain-expressed genes (BrainSpan) covered by the CNV and the level of gene expression. To analyze the expression data, we transformed the normalized RPKM value of each gene in the neocortex to log(1+RPKM). All analyses were performed in the statistical package R with the function “glm” and the logit link. Functional Enrichment and Network Analyses Functional-enrichment association tests and pathway and network analyses were performed with custom scripts,6 Bioconductor, NETBAG,36 and DAPPLE.37 Results Excess Genome-wide Burden of Rare and De Novo Genic CNVs To explore the contribution of CNV to ASD, we expanded our previous study (stage 1)6 with an additional 1,604 families (stage 2), bringing the total to 9,050 individuals from 2,845 ASD-affected families. We used an analytical pipeline of Illumina 1M arrays6,30 to detect rare CNV in families and applied a series of QC filters, including validation of all de novo events by at least one method (Tables S1A–S1C). In total, 1,359 stage 2 families passed QC, and 2,446 families were used in the combined analyses of both stages (Tables S2A and S2B). Of these, 2,147 families were European, and 299 were of other ancestries.21 We used the same pipeline to analyze 2,640 control individuals of European ancestry26,27,29 who were genotyped with the same array platforms. Ancestry was inferred by analysis of SNP genotype data (Table S1B). The rate, size, and number of genes affected by rare (<1% frequency) CNVs were assessed. Consistent with our previous data, we observed that compared to control subjects, affected subjects had an increased burden in the number of genes affected by rare CNVs (1.41-fold increase, empirical p = 1 × 10−5; Table 1). This enrichment was apparent for both deletions and duplications and remained after we controlled for potential case-control differences (Table 1). Similar findings were obtained when each stage was considered separately (Tables S3A–S3C). Array- and exome-based studies have revealed a substantial contribution of de novo variation to ASD risk,19 prompting us to assess this further. After screening 2,096 trios (of all ancestries), we found 102 rare de novo CNVs in 99 affected subjects (three of whom had two events; Table S4). Overall, 4.7% of trios had at least one de novo CNV, whereas control subjects had a frequency of 1%–2%.4,31,38 The average length of de novo events in our affected subjects (1.17 Mb) was larger than that of de novo CNVs in unaffected siblings from the Simons Simplex Collection (0.67 Mb, p = 0.01)4 and in control trios (0.55 Mb, p = 0.01).31 The average size of de novo CNVs was also larger than the size of all rare CNVs in our affected (188 kb) and control (159 kb) subjects. De novo CNVs affected 3.8-fold more genes in affected subjects than in control subjects4,31 (2.6-fold for deletions and 6.1-fold for duplications). Even after controlling for the difference in CNV size by proportionally scaling the number of intersected genes in each group, we observed a 1.77-fold difference (1.2-fold for deletions and 2.8-fold for duplications, p = 0.02). Furthermore, de novo CNVs in simplex families intersected 4.0-fold more genes than did CNVs in controls4,31 (1.8-fold after size correction, p = 0.01). There were no significant differences between subjects from simplex families and those from multiplex families in the frequency (5% and 4.2%, respectively) or gene content (n = 18.7 and 18.8, respectively) of de novo CNVs. Similarly, no significant difference was found between males and females in the size (1.17 and 1.2 Mb, respectively) or gene content (n = 18 and 17.3, respectively) of de novo CNVs. For 85 of 102 de novo events, it was possible to determine the parent of origin, and roughly equal numbers of events originated on the paternal allele (n = 45) and the maternal allele (n = 40) (Tables S5A–S5H). Taken together, our data indicate that there is an increased burden of de novo events in ASD-affected subjects. The clinical relevance of de novo CNVs in ASD is confirmed by the fact that among 102 such events identified, half (n = 46) are considered etiologically relevant, including 40 loci known to be involved in ASD and ID (see below). We replicated previous observations, such as a de novo deletion intersecting PTCHD1AS in a male (adding to the evidence that both PTCHD1 and PTCHD1AS contribute to ASD risk14) and de novo events involving the miRNA miR137 (MIM 614304) in 1p21.2–p21.3 in two subjects. Microdeletions of miR137 have been reported in ASD,39 ID,40 and schizophrenia.41 Examples of ASD candidate genes identified by small de novo CNVs include SETD5, DTNA (MIM 601239), and LSAMP (MIM 603241) (Supplemental Data section “Highlighted Genes,” Figures S9, S10, and S14). CNV Burden in Autosomal-Dominant or X-Linked Genes and Loci Implicated in ASD and/or ID At least 124 genes and 55 genomic loci have been implicated in ASD to date (Tables S6A and S6B; updated from Betancur18), all of which have also been implicated in ID. In addition, we compiled a list of genes and loci that have been implicated in ID, but not yet in ASD (Tables S6C–S6D). When we analyzed samples of inferred European ancestry, we found that 4% (87/2,147) of ASD-affected subjects had CNVs overlapping autosomal-dominant or X-linked genes and loci implicated in ASD and/or ID; this percentage was significantly higher than that in controls (OR = 4.09, 95% confidence interval [CI] = 2.64–6.32, p = 5.7 × 10−12; Figure 1A). We further classified these events into pathogenic, uncertain, or benign according to the American College of Medical Genetics guidelines.32 Pathogenic (or clinically significant) CNVs were identified in 2.8% (60/2,147) of affected subjects (OR = 12.62, 95% CI = 5.44–29.27, p = 2.74 × 10−15), and pathogenic deletions showed a striking estimated OR of 23.13 (95% CI = 5.57–96.08, p = 2.6 × 10−11; Figure 1B). Furthermore, the enrichment of pathogenic CNVs overlapping genes involved in ASD and/or ID was independently observed when the data were broken down by stages: 2.6% (25/979) of affected subjects in stage 1 (OR = 7.61, p = 1.22 × 10−5) carried pathogenic CNVs, whereas 3.0% (35/1,168) in stage 2 (OR = 6.47, p = 2.89 × 10−7) carried pathogenic CNVs. Some of these CNVs (e.g., NRXN1 deletion, 1q21 duplications [MIM 612475], and 16p11.2 duplications) were seen in a small fraction of control subjects, consistent with their variable expressivity and/or incomplete penetrance. Among the affected subjects with pathogenic CNVs, 63% (38/60) carried de novo events (Figure 1C), including two subjects with two pathogenic events each. When we further considered affected subjects of all ancestries (n = 2,446) and included chromosome abnormalities (>7.5 Mb), select large rare de novo events, and select experimentally validated smaller CNVs (<30 kb), we identified pathogenic CNVs in ∼3.3% of individuals with unexplained ASD (84 pathogenic events in 82/2,446 subjects; Figures 2A and S1A–S1C; Tables S7A and S7B). This most likely represents an underestimate of the true etiologic yield, given that many of the subjects were assessed with clinical diagnostic methods and excluded if positive; similarly, those individuals with known congenital malformations or dysmorphic features were not enrolled. Interestingly, 83% (64/77 [5 without information]) of carriers of pathogenic CNVs were nonsyndromic (i.e., ASD without reported accompanying physical or neurological abnormalities), and 57% (44/77 [5 without information]) had no ID (Figure 2B). The fraction of subjects with ID among carriers of pathogenic CNVs (42%) was not significantly different from the fraction of ID among all affected subjects (46%). Inheritance data showed that 64% (54/84) of pathogenic CNVs were de novo events (59% were deletions, and 41% were duplications) and that the remaining (36%) were inherited, including seven X-linked CNVs maternally transmitted to males and 23 (13 maternal and 10 paternal [27%]) on autosomes (Figure 2C). Pathogenic deletions tended to be smaller than duplications (Figure 2D). As expected, pathogenic de novo events were on average significantly larger than inherited ones (3.14 Mb—excluding three affected subjects with whole-chromosome aneuploidy—versus 1.44 Mb, respectively). We also observed that the proportion of females was significantly increased among carriers of highly penetrant pathogenic CNVs (male-to-female ratio of 2:1 versus 6:1 among all affected subjects; two-tailed Fisher’s exact test p = 0.017; Figure 2E). In contrast, the male-to-female ratio among individuals with CNVs associated with variable expressivity was 6:1. Pathogenic CNVs included well-characterized highly penetrant disorders associated with de novo CNVs, such as Phelan-McDermid syndrome (MIM 606232, 22q13.3 deletion including SHANK3 [MIM 606230]), Smith-Magenis syndrome (MIM 182290, 17p11.2 deletion including RAI1 [MIM 607642]), Kleefstra syndrome (MIM 610253, 9q34.3 deletion including EHMT1 [MIM 607001]), Williams syndrome (MIM 194050, 7q11.23 deletion), and large chromosomal abnormalities (Figure 2A; Table S7B). Recurrent deleterious CNVs mediated by segmental duplications affecting 12 distinct regions were identified in 44 individuals. For example, two unrelated males were found to harbor Xq28 duplications (MIM 300815), one de novo and one maternal, corresponding to a ∼0.3 Mb segmental-duplication-mediated gain (153.2–153.5 Mb), which was previously reported in X-linked ID.42 GDI1 (MIM 300104), mutations of which are linked to ID, is the most likely gene involved (Figure S8). Thus, our findings implicate abnormal GDI1 dosage in ASD. Interestingly, one AGP proband with the duplication had autism and a normal IQ, whereas the second had a borderline IQ (72) (see Table S8 for phenotype information of all affected subjects with pathogenic CNVs). Some other findings include a 1.7 Mb de novo deletion encompassing ARID1B (MIM 614556), recently implicated in ID and Coffin-Siris syndrome (MIM 135900), and a small maternally inherited intragenic deletion of HDAC4 (MIM 605314), involved in brachydactyly-mental-retardation syndrome (MIM 600430; Figure S7). Although many 2q37 deletions have been described in ASD, the deletion found in our proband directly implicates HDAC4 haploinsufficiency in autism. In Table S9, we analyzed data across three ASD cohorts, including a total of 5,106 nonoverlapping affected subjects and 3,512 control subjects from the AGP, the Simons Simplex Collection, and the Autism Genetic Resource Exchange (AGRE), for 17 loci and genes commonly reported as implicated in ASD. The most frequent deletions involved 16p11.2 and NRXN1, accounting for 0.31% and 0.32% of affected subjects, respectively. Typical 15q11–q13 duplications of the imprinted PWS-AS critical region were found in 0.25% (13/5,106) of affected subjects, reaffirming this region’s importance in ASD. The majority of these duplications were of maternal origin, but two were paternally derived (one without information; Table S9). Although paternally derived duplications appear to have incomplete penetrance in comparison to maternal ones, there have been several cases reported in subjects with ASD.43 FMRP Targets, PSD Genes, and Other Neuronal Genes Are Implicated in ASD We expanded our analysis to lists of genes important for neurological function, such as highly-brain-expressed genes, PSD genes,33 genes implicated in neurological diseases,44,45 genes with a high pHI,35 and FMRP targets,34 the latter of which have been reported to be enriched in de novo LoF SNVs.7 Our analysis focused on exonic events, and deletions and duplications were analyzed separately (Figures 3A and 3B). FMRP targets (n = 842) and PSD genes (n = 1,453) carried a significant excess of both deletions and duplications in affected subjects (Figures 3A and 3B). Five percent (73/1,486) of affected subjects with exonic CNVs, including 52 subjects with genes not previously implicated in ASD and/or ID, carried deletions overlapping one or more FMRP targets, yielding 43 ASD candidate genes (Figure 3A; Table S10). Given that the lists of FMRP targets and PSD genes shared 279 genes, we performed conditional analyses showing that the excess of affected subjects carrying deletions overlapping PSD genes was independent of the signal in FMRP targets (OR = 2.62, 95% CI = 1.62–4.32, p = 2.24 × 10−5) and represented 4% of subjects with exonic events (59/1,486) or 3% after exclusion of pathogenic events (p = 0.007). Notably, females were overrepresented among affected subjects carrying exonic deletions overlapping FMRP targets (17 females in 73 affected subjects, 1.98-fold more than males, p = 0.022, 95% CI = 1.06–3.52). Brain-expressed genes showed significant excess in affected versus control subjects for deletions only (OR = 1.89, 95% CI = 1.51–2.37, Fisher’s exact test p = 2.6 × 10−8; Figures 3A and 3B). Similarly, deletions (and not duplications) overlapping genes implicated in dominant neurological diseases and orthologous genes associated with abnormal phenotypes in heterozygous knockout mice conferred significant increase in ASD risk (OR = 2.94, 95% CI = 1.76–4.93, p = 2.5 × 10−5). Many of the genes implicated in dominant diseases have been related to loss of function or haploinsufficiency, previously suggested to be more frequent and penetrant when deletions rather than duplications are involved.46 Accordingly, we detected an excess of affected subjects carrying deletions overlapping genes with a high pHI (>0.35) (OR = 1.41, 95% CI = 1.13–1.76, p = 0.002). Increased Multigene Burden in ASD-Affected Subjects We tested whether multiple genes within a CNV or across unlinked genetic lesions in the same individual might act in concert to increase risk of ASD. In logit modeling, the number of genes overlapped by CNVs, the average brain-expression value for those genes, and the deletion or duplication status were used as predictors with the case-control status as the outcome (Figures 3C and 3D). We found that ASD risk increased (as measured by the predicted OR) as the numbers of deleted brain-expressed genes increased (generalized linear model chi-square goodness of fit, p = 3.2 × 10−5 and p = 4.7 × 10−7, respectively; Figures 3C and 3D). These results were consistent across the various models tested (Tables S11A–S11E). There was a decrease in signal after removal of affected subjects with at least one de novo event, suggesting that most of the risk can be attributed to de novo CNVs. Notably, the signal further decreased 2-fold when the remaining pathogenic CNVs were removed, confirming that pathogenic inherited CNVs alone also carry risk. Moreover, we found that gene density contributed significantly to increased risk only when de novo CNVs and inherited pathogenic CNVs were considered, whereas a higher-than-average level of gene expression in deletions (but not duplications) was a contributor irrespective of CNV status (i.e., even after removal of both de novo and inherited pathogenic CNVs) (Tables S11A–S11E). Thus, it is likely that de novo and pathogenic CNVs contribute to risk by altering the expression of more than one gene, suggesting that genetic interactions between these genes can underlie ASD risk. Network Analysis Links Exonic Deletions to Neurodevelopmental Processes We performed a gene-set enrichment analysis6 on our expanded sample set after refining our criteria to consider only exonic events (see Supplemental Data for details) and found only deletions to be significantly enriched in gene sets in affected versus control subjects (Figure 4A). We found 86 significantly enriched gene sets, including MAPK signaling components and neuronal synaptic functions and processes, in 42.5% (335/789) of affected subjects with exonic deletions (Figure 4A; Tables S12A–S12D). Enrichment of synaptic functioning has also been reported among inherited events in the AGRE families47 and among de novo events in the Simons Simplex Collection.36 Enriched sets delineate candidate genes disrupted by deletions not found in control subjects; these genes notably include those in the KEGG glutamatergic pathway (e.g., GRIK2 [MIM 138244], GRM5 [MIM 604102], SHANK1 [MIM 604999], SHANK2 [MIM 603290], and SHANK3 [MIM 606230]; Figure S2A and Tables S13A–S13D), the KEGG cholinergic pathway (e.g., KCNJ12 [MIM 602323], CHAT [MIM 118490], and SLC18A3 [MIM 600336], the latter two of which are within a recurrent 10q11.21–q11.23 deletion, recently reported in individuals with ID and ASD;48 Figure S2B and Tables S13A–S13D), or in both pathways (e.g., GNG13 [MIM 607298], PRKACB [MIM 176892], PLCB1 [MIM 607120], CAMK2G [MIM 602123 ], and PPP3CB [MIM 114106]). When analyzing human homologs of mouse genes, we also found enrichment of phenotypes mostly related to the brain, including abnormal telencephalon morphology, neuron morphology, behavior, and nervous system physiology (Table S12D). Genes within De Novo CNVs Cluster in a Gene Network De novo deletions (52 in European affected subjects) were found to be significantly enriched within each gene set or pathway cluster (for 85 of the 86 gene sets or pathways), as well as across clusters (chi-square test p = 9.8 × 10−9) (Table S12B), prompting us to search for enriched biological functions within de novo events separately. Taking into account our observations of a significant multigene burden in ASD subjects (Figures 3C and 3D), we analyzed de novo events by using NETBAG36 to identify up to two ASD candidate genes per CNV among 102 de novo events in 99 subjects. NETBAG identifies networks of genes under the premise that if genomic regions are perturbed by genetic variants associated with the same phenotype, they will contain genes forming connected clusters. The NETBAG analysis resulted in a network of 113 genes (global cluster p value = 0.02; Figure 4B). Ten genes have been previously implicated in autosomal-dominant or X-linked forms of ASD and ID (UBE3A [MIM 601623], NRXN1, SHANK2, EHMT1, SYNGAP1 [MIM 603384], and SMARCA2 [MIM 600014]) or ID only (ZEB2 [MIM 605802], FLNA [MIM 300017], SKI [MIM 164780], and IKBKG [MIM 300248]). On the basis of cumulative evidence from various sources, an additional 68% (67/98) are likely to affect ASD risk (Tables S14A–S14E); 27/67 of these are either FMRP targets or PSD genes. Compared to all other genes within de novo CNVs (or deletion CNVs only), genes in the network exhibited a significantly higher pHI (Wilcoxon rank sum p = 7.07 × 10−8), and 55% (59/107 [6 without information]) had a pHI > 0.35. A similar NETBAG analysis of de novo CNVs in control subjects did not yield significant results.36 We further characterized the biological processes related to the NETBAG cluster (Tables S14B–S14E; Figures 4B and S3) and found a significant enrichment (false-discovery rate [FDR] < 10%) of genes involved in chromatin and transcription regulation, MAPK signaling, and synaptic signaling and components (Figure 4B). We recapitulated many of the results of our gene-set analysis (Figure 4A), notably for synapse functions and processes, and also identified genes involved in chromatin and transcription regulation. The latter category included a high-risk gene associated with ASD, the chromatin gene CHD2 (MIM 602119), which is affected by a de novo 83 kb deletion in a male with ASD, mild ID, and dysmorphic features including micrognathia and protruding ears. His ASD-affected brother has mild ID, similar dysmorphic features, and epilepsy with onset at age 9 years and carries the same deletion, which removes the first six exons of CHD2. Neither parent carries the deletion, suggesting germline mosaicism (the deletion arose on the paternal chromosome). De novo SNVs in CHD2 have been reported in an ASD subject and in several individuals with a broad spectrum of neurodevelopmental disorders, including ID and epileptic encephalopathy8,49,50 (Figure S6). Two other genes in the chromodomain family have been linked to neurodevelopmental disorders: CHD7 (MIM 608892) in CHARGE syndrome (MIM 214800) and CHD8 (MIM 610528) in ASD. Another example is TRIP12 (MIM 604506), encoding an E3 ubiquitin ligase that can regulate chromatin function to maintain genome integrity (Figure S12). The chromatin and transcription module showed a predominance of genes with a prenatally biased expression profile (Figures 4B and S4). De Novo CNVs and LoF SNVs Converge on Functional Gene Networks We expanded our analysis to genes altered by both our de novo CNVs (Figure 5A) and de novo LoF SNVs compiled from four exome sequencing studies in ASD.7–10 Eleven genes affected by de novo CNVs (NRXN1, SHANK2, ARID1B, RIMS1 [MIM 606629], TRIP12, SMARCC2 [MIM 601734], DLL1 [MIM 606582], TM4SF19, MLL3 [MIM 606833], PHF2 [MIM 604351], and CSTF2T [MIM 611968]) were found to be altered by de novo LoF SNVs among autism cohorts, and three of them (NRXN1, SHANK2, and RIMS1) were selected by NETBAG (Figure 4B). Of the 11 genes, only TM4SF19 (3q29) belongs to a known locus associated with a recurrent genomic disorder. Despite the limited overlap observed among genes altered by de novo CNVs and LoF SNVs, there is reinforcing evidence of the role of NRXN1 and SHANK2 in ASD (Figure 5A). In addition, RIMS1, altered by both a de novo duplication and a LoF SNV (Figure 5A) and encoding a brain-specific synaptic Rab3a-binding protein, emerges as an ASD candidate gene. RIMS1 has a regulatory role in synaptic-vesicle exocytosis modulating synaptic transmission and plasticity.51 Three other genes affected by de novo CNVs (also picked by NETBAG)—CHD2, SYNGAP1, and SYNCRIP—are also affected by LoF de novo SNVs in ID,50,52 and two (SYNGAP1 and DPYD [MIM 612779]) are altered in schizophrenia.53 CHD2 (discussed above) and SYNGAP1 (encoding a Ras/Rap GTP-activating protein) are both involved in autosomal-dominant ID, ASD, and epilepsy.54 Under the assumption that different genes harboring suspected causative mutations for the same disorder often physically interact, we sought to evaluate protein-protein interactions (PPIs) encoded by genes known to be implicated in ASD and genes affected by rare CNVs or SNVs (data drawn from our de novo CNVs and published de novo ASD LoF SNVs; Figures 5A, 5B, and S5). The union set of 336 unique genes (Table S15) analyzed by DAPPLE37 resulted in a network of direct PPIs encoded by 151 genes (Figure 5A) from each of the three main lists: 54/92 (58.7%) genes involved in ASD, 64/113 (56.6%) de novo CNV genes, and 41/122 (33.6%) de novo LoF SNV genes. The number of direct PPIs was 1.5-fold higher than expected (p < 0.001, Figure 5B), suggesting that many of the de novo CNV or SNV genes and ASD-implicated genes function cooperatively. Overrepresentation analysis identified convergent functional themes related to neuronal development and axon guidance, signaling pathways, and chromatin and transcription regulation. Overall, these findings are consistent among the three different types of analyses shown in Figures 4A, 4B, and 5. Although 54 genes were previously implicated in ASD, the DAPPLE analysis singled out an additional 97 CNV or SNV high-confidence candidate genes (Figure 5B; Table S16). We found that compared to the 54 ASD-implicated genes, the newly selected 97 CNV or SNV genes had a comparably high pHI (median pHI = 0.58, Figure 6A). This is consistent with the observation that ASD subjects have more deletions with haploinsufficient genes than do controls (Figure 6B). Furthermore, similar to genes with known disease-causing mutations (Figure 6C), those genes have high functional-indispensability scores and a comparable high degree of centrality (i.e., high number of direct neighbors) and number of networks in which they are involved (Figures 6D and 6E). Compared to the genome average, they are also among the top 75% of more-conserved genes (on the basis of Genomic Evolutionary Rate Profiling scores and PhyloP) and are highly expressed in the brain. Interestingly, 39 of the 97 genes are either FMRP-related or PSD genes (of the initial 151 genes identified by DAPPLE, 51 are FMRP interactors and 24 are PSD genes). Thus, despite little overlap in genes, the strong interconnectedness between the resulting networks identifies pathways through which the effects of distinct mutations might converge. Discussion We used multiple approaches to prioritize key candidate ASD-associated genes disrupted by CNVs and further identified biological relationships and common pathways shared among those genes. Our data (1) confirm excess burden of genome-wide rare genic CNVs in an independent set of ASD subjects versus control subjects; (2) further reveal an extreme degree of etiological heterogeneity (36 different genetic loci were found among 82 individuals with pathogenic CNVs); (3) confirm the contribution of de novo CNVs to the etiology of autism and highlight the contribution of inherited pathogenic imbalances (36%); (4) show an increased proportion of females among carriers of highly penetrant pathogenic CNVs, as well as among carriers of deletions affecting FMRP targets; (5) show no significant difference in the frequency of de novo CNVs between simplex and multiplex families; (6) show that both deletions and duplications involving FMRP targets and PSD genes increase ASD risk; (7) show evidence of multigene contributions to ASD; (8) show that ASD-associated deletions impair synapse function and neurodevelopmental processes; (9) implicate chromatin and transcription regulation genes in ASD in a network analysis of de novo CNVs; and (10) show that genes affected by de novo CNVs and de novo LoF SNVs converge on functional gene networks. Importantly, when considering highly penetrant pathogenic CNVs, we found a 2:1 male-to-female ratio (deviating from the overall ratio of 6:1 in the study sample). In contrast, the ratio was unchanged among carriers of CNVs characterized by variable expressivity and/or incomplete penetrance. Moreover, among affected subjects, females were twice as likely as males to have exonic deletions involving FMRP targets. Given the sex bias of ASD toward males, it has been suggested that females require a higher genetic load to express ASD,56 and our foundational data support this general hypothesis. This same phenomenon has recently been shown for SHANK157 and the 16p13.11 CNV.58 We observed significant enrichment of both deletion and duplication events overlapping FMRP and PSD targets, indicating that altered dosage of these genes can underlie ASD susceptibility. This is consistent with evidence that FMRP targets belong to multiple signaling and interconnected pathways such as PI3K-Akt-TSC-PTEN-mTOR and PI3K-RAS-MAPK,59,60 which have been linked to ASD through both underexpression and overexpression of genes in these pathways. Although both deletions and duplications can contribute to risk, we found that deletions can have a stronger impact when highly-brain-expressed genes, genes conferring dominant phenotypes in humans and mice, or genes with a high pHI are considered. We also found that ASD risk increases as a function of the number of brain-expressed genes affected by rare de novo and pathogenic CNVs, consistent with an additive model of risk underlying ASD etiology. We developed an expanded and extensively interconnected network of high-confidence ASD candidate genes by integrating protein products from CNV and SNV genes and ASD-implicated genes. Overall, these results demonstrate that genes involved in ASD participate in a wide array of processes, from neuronal development and axon guidance to MAPK and other kinase signaling cascades (including the PI3K-Akt-mTOR and PI3K-RAS-MAPK pathways) to chromatin modification and transcription regulation. An increasing number of genes involved in chromatin structure and epigenetic regulation have been implicated in a variety of developmental disorders.61 Other chromatin regulator genes, such as MBD5 (MIM 611472) and KMT2D (MIM 602113), have been implicated in ID and ASD, highlighting the need to further study this category of genes as ASD risk factors. In addition to underlining important pathways, our results highlight specific genes in ASD risk. Whereas the majority of the 97 genes in the CNV or SNV network (not including the genes already known to be involved in ASD) most likely act via haploinsufficiency, a few are affected by duplications. One example is the duplication of PIK3CB (MIM 602925), which is likely to increase its expression and thus lead to excessive phosphatidylinositol 3-kinase (PI3K) activity. PI3K, which is regulated by FMRP,59 is elevated in FXS mouse knockouts,62,63 and downregulation of this pathway has been shown to have therapeutic effect in ASD and FXS mouse models. RAC3 (MIM 602050), another example of a gene affected by duplication, encodes a Rho family GTPase that enhances neuritogenesis and neurite branching when overexpressed.64 Our findings implicate many ASD candidate genes altered by de novo, inherited, or X-linked CNVs (e.g., SETD5, miR137, and HDAC9 [MIM 606543]; Supplemental Data section “Highlighted Genes”) or altered by both de novo CNVs and LoF SNVs (e.g., RIMS1, TRIP12, and DLL1; Figures 4B and 5). Taken together, our results suggest that rare variants affecting ASD risk in the population collectively encompass hundreds of genes. Despite this heterogeneity, many genes converge in interconnected functional modules, providing diagnostic and therapeutic targets.