PMC:10915378 / 151-1941 JSONTXT

Network‐based analysis predicts interacting genetic modifiers from a meta‐mapping study of spike–wave discharge in mice Lara et al. Abstract Abstract Absence seizures are characterized by brief lapses in awareness accompanied by a hallmark spike‐and‐wave discharge (SWD) electroencephalographic pattern and are common to genetic generalized epilepsies (GGEs). While numerous genes have been associated with increased risk, including some Mendelian forms with a single causal allele, most cases of GGE are idiopathic and there are many unknown genetic modifiers of GGE influencing risk and severity. In a previous meta‐mapping study, crosses between transgenic C57BL/6 and C3HeB/FeJ strains, each carrying one of three SWD‐causing mutations (Gabrg2 tm1Spet(R43Q) , Scn8a 8j or Gria4 spkw1 ), demonstrated an antagonistic epistatic interaction between loci on mouse chromosomes 2 and 7 influencing SWD. These results implicate universal modifiers in the B6 background that mitigate SWD severity through a common pathway, independent of the causal mutation. In this study, we prioritized candidate modifiers in these interacting loci. Our approach integrated human genome‐wide association results with gene interaction networks and mouse brain gene expression to prioritize candidate genes and pathways driving variation in SWD outcomes. We considered candidate genes that are functionally associated with human GGE risk genes and genes with evidence for coding or non‐coding allele effects between the B6 and C3H backgrounds. Our analyses output a summary ranking of gene pairs, one gene from each locus, as candidates for explaining the epistatic interaction. Our top‐ranking gene pairs implicate microtubule function, cytoskeletal stability and cell cycle regulation as novel hypotheses about the source of SWD variation across strain backgrounds, which could clarify underlying mechanisms driving differences in GGE severity in humans. Absence seizures are characterized by brief lapses in awareness accompanied by a hallmark spike‐and‐wave discharge (SWD). In a previous mapping study, crosses between transgenic C57BL/6 and C3HeB/FeJ strains, each carrying one of three SWD‐causing mutations demonstrated an antagonistic epistatic interaction between loci on mouse chromosomes 2 and 7 influencing SWD. In this study, we computationally prioritized candidate modifiers in these interacting loci, implicating microtubule function, cytoskeletal stability and cell cycle regulation as novel hypotheses about the source of SWD variation in mice. 1 INTRODUCTION Absence seizures are brief lapses in behavioral responsiveness accompanied by spike‐and‐wave discharges (SWDs) seen on an electroencephalogram (EEG) and are a feature of several generalized genetic epilepsies (GGEs), including Childhood Absence Epilepsy, Juvenile Absence Epilepsy and Juvenile Myoclonic Epilepsy. 1 , 2 Patients with any of these syndromes have SWDs caused by abnormal synchronous neuronal firing in the cortico‐thalamic loop, resulting in a characteristic 3 Hz waveform in humans. 3 , 4 Mouse models with absence seizures also have SWDs with a higher frequency of ~4–9 Hz. 5 , 6 , 7 From genome‐wide association studies (GWAS), it is now clear that GGEs are complex traits. In a recent meta‐GWAS, novel common risk variants in STAT4, GABRA2, KCNN2, ATXN1, GRIK1, STX1B, FANCL, BCL11A and ZEB2 were associated with GGE at the genome‐wide significance level. 8 However, these genome‐wide significant loci only account for a small fraction of the heritability of GGE and the mechanisms by which they alter risk is unknown. In contrast, a proportion of GGE cases are caused by large‐effect but rare variants in single genes, including GABRA2, GABRG2 and SCN1A, which have been detected in rare‐variant screens, large family studies and linkage studies. 9 , 10 In total, the genetic evidence demonstrates that there are some mendelian forms of GGE, but most cases are sporadic and highly polygenic. In the sporadic cases, genetic risk factors represent modifiers of disease pathology that alone are insufficient to cause disease but can aggregate into an GGE susceptible state. Identifying these modifiers and their mechanistic interconnections is necessary to fully characterize the mechanisms of GGE that could eventually become therapeutic targets. Mouse models are a critical system for identifying genetic modifiers of GGE because we can stringently control the background genetics and the environment. One of the authors 11 published a modifier screen of SWD in mice using three different Mendelian SWD‐causing mutations that were fixed onto two distinct genetic backgrounds: C57BL/6J (B6) and C3HeB/FeJ (C3H). The three mutant alleles—Gabrg2tm1Spet(R43Q), Scn8a8j and Gria4spkw1—cause SWD through distinct mechanisms, respectively encoding a GABA receptor, 12 a voltage gated sodium channel, 13 and a glutamate receptor. 14 Tyler et al found that independent of SWD‐causing mutations, the C3H strain had more frequent and longer SWDs, demonstrating that the C3H strain has variants relative to B6 that are potentially universal modifiers of SWD severity. Using a combination of backcross and intercross breeding, they mapped several quantitative trait loci (QTL) influencing SWD frequency and duration, including large‐effect loci on chromosomes 2 and 7 (Figure 1A). Moreover, the loci on chromosomes 2 and 7 had an epistatic interaction across the entire combined mapping population, suggesting that the modifier variants in these loci act in the same pathways. FIGURE 1 Identifying genetic modifiers within loci causing an epistatic interaction modifying spike‐and‐wave discharge (SWD) in mice using network‐based functional prediction (NBFP) integrating human and mouse data. (A) SWD meta‐mapping population performed by Tyler et al. 11 Three SWD‐causing mutations were fixed onto a B6‐C3H background, and through a series of intercrosses and a backcross, N2 and F2 animals were generated and phenotyped for SWD severity. (B) Adapted from Tyler et al, antagonistic epistasis between loci on chromosomes 2 and 7, where a C3H allele at either of the loci alone worsens SWD frequency and duration relative to B6, but C3H alleles at both loci combined does not further exacerbate SWD severity. (C) NBFP integration of genetic generalized epilepsies risk genes from genome‐wide association studies 8 and thalamocortical gene–gene networks 17 , 25 relevant to SWD to produce functional scores for the entire mouse and human genomes. (D) Filtering for loci on chromosomes 2 and 7 and identifying gene pairs with high joint ranking of individual genes and strong function connection to each other. Created with Biorender.comThe pattern of interaction between the loci on chromosomes 2 and 7 implicates antagonistic epistasis, that is, the effect of having C3H alleles at both loci is less extreme than would be predicted by adding the single‐allele effects (Figure 1B). Antagonistic epistasis strongly suggests that the causal variants within these alleles have effects in series in some shared pathway. 15 , 16 Thus, the interaction between the loci on chromosomes 2 and 7 implicates causal genes within a universal SWD‐modifier pathway that is responsible for worse outcomes in C3H mice compared with B6 mice. However, these mapping studies result in QTLs containing hundreds of positional candidate genes and variants. Thus, we require auxiliary data to prioritize among these positional candidates for mechanistic follow up. The problem of prioritizing among possible causal genes within a QTL is typically underdetermined, meaning the data at hand are not sufficient to strongly identify a small number of candidates. Here, we use the following assumptions to guide the integration of auxiliary data: (1) The majority of putative GGE risk genes from human GWAS are modifiers of GGE pathology; (2) The causal modifiers of SWD within the mouse QTLs on chromosomes 2 and 7 should be functionally associated with these human GGE GWAS genes; (3) The thalamus and cortex represent biologically important brain regions contributing to SWD whose corresponding functional gene–gene networks capture relevant tissue‐specific interactions; and (4) The antagonistic epistasis between loci on chromosomes 2 and 7 implies that the causal genes are strongly connected to each other within GGE‐associated gene networks. Together, these assumptions underlie our approach to seek gene pairs, with one gene from each locus, that are strongly functionally associated with human risk genes within SWD thalamic and cortical tissue networks as well as strongly functionally connected to each other (Figure 1). In this study, we address the limitation of low mapping resolution using a combination of bioinformatic gene interaction networks and brain gene expression data, which we integrate to identify gene pairs meeting the above four criteria. The major component of our prioritization pipeline uses network‐based functional prediction (NBFP), wherein we fit machine learning models with tissue‐specific gene interaction networks to rank all genes in the genome for functional association to GGE GWAS genes (Figure 1C). Here, we use gene networks, where individual genes are connected to each other by weighted edges, to represent interactions between genes within a specific tissue. These functional gene–gene networks capture relationships between genes by integrating multiple types of functional genomic data including gene expression profiles, molecular interactions and prior knowledge of gene functions. NFBP is an unbiased way to rank all genes in the genome for their functional association to GGE risk genes. With these rankings, we could identify mouse genes that are critically involved in GGE risk pathways, independent of whether there are risk alleles for those genes in the human population. We then filtered candidates to the loci on chromosomes 2 and 7 and ranked gene pairs for jointly associating with GGE as well as interacting with each other (Figure 1D). Finally, we merged these prioritizations with differential gene expression data and variant‐effect predictions to arrive at a plausible list of strong candidates for potential follow up. 2 METHODS Network‐based functional prediction (NBFP) has previously been used to identify leading candidate genes in multiple contexts, including Alzheimer's disease, autism spectrum disorder, inflammatory bowel disease and histamine hypersensitivity. 17 , 18 , 19 , 20 , 21 , 22 Here, we used NBFP to narrow the field of candidate genes within two loci (chromosome 2: 116.97–136.97 Mb and chromosome 7: 85.46–105.46 Mb) involved in an antagonistic epistatic interaction driving differences seen in SWD in a meta‐mapping population in mice with SWD‐causing mutations 11 ; The term meta‐mapping, in this case, refers to the fact that multiple distinct mapping populations (two backcrosses and one intercross) were combined to boost statistical power to detect modifier genes that are independent of SWD etiology. We integrated known human GGE risk genes from GWAS and tissue‐specific gene networks to rank all genes in the human and mouse genomes by the strength of their functional association to GGE GWAS genes within SWD networks. 2.1 GGE GWAS The International League Against Epilepsy Consortium on Complex Epilepsies conducted a genome‐wide mega‐analysis of common epilepsies. 8 The heritability of GGE suggests that common variants have a large aggregate role in modifying risk. While other large scale whole exome sequencing studies have also identified ultra‐rare variants, 23 like those conducted by the Epi25 Collaborative, these data address a different component of genetic risk, namely a causal mutation for GGE, and not the modifier effects from the genetic background. Therefore, we used statistical information about common variants as input for our pipeline. GWAS summary statistics for GGE sub‐phenotypes of childhood absence epilepsy (CAE), juvenile absence epilepsy (JAE) and juvenile myoclonic epilepsy (JME) were downloaded from http://www.epigad.org/gwas_ilae2018_16loci.html. We used Multi‐marker Analysis of GenoMic Annotation (MAGMA) to map SNP‐level p‐values to gene level p‐values using the Genome Reference Consortium Human Build 37 (hg37) genetic reference and a 10 kb window on either side of each gene. 24 The CAE, JAE and JME GWAS had 313, 279 and 468 genes, respectively, that reached a nominal level of significance (p < 0.01, File S1). These genes, termed GGE GWAS genes, were used to train the machine learning classifier for NBFP. 2.2 Network‐based functional prediction We performed NBFP to functionally score and rank all genes in the human and mouse genomes as previously described. 19 , 21 , 22 In the context of functional gene networks, the term “functional” refers to the biological activities or processes that genes are involved in beyond the physical interactions between genes. Functional gene networks capture relationships between genes based on biological pathways and cellular processes. Genes that are strongly connected to each other by their edge weights are predicted to be strongly functionally related. Briefly, we trained an ensemble of 100 linear SVM classifiers to discriminate GGE GWAS genes from the rest of the genes in the genome using the network connectivity as input features (Figure 2). Formally, this is a positive‐unlabeled learning problem, where we have GGE GWAS genes from the three disorders (CAE, JAE, JME) as positively labeled disease‐associated genes, while the functional association of the remainder of the genes is unknown. We used human and mouse cerebral cortex and thalamic functional networks downloaded from HumanBase (top edges for ‘cerebral cortex’ and ‘thalamus’) and Functional Networks of Tissues in Mice (‘cerebralcortex_top’ and ‘thalamus_top’). 17 , 25 These four functional gene–gene networks are composed of genes connected by edge weights encoding predicted probabilities for the genes' functional interaction within the specific tissue based on multiple streams of genomic data including expression profiles and prior knowledge of gene function. Human GGE GWAS genes were converted to mouse orthologs using gProfiler 26 to be used in the mouse networks. FIGURE 2 Network‐based functional prediction (NBFP). (A) Identification of nominally significant genome‐wide association studies (GWAS) genes from previous studies to serve as gold standard disease‐associated genes. (B) Tissue‐specific gene–gene interaction networks selected for their relevance to disease. Nodes represent genes and edges are weights denoting the interaction among those genes. (C) The network represented as an adjacency matrix where each gene occupies a space along the rows and columns, and the connections between them are specified as the network edge weights, which are values between zero and one. (D) The extracted feature matrix where GWAS genes represent positively labeled genes (P) associated with disease, and all other genes are unlabeled (U). (E) We trained an ensemble of linear support vector machines (SVM) to identify positively labeled genes based on their weighted connections within the relevant tissue networks. Highly connected genes that were previously unlabeled were classified as candidate disease genes if they were strongly annotated to the disease network, as defined by their functional score (FS).For every gene, g, the output of each SVM model is an arbitrarily scaled score,mg, where mg>0 indicates that the gene is functionally similar to GGE GWAS genes, while mg<0 indicates that the gene is functionally dissimilar to GGE GWAS genes. We scaled the raw SVM scores by computing the unlabeled‐predicted‐positive rate (UPPR), which is the positive‐unlabeled learning version of false discovery rate. For a gene g, UPPR is computed as the ratioUPPRg=#g′unlabeled:mg′≥mg#g′:mg′≥mg,that is, the fraction of unlabeled genes with a score higher than that of gene g. Intuitively, the lower the UPPR, the more confident that gene g is functionally related to the positively labeled genes. We converted the UPPR to a final functional score, FS, using the negative of the logarithmFSg=−log10UPPRg. High functional scores for all human and mouse genes correspond to how well connected each gene is to the GGE GWAS genes within the cortical and thalamic SWD sub‐networks. 2.3 Pathway analysis NBFP outputted rankings for all genes in human and mouse genomes for 12 models, which were derived from the three GGE GWAS gold standard training sets (CAE, JAE, JME) within the four functional tissue networks (human cerebral cortex and thalamus, and mouse cerebral cortex and thalamus). For each of these 12 models, all genes in the genome are ranked based on their functional score within that model. To build sub‐networks of the top‐ranking genes from these 12 models that contained the strongest functional connection, we aggregated the top 100 genes ranked by functional score into four representative networks: human cerebral cortex, human thalamus, mouse cerebral cortex, mouse thalamus (File S2). These four networks contained ~300 top‐ranked genes each, and represent the underlying SWD functional network of genes corresponding to that consolidated model. This process of aggregating the models and taking the top‐ranking genes within them allowed for these four representative networks to emerge, with which we could perform pathway analysis to better describe the contents of these networks. These four gene networks were then clustered into modules using a fast greedy modularity optimization algorithm. 27 Gene set enrichment analysis was performed on each of the modules using g:GOSt and Cytoscape. 26 , 28 The four networks were plotted as an adjacency matrix sorted by functional scores with top significant Gene Ontology (GO) terms. To visualize common enrichments across the SWD networks, we plotted p‐values of significant shared GO enrichment terms across the human and mouse cortical and thalamic networks. 2.4 Ranking gene pairs from loci on mouse chromosomes 2 and 7 We subset each of the 12 ranked lists to genes within the chromosome 2 and 7 epistatic interaction loci. For the human networks, genes were converted to mouse orthologs to score positional candidates using gProfiler. 26 To rank possible causal genes for the epistatic interaction between chr 2 and chr 7, we sought gene pairs for which both genes had high functional scores, indicating association to GGE etiology, and a strong direct connection within a network, indicating a functional interaction between those two genes specifically (File S3). For each species‐tissue combination (e.g., mouse‐thalamus), we computed a combined score, CS, that integrates these two features as follows. For a gene pair (g 1, g 2), with g 1 in the chr 2 locus and g 2 in the chr 7 locus, we computed the maximum functional score of g i , MFS(g i ), for the three GGE models (CAE, JAE, JME)MFSgi=maxFSCAEgiFSJAEgiFSJMEgi. The maximum functional score encodes whether a gene is highly ranked for at least one of the GGE subtype. From the maximum functional scores, we computed a paired functional score, PFS(g 1, g 2), asPFSg1g2=minMFSg1MFSg2. The paired functional score encodes whether both genes in a pair are highly ranked for at least one GGE subtype. For each tissue and every pair of genes (g 1, g 2), the tissue network has an edge weight, Wg1,g2, encoding the functional similarity of those genes. 17 , 25 We integrated the paired functional score with the edge weight to yield a combined score, CS (g 1, g 2), computed as follows. Letting N denote the number of gene pairs spanning the chr 2 and 7 loci,CSg1g2=#g~1,g~2:PFSg~1,g~2≤PFSg1g2andWg~1,g~2≤Wg1,g2N. Thus, the combined score encodes the 2D empirical cumulative distribution function of pair functional score and the edge weights. Intuitively, the combined score counts the fraction of gene pairs that are worse than g1g2 for both scores. Values near one indicate that a pair is better than nearly all other pairs on both metrics simultaneously. Because the distribution of edge weights differed across the four networks (human and mouse, cerebral cortex and thalamus), we considered each network independently. Thus, we generated four separate combined scores (File S3), and considered all four rankings in the final analysis (Table 1). TABLE 1 Plausible candidate gene pairs in spike‐and‐wave discharge (SWD) pathways. Ch2 gene Ch7 gene Comb score Network EW FS (Ch2 gene) Additional evidence FS (Ch7 gene) Additional evidence TUBGCP4 PPME1 0.947 0.174 2.140 KOMP phenotype Splice variants 2.348 DOWN DEG pv 0.004 KOMP phenotype Splice variant DUT DDIAS 0.925 0.180 1.279 Del missense variant 2.053 Del missense variants SNAP23 RAB6A 0.976 0.397 1.549 UP DEG pv 0.027 1.972 UP DEG pv 0.004 CKAP2L DDIAS 0.958 0.235 1.986 1.533 Del missense variants Chp1 Rab6a 0.952 0.299 1.222 KOMP phenotype 1.607 UP DEG pv 0.004 Gatm Pak1 0.813 0.173 1.284 UP DEG pv 0.011 1.210 KOMP phenotype Note: Integrative prioritization of top candidate gene pairs from the four SWD networks with the combined (comb) score and network edge weight (EW), as well as the functional score (FS) for each individual gene. Bolded genes are among the highest‐ranking genes in the whole‐genome analysis (see Section 2, these are the labeled genes in Figure 3), as well as the chromosome 2 and 7 loci. Additional evidence to support their role in this epistatic interaction include: differential expression of that gene in the cortex from a mouse model (DEG), 29 and the existence of missense or splice variants of that gene with predicted deleterious (del) effects, 64 as well as relevant Knockout Mouse Project (KOMP) phenotypes. 49 The genes included in this table were identified for their high rank among all the included criteria, and their function in neurodevelopmental pathways is highlighted in Figure 6. 2.5 Gene expression analysis To augment gene pair prioritization, we performed gene expression analysis using an RNAseq dataset from adult cortical tissue for the parent strains of the meta‐mapping population (three C57BL/6J samples and three C3H/HeJ samples, File S4). 29 Fastq files of paired‐end sequencing reads were downloaded from https://www.ebi.ac.uk/ena/browser/view/PRJNA707067 and extracted using SRA toolkit (https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software). Quality assessment of reads was performed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Adapter trimming was carried out using Cutadapt, 30 and reads were aligned to the reference genome GRCm38 using Spliced Transcripts Alignment to a Reference (STAR). 31 Read counts were aggregated with MultiQC. 32 Differential gene expression analysis was carried out in R using DESeq2. 33 2.6 Data and code availability statement All GWAS p‐values, functional and combined scores, gene pair scores and network edge weights, and gene expression p‐values (between C57BL/6J and C3H/HeJ strains) are provided in the supplementals. To ensure rigor and reproducibility, all NBFP code used in this study is available at https://github.com/MahoneyLabGroup. 3 RESULTS 3.1 Network‐based functional prediction identifies distinct thalamocortical functional networks associated with GGE risk genes To identify the underlying pathways involved in SWD, we performed NBFP to rank every gene in the genome by how well they connect to GGE GWAS genes. Briefly, NBFP is a “guilt by association” strategy that works as follows. Starting with a gene interaction network and a list of trait‐related genes, we train a classifier to discriminate between trait‐related genes and the rest of the genome based on the connections within the gene interaction network. By assumption, trait‐related genes are expected to be more tightly connected among themselves than to other genes in the genome. The output of the classifier is a functional score (FS) that ranks each gene according to how strongly connected it is to the trait‐associated input list. In this study, we trained an ensemble of SVM classifiers to identify genes related to human GGE risk using connection weights to GGE GWAS genes in the cortical and thalamic tissue networks as features. We used childhood absence epilepsy, juvenile absence epilepsy and juvenile myoclonic epilepsy GWAS 8 genes (collectively referred to as GGE GWAS genes) as putative GGE risk genes. We used both human and mouse cortical‐ and thalamus‐networks. 17 , 25 For mouse networks, we converted human GGE GWAS genes to mouse orthologs. In total, we trained 12 models, one for each combination of GGE subtype, species and tissue network. We used the resulting FSs from these 12 analyses (see Methods) to rank all genes in the human and mouse genomes. To interpret the functional role of highly ranked genes from NBFP and find plausible candidate pathways related to SWD that are enriched for modifiers, we performed a graph clustering analysis on the human and mouse AE‐associated networks containing the top 100 genes from each model. We then used maximum modularity clustering 27 to identify significant modules and performed a pathway analysis on each of the modules. To visualize the resulting sub‐networks of genes from the modularity analysis, we grouped each by human or mouse and thalamus or cortex and made weighted adjacency matrices (Figure 3). Within each module, genes were sorted by functional score. Higher edge weights between genes within the network corresponds to a stronger functional connection within the network and is represented by brighter colors (Figure 3). The human networks were denser than the mouse networks. Nevertheless, the functional relationship among genes within each module underscored distinct pathways. Across the four GGE networks, modules were commonly enriched for brain‐related pathways including nervous system development, neurogenesis, cell cycle, cell signaling, synaptic signaling, morphogenesis and neuron development, all of which have plausible association to SWD severity. FIGURE 3 Modularity and pathway enrichments of cortical and thalamic spike‐and‐wave discharge (SWD) functional networks of highly ranked genes across the entire human and mouse genome. The four adjacency matrices represent the SWD network of human and mouse cortical and thalamic‐specific functional interactions. Each of the four networks represented have the top ~300 genes most strongly associated with genetic generalized epilepsies genome‐wide association studies genes across the whole mouse or human genome. For each of the four networks, these top ~300 genes were clustered into modules, denoted with “M.” Modules are sorted by the strength of a gene's connection to other genes within the module. Distinct pathway enrichments for each module were performed using Gene Ontology (GO) terms, KEGG pathways and Human Phenotype (HP) ontology from gProfiler and Cytoscape. 26 , 28 Pathway enrichments are above their respective module. The genes that have been labeled are part of the ~300 top genes in addition to being in the loci on chromosomes 2 and 7, denoting their high ranking individually within the subnetworks for the epistatic loci. The colors represent the strength of the connection to other genes within the network, with yellow, orange and red denoting strong connections, and blue denoting weaker connections. Varying the number of highly ranked genes for visualizing the networks did not appreciably alter the module structure or enriched pathways (data not shown).The modularity and pathway analysis resulted in modules with common enrichments across the cortical and thalamic SWD networks, even though the gene composition of modules varied. To determine if the enriched pathways from these GGE‐associated networks were also shared across species, we compared enrichment p‐values of significant shared GO enrichment terms across human and mouse networks (Figure 4). The comparison revealed multiple common top GO terms including nervous system development, synapse, synaptic and cell signaling, neurogenesis, morphogenesis and structural development, and regulation of cellular and biological processes. The multi‐species concordance of shared enrichments demonstrates that there are common biological pathways among GGE risk genes in both humans and mice. FIGURE 4 Shared pathway enrichments in mouse and human spike‐and‐wave discharge networks. Scatter plot of −log10 p‐values of significant shared Gene Ontology enrichment terms between mouse and human cortical and thalamic networks. Note that the individual shared pathways in each tissue correspond to broader common themes, particularly “synaptic function” in the cerebral cortex (left) and “neurodevelopment” in the thalamus (right). 3.2 Functional ranking of genes in modifier loci on chromosomes 2 and 7 To prioritize among positional candidates within the modifier loci on chromosomes 2 and 7, we ranked every gene by its functional score. To visualize high ranking genes within these intervals, we plotted the positional candidates by functional score (from all 12 models) for both loci. For the human network models, genes were converted to mouse orthologs. There were multiple highly ranked genes shared across the 12 models, as well as highly ranked genes that were also GGE GWAS genes, including Emsy, Chrdl2, Kcnip3, Stard10, Sppl2a, Prom2 and Ap4e1. Furthermore, several of the top ranked genes were also highly ranked genes from the genome‐wide human and mouse NBFP analyses, including Ddias, Capn3, Ppme1, Tubgcp4, Vps16, Rab30, Ckap2l and Cdc25b (genes labeled in Figure 2). The existence of highly ranked hits from the genome‐wide GGE network analysis within the loci on chromosomes 2 and 7 is consistent with the effect of these QTLs on SWD. Beyond individual highly ranked genes within these loci, we wanted to find gene pairs that explain the antagonistic epistasis between these loci. Top candidates should have both strong functional association to GGE and a plausible functional interaction with each other as a mechanism to drive epistasis. To rank all possible gene pairs, we integrated each pair's functional scores from the species‐ and tissue‐specific NBFP models with the connection strength of a gene pair to each other. Specifically, for each gene pair, we computed a species‐ and tissue‐specific summary functional score (see Section 2). Plotting this summary functional score against the species‐ and tissue‐specific network edge weight, we see that some gene pairs have simultaneously high functional score and a strong network connection (Figure 5; upper right, yellow dots). To quantify this to produce an unbiased final pair ranking, we computed a combined score using the empirical two‐dimensional cumulative distribution function that, for each pair, counts the fraction of all pairs with both lower functional scores and edge weights (see Methods). Thus, a score of 0.9 indicates that 90% of all gene pairs have worse scores on both axes (Figure 5, color bar). We labeled the top 10 gene pairs from the human and mouse cortical and thalamic networks from the loci on chromosomes 2 and 7. FIGURE 5 Functional gene pair candidates for causal epistatic interaction. Joint ranking of candidate gene pairs aggregated over all models. Gen pairs are plotted by minimum function score of the pair and interaction strength of the pair within the tissue‐specific network. Gene pairs are ranked by their combined score. 3.3 Differential brain gene expression in B6 versus C3H brains We augmented our network‐based ranking with genetic support for gene pairs with transcriptomic evidence from the B6 and C3H/HeJ parent strains. (C3H/HeJ is closely related to the C3H/FeJ strain used in this study.) We reanalyzed an RNAseq dataset of cortical mouse brain tissue from Lee et al 29 to identify differentially expressed genes (DEGs) between the B6 and C3H strains from their study. We found that several gene pairs that were highly ranked by combined score also had variation in gene expression between the B6 and C3H strains (Table 1). 4 DISCUSSION 4.1 Genome‐wide network analysis implicates brain‐related pathways In our genome‐wide network analysis, the top‐ranked genes implicated brain‐related pathways that are consistent across species and tissues. Shared pathways included nervous system development, cell cycle, neuronal development, neurogenesis, cellular organization and synaptic signaling, which all have relevance in GGE. 3 , 34 , 35 , 36 , 37 , 38 Variations in the thalamocortical loop, such as changes in neuronal connectivity and excitatory/inhibitory balance, can promote aberrant thalamocortical oscillations leading to hypersynchronous firing states and SWDs (Figure 6A). 3 , 4 , 39 , 40 , 41 Critically, thalamocortical circuit development depends on proper axon growth, thalamic input for cortical structure and intrinsic properties of thalamic projections. 42 FIGURE 6 Neurodevelopmental and thalamocortical processes involved in epistasis driving spike‐and‐wave discharge (SWD) phenotypes in mice. Gene pairs identified through network‐based functional prediction and filtered for loci on chromosomes 2 and 7 are involved in multiple brain‐related pathways that could contribute to SWD phenotypes. Top candidates are indicated in red. (A) Critical thalamocortical connections are made during neurodevelopment. (B) Microtubule development and stabilization with the gene pair Tubgcp4 and Ppme1, which involves microtubule nucleation with gamma‐tubulin ring complex (gamma‐TuRC), microtubule stabilization with tau proteins, activation of protein phosphatase 2A (PP2A) by microtubule depolymerization and inactive, Ppme1‐bound PP2A. (C) Microtubule dynamics and transportation and anchoring with the golgi complex in the growth cone of the neuron, where beta‐catenin is also present. (D) Cell cycle regulation and mitotic spindle apparatus with microtubules during neurodevelopment. (E) Apoptotic signaling involving death‐inducing signaling complex (DISC) and caspase‐8 as well as DNA damage pathways involving the pair Ddias and Dut. (F) Wnt/beta‐catenin signaling pathway turned on (pink) where beta‐catenin is possibly stabilized by Ddias and can enter the nucleus, and turned off (gray) where beta‐catenin is degraded by the beta‐catenin destruction complex, which also includes active PP2A. Created with Biorender.comWith this genome‐wide assessment of pathways enriched for GGE risk variants, we prioritized interacting gene pairs within the epistatic loci affecting SWD in mice. Indeed, even some of the most highly ranked genes across the whole genome were found in the loci on chromosomes 2 and 7, including Ddias/DDIAS, Capn3/CAPN3, Ppme1/PPME1, Tubgcp4/TUBGCP4, Vps16/VPS16, Rab30/RAB30, Ckap2l/CKAP2L and Cdc25b. (Note that any genes found in the human networks were converted to mouse orthologs located in the loci on chromosomes 2 and 7.) Not only does this affirm the candidacy of these individual genes, but it also corroborates our assumptions that the epistatic interaction acts through common pathways and that these genes are connected through underlying sub‐networks related to human GGE. Of the top ranked genes in the entire genome, several appear across both species and tissue‐specific networks and have multiple interacting pairs in the epistatic loci. Additional evidence for their candidacy also includes differential expression in cortical mouse tissue, splice or deleterious variants found in mice, as well as neurologically relevant KOMP phenotypes. While it would have been Ideal to have cortical and thalamic gene expression data from the mapping population, these data do not exist. As an alternative, we used publicly available gene expression data for the parent strains of the mapping population for cerebral cortex tissue. To our knowledge, there are no publicly available gene expression data for these strains for the thalamus, which is a limitation of this study. Nevertheless, our integrative prioritization considers multiple streams of evidence that support the top‐ranking genes. In the remainder of this section, we discuss three top‐ranking candidate pairs in detail for which either both genes in the pair are highly ranked at the genome scale (defined as being in the top 100 genes in at least one of the prediction models) or the gene pair was highly ranked in both humans and mice. 4.2 Microtubule assembly and stabilization Included among the top‐ranked genes across the entire human genome, the gene pair Tubgcp4/TUBGCP4—Ppme1/PPME1 was highly ranked in both cerebral cortex and thalamus SWD‐networks. TUBGCP4 encodes gamma‐tubulin complex associated protein 4 (GCP4), which is an essential component of the gamma‐tubulin ring complex (gamma‐TuRC). 43 Extensive characterization of gamma‐TuRC shows that it is critical to microtubule nucleation, providing the template upon which alpha‐ and beta‐tubulin bind to grow microtubules at the centrosome and other microtubule organizing centers (Figure 6B–D). 44 , 45 , 46 Gene mutations of TUBGCP4 in humans cause a range of clinical manifestations, including congenital microcephaly, chorioretinopathy, learning difficulties, additional ophthalmic defects, as well as other neurodevelopmental abnormalities. 47 , 48 Functional assays in fibroblasts from an individual with a TUBCGP4 mutation revealed reduced levels of GCP4 and gamma‐TuRC, abnormal microtubule organization and cell morphology, and large nuclei and binucleated cells. 47 Homozygous Tubgcp4 knockout in mice results in embryonic lethality. In vitro evidence suggests this is caused by cell division defects, specifically abnormal mitotic spindle formation from defective gamma‐TuRC assembly (Figure 6D). 49 , 50 Heterozygous Tubgcp4 knockout mice are viable, but exhibit microcephaly and retinopathy, disorganization of the retina and photoreceptor degeneration, and increased autophagy in the retina compared with wildtype controls. 50 Additionally, C3H mice have three splice variants in Tubgcp4 (rs33279936, rs237335729, rs13465372). 51 , 52 The top interaction partner of Tubgcp4/TUBGCP4 was Ppme1/PPME1, which encodes protein phosphatase methylesterase 1 (PME1). PPME1 has high transcript expression in the brain and binds to protein phosphatase 2A (PP2A). 53 PME1 binding has demethylating and inactivating effects on PP2A, whose activity, varied conformations and isoforms, and regulation is critically involved in many neurodevelopmental processes, including microtubule stability in neurons, cell cycle progression, preventing neuronal degeneration and neuronal tau dephosphorylation (Figure 6B). 54 , 55 , 56 Downregulated PP2A activity, including PME1‐induced inactive forms of PP2A, decreases microtubule‐associated tau dephosphorylation, which results in tau hyperphosphorylation and neuropathological, toxic aggregation of tau, commonly associated with neurodegenerative tauopathies. 57 There are also multiple neurodevelopmental disorders and phenotypes associated with de novo variants of genes encoding specific subunits of PP2A, including epilepsy and intellectual disability. 58 In vivo studies have also shown that knockouts of subunit‐encoding genes for PP2A in the nervous system resulted in abnormal cortical development and atrophy, microcephaly, increased apoptosis, hyperphosphorylated tau and tau‐related neural defects and embryonic lethality. 59 , 60 Additionally, homozygous knockout of Ppme1 in mice results in postnatal lethality around the first day of birth, suggesting its critical role in development. 49 , 61 Interestingly, while Ppme1 knockout results in near complete loss of demethylated PP2A as expected, PP2A activity evidently decreases as well, indicating a dynamic role for demethylated PP2A in downstream PP2A activity and regulation. 61 Importantly, Ppme1 has downregulated expression in the cerebral cortex of the C3H strain compared with the B6 strain (p = 0.004) 29 implicating lower levels of PP2A in this background. The above results are consistent with decreased microtubule production and stability in the C3H strain. Intriguingly, it has been shown that wild‐type C3H mice have smaller brains than many other laboratory strains, demonstrating a connection between C3H variants and brain size. 62 Given the importance of cell division and structure to proper brain development, C3H variants in Ppme1 and Tubgcp4 are strong candidates to alter thalamocortical circuit organization to exacerbate SWD and support an epistatic interaction between the chr 2 and chr 7 loci. 4.3 Cell cycle and thalamocortical development The gene Ddias/DDIAS was included in three high‐ranking gene pairs, including CKAP2L‐DDIAS, both of which are high‐ranking at the genome level, and DUT‐DDIAS, whose mouse ortholog pair Dut/Ddias was also highly ranked (Table 1). Ddias/DDIAS encodes the protein DNA damage induced apoptosis suppressor and is involved in the regulation of apoptotic signaling and cell survival via multiple distinct pathways. In vivo and in vitro models of the mouse ortholog, noxin, support its role in controlling cell cycle progression (i.e., inducing cell cycle arrest, Figure 6D) and apoptosis in response to cellular stressors (Figure 6E). 63 Downregulation and inactivation of noxin results in increased cell death, revealing its normal anti‐apoptotic activity. Importantly, C3H mice have three Ddias missense variants that are potentially deleterious (rs50007671, SIFT 0.04, rs50007671, SIFT 0.01, rs265170948, SIFT 0.04). 51 , 52 , 64 The role of a dysfunctional version of Ddias in thalamocortical development and increased apoptotic signaling could contribute to increased SWD severity. Ckap2l/CKAP2L, a highly ranked gene partner of Ddias/DDIAS, is also involved in cytoskeletal function. CKAP2L encodes cytoskeleton associated protein 2‐like (CKAP2L) and the mouse homolog radmis protein, so named for radial fiber and mitotic spindle, which is a microtubule‐associated protein enriched in neural stem and progenitor cells (NSPCs) during embryonic and postnatal neurodevelopment (Figure 6D). 65 Radmis overexpression in vitro results in hyper‐stabilized microtubules during mitosis and abnormal mitotic spindles, and in vivo overexpression reduced NSPC proliferation, whereas radmis knockdown destabilized spindle microtubules, pointing to its critical role in NSPC cell division during development. 65 Familial sequencing studies have found mutations in CKAP2L that cause Filippi syndrome, an autosomal recessive genetic disorder with multiple severe brain‐related manifestations, including microcephaly, intellectual disability and seizures, among many others. 66 Loss of function CKAP2L in patient cells caused mitotic microtubule spindle organization and chromosome segregation defects, supporting its function in neurodevelopment and cell division. 66 However, while C3H have multiple missense variants in Ckap2l, so far they are not predicted to be deleterious by their SIFT scores. 51 , 52 Nevertheless, SIFT scores only measure evolutionary conservation as a surrogate for deleterious effects. Given the high rank of CKAP2L to GGE gene networks, we speculate that C3H variants may have a subtle functional effect during neurodevelopment that may influence SWD. Another highly ranked interaction partner of Ddias/DDIAS was Dut/DUT, which formed high‐ranking pairs in both human and mouse networks. Dut encodes deoxyuridine triphosphotase (DUT) and plays a role in apoptotic signaling. Through its enzymatic activity related to nucleotide metabolism, DUT produces dUMP, a necessary precursor for thymine nucleotide synthesis for DNA replication, and maintains dUTP levels, which can accumulate and activate DNA repair processes, resulting in DNA fragmentation and cell death (Figure 6E). 67 C3H mice also have a potentially deleterious missense variant in Dut (rs240193814, SIFT 0.0). 51 , 52 , 64 While Ddias and Dut normally have similar anti‐apoptotic functions (Figure 6E), deleterious variants in both may not exacerbate SWDs more than their individual contributions because of compensatory cell preservation and proliferation mechanisms, for example, possible Dut‐independent effects of Ddias in the Wnt/beta‐canenin pathway (Figure 6F). 68 , 69 , 70 , 71 , 72 5 CONCLUSIONS Our bioinformatic prioritization of gene pairs resulted in a highly nontrivial collection of new hypotheses about modifiers of SWD in multiple, etiologically distinct models with the potential for these to translate to human GGE. Candidate gene and variant prioritization is always an underdetermined problem; we never have enough evidence to support anything but plausible inferences. Nevertheless, our ranking criteria produced candidate genes that have strong mechanistic support in the literature. We also note that our criteria were stringent. Several candidate genes, including Mfap1, Rab6a (Figure 6C) and Pak1 (Figure 6C, D), all appeared in multiple highly ranked pairs. These candidates may deserve higher scrutiny, although we argue that the strong pairwise interactions of DDIAS‐DUT, DDIAS‐CKAP2L and TUBGCP4‐PPME1 gives the mouse orthologs of these pairs highest priority as candidates for the epistatic interaction. Our candidate predictions still require mechanistic follow‐up experiments. However, by systematically dissecting the networks containing these candidates, we have been able to nominate key cellular processes that can potentially be tested in vitro in advance of expensive and time‐consuming experiments in mice. Thus, our integrative approach above represents a new paradigm for ranking candidate interactions using information from functional interaction networks. Supporting information Data S1. Supporting Information. Data S2. Supporting Information. Data S3. Supporting Information. Data S4. Supporting Information. ACKNOWLEDGMENTS Jeffrey L. Brabec and J. Matthew Mahoney were supported by NIH grant 1P20 GM130454. Anna L. Tyler and J. Matthew Mahoney were supported by NIH grant R21LM012615.

Document structure show

Annnotations TAB TSV DIC JSON TextAE Lectin_function IAV-Glycan

  • Denotations: 0
  • Blocks: 0
  • Relations: 0