|
article-title
|
Comprehensive dissection of meiotic DNA double-strand breaks and crossovers in cucumber
|
|
abstract
|
Abstract
Meiotic recombination drives genetic diversity and crop genome optimization. In plant breeding, parents with favorable traits are crossed to create elite varieties. Different hybridizations produce diverse types of segment reshuffling between homologous chromosomes. However, little is known about the factors that cause hybrid-specific changes in crossovers (COs). Here, we constructed 2 F2 populations from crosses between a semiwild and 2 domesticated cucumber (Cucumis sativus) accessions and examined CO events. COs mainly occurred around genes and differed unevenly along chromosomes between the 2 hybrids. Fine-scale CO distributions were suppressed in regions of heterozygous structural variations (SVs) and were accelerated by high sequence polymorphism. C. sativus RADiation sensitive 51A (CsRAD51A) binding, histone H3 lysine 4 trimethylation (H3K4me3) modification, chromatin accessibility, and hypomethylation were positively associated with global CO landscapes and in local DNA double-strand break (DSB) hotspots and genes. The frequency and suppression of COs could be roughly predicted based on multiomic information. Differences in CO events between hybrids could be partially traced to distinct genetic and epigenetic features and were significantly associated with specific DSB hotspots and heterozygous SVs. Our findings identify the genomic and epigenetic features that contribute to CO formation and hybrid-specific divergence in cucumber and provide theoretical support for selecting parental combinations and manipulating recombination events at target genomic regions during plant breeding.
|
|
title
|
Abstract
|
|
p
|
Meiotic recombination drives genetic diversity and crop genome optimization. In plant breeding, parents with favorable traits are crossed to create elite varieties. Different hybridizations produce diverse types of segment reshuffling between homologous chromosomes. However, little is known about the factors that cause hybrid-specific changes in crossovers (COs). Here, we constructed 2 F2 populations from crosses between a semiwild and 2 domesticated cucumber (Cucumis sativus) accessions and examined CO events. COs mainly occurred around genes and differed unevenly along chromosomes between the 2 hybrids. Fine-scale CO distributions were suppressed in regions of heterozygous structural variations (SVs) and were accelerated by high sequence polymorphism. C. sativus RADiation sensitive 51A (CsRAD51A) binding, histone H3 lysine 4 trimethylation (H3K4me3) modification, chromatin accessibility, and hypomethylation were positively associated with global CO landscapes and in local DNA double-strand break (DSB) hotspots and genes. The frequency and suppression of COs could be roughly predicted based on multiomic information. Differences in CO events between hybrids could be partially traced to distinct genetic and epigenetic features and were significantly associated with specific DSB hotspots and heterozygous SVs. Our findings identify the genomic and epigenetic features that contribute to CO formation and hybrid-specific divergence in cucumber and provide theoretical support for selecting parental combinations and manipulating recombination events at target genomic regions during plant breeding.
|
|
abstract
|
A multiomic map of meiotic recombination in cucumber revealed the impacts of genomic and epigenetic features on crossover formation and hybrid-specific crossover alterations.
|
|
p
|
A multiomic map of meiotic recombination in cucumber revealed the impacts of genomic and epigenetic features on crossover formation and hybrid-specific crossover alterations.
|
|
body
|
Introduction
Modern crop cultivars exhibit enhanced morphological and agronomic characteristics due to trait selection during breeding. Selecting the appropriate parental combination is the first step in breeding, and maximizing genetic variability and integrating superior genotypes are important for parental selection. However, concerns about introducing undesirable linkage drag and breaking beneficial gene combinations may limit the application of elite alleles in plant breeding. Therefore, a deeper understanding of the factors influencing meiotic recombination and hybrid-specific divergence will help plant breeders and geneticists select the appropriate parental combinations and manipulate recombination events at target genomic regions.
Meiotic recombination occurs following the formation of DNA double-strand breaks (DSBs), a process catalyzed by the topoisomerase-like protein SPORULATION 11 (SPO11) (Grelon et al. 2001; Lange et al. 2016). These DSBs are repaired by the recombinases Disrupted Meiotic cDNA 1 (DMC1) and RADiation sensitive 51 (RAD51) (Kurzbauer et al. 2012). The processing of DSB ends results in reciprocal crossover (CO) formation or nonreciprocal gene conversion.
The regulation of meiotic CO formation is well studied in plants (Wang and Copenhaver 2018). The genome-wide phenomenon of CO assurance ensures the occurrence of at least 1 CO per homologous chromosome pair, and the regulation of CO homeostasis produces a stable number of COs among individuals (Sidhu et al. 2015; Li et al. 2021). Moreover, the presence of a CO suppresses the occurrence of another CO nearby, a process known as CO interference (Lambing et al. 2020). In the fine scale, several factors promote COs, including DSB frequency, the presence of active chromatin, hypomethylation, the presence of the histone H3 lysine 4 trimethylation (H3K4me3) modification, and the absence of the histone H3 lysine 9 dimethylation (H3K9me2) modification (Marand et al. 2017; Tock et al. 2021; Lian et al. 2022). The relationship between CO formation and sequence polymorphism is complicated. Structural variations (SVs) between homologous chromosomes can suppress CO formation (Lawrence et al. 2017; Rowan et al. 2019), whereas the local distribution of COs is associated with elevated polymorphism levels (Blackwell et al. 2020). Different hybrids exhibit diverse meiotic recombination landscapes, but these differences are not exclusively due to distinct polymorphism levels (Ziolkowski et al. 2015; Dreissig et al. 2020), which may be partially explained by differences in heritable epigenetic states between germplasms.
Cucumber (Cucumis sativus) is an economically important vegetable crop (FAOSTAT, http://www.fao.org/faostat/en/). The haploid cucumber genome has an estimated size of 211 Mb comprising 7 chromosomes, with 24,317 annotated protein-coding genes and 77,092 repetitive elements covering 42.6% and 36.4% of the genome, respectively (Li et al. 2019). Here, we addressed sites of COs and DSB hotspots in cucumber and characterized factors that influence their occupancy. We revealed that CO formation and divergence are associated with C. sativus RADiation sensitive 51A (CsRAD51A) binding, euchromatin marks, and heterozygous SVs. Our findings point to the multiomic regulation of CO formation in cucumber and highlight the influence of epigenetic marks inherited from the parents on CO distribution and alterations in the progeny.
Results
Genome-wide identification of meiotic COs
To map the CO landscape and capture epigenetic and transcriptional information during CO formation in cucumber, we constructed 2 F2 populations (Fig. 1A) and generated novel multiomic data sets from meiotic tissues of the corresponding male parents (Fig. 1B). In a previous study, we divided core cucumber accessions into 4 subgroups: the Indian, Xishuangbanna, Eurasian, and East Asian groups (Qi et al. 2013). To increase sequence polymorphism in the progeny, we crossed the semiwild Xishuangbanna cucumber accession Cuc80 as the shared female parent with cultivated accession 9930 or Cuc37. We self-pollinated the F1 hybrids and generated 2 F2 populations containing 200 (Cuc80 × 9930) and 193 individuals (Cuc80 × Cuc37).
Figure 1. Experimental design for multiomic analysis of meiotic recombination formation in cucumber. A) Construction of cucumber F2 populations. The semiwild Xishuangbanna cucumber accession Cuc80 was chosen as a shared female parent to cross with cultivated accessions 9930 (East Asia type) and Cuc37 (Eurasian type), respectively. The obtained F1 hybrids were self-pollinated to produce F2 populations. B) Multiomic integration of cucumber recombination. DSB sites were measured by CsRAD51A ChIP-seq. Epigenetic modifications were carried out by H3K4me3 ChIP-seq, FAIRE-seq, and WGBS. Transcription levels were presented with RNA-seq. 5-Methylcytosine (5-meC) represents methylation of the fifth position of cytosine.Using 822,844 and 1,203,797 sequence variants between the parental lines (Supplemental Table S1), we identified a total of 2,585 and 2,663 COs in the Cuc80 × 9930 and Cuc80 × Cuc37 F2 populations (Supplemental Table S2), respectively. The total number of COs per F2 individual ranged from 4 to 22. The average number of COs per individual in Cuc80 × Cuc37 was 13.8, which is significantly higher than that (12.9) in Cuc80 × 9930 (P = 0.0024, Mann–Whitney U test, Supplemental Fig. S1A). We observed robust CO assurance in the F2 populations, as the average number of COs per chromosome per meiosis was 0.92 in Cuc80 × 9930 and 0.99 in Cuc80 × Cuc37. The CO frequency did not show a Poisson distribution (Supplemental Fig. S1, B and C), suggesting the existence of CO interference. There were significant differences between CO numbers on different chromosomes (Supplemental Fig. S1, D and E). The average CO number per chromosome increased in proportion to the chromosome length in both populations (Supplemental Fig. S1F).
To assess the fine-scale distribution of COs, we looked at COs at higher resolution (CO intervals <12 kb) to enrichment analysis, including 1,814 COs from Cuc80 × 9930 and 1,958 COs from Cuc80 × Cuc37. Nearly 30% of these COs were located in gene body regions, and more than 93% were in genes or their 10-kb flanking intergenic regions in both populations (Fig. 2A). As shown in Fig. 2, B to E, CO enrichment patterns differed in various genomic components but were conserved between hybrids. Specifically, most COs occurred in distal intergenic regions (accounting for 42.6% to 46.1% of total COs), followed by promoter regions (24.0% to 26.5%) located within 2 kb upstream of transcription start sites (TSSs), introns (13.3% to 13.5%), coding sequences (CDSs, ∼10%), and untranslated regions (UTRs, ∼7%). We performed permutation tests to assess the significance of CO enrichment within these regions. When normalized to the expected CO enrichment, COs were significantly enriched in 5ʹUTRs and intergenic regions but exhibited less-than-expected overlap with introns and CDSs (P < 0.01 for 10,000 times permutation test, Supplemental Table S3). These findings reflect a preference for CO formation away from CDSs and introns in cucumber.
Figure 2. Enrichment of COs within various genomic components in cucumber. A) Relative abundance of COs enriched within the genic region and flanking intergenic region in cucumber. Stack bar charts show frequencies of COs overlapping with gene body and intergenic region with increasing distances to the genes in Cuc80 × 9930 (left) and Cuc80 × Cuc37 (right) populations. B, C) Enrichment proportion of COs in different genomic components. Pie charts show the proportion of COs overlapping with the CDS, UTR, intron, promoter (2 kb upstream of TSS), and distal intergenic region in B) Cuc80 × 9930 and C) Cuc80 × Cuc37, respectively. D, E) Relative enrichment and significance of COs in different genomic components. Histogram charts represent the relative enrichments of COs in corresponding genomic components in D) Cuc80 × 9930 and E) Cuc80 × Cuc37 normalized by the expected CO numbers. Compared to the expected number, deposition and depletion are marked by positive and negative log values, respectively. Significances are assessed by 10,000 times permutation test. **P < 0.01, and ***P < 0.001.To explore the functions of CO-related genes, we performed gene ontology (GO) enrichment analysis of genes within the high-resolution CO intervals (<12 kb). CO-overlapping genes were significantly enriched for GO terms involved in DNA/RNA biosynthesis and transcriptional regulation in both populations (Supplemental Fig. S2 and Table S4). The regions at least 10 kb away from COs were considered “CO cold regions”. No significantly enriched GO terms were shared between CO intervals and CO cold regions. These results indicate that CO events tend to occur around genes with certain functions.
Comparisons of CO landscapes between hybrids
To estimate the similarity of global CO distribution among hybrids, we calculated the correlation coefficient of CO rates in nonoverlapping 500-kb bins between the 2 populations. The genome-wide CO landscapes were conserved among hybrids (bin size = 500 kb, Spearman correlation coefficient, R = 0.41, Supplemental Fig. S3). The total lengths of CO cold regions in Cuc80 × 9930 and Cuc80 × Cuc37 were 140.26 and 147.29 Mb, respectively. The overlapping CO cold regions between the 2 populations covered 103.91 Mb, encompassing identified regulatory genes in cucumber. A well-studied cucurbitacin C (CuC) biosynthetic gene cluster comprises 1 terpene oxidosqualene cyclase gene (Bi), 3 cytochrome P450 genes, and 1 acetylase gene; this cluster regulates the biosynthesis of bitterness in Cucurbitaceae (Shang et al. 2014; Zhou et al. 2016). The CsBi cluster overlapped with a CO cold region and was located far from CO intervals in both populations (P < 0.05 for 10,000 Monte Carlo simulations, Supplemental Fig. S4). This might benefit the horizontal transfer and maintenance of the CsBi cluster during hybridization and facilitate the coexpression and coregulation of the clustered genes during CuC biosynthesis.
However, the correlation coefficient of CO landscapes fell to 0.18 when we focused on a smaller scale (bin size = 50 kb, Spearman correlation coefficient), implying that fine-scale CO divergence exists between the 2 populations. We then employed a permutation test to scan the differential recombination regions (see Materials and methods). We identified 136 significant differential recombination regions between the 2 populations, which were spread over the cucumber genome (bin size = 50 kb, P < 0.05 for 10,000 times permutation test, Fig. 3). Here, we regarded 50-kb bins with a 5-fold average recombination rate as CO hotspots. As a result, we identified 72 CO hotspots in both populations, 3 of which overlapped between the 2 populations. We defined a unique hotspot as one that was detected in only 1 population and had a lower-than-average recombination rate in the other population. We identified 19 and 25 unique hotspots in Cuc80 × 9930 and Cuc80 × Cuc37, respectively, and the remaining were unbiased hotspots between the 2 populations. The differential recombination regions and unique CO hotspots were unevenly distributed along the cucumber chromosomes.
Figure 3. Comparison of CO landscapes between cucumber populations. The green shadows represent the predicted positions of centromeres and pericentromeres (1 Mb flanking regions of centromeres). Blue and orange lines indicate chromosomal patterns in Cuc80 × 9930 and Cuc80 × Cuc37, respectively. The windows in Cuc80 × 9930 with a significantly higher CO rate than Cuc80 × Cuc37 is marked with blue hollow circles, and the opposite is orange. Unique CO hotspots in Cuc80 × 9930 and Cuc80 × Cuc37 are marked with blue and orange six-pointed stars, respectively. The bottom stacks represent gene density. Bin size = 50 kb. The significance of difference in recombination rates is calculated with 10,000 times permutation test, P < 0.05.Notably, some verified functional genes existed in these regions of differential recombination between hybrids. Among these, the fruit spine development gene Spine Base Size1 (CsSBS1) (Yang et al. 2022) and the plant architecture gene De-Etiolated-2 (CsDET2) (Hou et al. 2017) overlapped with regions containing significantly higher CO rates in Cuc80 × Cuc37 and CO cold regions in Cuc80 × 9930. Moreover, we observed no sequence variations of CsDET2 between inbred lines 9930 and Cuc37. These results suggest that selecting parents carrying the same genotype can result in great differences in fine-scale recombination between progenies. In summary, the fine-scale CO distributions and corresponding gene functions show diverse characteristics between different hybrids.
Genomic and epigenetic features associated with CO distribution
SVs play important roles in driving plant genome evolution and trait adaptation in crops (Li et al. 2023). To assess CO patterns within SVs, we examined the enrichment of COs relative to SVs. Biparental SVs (Li et al. 2022) included inversions, deletions, insertions, and translocations. We determined that 227 out of 2,585 COs from Cuc80 × 9930 and 324 out of 2,663 COs from Cuc80 × Cuc37 overlapped with SVs, 24 of which spanned predicted centromeres. Further analysis revealed that COs with higher resolution (CO intervals <100 kb) were significantly suppressed in heterozygous SVs, regardless of SV type (P < 0.001 for 10,000 times permutation test, Fig. 4, A and B; Supplemental Table S5). In addition, the intervals of COs overlapping with SVs were much longer than those of total COs (P < 0.001 for Mann–Whitney U test, Fig. 4, C and D; Supplemental Table S6), suggesting that the actual proportion of overlapping COs with SVs is much lower than observed. The presence of 2 SVs upstream of FLOWERING LOCUS T (CsFT) increases CsFT transcription and shortens flowering time in cucumber (Wang et al. 2020; Li et al. 2022). The 3 parental varieties used in the current study have different CsFT genotypes: Cuc80 with the long-2 type genotype is a late-flowering germplasm, whereas 9930 (short-1 type) and Cuc37 (short-2 type) are early flowering. We detected no COs spanning the CsFT locus in either population, but we did detect 1 CO at the left border of the heterozygous 12.1-kb deletion in the short-2 type CsFT locus in Cuc80 × Cuc37. This result matches the previous observation that CO formation is suppressed within SVs and elevated in the surrounding region (Rowan et al. 2019).
Figure 4. COs are suppressed around SV but positively associated with SNP density on the fine scale. A, B) Relative enrichment and significance of COs in different types of SVs in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Histogram charts represent for relative enrichments of COs in corresponding genomic components normalized by the expected CO number. Compared to expected number, deposition and depletion are shown in red and dark gray, respectively. C, D) Distances of nearest COs to SVs. DEL, deletion; INS, insertion; INV, inversion; TRANS, translocation (n = 2,540 for total COs with intervals <100 kb, 396 for SV-overlapping COs, 186 for DEL-overlapping COs, 196 for INS-overlapping COs, 1 for INV-overlapping COs, 13 for TRANS-overlapping COs in Cuc80 × 9930; n = 2,582 for total COs with intervals <100 kb, 632 for SV-overlapping COs, 265 for DEL-overlapping COs, 342 for INS-overlapping COs, 1 for INV-overlapping CO, 24 for INS-overlapping COs, INV-overlapping COs, TRANS-overlapping COs in Cuc80 × Cuc37). Boxplots show the median with center line, upper and lower quartiles with box limits, outliers with points, and the whiskers represent 1.5× interquartile range limits. E, F) SNP density around centers of CO intervals in cucumber. CO center indicates the midpoint of the CO interval. E) Hollow circles stand for SNP density (count/kb) in Cuc80 × 9930 within increasing distance away from CO centers of total COs, COs with intervals <5 kb, COs with intervals <10 kb, COs with intervals <15 kb, COs with intervals <20 kb, and random genomic region. The green dash line represents for average SNP density in genome-wide. F) Distributions in Cuc80 × Cuc37 (n = 1,413 for CO intervals <5 kb, 1,729 for CO intervals <10 kb, 1,912 for CO intervals <15 kb, 2,080 for CO intervals <20 kb, 2,585 for random and total COs in Cuc80 × 9930; n = 1,596 for CO intervals <5 kb, 1,879 for CO intervals <10 kb, 2,036 for CO intervals <15 kb, 2,152 for CO intervals <20 kb, 2,663 for random and total COs in Cuc80 × Cuc37). Significances are assessed by A, B) 10,000 times permutation test and C, D) Mann–Whitney U test. ***P < 0.001.To investigate the correlation between single nucleotide polymorphism (SNP) density and CO formation at a fine scale, we counted SNPs at continuously increasing distances away from CO centers (midpoints of CO intervals). In Cuc80 × 9930, SNP numbers increased close to CO centers but were lower than the random and average SNP densities (Fig. 4E), perhaps due to the limited CO resolution in Cuc80 × 9930. However, when only COs with higher resolutions were considered, SNP densities around CO centers visibly increased in Cuc80 × 9930. SNP densities in Cuc80 × Cuc37 were extremely elevated toward CO centers and much higher than random (Fig. 4F). To further investigate their global relationships, we calculated the correlation coefficients of polymorphism and recombination rates. As shown in Supplemental Fig. S5, A and B, we observed weakly negative correlations in both populations (Spearman correlation coefficient, R = −0.12 in Cuc80 × 9930, R = −0.02 in Cuc80×Cuc37). Together, these results indicate that polymorphism shapes CO distribution in cucumber on a fine scale but not genome-wide.
To pinpoint the localization of DSBs, we first isolated a cucumber homolog of RAD51A. A BLASTP search revealed 7 putative homologs of Arabidopsis (Arabidopsis thaliana) RAD51 (AT5G20850) in cucumber. Among these, 1 protein (CsaV3_1G41430.1) shares high sequence similarity with RAD51A proteins from Plantae, Fungi, and Animalia (Supplemental Fig. S6); we named this protein CsRAD51A. We then generated a polyclonal antibody against CsRAD51A and tested its specificity by immunoblot analysis of csrad51a mutant hairy root lines generated by clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9)–mediated genome editing (Supplemental Fig. S7).
To gain insights into the molecular mechanisms of CO formation in cucumber, we performed in-depth multiomic analysis. Since the 2 F2 hybrids were derived from different male parents, we conducted the subsequent multiomic assays using male meiotic tissues from inbred lines 9930 and Cuc37. Due to the lack of published information on DSB sites and epigenetic states in cucumber during meiosis, we constructed libraries for CsRAD51A chromatin immunoprecipitation sequencing (ChIP-seq), H3K4me3 ChIP-seq, formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-seq), whole-genome bisulfite sequencing (WGBS), and RNA sequencing (RNA-seq).
To examine the relationship between CO frequency and multiomic data in cucumber, we visualized their landscapes and calculated the pairwise correlation coefficients (Fig. 5). DNA methylation was negatively correlated with the global CO frequency. We also observed positive relationships between the CO rate and CsRAD51A binding, H3K4me3 modification, open chromatin, transcriptional activity, or gene density. However, there was no obvious correlation between the CO landscape and SNPs or repeat density. These findings provide an overview of the correlations between COs and multiomic information in cucumber, including epigenetic marks, transcriptional activity, and genomic features.
Figure 5. Genome-wide associations between meiotic CO and multiomic signals in cucumber meiosis. A) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. 9930. Circos plot showing coverage profiles along (a) physical coordinate (black contig for predicted centromere and pericentromere) of cucumber chromosome for (b) gene density, (c) repeat density, (d) DSB density, (e) recombination rate (centimorgan per megabase [cM/Mb]), (f) CsRAD51A binding (standardized log2[ChIP/Input] ratio), (g) H3K4 trimethylation level (standardized log2[ChIP/Input] ratio), (h) chromatin accessibility measured by FAIRE-seq (standardized log2[ChIP/Input] ratio), (i) transcription level (TPM), and (j) cytosine methylation level (mC/C ratio). B) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. Cuc37. C, D) The heatmaps represent pairwise Spearman's rank correlation matrix for CsRAD51A binding, chromatin accessibility, H3K4me3, transcription, gene density, repeat density, SNP density between homologs, and CO rate (cM/Mb). Color intensity indicates the strength of correlations.
Characterization of meiotic DSB hotspots
Meiotic recombination begins with the formation of DSBs. Previous studies measured genome-wide DSB sites in plants by identifying binding sites for SPO11-1 (Choi et al. 2018), DMC1 (Tock et al. 2021), and RAD51 (He et al. 2017). To investigate meiotic DSBs in cucumber, we identified differential peaks of CsRAD51A in ChIP-seq data between meiotic and nonmeiotic tissues. As 2 biological replicates of CsRAD51A ChIP-seq data exhibited high similarities (R = 0.91 to 0.99, Pearson correlation coefficient), we combined the 2 data sets before peak calling for further analysis. We identified 49,261 and 57,551 CsRAD51A peaks in 9930 and Cuc37 meiotic buds, respectively. After carefully comparing different peaks with those of nonmeiotic tissues, 3,480 and 5,605 specific peaks remained in 9930 and Cuc37, respectively (Supplemental Table S7). Since ChIP-seq peak calling is based on signal-to-noise models, referring to the previous research (Brick et al. 2018), we considered these peaks to be meiotic DSB hotspots in cucumber.
The average number of DSB hotspots per chromosome was strongly proportional to chromosome length (Supplemental Fig. S8). In total, 2,287 out of 3,480 DSB hotspots in 9930 and 3,642 out of 5,605 DSB hotspots in Cuc37 occurred within CO intervals in the corresponding progeny populations and significantly overlapped with COs (P = 0.0069 in Cuc80 × 9930 and 0.018 in Cuc80 × Cuc37 for 10,000 times permutation test). Of the cucumber DSB hotspots, 93.2% to 95.3% occurred in gene bodies and surrounding regions (10 kb upstream to 10 kb downstream of genes). In both inbred lines, most DSB hotspots (accounting for 83% in 9930 and 89.1% in Cuc37) were present in intergenic regions and introns (Fig. 6, A and B). DSB hotspots were significantly enriched in intergenic regions and depleted in the 5′UTR and CDS regions (P < 0.05 for 10,000 times permutation test, Fig. 6, C and D; Supplemental Table S8).
Figure 6. Genomic location annotation and motif analysis of meiotic DSB hotspots in cucumber. A, B) Enrichment proportion of DSB hotspots in different genomic components. Pie charts show the proportion of DSB hotspots overlapping with the CDS, UTR, intron, promoter (2-kb upstream of TSS), and distal intergenic region in A) 9930 and B) Cuc37, respectively. C, D) Relative enrichment and significance of DSB hotspots in different genomic components. Histogram charts represent the relative enrichment of DSB in corresponding genomic components in C) 9930 and D) Cuc37 normalized by the expected DSB numbers. Compared to expected number, deposition and depletion are marked by postive and negative log values, respectively. E) Most abundant DNA sequence motifs identified in 9930 and Cuc37 DSB hotspots. The alignment quality (P-value) and enrichment proportions are displayed. PWM, position weight matrix. Significances are assessed by 10,000 times permutation test. *P < 0.05, **P < 0.01, and ***P < 0.001.To clarify the sequence characteristics of cucumber DSB hotspots, we looked for enriched known and de novo motifs. The motifs with the highest proportions of DSBs in both lines were polyC repeats and TA repeats (Fig. 6E). We also discovered several de novo motifs (Supplemental Fig. S9). As shown in Supplemental Fig. S10, similar types of motifs were present in different cucumber ecotypes, including polyC repeats and TA repeats, which share similar features with meiotic recombination-related motifs in maize (Zea mays) (He et al. 2017; Kianian et al. 2018), Arabidopsis (Choi et al. 2013), and mammals (Grey et al. 2018) (Supplemental Table S9). These results reveal the sequence bias in positioning DSB sites is conserved in plants including cucumber.
To investigate genes associated with DSBs in cucumber, we performed GO enrichment analysis of genes whose promoters or 5′UTRs overlapped with meiotic DSB hotspots. In total, 935 and 1,306 genes overlapped with meiotic DSB hotspots in 9930 and Cuc37, respectively. In 9930, the significant GO terms of DSB-overlapping genes were related to mitosis/meiosis, protein import-/localization-/modification-related processes, and the response to oxidative stress (Supplemental Table S10). The DSB-overlapping genes in Cuc37 were involved in vitamin biosynthetic and metabolic processes, exocytosis, and the salicylic acid–mediated signaling pathway (Supplemental Table S10).
Transcriptional silencing at DSB sites is required for DNA repair in mammals and plants (Machour and Ayoub 2020; Wang et al. 2022). To identify genes potentially associated with efficient DSB repair, we analyzed GO terms of transcriptionally repressed DSB-associated genes in cucumber. First, we identified the significantly downregulated differentially expressed genes (DEGs) in 9930 and Cuc37 meiotic stamens compared to transcriptome data for 9930 and Cuc37 leaf tissues from a previous study (Li et al. 2022). As shown in Supplemental Fig. S11A, 5,105 genes were significantly downregulated in 9930 meiotic stamens compared to leaves, and the promoters or 5′UTRs of 204 of these genes overlapped with DSB hotspots. Enriched GO terms of the 204 genes included response to biotic/abiotic stimulus, response to osmotic stress, photosynthesis-related processes, and amino acid catabolic and metabolic processes (Supplemental Fig. S11B and Table S11). In Cuc37 meiotic stamens, 264 of the 5,319 downregulated DEGs overlapped with Cuc37 DSB hotspots (Supplemental Fig. S11C). These genes were significantly associated with responses to defense and stress, systemic resistance, and photosynthesis (Supplemental Fig. S11D and Table S11). Genes with high AtSPO11-1-oligo levels were previously shown to be strongly enriched in GO terms related to biotic defense responses in Arabidopsis (Choi et al. 2018). A recent study showed that the meiotic DSB-associated transcriptional repression of genes was related to responses to abiotic/biotic stimuli in Arabidopsis meiocytes (Wang et al. 2022). Our results suggest that meiotic DSBs that occur around genes related to defense and responses to stimuli would tend to be repaired during meiosis and that transcriptional suppression at meiotic DSBs might be conserved in cucumber.
Epigenetic profiles of meiotic DSB hotspots and genes
To evaluate the epigenetic environment of cucumber DSB sites, we calculated the enrichment of epigenetic markers in DSB hotspots. We identified 23,250 H3K4me3 peaks in 9930 and 22,434 in Cuc37. In addition, we detected 13,855 and 14,005 active chromatin regions by FAIRE-seq peak calling in 9930 and Cuc37 meiotic tissues, respectively. We detected 2,510,378 mCG sites, 1,368,475 mCHG sites, and 355,206 mCHH sites in 9930 and 4,313,454 mCG sites, 1,392,300 mCHG sites, and 351,535 mCHH sites in Cuc37. In 9930, DSB hotspots were significantly depleted of H3K4me3 markers and open chromatin but contained abundant mCHH sites (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). In Cuc37, DSB hotspots were also significantly depleted of H3K4me3 but were enriched in mCG and mCHG sites (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). By contrast, CO intervals significantly overlapped with H3K4me3 markers, open chromatin, and mCHH sites in both populations and showed significantly less enrichment of mCG sites in Cuc80 × 9930 (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). These results imply that the epigenetic environment differs in DSB hotspots and CO sites during meiosis.
We visualized the average profiles of the epigenetic status around DSB hotspots. To show the specificity of epigenetic enrichment around DSB hotspots, we used random genomic regions with similar features as DSBs for comparison. Then we further divided the DSB hotspots into 2 groups based on decreasing recombination rate to explore the roles of epigenetic modifications in local CO formation of DSBs. As shown in Fig. 7, the CO rates of DSB sites were positively correlated with the strength of the CsRAD51A ChIP-seq signal. The H3K4me3 modification was specifically absent in meiotic DSBs and their surrounding regions, but there was no obvious association with local CO formation. Compared to random genomic loci, the flanking regions of DSB hotspots displayed more chromatin accessibility but not DSB peaks. Moreover, active chromatin was positively correlated with the recombination rates of DSBs. By contrast, DNA methylation was specifically enriched within DSBs and was negatively correlated with CO rates, especially mCG and mCHG. Our data indicate that more CsRAD51A binding, more chromatin accessibility, and lower mCG and mCHG levels are tightly associated with higher CO frequencies in cucumber DSB hotspots.
Figure 7. Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around meiotic DSB hotspots in cucumber. A) DSB hotspots in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 749; Quantile 2 in blue, n = 2,731) or random grouping (Random 1 in red, n = 749; Random 2 in blue, n = 2,731), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 962; Random 2 in blue, n = 2,518). Quantile 1 for DSBs with recombination rate higher than genomic average Quantile 2 for DSBs with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) DSB hotspots in Cuc37. Quantile 1 and Random 1 for DSBs, n = 1,079; Quantile 2 and Random 2 for DSBs, n = 4,526. Random 1 for genomic regions, n = 1,627; Random 2 for genomic regions, n = 3,978.We also examined the profiles of epigenetic information around genes. Compared to random genomic loci that shared similar features with gene regions, the signals of CsRAD51A ChIP-seq, FAIRE-seq, mCHG, and mCHH decreased at gene body regions but slightly increased at the flanking regions. Moreover, CsRAD51A binding and chromatin accessibility were enhanced at TSSs compared to transcription termination sites (TTSs) (Fig. 8). By contrast, there was more H3K4me3 binding on gene body regions than their upstream and downstream regions, and stronger signals were shown near TSSs. We also detected lower mCG levels around TSS and TTS regions. To further explore the properties associated with CO rates in gene regions, we defined 2 groups of cucumber genes based on decreasing recombination rates. We observed significantly higher levels of CsRAD51A binding and H3K4me3 in CO-active gene regions, whereas genes with lower mCG and mCHG levels exhibited higher recombination rates. In summary, CsRAD51A binding, chromatin accessibility, and hypomethylation were positively correlated with local CO rates in gene regions.
Figure 8. Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around gene regions in cucumber. A) Gene regions in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 10,045; Quantile 2 in blue, n = 13,477) and randomly grouping (Random 1 in red, n = 10,045; Random 2 in blue, n = 13,477), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 11,760; Random 2 in blue, n = 11,762). Quantile 1 for genes with recombination rate higher than genomic average and Quantile 2 for genes with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) Gene regions in Cuc37. Quantile 1 and Random 1 for genes, n = 9,906; Quantile 2 and Random 2 for genes, n = 13,616. Random 1 for genomic regions, n = 11,760; Random 2 for genomic regions, n = 11,762.
Global CO landscape and suppression, as determined by integrating multiomic data
To further verify the relationships between multiomic features and CO landscapes, we predicted megabase-scale CO rates using the random forest algorithm with the following formula: CO rate ∼ CsRAD51A + H3K4me3 + FAIRE + mCG + mCHG + mCHH + gene density + repeat density + SNP density. The nonlinear coefficients between observed and predicted CO frequencies were 0.5342 in Cuc80 × 9930 and 0.3998 in Cuc80 × Cuc37 (Fig. 9; Supplemental Fig. S12, A and B). The ranking of importance for predicting CO frequencies in Cuc80 × 9930 (from high to low) was gene density, mCHG, chromatin accessibility, H3K4me3 modification, CsRAD51A deposition, mCG, repeat density, mCHH, and SNP density (Supplemental Fig. S12C). In predicting the CO landscape of Cuc80 × Cuc37, gene density, H3K4me3 modification, and DNA methylation appeared to be more important than the other features (Supplemental Fig. S12D). When only genomic features were considered, the correlations decreased to 0.4514 in Cuc80 × 9930 and 0.3327 in Cuc80 × Cuc37, highlighting the contributions of epigenetic information to CO formation.
Figure 9. Comparison of observed and predicted CO frequency along cucumber chromosomes. The predictions of CO landscapes were performed by a random forest regression. Line plots strand for observed and predicted CO rates in 2.5-Mb sliding windows in A) Cuc80 × 9930 and B) Cuc80 × Cuc37.We also performed random forest classification to predict megabase-scale CO deserts (CO cold regions) and plotted the receiver operating characteristic (ROC) curve to evaluate the efficiency of each classifier. The accuracies of predictions were 96.0% in Cuc80 × 9930 and 97.4% in Cuc80 × Cuc37 individually. The random forest classifiers with epigenetic and genetic features outperformed with only genetic features in both populations (Supplemental Fig. S13). The good performance of predictions of the CO landscape and CO suppression region supports the notion that the epigenetic and genetic features discussed here are closely related to CO formation in cucumber.
CO alterations determined by distinct epigenetics and SVs
Although the distribution profiles of multiomic data along chromosomes were similar between the 2 germplasms, many local differences still existed. For the CsRAD51A ChIP-seq data, only 599 DSB hotspots from 9930 overlapped with 596 peaks from Cuc37, accounting for 17.2% and 10.6% of total peaks in 9930 and Cuc37, respectively. The vast majority of H3K4me3 peaks in both inbred lines overlapped, including 92.8% of 23,250 peaks from 9930 and 99.3% of 22,434 peaks from Cuc37. Whereas 13,855 and 14,005 active chromatin regions were identified in 9930 and Cuc37, respectively, only 6,371 peaks overlapped between the 2 lines. We also identified 1,338 mCG, 431 mCHG, and 986 mCHH hypo-differentially methylated regions (hypo-DMRs) in 9930 compared to Cuc37, as well as 2,429 mCG, 1,285 mCHG, and 2,284 mCHH hypo-DMRs in Cuc37.
To explore the potential impacts of genetic and epigenetic factors on differences in CO events between hybrids, we considered nonoverlapping COs between the 2 populations to be “unique COs” and measured the overlap of unique COs with distinct epigenetic modifications and SVs. In total, 33% to 40% of unique COs overlapped with specific DSB sites, epigenetic modifications, and heterozygous SVs from the corresponding populations. The unique COs of Cuc80 × 9930 were significantly enriched within 9930-specific DSB hotspots, SV escape (i.e. the absence of a heterozygous SV in the region), and FAIRE + H3K4me3 + hypo-mC interactions (P < 0.05 for 10,000 times permutation test, Fig. 10A; Supplemental Table S13). We detected significant correlations between Cuc37-specific DSB hotspot, SV escape, hypo-mC, SV escape + hypo-mC, DSB hotspot + SV escape, DSB hotspot + SV escape + hypo-mC interactions, and unique COs of Cuc80 × Cuc37 (P < 0.05 for 10,000 times permutation test, Fig. 10B; Supplemental Table S13). Several unique COs overlapped with these features and interactions at the local scale (Supplemental Fig. S14). These data suggest that CO alterations among hybrids could be controlled by specific DSBs, SVs, epigenetic modifications, or their interactions.
Figure 10. Enrichment of distinct epigenetic modifications and SVs in CO alterations between cucumber hybrids. Upset diagrams show the number of unique and overlapping with distinct meiotic DSB hotspots, open chromatin (FAIRE), H3K4me3, hypo-DMR, absence of a heterozygous SV (SV escape) and interaction of features in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Pie charts show the total proportion of unique COs overlapping with distinct features and interactions. Red means features significantly overlapped with unique COs (P < 0.05 for 10,000 permutation test).
Discussion
In this study, we draw the high-resolution landscapes of meiotic DSBs and COs and constructed a multiomic map of meiotic recombination in cucumber. We systematically analyzed the distribution of meiotic COs in different cucumber varieties and compared them. Combined with the genetic information inherited from parental lines, we investigated the impacts of genomic and epigenetic features on CO formation and alterations in the corresponding progeny.
The average number of COs per chromosome per meiosis in the cucumber hybrids was 0.92 to 0.99. The CO number per bivalent in flowering plants spans a limited range (0.76 to 1.9), with genome sizes ranging from 120 Mb to 3 Gb (Li et al. 2015; Zhang et al. 2021; Lian et al. 2022), suggesting that CO assurance is independent of genome size. Most COs occurred within or near gene bodies (Fig. 2). In addition, gene density was a major factor in predicting the CO landscape in cucumber at the megabase scale (Supplemental Fig. S12, C and D). This finding is consistent with the positive relationship between global CO frequency and gene density identified in previous studies (Marand et al. 2017; Kianian et al. 2018; Cai et al. 2023). Perhaps the preferential distribution of COs in gene-rich regions is a by-product of chromatin accessibility around genes.
Meiotic DSB formation is required for CO formation. The number and distribution of COs are tightly associated with those of DSB sites in various plants. In Arabidopsis and wheat (Triticum aestivum), mutations of SPO11-1 decreased the total number of COs (Xue et al. 2018; Hyde et al. 2023). Site-specific CO formation was induced in tomato (Solanum lycopersicum) and maize somatic cells by creating targeted DSB sites (Filler Hayut et al. 2017; Kouranov et al. 2022). Here, we determined that cucumber COs were significantly enriched in meiotic DSB hotspots. At the chromosome scale, we observed a significant positive correlation between the average number of DSB hotspots and chromosome length (Supplemental Fig. S8), which explains the varying CO numbers among different chromosomes in each population (Supplemental Fig. S1, D to F). In addition, DSB hotspots and COs mainly occurred around genes but were significantly depleted in CDS regions due to the presence of abundant nucleosomes.
Plant CO sites always harbor active chromatin and low DNA methylation levels. Here, we revealed preferential H3K4me3 marks and chromatin accessibility within CO intervals in cucumber (Supplemental Table S12). By contrast, H3K4m3 peaks were strongly depleted in 9930 and Cuc37 DSB hotspots, and open chromatin exhibited significantly fewer overlaps with 9930 DSBs than expected (Supplemental Table S12). Further analysis of epigenetic meta-profiles in DSB hotspots showed that a relaxed chromatin environment, coupled with CsRAD51A signals and hypomethylation, was positively correlated with the CO rates of cucumber DSBs (Fig. 7). Our data suggest that these euchromatin marks (H3K4me3 modifications and open chromatin) might participate in DSB repair toward CO formation. Further studies based on higher-resolution CO data are still needed. Although we did not detect significant depletion of DNA methylation in cucumber COs, we observed negative correlations between DNA methylation levels and CO frequency genome-wide (Fig. 5) and at a fine scale (such as DSBs and genes) (Figs. 7 and 8), which aligns with previous findings in Arabidopsis, potato (Solanum tuberosum), and wheat (Marand et al. 2017; Choi et al. 2018; Tock et al. 2021). A deficiency in global DNA methylation is thought to promote COs on chromosome arms but to simultaneously inhibit COs in the pericentromeric region (Melamed-Bessudo and Levy 2012; Mirouze et al. 2012; Yelina et al. 2012). It is reasonable to speculate that abolishing DNA methylation indirectly affects CO formation, perhaps in cooperation with other factors. Although some small inconsistencies existed among varieties within assigned regions, like DSBs and genes (Figs. 7 and 8), which might be partly attributed to unexplored maternal epigenetic imprinting, our data generally support the conserved roles of multiple epigenetic modifications in regulating meiotic CO formation in plants. Further studies are needed to elucidate the precise regulatory mechanisms and their interactions.
We reported that the positioning of meiotic COs is sensitive to polymorphism in cucumber. In Arabidopsis and Brassica oleracea, a parabolic relationship was observed between the SNP density and CO rate genome-wide, and fine-scale CO frequency was both positively and negatively associated with sequence polymorphism (Blackwell et al. 2020; Cai et al. 2023). The strongly increased SNP density around COs corresponds well with MutS Homolog2 (MSH2)–dependent CO remodeling, in which CO frequency in heterozygosity increased when juxtaposed with homozygosity (Blackwell et al. 2020). By contrast, CO depletion in regions with higher densities of polymorphisms might be the production of a loop structure or when homolog pairing is disturbed by heterozygous SVs. Taking advantage of this feature, COs were successfully abolished and restored by introducing and reversing SVs, respectively, between homologs in Arabidopsis (Schmidt et al. 2020; Rönspies et al. 2022). However, there were certain limitations: Szymanska-Lejman et al. (2023) found that SVs do not alter the CO frequency in Arabidopsis CO hotspots unless they are directly located within hotspots and that the stimulation of CO formation in hotspots in response to local polymorphism is dependent on MSH2.
We revealed that the CO landscapes are generally conserved between the 2 cucumber hybrids but differ in the total CO number and local distribution. The high similarity of global CO frequency might be related to the similar gene densities and H3K4me3 modifications between varieties. To date, several CO modifier loci driving allelic differences in CO formation have been identified by genetic mapping of quantitative trait loci in natural recombinant populations. Human Enhancer of Invasion 10 (HEI10) functions in the Class I CO pathway. Natural variations in AtHEI10 explained 23.3% of recombination divergence in Arabidopsis Col × Ler F2 populations, and increased AtHEI10 dosage elevates euchromatic CO (Ziolkowski et al. 2017). A premature stop codon within TBP-ASSOCIATED FACTOR 4b (TAF4b) leads to a genome-wide decrease in CO events in Arabidopsis by mediating meiocyte-specific expression and germline transcription (Lawrence et al. 2019). SUPPRESSOR OF NPR1-1 INDUCIBLE (SNI1) encodes a component of the structural maintenance of chromosome complexes (SMC5/6 complex). Mutations of SNI1 result in reduced CO interference and CO remodeling in Arabidopsis (Zhu et al. 2021). REC8, encoding a cohesin subunit required for meiosis, is a candidate gene underlying the intraspecific divergence of CO within hybrid populations of barley (Hordeumspontaneum and H. vulgare) (Dreissig et al. 2020). In addition, the local recombination rates were both significantly higher and lower in Arabidopsis F1 heterozygotes than in homozygotes (Ziolkowski et al. 2015). The poor associations between CO frequency and polymorphism level suggest that polymorphism does not directly define CO variations between hybrids.
Here, we carried out a pairwise comparison of parental epigenetic information and CO landscapes in the progenies of different cucumber varieties, allowing us to estimate the epigenetic contributions of parental inheritance in CO divergence. Nearly one-third of hybrid-specific COs overlapped with distinct genetic and epigenetic features, and the positioning of CO alterations was tightly linked to specific DSB events, heterozygous SVs, and their interactions with epigenetic modifications (Fig. 10). This observation provides a clue about how parental selection drives the formation of hybrid-specific COs at the epigenetic level. Although DNA methylation imprints are faithfully replicated during cell division, mCHH in male meiocytes undergoes reprogramming during the development of the male sexual lineage (Walker et al. 2018). Together with the germline-specific DNA methylation modifications in meiocytes compared to somatic cells, the actual proportions of CO alterations among hybrids caused by DMRs might be overestimated in this study (Fig. 10). The lack of a more refined meiocyte sampling method in cucumber and additional data prevents us from making a solid conclusion.
Plant breeders are willing to maintain beneficial gene combinations and break unfavorable linkage drag in crop improvement. To meet these requirements, precise and controllable recombination is needed for next-generation plant breeding. Further research on meiotic recombination will advance our understanding of factors that enhance or inhibit CO formation and facilitate the cooperative utilization of these factors to manipulate recombination at target sites during breeding.
Materials and methods
Plant materials and growth conditions
Cucumber (C. sativus L.) inbred lines Cuc37 (Eurasian type), Cuc80 (Xishuangbanna type), and 9930 (East Asian type) were used for the construction of F2 populations. Cuc80 was chosen as a shared female parent to cross with 9930 and Cuc37, respectively. The obtained F1 plants were self-pollinated to produce F2 offspring. Male buds undergoing meiosis stage (Bai et al. 2004) from Cuc37 and 9930 were harvested for CsRAD51 ChIP-seq, and meiotic stamens were collected for H3K4me3 ChIP-seq, FAIRE-seq, WGBS, and RNA-seq analysis. Seedlings at the 2-leaf stage were planted in soil with standard water and pest control management under greenhouse conditions.
Phylogenetic analysis of recA-like proteins
Sequences of recombinase A (recA)-like proteins representing 12 species were collected from NCBI, TAIR, and UniProtKB/Swiss-Prot protein database. Phylogenetic relationship was constructed by ClustalW using maximum likelihood method with 1,000 bootstraps in MEGA X (Kumar et al. 2018) and visualized by Evolview (http://evolgenius.info/#/). Accession numbers of proteins are listed in Supplemental Table S14.
Plasmid construction and hairy root transformation
The full-length CsRAD51A cDNA were cloned with specific primers. The target sgRNA site against CsRAD51A CDS was designed and inserted into a circular GFP CRISPR/Cas9 vector pKSE402, which was modified from pKSE401 (Xing et al. 2014).
Soaked seeds were germinated on moist filter papers in the dark at 28 °C, and cotyledons from 7-d-old seedlings were cut into 2 pieces. Hairy roots derived from cotyledon explants were infected with Agrobacterium rhizogenes strain ArQual harboring pKSE402-CsRAD51A or empty pKSE402 vector and cocultivated for 2 d as previously described (Pak et al. 2009). Inducted hairy roots with GFP grown on antibiotic MS media were firstly in vivo screened with a fluorescence stereo microscope and transformed to fresh media for inducing root extension. The absence of Cas9 and mutation of targeting sites in genomic DNA were detected with specific PCR primers (Supplemental Table S15).
Antibodies
The anti-CsRAD51A (customized from Shanghai Youke) was raised in rabbits with synthetic peptides corresponding to amino acids 168 to 340 of the CsRAD51A protein sequence. Anti-β-actin (Sigma) and anti-CsRAD51A were used as primary antibodies in western blotting and goat anti-rabbit IgG-HRP (Beijing Bersee) as a secondary antibody. After specificity assay, anti-CsRAD51A and a commercial antibody anti-H3K4me3 (Abcam) were further employed in ChIP-seq assays.
Western blotting
Total proteins were extracted using the buffer containing 50 mM Tris–HCl (pH7.4), 150 mM NaCl, 1 mM EDTA, 0.1% (v/v) Triton X-100, protease inhibitor cocktail (Sigma), and phenylmethylsulfonyl fluoride and then quantified using BCA assay kit (Pierce). The protein extracts were separated by 12% Sodium dodecyl-sulfate polyacrylamide gel electrophoresis gels and then electotransferred to Polyvinylidene Difluorid membranes and incubated with anti-β-actin antibody (dilution 1:3,000) and anti-CsRAD51A (dilution 1:500). After washing, the immunoblotting membranes were analyzed using SuperSignal West Femto Kit (Thermo Fisher).
ChIP-seq and FAIRE-seq
Tissues were fixed by vacuum filtration 1% (w/v) formaldehyde and quenched in 0.25 M glycine. ChIP-seq methodology was adapted from previous report (Zhu et al. 2012). Chromatin–protein complexes were isolated with extraction buffer and sonicated with a Bioruptor (Diagenode) for 11 cycles. About 10 µL of chromatin lysate before IP was reserved as input control, and the remains were incubated overnight with the antibody on a rotator at 4 °C. Then ChIP-enriched and input chromatin was released from complexes with reverse crosslinking and purified with phenol and chloroform.
FAIRE-seq experiment was performed following the previous protocol with some modifications (Simon et al. 2012). After fixation and sonication, the chromatin lysate separated nucleosome-depleted DNA from proteins by phenol–chloroform extraction without specific antibodies. Then the purified FAIRE and input DNA products were eluted into 20 µL 10 mM Tris–HCl (pH 7.4).
DNA concentration and size distribution were assessed using Qubit fluorometer (Invitrogen) and Agilent 2100 Bioanalyzer (Agilent). ChIP and FAIRE libraries were constructed using NEXTFLEX ChIP-Seq Kit (PerkinElmer), sequenced on Illumina HiSeq with PE150 mode, and about 6 Gb output data were generated per library (Supplemental Table S16).
RNA-seq, WGBS, and resequencing
Total RNA was extracted using TRIzol reagent following the manufacturer's protocols (Invitrogen). Genomic DNAs were extracted with an established cetyl trimethylammonium bromide protocol (Murray and Thompson 1980). Nucleic acid products were quantified using 1% agarose gels and Agilent 2100 (Agilent). WGBS libraries were constructed as previously described (Zhong et al. 2013). Libraries for RNA-seq and DNA resequencing were prepared with NEBNext Ultra RNA library prep kit (NEB) and Annoroad Universal DNA Fragmentase kit (Annoroad), respectively. Sequencing was performed on Illumina NovaSeq PE150 platform. Each library for WGBS analysis produced 12 Gb of data (Supplemental Table S17). And output data from RNA-seq and resequencing library were about 5 Gb per sample (Supplemental Tables S18 and S19).
ChIP-seq and FAIRE-seq data alignment and processing
For paired-end ChIP-seq and FAIRE-seq, the removal of adaptor sequences and quality filter of reads with quality <Q30 and shorter than 30 bases was performed with FASTQ v0.20.1 (Chen et al. 2018). The trimmed reads were mapped to the cucumber reference genome ‘9930' v3 (Li et al. 2019) using Bowtie2 v2.3.4.3 (Langmead and Salzberg 2012), with the following parameters: --very-sensitive --no-mixed --no-discordant -k 4.
We employed SAMtools v1.9 (Li et al. 2009) to further filter aligned read pairs that contained more than 6 mismatches, only 1 read in pair, or its alignment score Mapping Quality (MAPQ) <2. For the read pairs aligned to multiple positions, only the alignment with the highest alignment score remained. Alignments with MAPQ ≥23 that aligned to only 1 genomic location remained. Then the remaining alignments were written into BAM files, further transformed to bigWig and bedGraph formats utilizing bamCoverage tool from deepTools v3.1.3 (Ramírez et al. 2014), and normalized to counts per million mapped reads (CPM) in 1-bp and 1-Mb adjacent genomic windows, respectively.
Bisulfite sequencing data alignment and processing
For bisulfite sequencing data, adaptor sequences, low-quality bases (quality <20), and reads shorter than 30 bases were filtered using FASTQ. The filtered reads were mapped to the genome sequence using Bismark v0.20.0 (Krueger and Andrews 2011). The best alignment was retained when read pairs mapped to multiple genomic locations. The proportions of DNA methylation contexts (CG, CHG, and CHH) were calculated using Bismark Methylation extractor. And Bismark2bedGraph module in Bismark and bedGraphToBigWig from USCS toolkit were used to generate bigWig files for checking in IGV browser (Thorvaldsdóttir et al. 2013).
RNA-seq data alignment and processing
The raw RNA-seq data were trimmed by FASTQ, and adaptors and reads with low-quality (quality <20) or lengths shorter than 30 bases were removed in the step. The clean reads were aligned to the genome sequence using HISAT2 v2.1.0 (Kim et al. 2015), with the following settings: --no-unal --no-discordant --no-mixed --rna-strandness RF -k 5. Unique read pairs with high-quality alignment (MAPQ = 60) were retained. For alignments with MAPQ <60, containing more than 6 mismatches or consisting of only 1 read in a pair was discarded. Then transcripts per million (TPM) values were calculated, and sorted bam files were transformed into bigWig formats. The read counts for each gene were obtained by FeatureCounts (v.2.0.1) from BAM files (Liao et al. 2014). DEG analysis was performed using DESeq2 (v.1.30.0 R package) with default parameters (Love et al. 2014). To identify DEGs in our study, we set a threshold for statistical significance, where genes showing a P < 0.05 and |fold change| ≥ 2 were deemed to be differentially expressed.
Genotyping and identification of CO events in F2 populations
Syntenic regions between biparents of 2 populations were described in cucumber pan-genome assay (Li et al. 2022). Sequencing reads were aligned against reference genome using Bowtie2, and SNPs and indels were called with GATK v4.0.12.0 (McKenna et al. 2010). To obtain credible markers, variants on syntenic regions were filtered by bcftools with the following parameters: N_ALT >1, QD <2, FS >60, SOR >3, MQ <40, MQRankSum <−12.5, and ReadPosRankSum <−8.0. Missing genotypes in F2 population were imputed with BEAGLE 4.1 (Browning and Browning 2007). Then, every chromosome was split into bins with 1,000 SNPs to roughly estimate COs, and each maker in adjacent bins with genotype changes was scored to find CO boundaries.
Differential recombination regions and CO hotspot identification
Permutation tests were employed to assess the differences in recombination rates between 2 populations. Considering the continuity of meiotic CO in adjacent windows, each chromosome was considered to be a whole unit. Firstly, the order of 7 chromosomes was rearranged, each chromosome was randomly inverted, and the origin of data from which population was also randomly chosen. Secondly, the differences in CO rates of each 50-kb bin between the 2 populations were calculated. Thirdly, a series of hypothetic differences in all the bins were computed by performing permutation 10,000 times. And the P-value represented the proportion of hypothetic difference, which was larger than the observation.
A unique CO hotspot was defined as the local recombination rate in 1 population that was higher than 5 times of genomic average, and the corresponding recombination in another population was lower than the average. The landscape of CO comparison was drawn with shinyChromosome (Yu et al. 2019).
GO term enrichment analysis
We performed enrichment analysis of genes overlapped within regions of interest with R package “GOstats.” Clustered heatmaps of significantly enriched GO terms (P < 0.05) were constructed with “ComplexHeatmap” package.
ChIP-seq and FAIRE-seq peak calling
CsRAD51A IP and input data sets generated from meiotic and nonmeiotic tissues were supplied to PeakRanger with ranger function (Feng et al. 2011). Then the meiosis-specific DSB peaks were obtained after differential peak comparison between meiotic and control peaks by MAnorm (Tu et al. 2021) with 1.8-fold change criteria. H3K4me3 peaks were analyzed with MACS2 (Zhang et al. 2008), and the overlapping peaks between biological replicates were filtered with BEDTools intersect function (Quinlan and Hall 2010). For FAIRE-seq analysis, peak callings were performed by fseq2 (Zhao and Boyle 2021). Before CsRAD51A ChIP-seq and FAIRE-seq peak calling, bam files of different replicates from the same tissues were merged.
Enrichment annotation and motif analysis
The enrichment significance within regions of interest was analyzed with 10,000 times permutation test by R package “regioneR.” DSB-enriched motifs were generated by HOMER using findMotifGenome.pl script (Heinz et al. 2010) with the P < 0.05 and enrichment >30%. A phylogenetic analysis of motif matrixes was built by R package “universalmotif” based on position weight matrix (PWM).
Genome-wide multiomic landscapes and correlation analysis
The log2([ChIP + 1]/[input + 1]) coverage ratios of ChIP-seq and FAIRE-seq, TPM values of RNA-seq, DNA methylation proportions, and densities of the genes, repeats, and SNPs were calculated in nonoverlapping 500-kb bins and plotted using Circos v0.67 (Krzywinski et al. 2009). The pairwise Spearman correlation coefficients between multiomics were calculated with “rcorr” function of the “Hmisc” package, and heatmaps were plotted using the “corrplot” package in R.
Chromatin status around DSB hotspots and genes
Multiomic signal profiles around DSB hotspots and genes were obtained from bigWig files by computeMatrix “scale-regions” mode in deepTools. The average coverage ratios of epigenetic features from 2-kb upstream to 2-kb downstream were calculated per 20-bp window. Each profile of epigenetic features was grouped into 2 groups by decreasing recombination rate or randomly. In addition, genomic regions with similar characteristics were randomly chosen as the control. Average fitted values of epigenetic features of each group were respectively visualized with 95% confidence intervals using R.
Mega-scale CO frequency and CO suppression prediction
Predictions of CO frequency and CO suppression were performed by random forest regression and classification with 7-fold crossvalidation. The whole genome was divided into nonoverlapping 500-kb bins. The coverage profiles of multiomics were taken as independent variables, recombination rate, and whether CO desert as the dependent variable. Every chromosome was split into 7 equals, a seventh of a chromosome was randomly chosen as testing data, and the remaining as training data. These random forest models were carried out by the R package “randomForest” (mtry = 3, ntree = 1,500). The importance of features was identified by the mean decrease in node impurity (%IncMSE) with the “importance” function. Model performance in regression analysis was assessed according to MSE and nonlinear correlation coefficient of predictions and observations (“nlcor” package). For classification analysis, P-value of the testing set was optimized with its training set, and the predictions were appointed as CO suppression when the probabilities were higher than the P-value. The overall performance of random forest classification was given by mean accuracy of predictions and ROC curve.
Accession numbers
All sequencing data generated in this study were deposited in the NCBI database under BioProject accession numbers PRJNA936557 and PRJNA936649.
Supplementary Material
kiad432_Supplementary_Data Click here for additional data file.
|
|
sec
|
Introduction
Modern crop cultivars exhibit enhanced morphological and agronomic characteristics due to trait selection during breeding. Selecting the appropriate parental combination is the first step in breeding, and maximizing genetic variability and integrating superior genotypes are important for parental selection. However, concerns about introducing undesirable linkage drag and breaking beneficial gene combinations may limit the application of elite alleles in plant breeding. Therefore, a deeper understanding of the factors influencing meiotic recombination and hybrid-specific divergence will help plant breeders and geneticists select the appropriate parental combinations and manipulate recombination events at target genomic regions.
Meiotic recombination occurs following the formation of DNA double-strand breaks (DSBs), a process catalyzed by the topoisomerase-like protein SPORULATION 11 (SPO11) (Grelon et al. 2001; Lange et al. 2016). These DSBs are repaired by the recombinases Disrupted Meiotic cDNA 1 (DMC1) and RADiation sensitive 51 (RAD51) (Kurzbauer et al. 2012). The processing of DSB ends results in reciprocal crossover (CO) formation or nonreciprocal gene conversion.
The regulation of meiotic CO formation is well studied in plants (Wang and Copenhaver 2018). The genome-wide phenomenon of CO assurance ensures the occurrence of at least 1 CO per homologous chromosome pair, and the regulation of CO homeostasis produces a stable number of COs among individuals (Sidhu et al. 2015; Li et al. 2021). Moreover, the presence of a CO suppresses the occurrence of another CO nearby, a process known as CO interference (Lambing et al. 2020). In the fine scale, several factors promote COs, including DSB frequency, the presence of active chromatin, hypomethylation, the presence of the histone H3 lysine 4 trimethylation (H3K4me3) modification, and the absence of the histone H3 lysine 9 dimethylation (H3K9me2) modification (Marand et al. 2017; Tock et al. 2021; Lian et al. 2022). The relationship between CO formation and sequence polymorphism is complicated. Structural variations (SVs) between homologous chromosomes can suppress CO formation (Lawrence et al. 2017; Rowan et al. 2019), whereas the local distribution of COs is associated with elevated polymorphism levels (Blackwell et al. 2020). Different hybrids exhibit diverse meiotic recombination landscapes, but these differences are not exclusively due to distinct polymorphism levels (Ziolkowski et al. 2015; Dreissig et al. 2020), which may be partially explained by differences in heritable epigenetic states between germplasms.
Cucumber (Cucumis sativus) is an economically important vegetable crop (FAOSTAT, http://www.fao.org/faostat/en/). The haploid cucumber genome has an estimated size of 211 Mb comprising 7 chromosomes, with 24,317 annotated protein-coding genes and 77,092 repetitive elements covering 42.6% and 36.4% of the genome, respectively (Li et al. 2019). Here, we addressed sites of COs and DSB hotspots in cucumber and characterized factors that influence their occupancy. We revealed that CO formation and divergence are associated with C. sativus RADiation sensitive 51A (CsRAD51A) binding, euchromatin marks, and heterozygous SVs. Our findings point to the multiomic regulation of CO formation in cucumber and highlight the influence of epigenetic marks inherited from the parents on CO distribution and alterations in the progeny.
|
|
title
|
Introduction
|
|
p
|
Modern crop cultivars exhibit enhanced morphological and agronomic characteristics due to trait selection during breeding. Selecting the appropriate parental combination is the first step in breeding, and maximizing genetic variability and integrating superior genotypes are important for parental selection. However, concerns about introducing undesirable linkage drag and breaking beneficial gene combinations may limit the application of elite alleles in plant breeding. Therefore, a deeper understanding of the factors influencing meiotic recombination and hybrid-specific divergence will help plant breeders and geneticists select the appropriate parental combinations and manipulate recombination events at target genomic regions.
|
|
p
|
Meiotic recombination occurs following the formation of DNA double-strand breaks (DSBs), a process catalyzed by the topoisomerase-like protein SPORULATION 11 (SPO11) (Grelon et al. 2001; Lange et al. 2016). These DSBs are repaired by the recombinases Disrupted Meiotic cDNA 1 (DMC1) and RADiation sensitive 51 (RAD51) (Kurzbauer et al. 2012). The processing of DSB ends results in reciprocal crossover (CO) formation or nonreciprocal gene conversion.
|
|
p
|
The regulation of meiotic CO formation is well studied in plants (Wang and Copenhaver 2018). The genome-wide phenomenon of CO assurance ensures the occurrence of at least 1 CO per homologous chromosome pair, and the regulation of CO homeostasis produces a stable number of COs among individuals (Sidhu et al. 2015; Li et al. 2021). Moreover, the presence of a CO suppresses the occurrence of another CO nearby, a process known as CO interference (Lambing et al. 2020). In the fine scale, several factors promote COs, including DSB frequency, the presence of active chromatin, hypomethylation, the presence of the histone H3 lysine 4 trimethylation (H3K4me3) modification, and the absence of the histone H3 lysine 9 dimethylation (H3K9me2) modification (Marand et al. 2017; Tock et al. 2021; Lian et al. 2022). The relationship between CO formation and sequence polymorphism is complicated. Structural variations (SVs) between homologous chromosomes can suppress CO formation (Lawrence et al. 2017; Rowan et al. 2019), whereas the local distribution of COs is associated with elevated polymorphism levels (Blackwell et al. 2020). Different hybrids exhibit diverse meiotic recombination landscapes, but these differences are not exclusively due to distinct polymorphism levels (Ziolkowski et al. 2015; Dreissig et al. 2020), which may be partially explained by differences in heritable epigenetic states between germplasms.
|
|
p
|
Cucumber (Cucumis sativus) is an economically important vegetable crop (FAOSTAT, http://www.fao.org/faostat/en/). The haploid cucumber genome has an estimated size of 211 Mb comprising 7 chromosomes, with 24,317 annotated protein-coding genes and 77,092 repetitive elements covering 42.6% and 36.4% of the genome, respectively (Li et al. 2019). Here, we addressed sites of COs and DSB hotspots in cucumber and characterized factors that influence their occupancy. We revealed that CO formation and divergence are associated with C. sativus RADiation sensitive 51A (CsRAD51A) binding, euchromatin marks, and heterozygous SVs. Our findings point to the multiomic regulation of CO formation in cucumber and highlight the influence of epigenetic marks inherited from the parents on CO distribution and alterations in the progeny.
|
|
sec
|
Results
Genome-wide identification of meiotic COs
To map the CO landscape and capture epigenetic and transcriptional information during CO formation in cucumber, we constructed 2 F2 populations (Fig. 1A) and generated novel multiomic data sets from meiotic tissues of the corresponding male parents (Fig. 1B). In a previous study, we divided core cucumber accessions into 4 subgroups: the Indian, Xishuangbanna, Eurasian, and East Asian groups (Qi et al. 2013). To increase sequence polymorphism in the progeny, we crossed the semiwild Xishuangbanna cucumber accession Cuc80 as the shared female parent with cultivated accession 9930 or Cuc37. We self-pollinated the F1 hybrids and generated 2 F2 populations containing 200 (Cuc80 × 9930) and 193 individuals (Cuc80 × Cuc37).
Figure 1. Experimental design for multiomic analysis of meiotic recombination formation in cucumber. A) Construction of cucumber F2 populations. The semiwild Xishuangbanna cucumber accession Cuc80 was chosen as a shared female parent to cross with cultivated accessions 9930 (East Asia type) and Cuc37 (Eurasian type), respectively. The obtained F1 hybrids were self-pollinated to produce F2 populations. B) Multiomic integration of cucumber recombination. DSB sites were measured by CsRAD51A ChIP-seq. Epigenetic modifications were carried out by H3K4me3 ChIP-seq, FAIRE-seq, and WGBS. Transcription levels were presented with RNA-seq. 5-Methylcytosine (5-meC) represents methylation of the fifth position of cytosine.Using 822,844 and 1,203,797 sequence variants between the parental lines (Supplemental Table S1), we identified a total of 2,585 and 2,663 COs in the Cuc80 × 9930 and Cuc80 × Cuc37 F2 populations (Supplemental Table S2), respectively. The total number of COs per F2 individual ranged from 4 to 22. The average number of COs per individual in Cuc80 × Cuc37 was 13.8, which is significantly higher than that (12.9) in Cuc80 × 9930 (P = 0.0024, Mann–Whitney U test, Supplemental Fig. S1A). We observed robust CO assurance in the F2 populations, as the average number of COs per chromosome per meiosis was 0.92 in Cuc80 × 9930 and 0.99 in Cuc80 × Cuc37. The CO frequency did not show a Poisson distribution (Supplemental Fig. S1, B and C), suggesting the existence of CO interference. There were significant differences between CO numbers on different chromosomes (Supplemental Fig. S1, D and E). The average CO number per chromosome increased in proportion to the chromosome length in both populations (Supplemental Fig. S1F).
To assess the fine-scale distribution of COs, we looked at COs at higher resolution (CO intervals <12 kb) to enrichment analysis, including 1,814 COs from Cuc80 × 9930 and 1,958 COs from Cuc80 × Cuc37. Nearly 30% of these COs were located in gene body regions, and more than 93% were in genes or their 10-kb flanking intergenic regions in both populations (Fig. 2A). As shown in Fig. 2, B to E, CO enrichment patterns differed in various genomic components but were conserved between hybrids. Specifically, most COs occurred in distal intergenic regions (accounting for 42.6% to 46.1% of total COs), followed by promoter regions (24.0% to 26.5%) located within 2 kb upstream of transcription start sites (TSSs), introns (13.3% to 13.5%), coding sequences (CDSs, ∼10%), and untranslated regions (UTRs, ∼7%). We performed permutation tests to assess the significance of CO enrichment within these regions. When normalized to the expected CO enrichment, COs were significantly enriched in 5ʹUTRs and intergenic regions but exhibited less-than-expected overlap with introns and CDSs (P < 0.01 for 10,000 times permutation test, Supplemental Table S3). These findings reflect a preference for CO formation away from CDSs and introns in cucumber.
Figure 2. Enrichment of COs within various genomic components in cucumber. A) Relative abundance of COs enriched within the genic region and flanking intergenic region in cucumber. Stack bar charts show frequencies of COs overlapping with gene body and intergenic region with increasing distances to the genes in Cuc80 × 9930 (left) and Cuc80 × Cuc37 (right) populations. B, C) Enrichment proportion of COs in different genomic components. Pie charts show the proportion of COs overlapping with the CDS, UTR, intron, promoter (2 kb upstream of TSS), and distal intergenic region in B) Cuc80 × 9930 and C) Cuc80 × Cuc37, respectively. D, E) Relative enrichment and significance of COs in different genomic components. Histogram charts represent the relative enrichments of COs in corresponding genomic components in D) Cuc80 × 9930 and E) Cuc80 × Cuc37 normalized by the expected CO numbers. Compared to the expected number, deposition and depletion are marked by positive and negative log values, respectively. Significances are assessed by 10,000 times permutation test. **P < 0.01, and ***P < 0.001.To explore the functions of CO-related genes, we performed gene ontology (GO) enrichment analysis of genes within the high-resolution CO intervals (<12 kb). CO-overlapping genes were significantly enriched for GO terms involved in DNA/RNA biosynthesis and transcriptional regulation in both populations (Supplemental Fig. S2 and Table S4). The regions at least 10 kb away from COs were considered “CO cold regions”. No significantly enriched GO terms were shared between CO intervals and CO cold regions. These results indicate that CO events tend to occur around genes with certain functions.
Comparisons of CO landscapes between hybrids
To estimate the similarity of global CO distribution among hybrids, we calculated the correlation coefficient of CO rates in nonoverlapping 500-kb bins between the 2 populations. The genome-wide CO landscapes were conserved among hybrids (bin size = 500 kb, Spearman correlation coefficient, R = 0.41, Supplemental Fig. S3). The total lengths of CO cold regions in Cuc80 × 9930 and Cuc80 × Cuc37 were 140.26 and 147.29 Mb, respectively. The overlapping CO cold regions between the 2 populations covered 103.91 Mb, encompassing identified regulatory genes in cucumber. A well-studied cucurbitacin C (CuC) biosynthetic gene cluster comprises 1 terpene oxidosqualene cyclase gene (Bi), 3 cytochrome P450 genes, and 1 acetylase gene; this cluster regulates the biosynthesis of bitterness in Cucurbitaceae (Shang et al. 2014; Zhou et al. 2016). The CsBi cluster overlapped with a CO cold region and was located far from CO intervals in both populations (P < 0.05 for 10,000 Monte Carlo simulations, Supplemental Fig. S4). This might benefit the horizontal transfer and maintenance of the CsBi cluster during hybridization and facilitate the coexpression and coregulation of the clustered genes during CuC biosynthesis.
However, the correlation coefficient of CO landscapes fell to 0.18 when we focused on a smaller scale (bin size = 50 kb, Spearman correlation coefficient), implying that fine-scale CO divergence exists between the 2 populations. We then employed a permutation test to scan the differential recombination regions (see Materials and methods). We identified 136 significant differential recombination regions between the 2 populations, which were spread over the cucumber genome (bin size = 50 kb, P < 0.05 for 10,000 times permutation test, Fig. 3). Here, we regarded 50-kb bins with a 5-fold average recombination rate as CO hotspots. As a result, we identified 72 CO hotspots in both populations, 3 of which overlapped between the 2 populations. We defined a unique hotspot as one that was detected in only 1 population and had a lower-than-average recombination rate in the other population. We identified 19 and 25 unique hotspots in Cuc80 × 9930 and Cuc80 × Cuc37, respectively, and the remaining were unbiased hotspots between the 2 populations. The differential recombination regions and unique CO hotspots were unevenly distributed along the cucumber chromosomes.
Figure 3. Comparison of CO landscapes between cucumber populations. The green shadows represent the predicted positions of centromeres and pericentromeres (1 Mb flanking regions of centromeres). Blue and orange lines indicate chromosomal patterns in Cuc80 × 9930 and Cuc80 × Cuc37, respectively. The windows in Cuc80 × 9930 with a significantly higher CO rate than Cuc80 × Cuc37 is marked with blue hollow circles, and the opposite is orange. Unique CO hotspots in Cuc80 × 9930 and Cuc80 × Cuc37 are marked with blue and orange six-pointed stars, respectively. The bottom stacks represent gene density. Bin size = 50 kb. The significance of difference in recombination rates is calculated with 10,000 times permutation test, P < 0.05.Notably, some verified functional genes existed in these regions of differential recombination between hybrids. Among these, the fruit spine development gene Spine Base Size1 (CsSBS1) (Yang et al. 2022) and the plant architecture gene De-Etiolated-2 (CsDET2) (Hou et al. 2017) overlapped with regions containing significantly higher CO rates in Cuc80 × Cuc37 and CO cold regions in Cuc80 × 9930. Moreover, we observed no sequence variations of CsDET2 between inbred lines 9930 and Cuc37. These results suggest that selecting parents carrying the same genotype can result in great differences in fine-scale recombination between progenies. In summary, the fine-scale CO distributions and corresponding gene functions show diverse characteristics between different hybrids.
Genomic and epigenetic features associated with CO distribution
SVs play important roles in driving plant genome evolution and trait adaptation in crops (Li et al. 2023). To assess CO patterns within SVs, we examined the enrichment of COs relative to SVs. Biparental SVs (Li et al. 2022) included inversions, deletions, insertions, and translocations. We determined that 227 out of 2,585 COs from Cuc80 × 9930 and 324 out of 2,663 COs from Cuc80 × Cuc37 overlapped with SVs, 24 of which spanned predicted centromeres. Further analysis revealed that COs with higher resolution (CO intervals <100 kb) were significantly suppressed in heterozygous SVs, regardless of SV type (P < 0.001 for 10,000 times permutation test, Fig. 4, A and B; Supplemental Table S5). In addition, the intervals of COs overlapping with SVs were much longer than those of total COs (P < 0.001 for Mann–Whitney U test, Fig. 4, C and D; Supplemental Table S6), suggesting that the actual proportion of overlapping COs with SVs is much lower than observed. The presence of 2 SVs upstream of FLOWERING LOCUS T (CsFT) increases CsFT transcription and shortens flowering time in cucumber (Wang et al. 2020; Li et al. 2022). The 3 parental varieties used in the current study have different CsFT genotypes: Cuc80 with the long-2 type genotype is a late-flowering germplasm, whereas 9930 (short-1 type) and Cuc37 (short-2 type) are early flowering. We detected no COs spanning the CsFT locus in either population, but we did detect 1 CO at the left border of the heterozygous 12.1-kb deletion in the short-2 type CsFT locus in Cuc80 × Cuc37. This result matches the previous observation that CO formation is suppressed within SVs and elevated in the surrounding region (Rowan et al. 2019).
Figure 4. COs are suppressed around SV but positively associated with SNP density on the fine scale. A, B) Relative enrichment and significance of COs in different types of SVs in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Histogram charts represent for relative enrichments of COs in corresponding genomic components normalized by the expected CO number. Compared to expected number, deposition and depletion are shown in red and dark gray, respectively. C, D) Distances of nearest COs to SVs. DEL, deletion; INS, insertion; INV, inversion; TRANS, translocation (n = 2,540 for total COs with intervals <100 kb, 396 for SV-overlapping COs, 186 for DEL-overlapping COs, 196 for INS-overlapping COs, 1 for INV-overlapping COs, 13 for TRANS-overlapping COs in Cuc80 × 9930; n = 2,582 for total COs with intervals <100 kb, 632 for SV-overlapping COs, 265 for DEL-overlapping COs, 342 for INS-overlapping COs, 1 for INV-overlapping CO, 24 for INS-overlapping COs, INV-overlapping COs, TRANS-overlapping COs in Cuc80 × Cuc37). Boxplots show the median with center line, upper and lower quartiles with box limits, outliers with points, and the whiskers represent 1.5× interquartile range limits. E, F) SNP density around centers of CO intervals in cucumber. CO center indicates the midpoint of the CO interval. E) Hollow circles stand for SNP density (count/kb) in Cuc80 × 9930 within increasing distance away from CO centers of total COs, COs with intervals <5 kb, COs with intervals <10 kb, COs with intervals <15 kb, COs with intervals <20 kb, and random genomic region. The green dash line represents for average SNP density in genome-wide. F) Distributions in Cuc80 × Cuc37 (n = 1,413 for CO intervals <5 kb, 1,729 for CO intervals <10 kb, 1,912 for CO intervals <15 kb, 2,080 for CO intervals <20 kb, 2,585 for random and total COs in Cuc80 × 9930; n = 1,596 for CO intervals <5 kb, 1,879 for CO intervals <10 kb, 2,036 for CO intervals <15 kb, 2,152 for CO intervals <20 kb, 2,663 for random and total COs in Cuc80 × Cuc37). Significances are assessed by A, B) 10,000 times permutation test and C, D) Mann–Whitney U test. ***P < 0.001.To investigate the correlation between single nucleotide polymorphism (SNP) density and CO formation at a fine scale, we counted SNPs at continuously increasing distances away from CO centers (midpoints of CO intervals). In Cuc80 × 9930, SNP numbers increased close to CO centers but were lower than the random and average SNP densities (Fig. 4E), perhaps due to the limited CO resolution in Cuc80 × 9930. However, when only COs with higher resolutions were considered, SNP densities around CO centers visibly increased in Cuc80 × 9930. SNP densities in Cuc80 × Cuc37 were extremely elevated toward CO centers and much higher than random (Fig. 4F). To further investigate their global relationships, we calculated the correlation coefficients of polymorphism and recombination rates. As shown in Supplemental Fig. S5, A and B, we observed weakly negative correlations in both populations (Spearman correlation coefficient, R = −0.12 in Cuc80 × 9930, R = −0.02 in Cuc80×Cuc37). Together, these results indicate that polymorphism shapes CO distribution in cucumber on a fine scale but not genome-wide.
To pinpoint the localization of DSBs, we first isolated a cucumber homolog of RAD51A. A BLASTP search revealed 7 putative homologs of Arabidopsis (Arabidopsis thaliana) RAD51 (AT5G20850) in cucumber. Among these, 1 protein (CsaV3_1G41430.1) shares high sequence similarity with RAD51A proteins from Plantae, Fungi, and Animalia (Supplemental Fig. S6); we named this protein CsRAD51A. We then generated a polyclonal antibody against CsRAD51A and tested its specificity by immunoblot analysis of csrad51a mutant hairy root lines generated by clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9)–mediated genome editing (Supplemental Fig. S7).
To gain insights into the molecular mechanisms of CO formation in cucumber, we performed in-depth multiomic analysis. Since the 2 F2 hybrids were derived from different male parents, we conducted the subsequent multiomic assays using male meiotic tissues from inbred lines 9930 and Cuc37. Due to the lack of published information on DSB sites and epigenetic states in cucumber during meiosis, we constructed libraries for CsRAD51A chromatin immunoprecipitation sequencing (ChIP-seq), H3K4me3 ChIP-seq, formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-seq), whole-genome bisulfite sequencing (WGBS), and RNA sequencing (RNA-seq).
To examine the relationship between CO frequency and multiomic data in cucumber, we visualized their landscapes and calculated the pairwise correlation coefficients (Fig. 5). DNA methylation was negatively correlated with the global CO frequency. We also observed positive relationships between the CO rate and CsRAD51A binding, H3K4me3 modification, open chromatin, transcriptional activity, or gene density. However, there was no obvious correlation between the CO landscape and SNPs or repeat density. These findings provide an overview of the correlations between COs and multiomic information in cucumber, including epigenetic marks, transcriptional activity, and genomic features.
Figure 5. Genome-wide associations between meiotic CO and multiomic signals in cucumber meiosis. A) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. 9930. Circos plot showing coverage profiles along (a) physical coordinate (black contig for predicted centromere and pericentromere) of cucumber chromosome for (b) gene density, (c) repeat density, (d) DSB density, (e) recombination rate (centimorgan per megabase [cM/Mb]), (f) CsRAD51A binding (standardized log2[ChIP/Input] ratio), (g) H3K4 trimethylation level (standardized log2[ChIP/Input] ratio), (h) chromatin accessibility measured by FAIRE-seq (standardized log2[ChIP/Input] ratio), (i) transcription level (TPM), and (j) cytosine methylation level (mC/C ratio). B) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. Cuc37. C, D) The heatmaps represent pairwise Spearman's rank correlation matrix for CsRAD51A binding, chromatin accessibility, H3K4me3, transcription, gene density, repeat density, SNP density between homologs, and CO rate (cM/Mb). Color intensity indicates the strength of correlations.
Characterization of meiotic DSB hotspots
Meiotic recombination begins with the formation of DSBs. Previous studies measured genome-wide DSB sites in plants by identifying binding sites for SPO11-1 (Choi et al. 2018), DMC1 (Tock et al. 2021), and RAD51 (He et al. 2017). To investigate meiotic DSBs in cucumber, we identified differential peaks of CsRAD51A in ChIP-seq data between meiotic and nonmeiotic tissues. As 2 biological replicates of CsRAD51A ChIP-seq data exhibited high similarities (R = 0.91 to 0.99, Pearson correlation coefficient), we combined the 2 data sets before peak calling for further analysis. We identified 49,261 and 57,551 CsRAD51A peaks in 9930 and Cuc37 meiotic buds, respectively. After carefully comparing different peaks with those of nonmeiotic tissues, 3,480 and 5,605 specific peaks remained in 9930 and Cuc37, respectively (Supplemental Table S7). Since ChIP-seq peak calling is based on signal-to-noise models, referring to the previous research (Brick et al. 2018), we considered these peaks to be meiotic DSB hotspots in cucumber.
The average number of DSB hotspots per chromosome was strongly proportional to chromosome length (Supplemental Fig. S8). In total, 2,287 out of 3,480 DSB hotspots in 9930 and 3,642 out of 5,605 DSB hotspots in Cuc37 occurred within CO intervals in the corresponding progeny populations and significantly overlapped with COs (P = 0.0069 in Cuc80 × 9930 and 0.018 in Cuc80 × Cuc37 for 10,000 times permutation test). Of the cucumber DSB hotspots, 93.2% to 95.3% occurred in gene bodies and surrounding regions (10 kb upstream to 10 kb downstream of genes). In both inbred lines, most DSB hotspots (accounting for 83% in 9930 and 89.1% in Cuc37) were present in intergenic regions and introns (Fig. 6, A and B). DSB hotspots were significantly enriched in intergenic regions and depleted in the 5′UTR and CDS regions (P < 0.05 for 10,000 times permutation test, Fig. 6, C and D; Supplemental Table S8).
Figure 6. Genomic location annotation and motif analysis of meiotic DSB hotspots in cucumber. A, B) Enrichment proportion of DSB hotspots in different genomic components. Pie charts show the proportion of DSB hotspots overlapping with the CDS, UTR, intron, promoter (2-kb upstream of TSS), and distal intergenic region in A) 9930 and B) Cuc37, respectively. C, D) Relative enrichment and significance of DSB hotspots in different genomic components. Histogram charts represent the relative enrichment of DSB in corresponding genomic components in C) 9930 and D) Cuc37 normalized by the expected DSB numbers. Compared to expected number, deposition and depletion are marked by postive and negative log values, respectively. E) Most abundant DNA sequence motifs identified in 9930 and Cuc37 DSB hotspots. The alignment quality (P-value) and enrichment proportions are displayed. PWM, position weight matrix. Significances are assessed by 10,000 times permutation test. *P < 0.05, **P < 0.01, and ***P < 0.001.To clarify the sequence characteristics of cucumber DSB hotspots, we looked for enriched known and de novo motifs. The motifs with the highest proportions of DSBs in both lines were polyC repeats and TA repeats (Fig. 6E). We also discovered several de novo motifs (Supplemental Fig. S9). As shown in Supplemental Fig. S10, similar types of motifs were present in different cucumber ecotypes, including polyC repeats and TA repeats, which share similar features with meiotic recombination-related motifs in maize (Zea mays) (He et al. 2017; Kianian et al. 2018), Arabidopsis (Choi et al. 2013), and mammals (Grey et al. 2018) (Supplemental Table S9). These results reveal the sequence bias in positioning DSB sites is conserved in plants including cucumber.
To investigate genes associated with DSBs in cucumber, we performed GO enrichment analysis of genes whose promoters or 5′UTRs overlapped with meiotic DSB hotspots. In total, 935 and 1,306 genes overlapped with meiotic DSB hotspots in 9930 and Cuc37, respectively. In 9930, the significant GO terms of DSB-overlapping genes were related to mitosis/meiosis, protein import-/localization-/modification-related processes, and the response to oxidative stress (Supplemental Table S10). The DSB-overlapping genes in Cuc37 were involved in vitamin biosynthetic and metabolic processes, exocytosis, and the salicylic acid–mediated signaling pathway (Supplemental Table S10).
Transcriptional silencing at DSB sites is required for DNA repair in mammals and plants (Machour and Ayoub 2020; Wang et al. 2022). To identify genes potentially associated with efficient DSB repair, we analyzed GO terms of transcriptionally repressed DSB-associated genes in cucumber. First, we identified the significantly downregulated differentially expressed genes (DEGs) in 9930 and Cuc37 meiotic stamens compared to transcriptome data for 9930 and Cuc37 leaf tissues from a previous study (Li et al. 2022). As shown in Supplemental Fig. S11A, 5,105 genes were significantly downregulated in 9930 meiotic stamens compared to leaves, and the promoters or 5′UTRs of 204 of these genes overlapped with DSB hotspots. Enriched GO terms of the 204 genes included response to biotic/abiotic stimulus, response to osmotic stress, photosynthesis-related processes, and amino acid catabolic and metabolic processes (Supplemental Fig. S11B and Table S11). In Cuc37 meiotic stamens, 264 of the 5,319 downregulated DEGs overlapped with Cuc37 DSB hotspots (Supplemental Fig. S11C). These genes were significantly associated with responses to defense and stress, systemic resistance, and photosynthesis (Supplemental Fig. S11D and Table S11). Genes with high AtSPO11-1-oligo levels were previously shown to be strongly enriched in GO terms related to biotic defense responses in Arabidopsis (Choi et al. 2018). A recent study showed that the meiotic DSB-associated transcriptional repression of genes was related to responses to abiotic/biotic stimuli in Arabidopsis meiocytes (Wang et al. 2022). Our results suggest that meiotic DSBs that occur around genes related to defense and responses to stimuli would tend to be repaired during meiosis and that transcriptional suppression at meiotic DSBs might be conserved in cucumber.
Epigenetic profiles of meiotic DSB hotspots and genes
To evaluate the epigenetic environment of cucumber DSB sites, we calculated the enrichment of epigenetic markers in DSB hotspots. We identified 23,250 H3K4me3 peaks in 9930 and 22,434 in Cuc37. In addition, we detected 13,855 and 14,005 active chromatin regions by FAIRE-seq peak calling in 9930 and Cuc37 meiotic tissues, respectively. We detected 2,510,378 mCG sites, 1,368,475 mCHG sites, and 355,206 mCHH sites in 9930 and 4,313,454 mCG sites, 1,392,300 mCHG sites, and 351,535 mCHH sites in Cuc37. In 9930, DSB hotspots were significantly depleted of H3K4me3 markers and open chromatin but contained abundant mCHH sites (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). In Cuc37, DSB hotspots were also significantly depleted of H3K4me3 but were enriched in mCG and mCHG sites (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). By contrast, CO intervals significantly overlapped with H3K4me3 markers, open chromatin, and mCHH sites in both populations and showed significantly less enrichment of mCG sites in Cuc80 × 9930 (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). These results imply that the epigenetic environment differs in DSB hotspots and CO sites during meiosis.
We visualized the average profiles of the epigenetic status around DSB hotspots. To show the specificity of epigenetic enrichment around DSB hotspots, we used random genomic regions with similar features as DSBs for comparison. Then we further divided the DSB hotspots into 2 groups based on decreasing recombination rate to explore the roles of epigenetic modifications in local CO formation of DSBs. As shown in Fig. 7, the CO rates of DSB sites were positively correlated with the strength of the CsRAD51A ChIP-seq signal. The H3K4me3 modification was specifically absent in meiotic DSBs and their surrounding regions, but there was no obvious association with local CO formation. Compared to random genomic loci, the flanking regions of DSB hotspots displayed more chromatin accessibility but not DSB peaks. Moreover, active chromatin was positively correlated with the recombination rates of DSBs. By contrast, DNA methylation was specifically enriched within DSBs and was negatively correlated with CO rates, especially mCG and mCHG. Our data indicate that more CsRAD51A binding, more chromatin accessibility, and lower mCG and mCHG levels are tightly associated with higher CO frequencies in cucumber DSB hotspots.
Figure 7. Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around meiotic DSB hotspots in cucumber. A) DSB hotspots in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 749; Quantile 2 in blue, n = 2,731) or random grouping (Random 1 in red, n = 749; Random 2 in blue, n = 2,731), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 962; Random 2 in blue, n = 2,518). Quantile 1 for DSBs with recombination rate higher than genomic average Quantile 2 for DSBs with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) DSB hotspots in Cuc37. Quantile 1 and Random 1 for DSBs, n = 1,079; Quantile 2 and Random 2 for DSBs, n = 4,526. Random 1 for genomic regions, n = 1,627; Random 2 for genomic regions, n = 3,978.We also examined the profiles of epigenetic information around genes. Compared to random genomic loci that shared similar features with gene regions, the signals of CsRAD51A ChIP-seq, FAIRE-seq, mCHG, and mCHH decreased at gene body regions but slightly increased at the flanking regions. Moreover, CsRAD51A binding and chromatin accessibility were enhanced at TSSs compared to transcription termination sites (TTSs) (Fig. 8). By contrast, there was more H3K4me3 binding on gene body regions than their upstream and downstream regions, and stronger signals were shown near TSSs. We also detected lower mCG levels around TSS and TTS regions. To further explore the properties associated with CO rates in gene regions, we defined 2 groups of cucumber genes based on decreasing recombination rates. We observed significantly higher levels of CsRAD51A binding and H3K4me3 in CO-active gene regions, whereas genes with lower mCG and mCHG levels exhibited higher recombination rates. In summary, CsRAD51A binding, chromatin accessibility, and hypomethylation were positively correlated with local CO rates in gene regions.
Figure 8. Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around gene regions in cucumber. A) Gene regions in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 10,045; Quantile 2 in blue, n = 13,477) and randomly grouping (Random 1 in red, n = 10,045; Random 2 in blue, n = 13,477), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 11,760; Random 2 in blue, n = 11,762). Quantile 1 for genes with recombination rate higher than genomic average and Quantile 2 for genes with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) Gene regions in Cuc37. Quantile 1 and Random 1 for genes, n = 9,906; Quantile 2 and Random 2 for genes, n = 13,616. Random 1 for genomic regions, n = 11,760; Random 2 for genomic regions, n = 11,762.
Global CO landscape and suppression, as determined by integrating multiomic data
To further verify the relationships between multiomic features and CO landscapes, we predicted megabase-scale CO rates using the random forest algorithm with the following formula: CO rate ∼ CsRAD51A + H3K4me3 + FAIRE + mCG + mCHG + mCHH + gene density + repeat density + SNP density. The nonlinear coefficients between observed and predicted CO frequencies were 0.5342 in Cuc80 × 9930 and 0.3998 in Cuc80 × Cuc37 (Fig. 9; Supplemental Fig. S12, A and B). The ranking of importance for predicting CO frequencies in Cuc80 × 9930 (from high to low) was gene density, mCHG, chromatin accessibility, H3K4me3 modification, CsRAD51A deposition, mCG, repeat density, mCHH, and SNP density (Supplemental Fig. S12C). In predicting the CO landscape of Cuc80 × Cuc37, gene density, H3K4me3 modification, and DNA methylation appeared to be more important than the other features (Supplemental Fig. S12D). When only genomic features were considered, the correlations decreased to 0.4514 in Cuc80 × 9930 and 0.3327 in Cuc80 × Cuc37, highlighting the contributions of epigenetic information to CO formation.
Figure 9. Comparison of observed and predicted CO frequency along cucumber chromosomes. The predictions of CO landscapes were performed by a random forest regression. Line plots strand for observed and predicted CO rates in 2.5-Mb sliding windows in A) Cuc80 × 9930 and B) Cuc80 × Cuc37.We also performed random forest classification to predict megabase-scale CO deserts (CO cold regions) and plotted the receiver operating characteristic (ROC) curve to evaluate the efficiency of each classifier. The accuracies of predictions were 96.0% in Cuc80 × 9930 and 97.4% in Cuc80 × Cuc37 individually. The random forest classifiers with epigenetic and genetic features outperformed with only genetic features in both populations (Supplemental Fig. S13). The good performance of predictions of the CO landscape and CO suppression region supports the notion that the epigenetic and genetic features discussed here are closely related to CO formation in cucumber.
CO alterations determined by distinct epigenetics and SVs
Although the distribution profiles of multiomic data along chromosomes were similar between the 2 germplasms, many local differences still existed. For the CsRAD51A ChIP-seq data, only 599 DSB hotspots from 9930 overlapped with 596 peaks from Cuc37, accounting for 17.2% and 10.6% of total peaks in 9930 and Cuc37, respectively. The vast majority of H3K4me3 peaks in both inbred lines overlapped, including 92.8% of 23,250 peaks from 9930 and 99.3% of 22,434 peaks from Cuc37. Whereas 13,855 and 14,005 active chromatin regions were identified in 9930 and Cuc37, respectively, only 6,371 peaks overlapped between the 2 lines. We also identified 1,338 mCG, 431 mCHG, and 986 mCHH hypo-differentially methylated regions (hypo-DMRs) in 9930 compared to Cuc37, as well as 2,429 mCG, 1,285 mCHG, and 2,284 mCHH hypo-DMRs in Cuc37.
To explore the potential impacts of genetic and epigenetic factors on differences in CO events between hybrids, we considered nonoverlapping COs between the 2 populations to be “unique COs” and measured the overlap of unique COs with distinct epigenetic modifications and SVs. In total, 33% to 40% of unique COs overlapped with specific DSB sites, epigenetic modifications, and heterozygous SVs from the corresponding populations. The unique COs of Cuc80 × 9930 were significantly enriched within 9930-specific DSB hotspots, SV escape (i.e. the absence of a heterozygous SV in the region), and FAIRE + H3K4me3 + hypo-mC interactions (P < 0.05 for 10,000 times permutation test, Fig. 10A; Supplemental Table S13). We detected significant correlations between Cuc37-specific DSB hotspot, SV escape, hypo-mC, SV escape + hypo-mC, DSB hotspot + SV escape, DSB hotspot + SV escape + hypo-mC interactions, and unique COs of Cuc80 × Cuc37 (P < 0.05 for 10,000 times permutation test, Fig. 10B; Supplemental Table S13). Several unique COs overlapped with these features and interactions at the local scale (Supplemental Fig. S14). These data suggest that CO alterations among hybrids could be controlled by specific DSBs, SVs, epigenetic modifications, or their interactions.
Figure 10. Enrichment of distinct epigenetic modifications and SVs in CO alterations between cucumber hybrids. Upset diagrams show the number of unique and overlapping with distinct meiotic DSB hotspots, open chromatin (FAIRE), H3K4me3, hypo-DMR, absence of a heterozygous SV (SV escape) and interaction of features in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Pie charts show the total proportion of unique COs overlapping with distinct features and interactions. Red means features significantly overlapped with unique COs (P < 0.05 for 10,000 permutation test).
|
|
title
|
Results
|
|
sec
|
Genome-wide identification of meiotic COs
To map the CO landscape and capture epigenetic and transcriptional information during CO formation in cucumber, we constructed 2 F2 populations (Fig. 1A) and generated novel multiomic data sets from meiotic tissues of the corresponding male parents (Fig. 1B). In a previous study, we divided core cucumber accessions into 4 subgroups: the Indian, Xishuangbanna, Eurasian, and East Asian groups (Qi et al. 2013). To increase sequence polymorphism in the progeny, we crossed the semiwild Xishuangbanna cucumber accession Cuc80 as the shared female parent with cultivated accession 9930 or Cuc37. We self-pollinated the F1 hybrids and generated 2 F2 populations containing 200 (Cuc80 × 9930) and 193 individuals (Cuc80 × Cuc37).
Figure 1. Experimental design for multiomic analysis of meiotic recombination formation in cucumber. A) Construction of cucumber F2 populations. The semiwild Xishuangbanna cucumber accession Cuc80 was chosen as a shared female parent to cross with cultivated accessions 9930 (East Asia type) and Cuc37 (Eurasian type), respectively. The obtained F1 hybrids were self-pollinated to produce F2 populations. B) Multiomic integration of cucumber recombination. DSB sites were measured by CsRAD51A ChIP-seq. Epigenetic modifications were carried out by H3K4me3 ChIP-seq, FAIRE-seq, and WGBS. Transcription levels were presented with RNA-seq. 5-Methylcytosine (5-meC) represents methylation of the fifth position of cytosine.Using 822,844 and 1,203,797 sequence variants between the parental lines (Supplemental Table S1), we identified a total of 2,585 and 2,663 COs in the Cuc80 × 9930 and Cuc80 × Cuc37 F2 populations (Supplemental Table S2), respectively. The total number of COs per F2 individual ranged from 4 to 22. The average number of COs per individual in Cuc80 × Cuc37 was 13.8, which is significantly higher than that (12.9) in Cuc80 × 9930 (P = 0.0024, Mann–Whitney U test, Supplemental Fig. S1A). We observed robust CO assurance in the F2 populations, as the average number of COs per chromosome per meiosis was 0.92 in Cuc80 × 9930 and 0.99 in Cuc80 × Cuc37. The CO frequency did not show a Poisson distribution (Supplemental Fig. S1, B and C), suggesting the existence of CO interference. There were significant differences between CO numbers on different chromosomes (Supplemental Fig. S1, D and E). The average CO number per chromosome increased in proportion to the chromosome length in both populations (Supplemental Fig. S1F).
To assess the fine-scale distribution of COs, we looked at COs at higher resolution (CO intervals <12 kb) to enrichment analysis, including 1,814 COs from Cuc80 × 9930 and 1,958 COs from Cuc80 × Cuc37. Nearly 30% of these COs were located in gene body regions, and more than 93% were in genes or their 10-kb flanking intergenic regions in both populations (Fig. 2A). As shown in Fig. 2, B to E, CO enrichment patterns differed in various genomic components but were conserved between hybrids. Specifically, most COs occurred in distal intergenic regions (accounting for 42.6% to 46.1% of total COs), followed by promoter regions (24.0% to 26.5%) located within 2 kb upstream of transcription start sites (TSSs), introns (13.3% to 13.5%), coding sequences (CDSs, ∼10%), and untranslated regions (UTRs, ∼7%). We performed permutation tests to assess the significance of CO enrichment within these regions. When normalized to the expected CO enrichment, COs were significantly enriched in 5ʹUTRs and intergenic regions but exhibited less-than-expected overlap with introns and CDSs (P < 0.01 for 10,000 times permutation test, Supplemental Table S3). These findings reflect a preference for CO formation away from CDSs and introns in cucumber.
Figure 2. Enrichment of COs within various genomic components in cucumber. A) Relative abundance of COs enriched within the genic region and flanking intergenic region in cucumber. Stack bar charts show frequencies of COs overlapping with gene body and intergenic region with increasing distances to the genes in Cuc80 × 9930 (left) and Cuc80 × Cuc37 (right) populations. B, C) Enrichment proportion of COs in different genomic components. Pie charts show the proportion of COs overlapping with the CDS, UTR, intron, promoter (2 kb upstream of TSS), and distal intergenic region in B) Cuc80 × 9930 and C) Cuc80 × Cuc37, respectively. D, E) Relative enrichment and significance of COs in different genomic components. Histogram charts represent the relative enrichments of COs in corresponding genomic components in D) Cuc80 × 9930 and E) Cuc80 × Cuc37 normalized by the expected CO numbers. Compared to the expected number, deposition and depletion are marked by positive and negative log values, respectively. Significances are assessed by 10,000 times permutation test. **P < 0.01, and ***P < 0.001.To explore the functions of CO-related genes, we performed gene ontology (GO) enrichment analysis of genes within the high-resolution CO intervals (<12 kb). CO-overlapping genes were significantly enriched for GO terms involved in DNA/RNA biosynthesis and transcriptional regulation in both populations (Supplemental Fig. S2 and Table S4). The regions at least 10 kb away from COs were considered “CO cold regions”. No significantly enriched GO terms were shared between CO intervals and CO cold regions. These results indicate that CO events tend to occur around genes with certain functions.
|
|
title
|
Genome-wide identification of meiotic COs
|
|
p
|
To map the CO landscape and capture epigenetic and transcriptional information during CO formation in cucumber, we constructed 2 F2 populations (Fig. 1A) and generated novel multiomic data sets from meiotic tissues of the corresponding male parents (Fig. 1B). In a previous study, we divided core cucumber accessions into 4 subgroups: the Indian, Xishuangbanna, Eurasian, and East Asian groups (Qi et al. 2013). To increase sequence polymorphism in the progeny, we crossed the semiwild Xishuangbanna cucumber accession Cuc80 as the shared female parent with cultivated accession 9930 or Cuc37. We self-pollinated the F1 hybrids and generated 2 F2 populations containing 200 (Cuc80 × 9930) and 193 individuals (Cuc80 × Cuc37).
|
|
figure
|
Figure 1. Experimental design for multiomic analysis of meiotic recombination formation in cucumber. A) Construction of cucumber F2 populations. The semiwild Xishuangbanna cucumber accession Cuc80 was chosen as a shared female parent to cross with cultivated accessions 9930 (East Asia type) and Cuc37 (Eurasian type), respectively. The obtained F1 hybrids were self-pollinated to produce F2 populations. B) Multiomic integration of cucumber recombination. DSB sites were measured by CsRAD51A ChIP-seq. Epigenetic modifications were carried out by H3K4me3 ChIP-seq, FAIRE-seq, and WGBS. Transcription levels were presented with RNA-seq. 5-Methylcytosine (5-meC) represents methylation of the fifth position of cytosine.
|
|
label
|
Figure 1.
|
|
caption
|
Experimental design for multiomic analysis of meiotic recombination formation in cucumber. A) Construction of cucumber F2 populations. The semiwild Xishuangbanna cucumber accession Cuc80 was chosen as a shared female parent to cross with cultivated accessions 9930 (East Asia type) and Cuc37 (Eurasian type), respectively. The obtained F1 hybrids were self-pollinated to produce F2 populations. B) Multiomic integration of cucumber recombination. DSB sites were measured by CsRAD51A ChIP-seq. Epigenetic modifications were carried out by H3K4me3 ChIP-seq, FAIRE-seq, and WGBS. Transcription levels were presented with RNA-seq. 5-Methylcytosine (5-meC) represents methylation of the fifth position of cytosine.
|
|
p
|
Experimental design for multiomic analysis of meiotic recombination formation in cucumber. A) Construction of cucumber F2 populations. The semiwild Xishuangbanna cucumber accession Cuc80 was chosen as a shared female parent to cross with cultivated accessions 9930 (East Asia type) and Cuc37 (Eurasian type), respectively. The obtained F1 hybrids were self-pollinated to produce F2 populations. B) Multiomic integration of cucumber recombination. DSB sites were measured by CsRAD51A ChIP-seq. Epigenetic modifications were carried out by H3K4me3 ChIP-seq, FAIRE-seq, and WGBS. Transcription levels were presented with RNA-seq. 5-Methylcytosine (5-meC) represents methylation of the fifth position of cytosine.
|
|
p
|
Using 822,844 and 1,203,797 sequence variants between the parental lines (Supplemental Table S1), we identified a total of 2,585 and 2,663 COs in the Cuc80 × 9930 and Cuc80 × Cuc37 F2 populations (Supplemental Table S2), respectively. The total number of COs per F2 individual ranged from 4 to 22. The average number of COs per individual in Cuc80 × Cuc37 was 13.8, which is significantly higher than that (12.9) in Cuc80 × 9930 (P = 0.0024, Mann–Whitney U test, Supplemental Fig. S1A). We observed robust CO assurance in the F2 populations, as the average number of COs per chromosome per meiosis was 0.92 in Cuc80 × 9930 and 0.99 in Cuc80 × Cuc37. The CO frequency did not show a Poisson distribution (Supplemental Fig. S1, B and C), suggesting the existence of CO interference. There were significant differences between CO numbers on different chromosomes (Supplemental Fig. S1, D and E). The average CO number per chromosome increased in proportion to the chromosome length in both populations (Supplemental Fig. S1F).
|
|
p
|
To assess the fine-scale distribution of COs, we looked at COs at higher resolution (CO intervals <12 kb) to enrichment analysis, including 1,814 COs from Cuc80 × 9930 and 1,958 COs from Cuc80 × Cuc37. Nearly 30% of these COs were located in gene body regions, and more than 93% were in genes or their 10-kb flanking intergenic regions in both populations (Fig. 2A). As shown in Fig. 2, B to E, CO enrichment patterns differed in various genomic components but were conserved between hybrids. Specifically, most COs occurred in distal intergenic regions (accounting for 42.6% to 46.1% of total COs), followed by promoter regions (24.0% to 26.5%) located within 2 kb upstream of transcription start sites (TSSs), introns (13.3% to 13.5%), coding sequences (CDSs, ∼10%), and untranslated regions (UTRs, ∼7%). We performed permutation tests to assess the significance of CO enrichment within these regions. When normalized to the expected CO enrichment, COs were significantly enriched in 5ʹUTRs and intergenic regions but exhibited less-than-expected overlap with introns and CDSs (P < 0.01 for 10,000 times permutation test, Supplemental Table S3). These findings reflect a preference for CO formation away from CDSs and introns in cucumber.
|
|
figure
|
Figure 2. Enrichment of COs within various genomic components in cucumber. A) Relative abundance of COs enriched within the genic region and flanking intergenic region in cucumber. Stack bar charts show frequencies of COs overlapping with gene body and intergenic region with increasing distances to the genes in Cuc80 × 9930 (left) and Cuc80 × Cuc37 (right) populations. B, C) Enrichment proportion of COs in different genomic components. Pie charts show the proportion of COs overlapping with the CDS, UTR, intron, promoter (2 kb upstream of TSS), and distal intergenic region in B) Cuc80 × 9930 and C) Cuc80 × Cuc37, respectively. D, E) Relative enrichment and significance of COs in different genomic components. Histogram charts represent the relative enrichments of COs in corresponding genomic components in D) Cuc80 × 9930 and E) Cuc80 × Cuc37 normalized by the expected CO numbers. Compared to the expected number, deposition and depletion are marked by positive and negative log values, respectively. Significances are assessed by 10,000 times permutation test. **P < 0.01, and ***P < 0.001.
|
|
label
|
Figure 2.
|
|
caption
|
Enrichment of COs within various genomic components in cucumber. A) Relative abundance of COs enriched within the genic region and flanking intergenic region in cucumber. Stack bar charts show frequencies of COs overlapping with gene body and intergenic region with increasing distances to the genes in Cuc80 × 9930 (left) and Cuc80 × Cuc37 (right) populations. B, C) Enrichment proportion of COs in different genomic components. Pie charts show the proportion of COs overlapping with the CDS, UTR, intron, promoter (2 kb upstream of TSS), and distal intergenic region in B) Cuc80 × 9930 and C) Cuc80 × Cuc37, respectively. D, E) Relative enrichment and significance of COs in different genomic components. Histogram charts represent the relative enrichments of COs in corresponding genomic components in D) Cuc80 × 9930 and E) Cuc80 × Cuc37 normalized by the expected CO numbers. Compared to the expected number, deposition and depletion are marked by positive and negative log values, respectively. Significances are assessed by 10,000 times permutation test. **P < 0.01, and ***P < 0.001.
|
|
p
|
Enrichment of COs within various genomic components in cucumber. A) Relative abundance of COs enriched within the genic region and flanking intergenic region in cucumber. Stack bar charts show frequencies of COs overlapping with gene body and intergenic region with increasing distances to the genes in Cuc80 × 9930 (left) and Cuc80 × Cuc37 (right) populations. B, C) Enrichment proportion of COs in different genomic components. Pie charts show the proportion of COs overlapping with the CDS, UTR, intron, promoter (2 kb upstream of TSS), and distal intergenic region in B) Cuc80 × 9930 and C) Cuc80 × Cuc37, respectively. D, E) Relative enrichment and significance of COs in different genomic components. Histogram charts represent the relative enrichments of COs in corresponding genomic components in D) Cuc80 × 9930 and E) Cuc80 × Cuc37 normalized by the expected CO numbers. Compared to the expected number, deposition and depletion are marked by positive and negative log values, respectively. Significances are assessed by 10,000 times permutation test. **P < 0.01, and ***P < 0.001.
|
|
p
|
To explore the functions of CO-related genes, we performed gene ontology (GO) enrichment analysis of genes within the high-resolution CO intervals (<12 kb). CO-overlapping genes were significantly enriched for GO terms involved in DNA/RNA biosynthesis and transcriptional regulation in both populations (Supplemental Fig. S2 and Table S4). The regions at least 10 kb away from COs were considered “CO cold regions”. No significantly enriched GO terms were shared between CO intervals and CO cold regions. These results indicate that CO events tend to occur around genes with certain functions.
|
|
sec
|
Comparisons of CO landscapes between hybrids
To estimate the similarity of global CO distribution among hybrids, we calculated the correlation coefficient of CO rates in nonoverlapping 500-kb bins between the 2 populations. The genome-wide CO landscapes were conserved among hybrids (bin size = 500 kb, Spearman correlation coefficient, R = 0.41, Supplemental Fig. S3). The total lengths of CO cold regions in Cuc80 × 9930 and Cuc80 × Cuc37 were 140.26 and 147.29 Mb, respectively. The overlapping CO cold regions between the 2 populations covered 103.91 Mb, encompassing identified regulatory genes in cucumber. A well-studied cucurbitacin C (CuC) biosynthetic gene cluster comprises 1 terpene oxidosqualene cyclase gene (Bi), 3 cytochrome P450 genes, and 1 acetylase gene; this cluster regulates the biosynthesis of bitterness in Cucurbitaceae (Shang et al. 2014; Zhou et al. 2016). The CsBi cluster overlapped with a CO cold region and was located far from CO intervals in both populations (P < 0.05 for 10,000 Monte Carlo simulations, Supplemental Fig. S4). This might benefit the horizontal transfer and maintenance of the CsBi cluster during hybridization and facilitate the coexpression and coregulation of the clustered genes during CuC biosynthesis.
However, the correlation coefficient of CO landscapes fell to 0.18 when we focused on a smaller scale (bin size = 50 kb, Spearman correlation coefficient), implying that fine-scale CO divergence exists between the 2 populations. We then employed a permutation test to scan the differential recombination regions (see Materials and methods). We identified 136 significant differential recombination regions between the 2 populations, which were spread over the cucumber genome (bin size = 50 kb, P < 0.05 for 10,000 times permutation test, Fig. 3). Here, we regarded 50-kb bins with a 5-fold average recombination rate as CO hotspots. As a result, we identified 72 CO hotspots in both populations, 3 of which overlapped between the 2 populations. We defined a unique hotspot as one that was detected in only 1 population and had a lower-than-average recombination rate in the other population. We identified 19 and 25 unique hotspots in Cuc80 × 9930 and Cuc80 × Cuc37, respectively, and the remaining were unbiased hotspots between the 2 populations. The differential recombination regions and unique CO hotspots were unevenly distributed along the cucumber chromosomes.
Figure 3. Comparison of CO landscapes between cucumber populations. The green shadows represent the predicted positions of centromeres and pericentromeres (1 Mb flanking regions of centromeres). Blue and orange lines indicate chromosomal patterns in Cuc80 × 9930 and Cuc80 × Cuc37, respectively. The windows in Cuc80 × 9930 with a significantly higher CO rate than Cuc80 × Cuc37 is marked with blue hollow circles, and the opposite is orange. Unique CO hotspots in Cuc80 × 9930 and Cuc80 × Cuc37 are marked with blue and orange six-pointed stars, respectively. The bottom stacks represent gene density. Bin size = 50 kb. The significance of difference in recombination rates is calculated with 10,000 times permutation test, P < 0.05.Notably, some verified functional genes existed in these regions of differential recombination between hybrids. Among these, the fruit spine development gene Spine Base Size1 (CsSBS1) (Yang et al. 2022) and the plant architecture gene De-Etiolated-2 (CsDET2) (Hou et al. 2017) overlapped with regions containing significantly higher CO rates in Cuc80 × Cuc37 and CO cold regions in Cuc80 × 9930. Moreover, we observed no sequence variations of CsDET2 between inbred lines 9930 and Cuc37. These results suggest that selecting parents carrying the same genotype can result in great differences in fine-scale recombination between progenies. In summary, the fine-scale CO distributions and corresponding gene functions show diverse characteristics between different hybrids.
|
|
title
|
Comparisons of CO landscapes between hybrids
|
|
p
|
To estimate the similarity of global CO distribution among hybrids, we calculated the correlation coefficient of CO rates in nonoverlapping 500-kb bins between the 2 populations. The genome-wide CO landscapes were conserved among hybrids (bin size = 500 kb, Spearman correlation coefficient, R = 0.41, Supplemental Fig. S3). The total lengths of CO cold regions in Cuc80 × 9930 and Cuc80 × Cuc37 were 140.26 and 147.29 Mb, respectively. The overlapping CO cold regions between the 2 populations covered 103.91 Mb, encompassing identified regulatory genes in cucumber. A well-studied cucurbitacin C (CuC) biosynthetic gene cluster comprises 1 terpene oxidosqualene cyclase gene (Bi), 3 cytochrome P450 genes, and 1 acetylase gene; this cluster regulates the biosynthesis of bitterness in Cucurbitaceae (Shang et al. 2014; Zhou et al. 2016). The CsBi cluster overlapped with a CO cold region and was located far from CO intervals in both populations (P < 0.05 for 10,000 Monte Carlo simulations, Supplemental Fig. S4). This might benefit the horizontal transfer and maintenance of the CsBi cluster during hybridization and facilitate the coexpression and coregulation of the clustered genes during CuC biosynthesis.
|
|
p
|
However, the correlation coefficient of CO landscapes fell to 0.18 when we focused on a smaller scale (bin size = 50 kb, Spearman correlation coefficient), implying that fine-scale CO divergence exists between the 2 populations. We then employed a permutation test to scan the differential recombination regions (see Materials and methods). We identified 136 significant differential recombination regions between the 2 populations, which were spread over the cucumber genome (bin size = 50 kb, P < 0.05 for 10,000 times permutation test, Fig. 3). Here, we regarded 50-kb bins with a 5-fold average recombination rate as CO hotspots. As a result, we identified 72 CO hotspots in both populations, 3 of which overlapped between the 2 populations. We defined a unique hotspot as one that was detected in only 1 population and had a lower-than-average recombination rate in the other population. We identified 19 and 25 unique hotspots in Cuc80 × 9930 and Cuc80 × Cuc37, respectively, and the remaining were unbiased hotspots between the 2 populations. The differential recombination regions and unique CO hotspots were unevenly distributed along the cucumber chromosomes.
|
|
figure
|
Figure 3. Comparison of CO landscapes between cucumber populations. The green shadows represent the predicted positions of centromeres and pericentromeres (1 Mb flanking regions of centromeres). Blue and orange lines indicate chromosomal patterns in Cuc80 × 9930 and Cuc80 × Cuc37, respectively. The windows in Cuc80 × 9930 with a significantly higher CO rate than Cuc80 × Cuc37 is marked with blue hollow circles, and the opposite is orange. Unique CO hotspots in Cuc80 × 9930 and Cuc80 × Cuc37 are marked with blue and orange six-pointed stars, respectively. The bottom stacks represent gene density. Bin size = 50 kb. The significance of difference in recombination rates is calculated with 10,000 times permutation test, P < 0.05.
|
|
label
|
Figure 3.
|
|
caption
|
Comparison of CO landscapes between cucumber populations. The green shadows represent the predicted positions of centromeres and pericentromeres (1 Mb flanking regions of centromeres). Blue and orange lines indicate chromosomal patterns in Cuc80 × 9930 and Cuc80 × Cuc37, respectively. The windows in Cuc80 × 9930 with a significantly higher CO rate than Cuc80 × Cuc37 is marked with blue hollow circles, and the opposite is orange. Unique CO hotspots in Cuc80 × 9930 and Cuc80 × Cuc37 are marked with blue and orange six-pointed stars, respectively. The bottom stacks represent gene density. Bin size = 50 kb. The significance of difference in recombination rates is calculated with 10,000 times permutation test, P < 0.05.
|
|
p
|
Comparison of CO landscapes between cucumber populations. The green shadows represent the predicted positions of centromeres and pericentromeres (1 Mb flanking regions of centromeres). Blue and orange lines indicate chromosomal patterns in Cuc80 × 9930 and Cuc80 × Cuc37, respectively. The windows in Cuc80 × 9930 with a significantly higher CO rate than Cuc80 × Cuc37 is marked with blue hollow circles, and the opposite is orange. Unique CO hotspots in Cuc80 × 9930 and Cuc80 × Cuc37 are marked with blue and orange six-pointed stars, respectively. The bottom stacks represent gene density. Bin size = 50 kb. The significance of difference in recombination rates is calculated with 10,000 times permutation test, P < 0.05.
|
|
p
|
Notably, some verified functional genes existed in these regions of differential recombination between hybrids. Among these, the fruit spine development gene Spine Base Size1 (CsSBS1) (Yang et al. 2022) and the plant architecture gene De-Etiolated-2 (CsDET2) (Hou et al. 2017) overlapped with regions containing significantly higher CO rates in Cuc80 × Cuc37 and CO cold regions in Cuc80 × 9930. Moreover, we observed no sequence variations of CsDET2 between inbred lines 9930 and Cuc37. These results suggest that selecting parents carrying the same genotype can result in great differences in fine-scale recombination between progenies. In summary, the fine-scale CO distributions and corresponding gene functions show diverse characteristics between different hybrids.
|
|
sec
|
Genomic and epigenetic features associated with CO distribution
SVs play important roles in driving plant genome evolution and trait adaptation in crops (Li et al. 2023). To assess CO patterns within SVs, we examined the enrichment of COs relative to SVs. Biparental SVs (Li et al. 2022) included inversions, deletions, insertions, and translocations. We determined that 227 out of 2,585 COs from Cuc80 × 9930 and 324 out of 2,663 COs from Cuc80 × Cuc37 overlapped with SVs, 24 of which spanned predicted centromeres. Further analysis revealed that COs with higher resolution (CO intervals <100 kb) were significantly suppressed in heterozygous SVs, regardless of SV type (P < 0.001 for 10,000 times permutation test, Fig. 4, A and B; Supplemental Table S5). In addition, the intervals of COs overlapping with SVs were much longer than those of total COs (P < 0.001 for Mann–Whitney U test, Fig. 4, C and D; Supplemental Table S6), suggesting that the actual proportion of overlapping COs with SVs is much lower than observed. The presence of 2 SVs upstream of FLOWERING LOCUS T (CsFT) increases CsFT transcription and shortens flowering time in cucumber (Wang et al. 2020; Li et al. 2022). The 3 parental varieties used in the current study have different CsFT genotypes: Cuc80 with the long-2 type genotype is a late-flowering germplasm, whereas 9930 (short-1 type) and Cuc37 (short-2 type) are early flowering. We detected no COs spanning the CsFT locus in either population, but we did detect 1 CO at the left border of the heterozygous 12.1-kb deletion in the short-2 type CsFT locus in Cuc80 × Cuc37. This result matches the previous observation that CO formation is suppressed within SVs and elevated in the surrounding region (Rowan et al. 2019).
Figure 4. COs are suppressed around SV but positively associated with SNP density on the fine scale. A, B) Relative enrichment and significance of COs in different types of SVs in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Histogram charts represent for relative enrichments of COs in corresponding genomic components normalized by the expected CO number. Compared to expected number, deposition and depletion are shown in red and dark gray, respectively. C, D) Distances of nearest COs to SVs. DEL, deletion; INS, insertion; INV, inversion; TRANS, translocation (n = 2,540 for total COs with intervals <100 kb, 396 for SV-overlapping COs, 186 for DEL-overlapping COs, 196 for INS-overlapping COs, 1 for INV-overlapping COs, 13 for TRANS-overlapping COs in Cuc80 × 9930; n = 2,582 for total COs with intervals <100 kb, 632 for SV-overlapping COs, 265 for DEL-overlapping COs, 342 for INS-overlapping COs, 1 for INV-overlapping CO, 24 for INS-overlapping COs, INV-overlapping COs, TRANS-overlapping COs in Cuc80 × Cuc37). Boxplots show the median with center line, upper and lower quartiles with box limits, outliers with points, and the whiskers represent 1.5× interquartile range limits. E, F) SNP density around centers of CO intervals in cucumber. CO center indicates the midpoint of the CO interval. E) Hollow circles stand for SNP density (count/kb) in Cuc80 × 9930 within increasing distance away from CO centers of total COs, COs with intervals <5 kb, COs with intervals <10 kb, COs with intervals <15 kb, COs with intervals <20 kb, and random genomic region. The green dash line represents for average SNP density in genome-wide. F) Distributions in Cuc80 × Cuc37 (n = 1,413 for CO intervals <5 kb, 1,729 for CO intervals <10 kb, 1,912 for CO intervals <15 kb, 2,080 for CO intervals <20 kb, 2,585 for random and total COs in Cuc80 × 9930; n = 1,596 for CO intervals <5 kb, 1,879 for CO intervals <10 kb, 2,036 for CO intervals <15 kb, 2,152 for CO intervals <20 kb, 2,663 for random and total COs in Cuc80 × Cuc37). Significances are assessed by A, B) 10,000 times permutation test and C, D) Mann–Whitney U test. ***P < 0.001.To investigate the correlation between single nucleotide polymorphism (SNP) density and CO formation at a fine scale, we counted SNPs at continuously increasing distances away from CO centers (midpoints of CO intervals). In Cuc80 × 9930, SNP numbers increased close to CO centers but were lower than the random and average SNP densities (Fig. 4E), perhaps due to the limited CO resolution in Cuc80 × 9930. However, when only COs with higher resolutions were considered, SNP densities around CO centers visibly increased in Cuc80 × 9930. SNP densities in Cuc80 × Cuc37 were extremely elevated toward CO centers and much higher than random (Fig. 4F). To further investigate their global relationships, we calculated the correlation coefficients of polymorphism and recombination rates. As shown in Supplemental Fig. S5, A and B, we observed weakly negative correlations in both populations (Spearman correlation coefficient, R = −0.12 in Cuc80 × 9930, R = −0.02 in Cuc80×Cuc37). Together, these results indicate that polymorphism shapes CO distribution in cucumber on a fine scale but not genome-wide.
To pinpoint the localization of DSBs, we first isolated a cucumber homolog of RAD51A. A BLASTP search revealed 7 putative homologs of Arabidopsis (Arabidopsis thaliana) RAD51 (AT5G20850) in cucumber. Among these, 1 protein (CsaV3_1G41430.1) shares high sequence similarity with RAD51A proteins from Plantae, Fungi, and Animalia (Supplemental Fig. S6); we named this protein CsRAD51A. We then generated a polyclonal antibody against CsRAD51A and tested its specificity by immunoblot analysis of csrad51a mutant hairy root lines generated by clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9)–mediated genome editing (Supplemental Fig. S7).
To gain insights into the molecular mechanisms of CO formation in cucumber, we performed in-depth multiomic analysis. Since the 2 F2 hybrids were derived from different male parents, we conducted the subsequent multiomic assays using male meiotic tissues from inbred lines 9930 and Cuc37. Due to the lack of published information on DSB sites and epigenetic states in cucumber during meiosis, we constructed libraries for CsRAD51A chromatin immunoprecipitation sequencing (ChIP-seq), H3K4me3 ChIP-seq, formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-seq), whole-genome bisulfite sequencing (WGBS), and RNA sequencing (RNA-seq).
To examine the relationship between CO frequency and multiomic data in cucumber, we visualized their landscapes and calculated the pairwise correlation coefficients (Fig. 5). DNA methylation was negatively correlated with the global CO frequency. We also observed positive relationships between the CO rate and CsRAD51A binding, H3K4me3 modification, open chromatin, transcriptional activity, or gene density. However, there was no obvious correlation between the CO landscape and SNPs or repeat density. These findings provide an overview of the correlations between COs and multiomic information in cucumber, including epigenetic marks, transcriptional activity, and genomic features.
Figure 5. Genome-wide associations between meiotic CO and multiomic signals in cucumber meiosis. A) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. 9930. Circos plot showing coverage profiles along (a) physical coordinate (black contig for predicted centromere and pericentromere) of cucumber chromosome for (b) gene density, (c) repeat density, (d) DSB density, (e) recombination rate (centimorgan per megabase [cM/Mb]), (f) CsRAD51A binding (standardized log2[ChIP/Input] ratio), (g) H3K4 trimethylation level (standardized log2[ChIP/Input] ratio), (h) chromatin accessibility measured by FAIRE-seq (standardized log2[ChIP/Input] ratio), (i) transcription level (TPM), and (j) cytosine methylation level (mC/C ratio). B) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. Cuc37. C, D) The heatmaps represent pairwise Spearman's rank correlation matrix for CsRAD51A binding, chromatin accessibility, H3K4me3, transcription, gene density, repeat density, SNP density between homologs, and CO rate (cM/Mb). Color intensity indicates the strength of correlations.
|
|
title
|
Genomic and epigenetic features associated with CO distribution
|
|
p
|
SVs play important roles in driving plant genome evolution and trait adaptation in crops (Li et al. 2023). To assess CO patterns within SVs, we examined the enrichment of COs relative to SVs. Biparental SVs (Li et al. 2022) included inversions, deletions, insertions, and translocations. We determined that 227 out of 2,585 COs from Cuc80 × 9930 and 324 out of 2,663 COs from Cuc80 × Cuc37 overlapped with SVs, 24 of which spanned predicted centromeres. Further analysis revealed that COs with higher resolution (CO intervals <100 kb) were significantly suppressed in heterozygous SVs, regardless of SV type (P < 0.001 for 10,000 times permutation test, Fig. 4, A and B; Supplemental Table S5). In addition, the intervals of COs overlapping with SVs were much longer than those of total COs (P < 0.001 for Mann–Whitney U test, Fig. 4, C and D; Supplemental Table S6), suggesting that the actual proportion of overlapping COs with SVs is much lower than observed. The presence of 2 SVs upstream of FLOWERING LOCUS T (CsFT) increases CsFT transcription and shortens flowering time in cucumber (Wang et al. 2020; Li et al. 2022). The 3 parental varieties used in the current study have different CsFT genotypes: Cuc80 with the long-2 type genotype is a late-flowering germplasm, whereas 9930 (short-1 type) and Cuc37 (short-2 type) are early flowering. We detected no COs spanning the CsFT locus in either population, but we did detect 1 CO at the left border of the heterozygous 12.1-kb deletion in the short-2 type CsFT locus in Cuc80 × Cuc37. This result matches the previous observation that CO formation is suppressed within SVs and elevated in the surrounding region (Rowan et al. 2019).
|
|
figure
|
Figure 4. COs are suppressed around SV but positively associated with SNP density on the fine scale. A, B) Relative enrichment and significance of COs in different types of SVs in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Histogram charts represent for relative enrichments of COs in corresponding genomic components normalized by the expected CO number. Compared to expected number, deposition and depletion are shown in red and dark gray, respectively. C, D) Distances of nearest COs to SVs. DEL, deletion; INS, insertion; INV, inversion; TRANS, translocation (n = 2,540 for total COs with intervals <100 kb, 396 for SV-overlapping COs, 186 for DEL-overlapping COs, 196 for INS-overlapping COs, 1 for INV-overlapping COs, 13 for TRANS-overlapping COs in Cuc80 × 9930; n = 2,582 for total COs with intervals <100 kb, 632 for SV-overlapping COs, 265 for DEL-overlapping COs, 342 for INS-overlapping COs, 1 for INV-overlapping CO, 24 for INS-overlapping COs, INV-overlapping COs, TRANS-overlapping COs in Cuc80 × Cuc37). Boxplots show the median with center line, upper and lower quartiles with box limits, outliers with points, and the whiskers represent 1.5× interquartile range limits. E, F) SNP density around centers of CO intervals in cucumber. CO center indicates the midpoint of the CO interval. E) Hollow circles stand for SNP density (count/kb) in Cuc80 × 9930 within increasing distance away from CO centers of total COs, COs with intervals <5 kb, COs with intervals <10 kb, COs with intervals <15 kb, COs with intervals <20 kb, and random genomic region. The green dash line represents for average SNP density in genome-wide. F) Distributions in Cuc80 × Cuc37 (n = 1,413 for CO intervals <5 kb, 1,729 for CO intervals <10 kb, 1,912 for CO intervals <15 kb, 2,080 for CO intervals <20 kb, 2,585 for random and total COs in Cuc80 × 9930; n = 1,596 for CO intervals <5 kb, 1,879 for CO intervals <10 kb, 2,036 for CO intervals <15 kb, 2,152 for CO intervals <20 kb, 2,663 for random and total COs in Cuc80 × Cuc37). Significances are assessed by A, B) 10,000 times permutation test and C, D) Mann–Whitney U test. ***P < 0.001.
|
|
label
|
Figure 4.
|
|
caption
|
COs are suppressed around SV but positively associated with SNP density on the fine scale. A, B) Relative enrichment and significance of COs in different types of SVs in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Histogram charts represent for relative enrichments of COs in corresponding genomic components normalized by the expected CO number. Compared to expected number, deposition and depletion are shown in red and dark gray, respectively. C, D) Distances of nearest COs to SVs. DEL, deletion; INS, insertion; INV, inversion; TRANS, translocation (n = 2,540 for total COs with intervals <100 kb, 396 for SV-overlapping COs, 186 for DEL-overlapping COs, 196 for INS-overlapping COs, 1 for INV-overlapping COs, 13 for TRANS-overlapping COs in Cuc80 × 9930; n = 2,582 for total COs with intervals <100 kb, 632 for SV-overlapping COs, 265 for DEL-overlapping COs, 342 for INS-overlapping COs, 1 for INV-overlapping CO, 24 for INS-overlapping COs, INV-overlapping COs, TRANS-overlapping COs in Cuc80 × Cuc37). Boxplots show the median with center line, upper and lower quartiles with box limits, outliers with points, and the whiskers represent 1.5× interquartile range limits. E, F) SNP density around centers of CO intervals in cucumber. CO center indicates the midpoint of the CO interval. E) Hollow circles stand for SNP density (count/kb) in Cuc80 × 9930 within increasing distance away from CO centers of total COs, COs with intervals <5 kb, COs with intervals <10 kb, COs with intervals <15 kb, COs with intervals <20 kb, and random genomic region. The green dash line represents for average SNP density in genome-wide. F) Distributions in Cuc80 × Cuc37 (n = 1,413 for CO intervals <5 kb, 1,729 for CO intervals <10 kb, 1,912 for CO intervals <15 kb, 2,080 for CO intervals <20 kb, 2,585 for random and total COs in Cuc80 × 9930; n = 1,596 for CO intervals <5 kb, 1,879 for CO intervals <10 kb, 2,036 for CO intervals <15 kb, 2,152 for CO intervals <20 kb, 2,663 for random and total COs in Cuc80 × Cuc37). Significances are assessed by A, B) 10,000 times permutation test and C, D) Mann–Whitney U test. ***P < 0.001.
|
|
p
|
COs are suppressed around SV but positively associated with SNP density on the fine scale. A, B) Relative enrichment and significance of COs in different types of SVs in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Histogram charts represent for relative enrichments of COs in corresponding genomic components normalized by the expected CO number. Compared to expected number, deposition and depletion are shown in red and dark gray, respectively. C, D) Distances of nearest COs to SVs. DEL, deletion; INS, insertion; INV, inversion; TRANS, translocation (n = 2,540 for total COs with intervals <100 kb, 396 for SV-overlapping COs, 186 for DEL-overlapping COs, 196 for INS-overlapping COs, 1 for INV-overlapping COs, 13 for TRANS-overlapping COs in Cuc80 × 9930; n = 2,582 for total COs with intervals <100 kb, 632 for SV-overlapping COs, 265 for DEL-overlapping COs, 342 for INS-overlapping COs, 1 for INV-overlapping CO, 24 for INS-overlapping COs, INV-overlapping COs, TRANS-overlapping COs in Cuc80 × Cuc37). Boxplots show the median with center line, upper and lower quartiles with box limits, outliers with points, and the whiskers represent 1.5× interquartile range limits. E, F) SNP density around centers of CO intervals in cucumber. CO center indicates the midpoint of the CO interval. E) Hollow circles stand for SNP density (count/kb) in Cuc80 × 9930 within increasing distance away from CO centers of total COs, COs with intervals <5 kb, COs with intervals <10 kb, COs with intervals <15 kb, COs with intervals <20 kb, and random genomic region. The green dash line represents for average SNP density in genome-wide. F) Distributions in Cuc80 × Cuc37 (n = 1,413 for CO intervals <5 kb, 1,729 for CO intervals <10 kb, 1,912 for CO intervals <15 kb, 2,080 for CO intervals <20 kb, 2,585 for random and total COs in Cuc80 × 9930; n = 1,596 for CO intervals <5 kb, 1,879 for CO intervals <10 kb, 2,036 for CO intervals <15 kb, 2,152 for CO intervals <20 kb, 2,663 for random and total COs in Cuc80 × Cuc37). Significances are assessed by A, B) 10,000 times permutation test and C, D) Mann–Whitney U test. ***P < 0.001.
|
|
p
|
To investigate the correlation between single nucleotide polymorphism (SNP) density and CO formation at a fine scale, we counted SNPs at continuously increasing distances away from CO centers (midpoints of CO intervals). In Cuc80 × 9930, SNP numbers increased close to CO centers but were lower than the random and average SNP densities (Fig. 4E), perhaps due to the limited CO resolution in Cuc80 × 9930. However, when only COs with higher resolutions were considered, SNP densities around CO centers visibly increased in Cuc80 × 9930. SNP densities in Cuc80 × Cuc37 were extremely elevated toward CO centers and much higher than random (Fig. 4F). To further investigate their global relationships, we calculated the correlation coefficients of polymorphism and recombination rates. As shown in Supplemental Fig. S5, A and B, we observed weakly negative correlations in both populations (Spearman correlation coefficient, R = −0.12 in Cuc80 × 9930, R = −0.02 in Cuc80×Cuc37). Together, these results indicate that polymorphism shapes CO distribution in cucumber on a fine scale but not genome-wide.
|
|
p
|
To pinpoint the localization of DSBs, we first isolated a cucumber homolog of RAD51A. A BLASTP search revealed 7 putative homologs of Arabidopsis (Arabidopsis thaliana) RAD51 (AT5G20850) in cucumber. Among these, 1 protein (CsaV3_1G41430.1) shares high sequence similarity with RAD51A proteins from Plantae, Fungi, and Animalia (Supplemental Fig. S6); we named this protein CsRAD51A. We then generated a polyclonal antibody against CsRAD51A and tested its specificity by immunoblot analysis of csrad51a mutant hairy root lines generated by clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9)–mediated genome editing (Supplemental Fig. S7).
|
|
p
|
To gain insights into the molecular mechanisms of CO formation in cucumber, we performed in-depth multiomic analysis. Since the 2 F2 hybrids were derived from different male parents, we conducted the subsequent multiomic assays using male meiotic tissues from inbred lines 9930 and Cuc37. Due to the lack of published information on DSB sites and epigenetic states in cucumber during meiosis, we constructed libraries for CsRAD51A chromatin immunoprecipitation sequencing (ChIP-seq), H3K4me3 ChIP-seq, formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-seq), whole-genome bisulfite sequencing (WGBS), and RNA sequencing (RNA-seq).
|
|
p
|
To examine the relationship between CO frequency and multiomic data in cucumber, we visualized their landscapes and calculated the pairwise correlation coefficients (Fig. 5). DNA methylation was negatively correlated with the global CO frequency. We also observed positive relationships between the CO rate and CsRAD51A binding, H3K4me3 modification, open chromatin, transcriptional activity, or gene density. However, there was no obvious correlation between the CO landscape and SNPs or repeat density. These findings provide an overview of the correlations between COs and multiomic information in cucumber, including epigenetic marks, transcriptional activity, and genomic features.
|
|
figure
|
Figure 5. Genome-wide associations between meiotic CO and multiomic signals in cucumber meiosis. A) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. 9930. Circos plot showing coverage profiles along (a) physical coordinate (black contig for predicted centromere and pericentromere) of cucumber chromosome for (b) gene density, (c) repeat density, (d) DSB density, (e) recombination rate (centimorgan per megabase [cM/Mb]), (f) CsRAD51A binding (standardized log2[ChIP/Input] ratio), (g) H3K4 trimethylation level (standardized log2[ChIP/Input] ratio), (h) chromatin accessibility measured by FAIRE-seq (standardized log2[ChIP/Input] ratio), (i) transcription level (TPM), and (j) cytosine methylation level (mC/C ratio). B) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. Cuc37. C, D) The heatmaps represent pairwise Spearman's rank correlation matrix for CsRAD51A binding, chromatin accessibility, H3K4me3, transcription, gene density, repeat density, SNP density between homologs, and CO rate (cM/Mb). Color intensity indicates the strength of correlations.
|
|
label
|
Figure 5.
|
|
caption
|
Genome-wide associations between meiotic CO and multiomic signals in cucumber meiosis. A) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. 9930. Circos plot showing coverage profiles along (a) physical coordinate (black contig for predicted centromere and pericentromere) of cucumber chromosome for (b) gene density, (c) repeat density, (d) DSB density, (e) recombination rate (centimorgan per megabase [cM/Mb]), (f) CsRAD51A binding (standardized log2[ChIP/Input] ratio), (g) H3K4 trimethylation level (standardized log2[ChIP/Input] ratio), (h) chromatin accessibility measured by FAIRE-seq (standardized log2[ChIP/Input] ratio), (i) transcription level (TPM), and (j) cytosine methylation level (mC/C ratio). B) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. Cuc37. C, D) The heatmaps represent pairwise Spearman's rank correlation matrix for CsRAD51A binding, chromatin accessibility, H3K4me3, transcription, gene density, repeat density, SNP density between homologs, and CO rate (cM/Mb). Color intensity indicates the strength of correlations.
|
|
p
|
Genome-wide associations between meiotic CO and multiomic signals in cucumber meiosis. A) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. 9930. Circos plot showing coverage profiles along (a) physical coordinate (black contig for predicted centromere and pericentromere) of cucumber chromosome for (b) gene density, (c) repeat density, (d) DSB density, (e) recombination rate (centimorgan per megabase [cM/Mb]), (f) CsRAD51A binding (standardized log2[ChIP/Input] ratio), (g) H3K4 trimethylation level (standardized log2[ChIP/Input] ratio), (h) chromatin accessibility measured by FAIRE-seq (standardized log2[ChIP/Input] ratio), (i) transcription level (TPM), and (j) cytosine methylation level (mC/C ratio). B) Genomic landscapes of CO, genomic and epigenetic features during meiosis in C. sativus cv. Cuc37. C, D) The heatmaps represent pairwise Spearman's rank correlation matrix for CsRAD51A binding, chromatin accessibility, H3K4me3, transcription, gene density, repeat density, SNP density between homologs, and CO rate (cM/Mb). Color intensity indicates the strength of correlations.
|
|
sec
|
Characterization of meiotic DSB hotspots
Meiotic recombination begins with the formation of DSBs. Previous studies measured genome-wide DSB sites in plants by identifying binding sites for SPO11-1 (Choi et al. 2018), DMC1 (Tock et al. 2021), and RAD51 (He et al. 2017). To investigate meiotic DSBs in cucumber, we identified differential peaks of CsRAD51A in ChIP-seq data between meiotic and nonmeiotic tissues. As 2 biological replicates of CsRAD51A ChIP-seq data exhibited high similarities (R = 0.91 to 0.99, Pearson correlation coefficient), we combined the 2 data sets before peak calling for further analysis. We identified 49,261 and 57,551 CsRAD51A peaks in 9930 and Cuc37 meiotic buds, respectively. After carefully comparing different peaks with those of nonmeiotic tissues, 3,480 and 5,605 specific peaks remained in 9930 and Cuc37, respectively (Supplemental Table S7). Since ChIP-seq peak calling is based on signal-to-noise models, referring to the previous research (Brick et al. 2018), we considered these peaks to be meiotic DSB hotspots in cucumber.
The average number of DSB hotspots per chromosome was strongly proportional to chromosome length (Supplemental Fig. S8). In total, 2,287 out of 3,480 DSB hotspots in 9930 and 3,642 out of 5,605 DSB hotspots in Cuc37 occurred within CO intervals in the corresponding progeny populations and significantly overlapped with COs (P = 0.0069 in Cuc80 × 9930 and 0.018 in Cuc80 × Cuc37 for 10,000 times permutation test). Of the cucumber DSB hotspots, 93.2% to 95.3% occurred in gene bodies and surrounding regions (10 kb upstream to 10 kb downstream of genes). In both inbred lines, most DSB hotspots (accounting for 83% in 9930 and 89.1% in Cuc37) were present in intergenic regions and introns (Fig. 6, A and B). DSB hotspots were significantly enriched in intergenic regions and depleted in the 5′UTR and CDS regions (P < 0.05 for 10,000 times permutation test, Fig. 6, C and D; Supplemental Table S8).
Figure 6. Genomic location annotation and motif analysis of meiotic DSB hotspots in cucumber. A, B) Enrichment proportion of DSB hotspots in different genomic components. Pie charts show the proportion of DSB hotspots overlapping with the CDS, UTR, intron, promoter (2-kb upstream of TSS), and distal intergenic region in A) 9930 and B) Cuc37, respectively. C, D) Relative enrichment and significance of DSB hotspots in different genomic components. Histogram charts represent the relative enrichment of DSB in corresponding genomic components in C) 9930 and D) Cuc37 normalized by the expected DSB numbers. Compared to expected number, deposition and depletion are marked by postive and negative log values, respectively. E) Most abundant DNA sequence motifs identified in 9930 and Cuc37 DSB hotspots. The alignment quality (P-value) and enrichment proportions are displayed. PWM, position weight matrix. Significances are assessed by 10,000 times permutation test. *P < 0.05, **P < 0.01, and ***P < 0.001.To clarify the sequence characteristics of cucumber DSB hotspots, we looked for enriched known and de novo motifs. The motifs with the highest proportions of DSBs in both lines were polyC repeats and TA repeats (Fig. 6E). We also discovered several de novo motifs (Supplemental Fig. S9). As shown in Supplemental Fig. S10, similar types of motifs were present in different cucumber ecotypes, including polyC repeats and TA repeats, which share similar features with meiotic recombination-related motifs in maize (Zea mays) (He et al. 2017; Kianian et al. 2018), Arabidopsis (Choi et al. 2013), and mammals (Grey et al. 2018) (Supplemental Table S9). These results reveal the sequence bias in positioning DSB sites is conserved in plants including cucumber.
To investigate genes associated with DSBs in cucumber, we performed GO enrichment analysis of genes whose promoters or 5′UTRs overlapped with meiotic DSB hotspots. In total, 935 and 1,306 genes overlapped with meiotic DSB hotspots in 9930 and Cuc37, respectively. In 9930, the significant GO terms of DSB-overlapping genes were related to mitosis/meiosis, protein import-/localization-/modification-related processes, and the response to oxidative stress (Supplemental Table S10). The DSB-overlapping genes in Cuc37 were involved in vitamin biosynthetic and metabolic processes, exocytosis, and the salicylic acid–mediated signaling pathway (Supplemental Table S10).
Transcriptional silencing at DSB sites is required for DNA repair in mammals and plants (Machour and Ayoub 2020; Wang et al. 2022). To identify genes potentially associated with efficient DSB repair, we analyzed GO terms of transcriptionally repressed DSB-associated genes in cucumber. First, we identified the significantly downregulated differentially expressed genes (DEGs) in 9930 and Cuc37 meiotic stamens compared to transcriptome data for 9930 and Cuc37 leaf tissues from a previous study (Li et al. 2022). As shown in Supplemental Fig. S11A, 5,105 genes were significantly downregulated in 9930 meiotic stamens compared to leaves, and the promoters or 5′UTRs of 204 of these genes overlapped with DSB hotspots. Enriched GO terms of the 204 genes included response to biotic/abiotic stimulus, response to osmotic stress, photosynthesis-related processes, and amino acid catabolic and metabolic processes (Supplemental Fig. S11B and Table S11). In Cuc37 meiotic stamens, 264 of the 5,319 downregulated DEGs overlapped with Cuc37 DSB hotspots (Supplemental Fig. S11C). These genes were significantly associated with responses to defense and stress, systemic resistance, and photosynthesis (Supplemental Fig. S11D and Table S11). Genes with high AtSPO11-1-oligo levels were previously shown to be strongly enriched in GO terms related to biotic defense responses in Arabidopsis (Choi et al. 2018). A recent study showed that the meiotic DSB-associated transcriptional repression of genes was related to responses to abiotic/biotic stimuli in Arabidopsis meiocytes (Wang et al. 2022). Our results suggest that meiotic DSBs that occur around genes related to defense and responses to stimuli would tend to be repaired during meiosis and that transcriptional suppression at meiotic DSBs might be conserved in cucumber.
|
|
title
|
Characterization of meiotic DSB hotspots
|
|
p
|
Meiotic recombination begins with the formation of DSBs. Previous studies measured genome-wide DSB sites in plants by identifying binding sites for SPO11-1 (Choi et al. 2018), DMC1 (Tock et al. 2021), and RAD51 (He et al. 2017). To investigate meiotic DSBs in cucumber, we identified differential peaks of CsRAD51A in ChIP-seq data between meiotic and nonmeiotic tissues. As 2 biological replicates of CsRAD51A ChIP-seq data exhibited high similarities (R = 0.91 to 0.99, Pearson correlation coefficient), we combined the 2 data sets before peak calling for further analysis. We identified 49,261 and 57,551 CsRAD51A peaks in 9930 and Cuc37 meiotic buds, respectively. After carefully comparing different peaks with those of nonmeiotic tissues, 3,480 and 5,605 specific peaks remained in 9930 and Cuc37, respectively (Supplemental Table S7). Since ChIP-seq peak calling is based on signal-to-noise models, referring to the previous research (Brick et al. 2018), we considered these peaks to be meiotic DSB hotspots in cucumber.
|
|
p
|
The average number of DSB hotspots per chromosome was strongly proportional to chromosome length (Supplemental Fig. S8). In total, 2,287 out of 3,480 DSB hotspots in 9930 and 3,642 out of 5,605 DSB hotspots in Cuc37 occurred within CO intervals in the corresponding progeny populations and significantly overlapped with COs (P = 0.0069 in Cuc80 × 9930 and 0.018 in Cuc80 × Cuc37 for 10,000 times permutation test). Of the cucumber DSB hotspots, 93.2% to 95.3% occurred in gene bodies and surrounding regions (10 kb upstream to 10 kb downstream of genes). In both inbred lines, most DSB hotspots (accounting for 83% in 9930 and 89.1% in Cuc37) were present in intergenic regions and introns (Fig. 6, A and B). DSB hotspots were significantly enriched in intergenic regions and depleted in the 5′UTR and CDS regions (P < 0.05 for 10,000 times permutation test, Fig. 6, C and D; Supplemental Table S8).
|
|
figure
|
Figure 6. Genomic location annotation and motif analysis of meiotic DSB hotspots in cucumber. A, B) Enrichment proportion of DSB hotspots in different genomic components. Pie charts show the proportion of DSB hotspots overlapping with the CDS, UTR, intron, promoter (2-kb upstream of TSS), and distal intergenic region in A) 9930 and B) Cuc37, respectively. C, D) Relative enrichment and significance of DSB hotspots in different genomic components. Histogram charts represent the relative enrichment of DSB in corresponding genomic components in C) 9930 and D) Cuc37 normalized by the expected DSB numbers. Compared to expected number, deposition and depletion are marked by postive and negative log values, respectively. E) Most abundant DNA sequence motifs identified in 9930 and Cuc37 DSB hotspots. The alignment quality (P-value) and enrichment proportions are displayed. PWM, position weight matrix. Significances are assessed by 10,000 times permutation test. *P < 0.05, **P < 0.01, and ***P < 0.001.
|
|
label
|
Figure 6.
|
|
caption
|
Genomic location annotation and motif analysis of meiotic DSB hotspots in cucumber. A, B) Enrichment proportion of DSB hotspots in different genomic components. Pie charts show the proportion of DSB hotspots overlapping with the CDS, UTR, intron, promoter (2-kb upstream of TSS), and distal intergenic region in A) 9930 and B) Cuc37, respectively. C, D) Relative enrichment and significance of DSB hotspots in different genomic components. Histogram charts represent the relative enrichment of DSB in corresponding genomic components in C) 9930 and D) Cuc37 normalized by the expected DSB numbers. Compared to expected number, deposition and depletion are marked by postive and negative log values, respectively. E) Most abundant DNA sequence motifs identified in 9930 and Cuc37 DSB hotspots. The alignment quality (P-value) and enrichment proportions are displayed. PWM, position weight matrix. Significances are assessed by 10,000 times permutation test. *P < 0.05, **P < 0.01, and ***P < 0.001.
|
|
p
|
Genomic location annotation and motif analysis of meiotic DSB hotspots in cucumber. A, B) Enrichment proportion of DSB hotspots in different genomic components. Pie charts show the proportion of DSB hotspots overlapping with the CDS, UTR, intron, promoter (2-kb upstream of TSS), and distal intergenic region in A) 9930 and B) Cuc37, respectively. C, D) Relative enrichment and significance of DSB hotspots in different genomic components. Histogram charts represent the relative enrichment of DSB in corresponding genomic components in C) 9930 and D) Cuc37 normalized by the expected DSB numbers. Compared to expected number, deposition and depletion are marked by postive and negative log values, respectively. E) Most abundant DNA sequence motifs identified in 9930 and Cuc37 DSB hotspots. The alignment quality (P-value) and enrichment proportions are displayed. PWM, position weight matrix. Significances are assessed by 10,000 times permutation test. *P < 0.05, **P < 0.01, and ***P < 0.001.
|
|
p
|
To clarify the sequence characteristics of cucumber DSB hotspots, we looked for enriched known and de novo motifs. The motifs with the highest proportions of DSBs in both lines were polyC repeats and TA repeats (Fig. 6E). We also discovered several de novo motifs (Supplemental Fig. S9). As shown in Supplemental Fig. S10, similar types of motifs were present in different cucumber ecotypes, including polyC repeats and TA repeats, which share similar features with meiotic recombination-related motifs in maize (Zea mays) (He et al. 2017; Kianian et al. 2018), Arabidopsis (Choi et al. 2013), and mammals (Grey et al. 2018) (Supplemental Table S9). These results reveal the sequence bias in positioning DSB sites is conserved in plants including cucumber.
|
|
p
|
To investigate genes associated with DSBs in cucumber, we performed GO enrichment analysis of genes whose promoters or 5′UTRs overlapped with meiotic DSB hotspots. In total, 935 and 1,306 genes overlapped with meiotic DSB hotspots in 9930 and Cuc37, respectively. In 9930, the significant GO terms of DSB-overlapping genes were related to mitosis/meiosis, protein import-/localization-/modification-related processes, and the response to oxidative stress (Supplemental Table S10). The DSB-overlapping genes in Cuc37 were involved in vitamin biosynthetic and metabolic processes, exocytosis, and the salicylic acid–mediated signaling pathway (Supplemental Table S10).
|
|
p
|
Transcriptional silencing at DSB sites is required for DNA repair in mammals and plants (Machour and Ayoub 2020; Wang et al. 2022). To identify genes potentially associated with efficient DSB repair, we analyzed GO terms of transcriptionally repressed DSB-associated genes in cucumber. First, we identified the significantly downregulated differentially expressed genes (DEGs) in 9930 and Cuc37 meiotic stamens compared to transcriptome data for 9930 and Cuc37 leaf tissues from a previous study (Li et al. 2022). As shown in Supplemental Fig. S11A, 5,105 genes were significantly downregulated in 9930 meiotic stamens compared to leaves, and the promoters or 5′UTRs of 204 of these genes overlapped with DSB hotspots. Enriched GO terms of the 204 genes included response to biotic/abiotic stimulus, response to osmotic stress, photosynthesis-related processes, and amino acid catabolic and metabolic processes (Supplemental Fig. S11B and Table S11). In Cuc37 meiotic stamens, 264 of the 5,319 downregulated DEGs overlapped with Cuc37 DSB hotspots (Supplemental Fig. S11C). These genes were significantly associated with responses to defense and stress, systemic resistance, and photosynthesis (Supplemental Fig. S11D and Table S11). Genes with high AtSPO11-1-oligo levels were previously shown to be strongly enriched in GO terms related to biotic defense responses in Arabidopsis (Choi et al. 2018). A recent study showed that the meiotic DSB-associated transcriptional repression of genes was related to responses to abiotic/biotic stimuli in Arabidopsis meiocytes (Wang et al. 2022). Our results suggest that meiotic DSBs that occur around genes related to defense and responses to stimuli would tend to be repaired during meiosis and that transcriptional suppression at meiotic DSBs might be conserved in cucumber.
|
|
sec
|
Epigenetic profiles of meiotic DSB hotspots and genes
To evaluate the epigenetic environment of cucumber DSB sites, we calculated the enrichment of epigenetic markers in DSB hotspots. We identified 23,250 H3K4me3 peaks in 9930 and 22,434 in Cuc37. In addition, we detected 13,855 and 14,005 active chromatin regions by FAIRE-seq peak calling in 9930 and Cuc37 meiotic tissues, respectively. We detected 2,510,378 mCG sites, 1,368,475 mCHG sites, and 355,206 mCHH sites in 9930 and 4,313,454 mCG sites, 1,392,300 mCHG sites, and 351,535 mCHH sites in Cuc37. In 9930, DSB hotspots were significantly depleted of H3K4me3 markers and open chromatin but contained abundant mCHH sites (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). In Cuc37, DSB hotspots were also significantly depleted of H3K4me3 but were enriched in mCG and mCHG sites (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). By contrast, CO intervals significantly overlapped with H3K4me3 markers, open chromatin, and mCHH sites in both populations and showed significantly less enrichment of mCG sites in Cuc80 × 9930 (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). These results imply that the epigenetic environment differs in DSB hotspots and CO sites during meiosis.
We visualized the average profiles of the epigenetic status around DSB hotspots. To show the specificity of epigenetic enrichment around DSB hotspots, we used random genomic regions with similar features as DSBs for comparison. Then we further divided the DSB hotspots into 2 groups based on decreasing recombination rate to explore the roles of epigenetic modifications in local CO formation of DSBs. As shown in Fig. 7, the CO rates of DSB sites were positively correlated with the strength of the CsRAD51A ChIP-seq signal. The H3K4me3 modification was specifically absent in meiotic DSBs and their surrounding regions, but there was no obvious association with local CO formation. Compared to random genomic loci, the flanking regions of DSB hotspots displayed more chromatin accessibility but not DSB peaks. Moreover, active chromatin was positively correlated with the recombination rates of DSBs. By contrast, DNA methylation was specifically enriched within DSBs and was negatively correlated with CO rates, especially mCG and mCHG. Our data indicate that more CsRAD51A binding, more chromatin accessibility, and lower mCG and mCHG levels are tightly associated with higher CO frequencies in cucumber DSB hotspots.
Figure 7. Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around meiotic DSB hotspots in cucumber. A) DSB hotspots in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 749; Quantile 2 in blue, n = 2,731) or random grouping (Random 1 in red, n = 749; Random 2 in blue, n = 2,731), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 962; Random 2 in blue, n = 2,518). Quantile 1 for DSBs with recombination rate higher than genomic average Quantile 2 for DSBs with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) DSB hotspots in Cuc37. Quantile 1 and Random 1 for DSBs, n = 1,079; Quantile 2 and Random 2 for DSBs, n = 4,526. Random 1 for genomic regions, n = 1,627; Random 2 for genomic regions, n = 3,978.We also examined the profiles of epigenetic information around genes. Compared to random genomic loci that shared similar features with gene regions, the signals of CsRAD51A ChIP-seq, FAIRE-seq, mCHG, and mCHH decreased at gene body regions but slightly increased at the flanking regions. Moreover, CsRAD51A binding and chromatin accessibility were enhanced at TSSs compared to transcription termination sites (TTSs) (Fig. 8). By contrast, there was more H3K4me3 binding on gene body regions than their upstream and downstream regions, and stronger signals were shown near TSSs. We also detected lower mCG levels around TSS and TTS regions. To further explore the properties associated with CO rates in gene regions, we defined 2 groups of cucumber genes based on decreasing recombination rates. We observed significantly higher levels of CsRAD51A binding and H3K4me3 in CO-active gene regions, whereas genes with lower mCG and mCHG levels exhibited higher recombination rates. In summary, CsRAD51A binding, chromatin accessibility, and hypomethylation were positively correlated with local CO rates in gene regions.
Figure 8. Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around gene regions in cucumber. A) Gene regions in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 10,045; Quantile 2 in blue, n = 13,477) and randomly grouping (Random 1 in red, n = 10,045; Random 2 in blue, n = 13,477), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 11,760; Random 2 in blue, n = 11,762). Quantile 1 for genes with recombination rate higher than genomic average and Quantile 2 for genes with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) Gene regions in Cuc37. Quantile 1 and Random 1 for genes, n = 9,906; Quantile 2 and Random 2 for genes, n = 13,616. Random 1 for genomic regions, n = 11,760; Random 2 for genomic regions, n = 11,762.
|
|
title
|
Epigenetic profiles of meiotic DSB hotspots and genes
|
|
p
|
To evaluate the epigenetic environment of cucumber DSB sites, we calculated the enrichment of epigenetic markers in DSB hotspots. We identified 23,250 H3K4me3 peaks in 9930 and 22,434 in Cuc37. In addition, we detected 13,855 and 14,005 active chromatin regions by FAIRE-seq peak calling in 9930 and Cuc37 meiotic tissues, respectively. We detected 2,510,378 mCG sites, 1,368,475 mCHG sites, and 355,206 mCHH sites in 9930 and 4,313,454 mCG sites, 1,392,300 mCHG sites, and 351,535 mCHH sites in Cuc37. In 9930, DSB hotspots were significantly depleted of H3K4me3 markers and open chromatin but contained abundant mCHH sites (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). In Cuc37, DSB hotspots were also significantly depleted of H3K4me3 but were enriched in mCG and mCHG sites (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). By contrast, CO intervals significantly overlapped with H3K4me3 markers, open chromatin, and mCHH sites in both populations and showed significantly less enrichment of mCG sites in Cuc80 × 9930 (P < 0.05 for 10,000 times permutation test, Supplemental Table S12). These results imply that the epigenetic environment differs in DSB hotspots and CO sites during meiosis.
|
|
p
|
We visualized the average profiles of the epigenetic status around DSB hotspots. To show the specificity of epigenetic enrichment around DSB hotspots, we used random genomic regions with similar features as DSBs for comparison. Then we further divided the DSB hotspots into 2 groups based on decreasing recombination rate to explore the roles of epigenetic modifications in local CO formation of DSBs. As shown in Fig. 7, the CO rates of DSB sites were positively correlated with the strength of the CsRAD51A ChIP-seq signal. The H3K4me3 modification was specifically absent in meiotic DSBs and their surrounding regions, but there was no obvious association with local CO formation. Compared to random genomic loci, the flanking regions of DSB hotspots displayed more chromatin accessibility but not DSB peaks. Moreover, active chromatin was positively correlated with the recombination rates of DSBs. By contrast, DNA methylation was specifically enriched within DSBs and was negatively correlated with CO rates, especially mCG and mCHG. Our data indicate that more CsRAD51A binding, more chromatin accessibility, and lower mCG and mCHG levels are tightly associated with higher CO frequencies in cucumber DSB hotspots.
|
|
figure
|
Figure 7. Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around meiotic DSB hotspots in cucumber. A) DSB hotspots in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 749; Quantile 2 in blue, n = 2,731) or random grouping (Random 1 in red, n = 749; Random 2 in blue, n = 2,731), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 962; Random 2 in blue, n = 2,518). Quantile 1 for DSBs with recombination rate higher than genomic average Quantile 2 for DSBs with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) DSB hotspots in Cuc37. Quantile 1 and Random 1 for DSBs, n = 1,079; Quantile 2 and Random 2 for DSBs, n = 4,526. Random 1 for genomic regions, n = 1,627; Random 2 for genomic regions, n = 3,978.
|
|
label
|
Figure 7.
|
|
caption
|
Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around meiotic DSB hotspots in cucumber. A) DSB hotspots in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 749; Quantile 2 in blue, n = 2,731) or random grouping (Random 1 in red, n = 749; Random 2 in blue, n = 2,731), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 962; Random 2 in blue, n = 2,518). Quantile 1 for DSBs with recombination rate higher than genomic average Quantile 2 for DSBs with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) DSB hotspots in Cuc37. Quantile 1 and Random 1 for DSBs, n = 1,079; Quantile 2 and Random 2 for DSBs, n = 4,526. Random 1 for genomic regions, n = 1,627; Random 2 for genomic regions, n = 3,978.
|
|
p
|
Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around meiotic DSB hotspots in cucumber. A) DSB hotspots in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 749; Quantile 2 in blue, n = 2,731) or random grouping (Random 1 in red, n = 749; Random 2 in blue, n = 2,731), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 962; Random 2 in blue, n = 2,518). Quantile 1 for DSBs with recombination rate higher than genomic average Quantile 2 for DSBs with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) DSB hotspots in Cuc37. Quantile 1 and Random 1 for DSBs, n = 1,079; Quantile 2 and Random 2 for DSBs, n = 4,526. Random 1 for genomic regions, n = 1,627; Random 2 for genomic regions, n = 3,978.
|
|
p
|
We also examined the profiles of epigenetic information around genes. Compared to random genomic loci that shared similar features with gene regions, the signals of CsRAD51A ChIP-seq, FAIRE-seq, mCHG, and mCHH decreased at gene body regions but slightly increased at the flanking regions. Moreover, CsRAD51A binding and chromatin accessibility were enhanced at TSSs compared to transcription termination sites (TTSs) (Fig. 8). By contrast, there was more H3K4me3 binding on gene body regions than their upstream and downstream regions, and stronger signals were shown near TSSs. We also detected lower mCG levels around TSS and TTS regions. To further explore the properties associated with CO rates in gene regions, we defined 2 groups of cucumber genes based on decreasing recombination rates. We observed significantly higher levels of CsRAD51A binding and H3K4me3 in CO-active gene regions, whereas genes with lower mCG and mCHG levels exhibited higher recombination rates. In summary, CsRAD51A binding, chromatin accessibility, and hypomethylation were positively correlated with local CO rates in gene regions.
|
|
figure
|
Figure 8. Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around gene regions in cucumber. A) Gene regions in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 10,045; Quantile 2 in blue, n = 13,477) and randomly grouping (Random 1 in red, n = 10,045; Random 2 in blue, n = 13,477), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 11,760; Random 2 in blue, n = 11,762). Quantile 1 for genes with recombination rate higher than genomic average and Quantile 2 for genes with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) Gene regions in Cuc37. Quantile 1 and Random 1 for genes, n = 9,906; Quantile 2 and Random 2 for genes, n = 13,616. Random 1 for genomic regions, n = 11,760; Random 2 for genomic regions, n = 11,762.
|
|
label
|
Figure 8.
|
|
caption
|
Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around gene regions in cucumber. A) Gene regions in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 10,045; Quantile 2 in blue, n = 13,477) and randomly grouping (Random 1 in red, n = 10,045; Random 2 in blue, n = 13,477), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 11,760; Random 2 in blue, n = 11,762). Quantile 1 for genes with recombination rate higher than genomic average and Quantile 2 for genes with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) Gene regions in Cuc37. Quantile 1 and Random 1 for genes, n = 9,906; Quantile 2 and Random 2 for genes, n = 13,616. Random 1 for genomic regions, n = 11,760; Random 2 for genomic regions, n = 11,762.
|
|
p
|
Aggregated plots of CsRAD51A, H3K4 trimethylation, chromatin accessibility, and DNA methylation around gene regions in cucumber. A) Gene regions in 9930 were divided into 2 groups based on decreasing recombination rate (Quantile 1 in red, n = 10,045; Quantile 2 in blue, n = 13,477) and randomly grouping (Random 1 in red, n = 10,045; Random 2 in blue, n = 13,477), respectively. For comparison, the random genomic region with the same length and chromosome distribution was taken as control (Random 1 in red, n = 11,760; Random 2 in blue, n = 11,762). Quantile 1 for genes with recombination rate higher than genomic average and Quantile 2 for genes with recombination rate lower than genomic average. Random 1 and Random 2 for randomly dividing evenly into 2 groups. Lines show the average epigenetic signals and corresponding shadows for a 95% confidence interval. H3K4me3, histone H3 lysine 4 trimethylation; FAIRE, formaldehyde-assisted isolation of regulatory elements; cM/Mb, centimorgan per megabase. B) Gene regions in Cuc37. Quantile 1 and Random 1 for genes, n = 9,906; Quantile 2 and Random 2 for genes, n = 13,616. Random 1 for genomic regions, n = 11,760; Random 2 for genomic regions, n = 11,762.
|
|
sec
|
Global CO landscape and suppression, as determined by integrating multiomic data
To further verify the relationships between multiomic features and CO landscapes, we predicted megabase-scale CO rates using the random forest algorithm with the following formula: CO rate ∼ CsRAD51A + H3K4me3 + FAIRE + mCG + mCHG + mCHH + gene density + repeat density + SNP density. The nonlinear coefficients between observed and predicted CO frequencies were 0.5342 in Cuc80 × 9930 and 0.3998 in Cuc80 × Cuc37 (Fig. 9; Supplemental Fig. S12, A and B). The ranking of importance for predicting CO frequencies in Cuc80 × 9930 (from high to low) was gene density, mCHG, chromatin accessibility, H3K4me3 modification, CsRAD51A deposition, mCG, repeat density, mCHH, and SNP density (Supplemental Fig. S12C). In predicting the CO landscape of Cuc80 × Cuc37, gene density, H3K4me3 modification, and DNA methylation appeared to be more important than the other features (Supplemental Fig. S12D). When only genomic features were considered, the correlations decreased to 0.4514 in Cuc80 × 9930 and 0.3327 in Cuc80 × Cuc37, highlighting the contributions of epigenetic information to CO formation.
Figure 9. Comparison of observed and predicted CO frequency along cucumber chromosomes. The predictions of CO landscapes were performed by a random forest regression. Line plots strand for observed and predicted CO rates in 2.5-Mb sliding windows in A) Cuc80 × 9930 and B) Cuc80 × Cuc37.We also performed random forest classification to predict megabase-scale CO deserts (CO cold regions) and plotted the receiver operating characteristic (ROC) curve to evaluate the efficiency of each classifier. The accuracies of predictions were 96.0% in Cuc80 × 9930 and 97.4% in Cuc80 × Cuc37 individually. The random forest classifiers with epigenetic and genetic features outperformed with only genetic features in both populations (Supplemental Fig. S13). The good performance of predictions of the CO landscape and CO suppression region supports the notion that the epigenetic and genetic features discussed here are closely related to CO formation in cucumber.
|
|
title
|
Global CO landscape and suppression, as determined by integrating multiomic data
|
|
p
|
To further verify the relationships between multiomic features and CO landscapes, we predicted megabase-scale CO rates using the random forest algorithm with the following formula: CO rate ∼ CsRAD51A + H3K4me3 + FAIRE + mCG + mCHG + mCHH + gene density + repeat density + SNP density. The nonlinear coefficients between observed and predicted CO frequencies were 0.5342 in Cuc80 × 9930 and 0.3998 in Cuc80 × Cuc37 (Fig. 9; Supplemental Fig. S12, A and B). The ranking of importance for predicting CO frequencies in Cuc80 × 9930 (from high to low) was gene density, mCHG, chromatin accessibility, H3K4me3 modification, CsRAD51A deposition, mCG, repeat density, mCHH, and SNP density (Supplemental Fig. S12C). In predicting the CO landscape of Cuc80 × Cuc37, gene density, H3K4me3 modification, and DNA methylation appeared to be more important than the other features (Supplemental Fig. S12D). When only genomic features were considered, the correlations decreased to 0.4514 in Cuc80 × 9930 and 0.3327 in Cuc80 × Cuc37, highlighting the contributions of epigenetic information to CO formation.
|
|
figure
|
Figure 9. Comparison of observed and predicted CO frequency along cucumber chromosomes. The predictions of CO landscapes were performed by a random forest regression. Line plots strand for observed and predicted CO rates in 2.5-Mb sliding windows in A) Cuc80 × 9930 and B) Cuc80 × Cuc37.
|
|
label
|
Figure 9.
|
|
caption
|
Comparison of observed and predicted CO frequency along cucumber chromosomes. The predictions of CO landscapes were performed by a random forest regression. Line plots strand for observed and predicted CO rates in 2.5-Mb sliding windows in A) Cuc80 × 9930 and B) Cuc80 × Cuc37.
|
|
p
|
Comparison of observed and predicted CO frequency along cucumber chromosomes. The predictions of CO landscapes were performed by a random forest regression. Line plots strand for observed and predicted CO rates in 2.5-Mb sliding windows in A) Cuc80 × 9930 and B) Cuc80 × Cuc37.
|
|
p
|
We also performed random forest classification to predict megabase-scale CO deserts (CO cold regions) and plotted the receiver operating characteristic (ROC) curve to evaluate the efficiency of each classifier. The accuracies of predictions were 96.0% in Cuc80 × 9930 and 97.4% in Cuc80 × Cuc37 individually. The random forest classifiers with epigenetic and genetic features outperformed with only genetic features in both populations (Supplemental Fig. S13). The good performance of predictions of the CO landscape and CO suppression region supports the notion that the epigenetic and genetic features discussed here are closely related to CO formation in cucumber.
|
|
sec
|
CO alterations determined by distinct epigenetics and SVs
Although the distribution profiles of multiomic data along chromosomes were similar between the 2 germplasms, many local differences still existed. For the CsRAD51A ChIP-seq data, only 599 DSB hotspots from 9930 overlapped with 596 peaks from Cuc37, accounting for 17.2% and 10.6% of total peaks in 9930 and Cuc37, respectively. The vast majority of H3K4me3 peaks in both inbred lines overlapped, including 92.8% of 23,250 peaks from 9930 and 99.3% of 22,434 peaks from Cuc37. Whereas 13,855 and 14,005 active chromatin regions were identified in 9930 and Cuc37, respectively, only 6,371 peaks overlapped between the 2 lines. We also identified 1,338 mCG, 431 mCHG, and 986 mCHH hypo-differentially methylated regions (hypo-DMRs) in 9930 compared to Cuc37, as well as 2,429 mCG, 1,285 mCHG, and 2,284 mCHH hypo-DMRs in Cuc37.
To explore the potential impacts of genetic and epigenetic factors on differences in CO events between hybrids, we considered nonoverlapping COs between the 2 populations to be “unique COs” and measured the overlap of unique COs with distinct epigenetic modifications and SVs. In total, 33% to 40% of unique COs overlapped with specific DSB sites, epigenetic modifications, and heterozygous SVs from the corresponding populations. The unique COs of Cuc80 × 9930 were significantly enriched within 9930-specific DSB hotspots, SV escape (i.e. the absence of a heterozygous SV in the region), and FAIRE + H3K4me3 + hypo-mC interactions (P < 0.05 for 10,000 times permutation test, Fig. 10A; Supplemental Table S13). We detected significant correlations between Cuc37-specific DSB hotspot, SV escape, hypo-mC, SV escape + hypo-mC, DSB hotspot + SV escape, DSB hotspot + SV escape + hypo-mC interactions, and unique COs of Cuc80 × Cuc37 (P < 0.05 for 10,000 times permutation test, Fig. 10B; Supplemental Table S13). Several unique COs overlapped with these features and interactions at the local scale (Supplemental Fig. S14). These data suggest that CO alterations among hybrids could be controlled by specific DSBs, SVs, epigenetic modifications, or their interactions.
Figure 10. Enrichment of distinct epigenetic modifications and SVs in CO alterations between cucumber hybrids. Upset diagrams show the number of unique and overlapping with distinct meiotic DSB hotspots, open chromatin (FAIRE), H3K4me3, hypo-DMR, absence of a heterozygous SV (SV escape) and interaction of features in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Pie charts show the total proportion of unique COs overlapping with distinct features and interactions. Red means features significantly overlapped with unique COs (P < 0.05 for 10,000 permutation test).
|
|
title
|
CO alterations determined by distinct epigenetics and SVs
|
|
p
|
Although the distribution profiles of multiomic data along chromosomes were similar between the 2 germplasms, many local differences still existed. For the CsRAD51A ChIP-seq data, only 599 DSB hotspots from 9930 overlapped with 596 peaks from Cuc37, accounting for 17.2% and 10.6% of total peaks in 9930 and Cuc37, respectively. The vast majority of H3K4me3 peaks in both inbred lines overlapped, including 92.8% of 23,250 peaks from 9930 and 99.3% of 22,434 peaks from Cuc37. Whereas 13,855 and 14,005 active chromatin regions were identified in 9930 and Cuc37, respectively, only 6,371 peaks overlapped between the 2 lines. We also identified 1,338 mCG, 431 mCHG, and 986 mCHH hypo-differentially methylated regions (hypo-DMRs) in 9930 compared to Cuc37, as well as 2,429 mCG, 1,285 mCHG, and 2,284 mCHH hypo-DMRs in Cuc37.
|
|
p
|
To explore the potential impacts of genetic and epigenetic factors on differences in CO events between hybrids, we considered nonoverlapping COs between the 2 populations to be “unique COs” and measured the overlap of unique COs with distinct epigenetic modifications and SVs. In total, 33% to 40% of unique COs overlapped with specific DSB sites, epigenetic modifications, and heterozygous SVs from the corresponding populations. The unique COs of Cuc80 × 9930 were significantly enriched within 9930-specific DSB hotspots, SV escape (i.e. the absence of a heterozygous SV in the region), and FAIRE + H3K4me3 + hypo-mC interactions (P < 0.05 for 10,000 times permutation test, Fig. 10A; Supplemental Table S13). We detected significant correlations between Cuc37-specific DSB hotspot, SV escape, hypo-mC, SV escape + hypo-mC, DSB hotspot + SV escape, DSB hotspot + SV escape + hypo-mC interactions, and unique COs of Cuc80 × Cuc37 (P < 0.05 for 10,000 times permutation test, Fig. 10B; Supplemental Table S13). Several unique COs overlapped with these features and interactions at the local scale (Supplemental Fig. S14). These data suggest that CO alterations among hybrids could be controlled by specific DSBs, SVs, epigenetic modifications, or their interactions.
|
|
figure
|
Figure 10. Enrichment of distinct epigenetic modifications and SVs in CO alterations between cucumber hybrids. Upset diagrams show the number of unique and overlapping with distinct meiotic DSB hotspots, open chromatin (FAIRE), H3K4me3, hypo-DMR, absence of a heterozygous SV (SV escape) and interaction of features in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Pie charts show the total proportion of unique COs overlapping with distinct features and interactions. Red means features significantly overlapped with unique COs (P < 0.05 for 10,000 permutation test).
|
|
label
|
Figure 10.
|
|
caption
|
Enrichment of distinct epigenetic modifications and SVs in CO alterations between cucumber hybrids. Upset diagrams show the number of unique and overlapping with distinct meiotic DSB hotspots, open chromatin (FAIRE), H3K4me3, hypo-DMR, absence of a heterozygous SV (SV escape) and interaction of features in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Pie charts show the total proportion of unique COs overlapping with distinct features and interactions. Red means features significantly overlapped with unique COs (P < 0.05 for 10,000 permutation test).
|
|
p
|
Enrichment of distinct epigenetic modifications and SVs in CO alterations between cucumber hybrids. Upset diagrams show the number of unique and overlapping with distinct meiotic DSB hotspots, open chromatin (FAIRE), H3K4me3, hypo-DMR, absence of a heterozygous SV (SV escape) and interaction of features in A) Cuc80 × 9930 and B) Cuc80 × Cuc37. Pie charts show the total proportion of unique COs overlapping with distinct features and interactions. Red means features significantly overlapped with unique COs (P < 0.05 for 10,000 permutation test).
|
|
sec
|
Discussion
In this study, we draw the high-resolution landscapes of meiotic DSBs and COs and constructed a multiomic map of meiotic recombination in cucumber. We systematically analyzed the distribution of meiotic COs in different cucumber varieties and compared them. Combined with the genetic information inherited from parental lines, we investigated the impacts of genomic and epigenetic features on CO formation and alterations in the corresponding progeny.
The average number of COs per chromosome per meiosis in the cucumber hybrids was 0.92 to 0.99. The CO number per bivalent in flowering plants spans a limited range (0.76 to 1.9), with genome sizes ranging from 120 Mb to 3 Gb (Li et al. 2015; Zhang et al. 2021; Lian et al. 2022), suggesting that CO assurance is independent of genome size. Most COs occurred within or near gene bodies (Fig. 2). In addition, gene density was a major factor in predicting the CO landscape in cucumber at the megabase scale (Supplemental Fig. S12, C and D). This finding is consistent with the positive relationship between global CO frequency and gene density identified in previous studies (Marand et al. 2017; Kianian et al. 2018; Cai et al. 2023). Perhaps the preferential distribution of COs in gene-rich regions is a by-product of chromatin accessibility around genes.
Meiotic DSB formation is required for CO formation. The number and distribution of COs are tightly associated with those of DSB sites in various plants. In Arabidopsis and wheat (Triticum aestivum), mutations of SPO11-1 decreased the total number of COs (Xue et al. 2018; Hyde et al. 2023). Site-specific CO formation was induced in tomato (Solanum lycopersicum) and maize somatic cells by creating targeted DSB sites (Filler Hayut et al. 2017; Kouranov et al. 2022). Here, we determined that cucumber COs were significantly enriched in meiotic DSB hotspots. At the chromosome scale, we observed a significant positive correlation between the average number of DSB hotspots and chromosome length (Supplemental Fig. S8), which explains the varying CO numbers among different chromosomes in each population (Supplemental Fig. S1, D to F). In addition, DSB hotspots and COs mainly occurred around genes but were significantly depleted in CDS regions due to the presence of abundant nucleosomes.
Plant CO sites always harbor active chromatin and low DNA methylation levels. Here, we revealed preferential H3K4me3 marks and chromatin accessibility within CO intervals in cucumber (Supplemental Table S12). By contrast, H3K4m3 peaks were strongly depleted in 9930 and Cuc37 DSB hotspots, and open chromatin exhibited significantly fewer overlaps with 9930 DSBs than expected (Supplemental Table S12). Further analysis of epigenetic meta-profiles in DSB hotspots showed that a relaxed chromatin environment, coupled with CsRAD51A signals and hypomethylation, was positively correlated with the CO rates of cucumber DSBs (Fig. 7). Our data suggest that these euchromatin marks (H3K4me3 modifications and open chromatin) might participate in DSB repair toward CO formation. Further studies based on higher-resolution CO data are still needed. Although we did not detect significant depletion of DNA methylation in cucumber COs, we observed negative correlations between DNA methylation levels and CO frequency genome-wide (Fig. 5) and at a fine scale (such as DSBs and genes) (Figs. 7 and 8), which aligns with previous findings in Arabidopsis, potato (Solanum tuberosum), and wheat (Marand et al. 2017; Choi et al. 2018; Tock et al. 2021). A deficiency in global DNA methylation is thought to promote COs on chromosome arms but to simultaneously inhibit COs in the pericentromeric region (Melamed-Bessudo and Levy 2012; Mirouze et al. 2012; Yelina et al. 2012). It is reasonable to speculate that abolishing DNA methylation indirectly affects CO formation, perhaps in cooperation with other factors. Although some small inconsistencies existed among varieties within assigned regions, like DSBs and genes (Figs. 7 and 8), which might be partly attributed to unexplored maternal epigenetic imprinting, our data generally support the conserved roles of multiple epigenetic modifications in regulating meiotic CO formation in plants. Further studies are needed to elucidate the precise regulatory mechanisms and their interactions.
We reported that the positioning of meiotic COs is sensitive to polymorphism in cucumber. In Arabidopsis and Brassica oleracea, a parabolic relationship was observed between the SNP density and CO rate genome-wide, and fine-scale CO frequency was both positively and negatively associated with sequence polymorphism (Blackwell et al. 2020; Cai et al. 2023). The strongly increased SNP density around COs corresponds well with MutS Homolog2 (MSH2)–dependent CO remodeling, in which CO frequency in heterozygosity increased when juxtaposed with homozygosity (Blackwell et al. 2020). By contrast, CO depletion in regions with higher densities of polymorphisms might be the production of a loop structure or when homolog pairing is disturbed by heterozygous SVs. Taking advantage of this feature, COs were successfully abolished and restored by introducing and reversing SVs, respectively, between homologs in Arabidopsis (Schmidt et al. 2020; Rönspies et al. 2022). However, there were certain limitations: Szymanska-Lejman et al. (2023) found that SVs do not alter the CO frequency in Arabidopsis CO hotspots unless they are directly located within hotspots and that the stimulation of CO formation in hotspots in response to local polymorphism is dependent on MSH2.
We revealed that the CO landscapes are generally conserved between the 2 cucumber hybrids but differ in the total CO number and local distribution. The high similarity of global CO frequency might be related to the similar gene densities and H3K4me3 modifications between varieties. To date, several CO modifier loci driving allelic differences in CO formation have been identified by genetic mapping of quantitative trait loci in natural recombinant populations. Human Enhancer of Invasion 10 (HEI10) functions in the Class I CO pathway. Natural variations in AtHEI10 explained 23.3% of recombination divergence in Arabidopsis Col × Ler F2 populations, and increased AtHEI10 dosage elevates euchromatic CO (Ziolkowski et al. 2017). A premature stop codon within TBP-ASSOCIATED FACTOR 4b (TAF4b) leads to a genome-wide decrease in CO events in Arabidopsis by mediating meiocyte-specific expression and germline transcription (Lawrence et al. 2019). SUPPRESSOR OF NPR1-1 INDUCIBLE (SNI1) encodes a component of the structural maintenance of chromosome complexes (SMC5/6 complex). Mutations of SNI1 result in reduced CO interference and CO remodeling in Arabidopsis (Zhu et al. 2021). REC8, encoding a cohesin subunit required for meiosis, is a candidate gene underlying the intraspecific divergence of CO within hybrid populations of barley (Hordeumspontaneum and H. vulgare) (Dreissig et al. 2020). In addition, the local recombination rates were both significantly higher and lower in Arabidopsis F1 heterozygotes than in homozygotes (Ziolkowski et al. 2015). The poor associations between CO frequency and polymorphism level suggest that polymorphism does not directly define CO variations between hybrids.
Here, we carried out a pairwise comparison of parental epigenetic information and CO landscapes in the progenies of different cucumber varieties, allowing us to estimate the epigenetic contributions of parental inheritance in CO divergence. Nearly one-third of hybrid-specific COs overlapped with distinct genetic and epigenetic features, and the positioning of CO alterations was tightly linked to specific DSB events, heterozygous SVs, and their interactions with epigenetic modifications (Fig. 10). This observation provides a clue about how parental selection drives the formation of hybrid-specific COs at the epigenetic level. Although DNA methylation imprints are faithfully replicated during cell division, mCHH in male meiocytes undergoes reprogramming during the development of the male sexual lineage (Walker et al. 2018). Together with the germline-specific DNA methylation modifications in meiocytes compared to somatic cells, the actual proportions of CO alterations among hybrids caused by DMRs might be overestimated in this study (Fig. 10). The lack of a more refined meiocyte sampling method in cucumber and additional data prevents us from making a solid conclusion.
Plant breeders are willing to maintain beneficial gene combinations and break unfavorable linkage drag in crop improvement. To meet these requirements, precise and controllable recombination is needed for next-generation plant breeding. Further research on meiotic recombination will advance our understanding of factors that enhance or inhibit CO formation and facilitate the cooperative utilization of these factors to manipulate recombination at target sites during breeding.
|
|
title
|
Discussion
|
|
p
|
In this study, we draw the high-resolution landscapes of meiotic DSBs and COs and constructed a multiomic map of meiotic recombination in cucumber. We systematically analyzed the distribution of meiotic COs in different cucumber varieties and compared them. Combined with the genetic information inherited from parental lines, we investigated the impacts of genomic and epigenetic features on CO formation and alterations in the corresponding progeny.
|
|
p
|
The average number of COs per chromosome per meiosis in the cucumber hybrids was 0.92 to 0.99. The CO number per bivalent in flowering plants spans a limited range (0.76 to 1.9), with genome sizes ranging from 120 Mb to 3 Gb (Li et al. 2015; Zhang et al. 2021; Lian et al. 2022), suggesting that CO assurance is independent of genome size. Most COs occurred within or near gene bodies (Fig. 2). In addition, gene density was a major factor in predicting the CO landscape in cucumber at the megabase scale (Supplemental Fig. S12, C and D). This finding is consistent with the positive relationship between global CO frequency and gene density identified in previous studies (Marand et al. 2017; Kianian et al. 2018; Cai et al. 2023). Perhaps the preferential distribution of COs in gene-rich regions is a by-product of chromatin accessibility around genes.
|
|
p
|
Meiotic DSB formation is required for CO formation. The number and distribution of COs are tightly associated with those of DSB sites in various plants. In Arabidopsis and wheat (Triticum aestivum), mutations of SPO11-1 decreased the total number of COs (Xue et al. 2018; Hyde et al. 2023). Site-specific CO formation was induced in tomato (Solanum lycopersicum) and maize somatic cells by creating targeted DSB sites (Filler Hayut et al. 2017; Kouranov et al. 2022). Here, we determined that cucumber COs were significantly enriched in meiotic DSB hotspots. At the chromosome scale, we observed a significant positive correlation between the average number of DSB hotspots and chromosome length (Supplemental Fig. S8), which explains the varying CO numbers among different chromosomes in each population (Supplemental Fig. S1, D to F). In addition, DSB hotspots and COs mainly occurred around genes but were significantly depleted in CDS regions due to the presence of abundant nucleosomes.
|
|
p
|
Plant CO sites always harbor active chromatin and low DNA methylation levels. Here, we revealed preferential H3K4me3 marks and chromatin accessibility within CO intervals in cucumber (Supplemental Table S12). By contrast, H3K4m3 peaks were strongly depleted in 9930 and Cuc37 DSB hotspots, and open chromatin exhibited significantly fewer overlaps with 9930 DSBs than expected (Supplemental Table S12). Further analysis of epigenetic meta-profiles in DSB hotspots showed that a relaxed chromatin environment, coupled with CsRAD51A signals and hypomethylation, was positively correlated with the CO rates of cucumber DSBs (Fig. 7). Our data suggest that these euchromatin marks (H3K4me3 modifications and open chromatin) might participate in DSB repair toward CO formation. Further studies based on higher-resolution CO data are still needed. Although we did not detect significant depletion of DNA methylation in cucumber COs, we observed negative correlations between DNA methylation levels and CO frequency genome-wide (Fig. 5) and at a fine scale (such as DSBs and genes) (Figs. 7 and 8), which aligns with previous findings in Arabidopsis, potato (Solanum tuberosum), and wheat (Marand et al. 2017; Choi et al. 2018; Tock et al. 2021). A deficiency in global DNA methylation is thought to promote COs on chromosome arms but to simultaneously inhibit COs in the pericentromeric region (Melamed-Bessudo and Levy 2012; Mirouze et al. 2012; Yelina et al. 2012). It is reasonable to speculate that abolishing DNA methylation indirectly affects CO formation, perhaps in cooperation with other factors. Although some small inconsistencies existed among varieties within assigned regions, like DSBs and genes (Figs. 7 and 8), which might be partly attributed to unexplored maternal epigenetic imprinting, our data generally support the conserved roles of multiple epigenetic modifications in regulating meiotic CO formation in plants. Further studies are needed to elucidate the precise regulatory mechanisms and their interactions.
|
|
p
|
We reported that the positioning of meiotic COs is sensitive to polymorphism in cucumber. In Arabidopsis and Brassica oleracea, a parabolic relationship was observed between the SNP density and CO rate genome-wide, and fine-scale CO frequency was both positively and negatively associated with sequence polymorphism (Blackwell et al. 2020; Cai et al. 2023). The strongly increased SNP density around COs corresponds well with MutS Homolog2 (MSH2)–dependent CO remodeling, in which CO frequency in heterozygosity increased when juxtaposed with homozygosity (Blackwell et al. 2020). By contrast, CO depletion in regions with higher densities of polymorphisms might be the production of a loop structure or when homolog pairing is disturbed by heterozygous SVs. Taking advantage of this feature, COs were successfully abolished and restored by introducing and reversing SVs, respectively, between homologs in Arabidopsis (Schmidt et al. 2020; Rönspies et al. 2022). However, there were certain limitations: Szymanska-Lejman et al. (2023) found that SVs do not alter the CO frequency in Arabidopsis CO hotspots unless they are directly located within hotspots and that the stimulation of CO formation in hotspots in response to local polymorphism is dependent on MSH2.
|
|
p
|
We revealed that the CO landscapes are generally conserved between the 2 cucumber hybrids but differ in the total CO number and local distribution. The high similarity of global CO frequency might be related to the similar gene densities and H3K4me3 modifications between varieties. To date, several CO modifier loci driving allelic differences in CO formation have been identified by genetic mapping of quantitative trait loci in natural recombinant populations. Human Enhancer of Invasion 10 (HEI10) functions in the Class I CO pathway. Natural variations in AtHEI10 explained 23.3% of recombination divergence in Arabidopsis Col × Ler F2 populations, and increased AtHEI10 dosage elevates euchromatic CO (Ziolkowski et al. 2017). A premature stop codon within TBP-ASSOCIATED FACTOR 4b (TAF4b) leads to a genome-wide decrease in CO events in Arabidopsis by mediating meiocyte-specific expression and germline transcription (Lawrence et al. 2019). SUPPRESSOR OF NPR1-1 INDUCIBLE (SNI1) encodes a component of the structural maintenance of chromosome complexes (SMC5/6 complex). Mutations of SNI1 result in reduced CO interference and CO remodeling in Arabidopsis (Zhu et al. 2021). REC8, encoding a cohesin subunit required for meiosis, is a candidate gene underlying the intraspecific divergence of CO within hybrid populations of barley (Hordeumspontaneum and H. vulgare) (Dreissig et al. 2020). In addition, the local recombination rates were both significantly higher and lower in Arabidopsis F1 heterozygotes than in homozygotes (Ziolkowski et al. 2015). The poor associations between CO frequency and polymorphism level suggest that polymorphism does not directly define CO variations between hybrids.
|
|
p
|
Here, we carried out a pairwise comparison of parental epigenetic information and CO landscapes in the progenies of different cucumber varieties, allowing us to estimate the epigenetic contributions of parental inheritance in CO divergence. Nearly one-third of hybrid-specific COs overlapped with distinct genetic and epigenetic features, and the positioning of CO alterations was tightly linked to specific DSB events, heterozygous SVs, and their interactions with epigenetic modifications (Fig. 10). This observation provides a clue about how parental selection drives the formation of hybrid-specific COs at the epigenetic level. Although DNA methylation imprints are faithfully replicated during cell division, mCHH in male meiocytes undergoes reprogramming during the development of the male sexual lineage (Walker et al. 2018). Together with the germline-specific DNA methylation modifications in meiocytes compared to somatic cells, the actual proportions of CO alterations among hybrids caused by DMRs might be overestimated in this study (Fig. 10). The lack of a more refined meiocyte sampling method in cucumber and additional data prevents us from making a solid conclusion.
|
|
p
|
Plant breeders are willing to maintain beneficial gene combinations and break unfavorable linkage drag in crop improvement. To meet these requirements, precise and controllable recombination is needed for next-generation plant breeding. Further research on meiotic recombination will advance our understanding of factors that enhance or inhibit CO formation and facilitate the cooperative utilization of these factors to manipulate recombination at target sites during breeding.
|
|
sec
|
Materials and methods
Plant materials and growth conditions
Cucumber (C. sativus L.) inbred lines Cuc37 (Eurasian type), Cuc80 (Xishuangbanna type), and 9930 (East Asian type) were used for the construction of F2 populations. Cuc80 was chosen as a shared female parent to cross with 9930 and Cuc37, respectively. The obtained F1 plants were self-pollinated to produce F2 offspring. Male buds undergoing meiosis stage (Bai et al. 2004) from Cuc37 and 9930 were harvested for CsRAD51 ChIP-seq, and meiotic stamens were collected for H3K4me3 ChIP-seq, FAIRE-seq, WGBS, and RNA-seq analysis. Seedlings at the 2-leaf stage were planted in soil with standard water and pest control management under greenhouse conditions.
Phylogenetic analysis of recA-like proteins
Sequences of recombinase A (recA)-like proteins representing 12 species were collected from NCBI, TAIR, and UniProtKB/Swiss-Prot protein database. Phylogenetic relationship was constructed by ClustalW using maximum likelihood method with 1,000 bootstraps in MEGA X (Kumar et al. 2018) and visualized by Evolview (http://evolgenius.info/#/). Accession numbers of proteins are listed in Supplemental Table S14.
Plasmid construction and hairy root transformation
The full-length CsRAD51A cDNA were cloned with specific primers. The target sgRNA site against CsRAD51A CDS was designed and inserted into a circular GFP CRISPR/Cas9 vector pKSE402, which was modified from pKSE401 (Xing et al. 2014).
Soaked seeds were germinated on moist filter papers in the dark at 28 °C, and cotyledons from 7-d-old seedlings were cut into 2 pieces. Hairy roots derived from cotyledon explants were infected with Agrobacterium rhizogenes strain ArQual harboring pKSE402-CsRAD51A or empty pKSE402 vector and cocultivated for 2 d as previously described (Pak et al. 2009). Inducted hairy roots with GFP grown on antibiotic MS media were firstly in vivo screened with a fluorescence stereo microscope and transformed to fresh media for inducing root extension. The absence of Cas9 and mutation of targeting sites in genomic DNA were detected with specific PCR primers (Supplemental Table S15).
Antibodies
The anti-CsRAD51A (customized from Shanghai Youke) was raised in rabbits with synthetic peptides corresponding to amino acids 168 to 340 of the CsRAD51A protein sequence. Anti-β-actin (Sigma) and anti-CsRAD51A were used as primary antibodies in western blotting and goat anti-rabbit IgG-HRP (Beijing Bersee) as a secondary antibody. After specificity assay, anti-CsRAD51A and a commercial antibody anti-H3K4me3 (Abcam) were further employed in ChIP-seq assays.
Western blotting
Total proteins were extracted using the buffer containing 50 mM Tris–HCl (pH7.4), 150 mM NaCl, 1 mM EDTA, 0.1% (v/v) Triton X-100, protease inhibitor cocktail (Sigma), and phenylmethylsulfonyl fluoride and then quantified using BCA assay kit (Pierce). The protein extracts were separated by 12% Sodium dodecyl-sulfate polyacrylamide gel electrophoresis gels and then electotransferred to Polyvinylidene Difluorid membranes and incubated with anti-β-actin antibody (dilution 1:3,000) and anti-CsRAD51A (dilution 1:500). After washing, the immunoblotting membranes were analyzed using SuperSignal West Femto Kit (Thermo Fisher).
ChIP-seq and FAIRE-seq
Tissues were fixed by vacuum filtration 1% (w/v) formaldehyde and quenched in 0.25 M glycine. ChIP-seq methodology was adapted from previous report (Zhu et al. 2012). Chromatin–protein complexes were isolated with extraction buffer and sonicated with a Bioruptor (Diagenode) for 11 cycles. About 10 µL of chromatin lysate before IP was reserved as input control, and the remains were incubated overnight with the antibody on a rotator at 4 °C. Then ChIP-enriched and input chromatin was released from complexes with reverse crosslinking and purified with phenol and chloroform.
FAIRE-seq experiment was performed following the previous protocol with some modifications (Simon et al. 2012). After fixation and sonication, the chromatin lysate separated nucleosome-depleted DNA from proteins by phenol–chloroform extraction without specific antibodies. Then the purified FAIRE and input DNA products were eluted into 20 µL 10 mM Tris–HCl (pH 7.4).
DNA concentration and size distribution were assessed using Qubit fluorometer (Invitrogen) and Agilent 2100 Bioanalyzer (Agilent). ChIP and FAIRE libraries were constructed using NEXTFLEX ChIP-Seq Kit (PerkinElmer), sequenced on Illumina HiSeq with PE150 mode, and about 6 Gb output data were generated per library (Supplemental Table S16).
RNA-seq, WGBS, and resequencing
Total RNA was extracted using TRIzol reagent following the manufacturer's protocols (Invitrogen). Genomic DNAs were extracted with an established cetyl trimethylammonium bromide protocol (Murray and Thompson 1980). Nucleic acid products were quantified using 1% agarose gels and Agilent 2100 (Agilent). WGBS libraries were constructed as previously described (Zhong et al. 2013). Libraries for RNA-seq and DNA resequencing were prepared with NEBNext Ultra RNA library prep kit (NEB) and Annoroad Universal DNA Fragmentase kit (Annoroad), respectively. Sequencing was performed on Illumina NovaSeq PE150 platform. Each library for WGBS analysis produced 12 Gb of data (Supplemental Table S17). And output data from RNA-seq and resequencing library were about 5 Gb per sample (Supplemental Tables S18 and S19).
ChIP-seq and FAIRE-seq data alignment and processing
For paired-end ChIP-seq and FAIRE-seq, the removal of adaptor sequences and quality filter of reads with quality <Q30 and shorter than 30 bases was performed with FASTQ v0.20.1 (Chen et al. 2018). The trimmed reads were mapped to the cucumber reference genome ‘9930' v3 (Li et al. 2019) using Bowtie2 v2.3.4.3 (Langmead and Salzberg 2012), with the following parameters: --very-sensitive --no-mixed --no-discordant -k 4.
We employed SAMtools v1.9 (Li et al. 2009) to further filter aligned read pairs that contained more than 6 mismatches, only 1 read in pair, or its alignment score Mapping Quality (MAPQ) <2. For the read pairs aligned to multiple positions, only the alignment with the highest alignment score remained. Alignments with MAPQ ≥23 that aligned to only 1 genomic location remained. Then the remaining alignments were written into BAM files, further transformed to bigWig and bedGraph formats utilizing bamCoverage tool from deepTools v3.1.3 (Ramírez et al. 2014), and normalized to counts per million mapped reads (CPM) in 1-bp and 1-Mb adjacent genomic windows, respectively.
Bisulfite sequencing data alignment and processing
For bisulfite sequencing data, adaptor sequences, low-quality bases (quality <20), and reads shorter than 30 bases were filtered using FASTQ. The filtered reads were mapped to the genome sequence using Bismark v0.20.0 (Krueger and Andrews 2011). The best alignment was retained when read pairs mapped to multiple genomic locations. The proportions of DNA methylation contexts (CG, CHG, and CHH) were calculated using Bismark Methylation extractor. And Bismark2bedGraph module in Bismark and bedGraphToBigWig from USCS toolkit were used to generate bigWig files for checking in IGV browser (Thorvaldsdóttir et al. 2013).
RNA-seq data alignment and processing
The raw RNA-seq data were trimmed by FASTQ, and adaptors and reads with low-quality (quality <20) or lengths shorter than 30 bases were removed in the step. The clean reads were aligned to the genome sequence using HISAT2 v2.1.0 (Kim et al. 2015), with the following settings: --no-unal --no-discordant --no-mixed --rna-strandness RF -k 5. Unique read pairs with high-quality alignment (MAPQ = 60) were retained. For alignments with MAPQ <60, containing more than 6 mismatches or consisting of only 1 read in a pair was discarded. Then transcripts per million (TPM) values were calculated, and sorted bam files were transformed into bigWig formats. The read counts for each gene were obtained by FeatureCounts (v.2.0.1) from BAM files (Liao et al. 2014). DEG analysis was performed using DESeq2 (v.1.30.0 R package) with default parameters (Love et al. 2014). To identify DEGs in our study, we set a threshold for statistical significance, where genes showing a P < 0.05 and |fold change| ≥ 2 were deemed to be differentially expressed.
Genotyping and identification of CO events in F2 populations
Syntenic regions between biparents of 2 populations were described in cucumber pan-genome assay (Li et al. 2022). Sequencing reads were aligned against reference genome using Bowtie2, and SNPs and indels were called with GATK v4.0.12.0 (McKenna et al. 2010). To obtain credible markers, variants on syntenic regions were filtered by bcftools with the following parameters: N_ALT >1, QD <2, FS >60, SOR >3, MQ <40, MQRankSum <−12.5, and ReadPosRankSum <−8.0. Missing genotypes in F2 population were imputed with BEAGLE 4.1 (Browning and Browning 2007). Then, every chromosome was split into bins with 1,000 SNPs to roughly estimate COs, and each maker in adjacent bins with genotype changes was scored to find CO boundaries.
Differential recombination regions and CO hotspot identification
Permutation tests were employed to assess the differences in recombination rates between 2 populations. Considering the continuity of meiotic CO in adjacent windows, each chromosome was considered to be a whole unit. Firstly, the order of 7 chromosomes was rearranged, each chromosome was randomly inverted, and the origin of data from which population was also randomly chosen. Secondly, the differences in CO rates of each 50-kb bin between the 2 populations were calculated. Thirdly, a series of hypothetic differences in all the bins were computed by performing permutation 10,000 times. And the P-value represented the proportion of hypothetic difference, which was larger than the observation.
A unique CO hotspot was defined as the local recombination rate in 1 population that was higher than 5 times of genomic average, and the corresponding recombination in another population was lower than the average. The landscape of CO comparison was drawn with shinyChromosome (Yu et al. 2019).
GO term enrichment analysis
We performed enrichment analysis of genes overlapped within regions of interest with R package “GOstats.” Clustered heatmaps of significantly enriched GO terms (P < 0.05) were constructed with “ComplexHeatmap” package.
ChIP-seq and FAIRE-seq peak calling
CsRAD51A IP and input data sets generated from meiotic and nonmeiotic tissues were supplied to PeakRanger with ranger function (Feng et al. 2011). Then the meiosis-specific DSB peaks were obtained after differential peak comparison between meiotic and control peaks by MAnorm (Tu et al. 2021) with 1.8-fold change criteria. H3K4me3 peaks were analyzed with MACS2 (Zhang et al. 2008), and the overlapping peaks between biological replicates were filtered with BEDTools intersect function (Quinlan and Hall 2010). For FAIRE-seq analysis, peak callings were performed by fseq2 (Zhao and Boyle 2021). Before CsRAD51A ChIP-seq and FAIRE-seq peak calling, bam files of different replicates from the same tissues were merged.
Enrichment annotation and motif analysis
The enrichment significance within regions of interest was analyzed with 10,000 times permutation test by R package “regioneR.” DSB-enriched motifs were generated by HOMER using findMotifGenome.pl script (Heinz et al. 2010) with the P < 0.05 and enrichment >30%. A phylogenetic analysis of motif matrixes was built by R package “universalmotif” based on position weight matrix (PWM).
Genome-wide multiomic landscapes and correlation analysis
The log2([ChIP + 1]/[input + 1]) coverage ratios of ChIP-seq and FAIRE-seq, TPM values of RNA-seq, DNA methylation proportions, and densities of the genes, repeats, and SNPs were calculated in nonoverlapping 500-kb bins and plotted using Circos v0.67 (Krzywinski et al. 2009). The pairwise Spearman correlation coefficients between multiomics were calculated with “rcorr” function of the “Hmisc” package, and heatmaps were plotted using the “corrplot” package in R.
Chromatin status around DSB hotspots and genes
Multiomic signal profiles around DSB hotspots and genes were obtained from bigWig files by computeMatrix “scale-regions” mode in deepTools. The average coverage ratios of epigenetic features from 2-kb upstream to 2-kb downstream were calculated per 20-bp window. Each profile of epigenetic features was grouped into 2 groups by decreasing recombination rate or randomly. In addition, genomic regions with similar characteristics were randomly chosen as the control. Average fitted values of epigenetic features of each group were respectively visualized with 95% confidence intervals using R.
Mega-scale CO frequency and CO suppression prediction
Predictions of CO frequency and CO suppression were performed by random forest regression and classification with 7-fold crossvalidation. The whole genome was divided into nonoverlapping 500-kb bins. The coverage profiles of multiomics were taken as independent variables, recombination rate, and whether CO desert as the dependent variable. Every chromosome was split into 7 equals, a seventh of a chromosome was randomly chosen as testing data, and the remaining as training data. These random forest models were carried out by the R package “randomForest” (mtry = 3, ntree = 1,500). The importance of features was identified by the mean decrease in node impurity (%IncMSE) with the “importance” function. Model performance in regression analysis was assessed according to MSE and nonlinear correlation coefficient of predictions and observations (“nlcor” package). For classification analysis, P-value of the testing set was optimized with its training set, and the predictions were appointed as CO suppression when the probabilities were higher than the P-value. The overall performance of random forest classification was given by mean accuracy of predictions and ROC curve.
Accession numbers
All sequencing data generated in this study were deposited in the NCBI database under BioProject accession numbers PRJNA936557 and PRJNA936649.
|
|
title
|
Materials and methods
|
|
sec
|
Plant materials and growth conditions
Cucumber (C. sativus L.) inbred lines Cuc37 (Eurasian type), Cuc80 (Xishuangbanna type), and 9930 (East Asian type) were used for the construction of F2 populations. Cuc80 was chosen as a shared female parent to cross with 9930 and Cuc37, respectively. The obtained F1 plants were self-pollinated to produce F2 offspring. Male buds undergoing meiosis stage (Bai et al. 2004) from Cuc37 and 9930 were harvested for CsRAD51 ChIP-seq, and meiotic stamens were collected for H3K4me3 ChIP-seq, FAIRE-seq, WGBS, and RNA-seq analysis. Seedlings at the 2-leaf stage were planted in soil with standard water and pest control management under greenhouse conditions.
|
|
title
|
Plant materials and growth conditions
|
|
p
|
Cucumber (C. sativus L.) inbred lines Cuc37 (Eurasian type), Cuc80 (Xishuangbanna type), and 9930 (East Asian type) were used for the construction of F2 populations. Cuc80 was chosen as a shared female parent to cross with 9930 and Cuc37, respectively. The obtained F1 plants were self-pollinated to produce F2 offspring. Male buds undergoing meiosis stage (Bai et al. 2004) from Cuc37 and 9930 were harvested for CsRAD51 ChIP-seq, and meiotic stamens were collected for H3K4me3 ChIP-seq, FAIRE-seq, WGBS, and RNA-seq analysis. Seedlings at the 2-leaf stage were planted in soil with standard water and pest control management under greenhouse conditions.
|
|
sec
|
Phylogenetic analysis of recA-like proteins
Sequences of recombinase A (recA)-like proteins representing 12 species were collected from NCBI, TAIR, and UniProtKB/Swiss-Prot protein database. Phylogenetic relationship was constructed by ClustalW using maximum likelihood method with 1,000 bootstraps in MEGA X (Kumar et al. 2018) and visualized by Evolview (http://evolgenius.info/#/). Accession numbers of proteins are listed in Supplemental Table S14.
|
|
title
|
Phylogenetic analysis of recA-like proteins
|
|
p
|
Sequences of recombinase A (recA)-like proteins representing 12 species were collected from NCBI, TAIR, and UniProtKB/Swiss-Prot protein database. Phylogenetic relationship was constructed by ClustalW using maximum likelihood method with 1,000 bootstraps in MEGA X (Kumar et al. 2018) and visualized by Evolview (http://evolgenius.info/#/). Accession numbers of proteins are listed in Supplemental Table S14.
|
|
sec
|
Plasmid construction and hairy root transformation
The full-length CsRAD51A cDNA were cloned with specific primers. The target sgRNA site against CsRAD51A CDS was designed and inserted into a circular GFP CRISPR/Cas9 vector pKSE402, which was modified from pKSE401 (Xing et al. 2014).
Soaked seeds were germinated on moist filter papers in the dark at 28 °C, and cotyledons from 7-d-old seedlings were cut into 2 pieces. Hairy roots derived from cotyledon explants were infected with Agrobacterium rhizogenes strain ArQual harboring pKSE402-CsRAD51A or empty pKSE402 vector and cocultivated for 2 d as previously described (Pak et al. 2009). Inducted hairy roots with GFP grown on antibiotic MS media were firstly in vivo screened with a fluorescence stereo microscope and transformed to fresh media for inducing root extension. The absence of Cas9 and mutation of targeting sites in genomic DNA were detected with specific PCR primers (Supplemental Table S15).
|
|
title
|
Plasmid construction and hairy root transformation
|
|
p
|
The full-length CsRAD51A cDNA were cloned with specific primers. The target sgRNA site against CsRAD51A CDS was designed and inserted into a circular GFP CRISPR/Cas9 vector pKSE402, which was modified from pKSE401 (Xing et al. 2014).
|
|
p
|
Soaked seeds were germinated on moist filter papers in the dark at 28 °C, and cotyledons from 7-d-old seedlings were cut into 2 pieces. Hairy roots derived from cotyledon explants were infected with Agrobacterium rhizogenes strain ArQual harboring pKSE402-CsRAD51A or empty pKSE402 vector and cocultivated for 2 d as previously described (Pak et al. 2009). Inducted hairy roots with GFP grown on antibiotic MS media were firstly in vivo screened with a fluorescence stereo microscope and transformed to fresh media for inducing root extension. The absence of Cas9 and mutation of targeting sites in genomic DNA were detected with specific PCR primers (Supplemental Table S15).
|
|
sec
|
Antibodies
The anti-CsRAD51A (customized from Shanghai Youke) was raised in rabbits with synthetic peptides corresponding to amino acids 168 to 340 of the CsRAD51A protein sequence. Anti-β-actin (Sigma) and anti-CsRAD51A were used as primary antibodies in western blotting and goat anti-rabbit IgG-HRP (Beijing Bersee) as a secondary antibody. After specificity assay, anti-CsRAD51A and a commercial antibody anti-H3K4me3 (Abcam) were further employed in ChIP-seq assays.
|
|
title
|
Antibodies
|
|
p
|
The anti-CsRAD51A (customized from Shanghai Youke) was raised in rabbits with synthetic peptides corresponding to amino acids 168 to 340 of the CsRAD51A protein sequence. Anti-β-actin (Sigma) and anti-CsRAD51A were used as primary antibodies in western blotting and goat anti-rabbit IgG-HRP (Beijing Bersee) as a secondary antibody. After specificity assay, anti-CsRAD51A and a commercial antibody anti-H3K4me3 (Abcam) were further employed in ChIP-seq assays.
|
|
sec
|
Western blotting
Total proteins were extracted using the buffer containing 50 mM Tris–HCl (pH7.4), 150 mM NaCl, 1 mM EDTA, 0.1% (v/v) Triton X-100, protease inhibitor cocktail (Sigma), and phenylmethylsulfonyl fluoride and then quantified using BCA assay kit (Pierce). The protein extracts were separated by 12% Sodium dodecyl-sulfate polyacrylamide gel electrophoresis gels and then electotransferred to Polyvinylidene Difluorid membranes and incubated with anti-β-actin antibody (dilution 1:3,000) and anti-CsRAD51A (dilution 1:500). After washing, the immunoblotting membranes were analyzed using SuperSignal West Femto Kit (Thermo Fisher).
|
|
title
|
Western blotting
|
|
p
|
Total proteins were extracted using the buffer containing 50 mM Tris–HCl (pH7.4), 150 mM NaCl, 1 mM EDTA, 0.1% (v/v) Triton X-100, protease inhibitor cocktail (Sigma), and phenylmethylsulfonyl fluoride and then quantified using BCA assay kit (Pierce). The protein extracts were separated by 12% Sodium dodecyl-sulfate polyacrylamide gel electrophoresis gels and then electotransferred to Polyvinylidene Difluorid membranes and incubated with anti-β-actin antibody (dilution 1:3,000) and anti-CsRAD51A (dilution 1:500). After washing, the immunoblotting membranes were analyzed using SuperSignal West Femto Kit (Thermo Fisher).
|
|
sec
|
ChIP-seq and FAIRE-seq
Tissues were fixed by vacuum filtration 1% (w/v) formaldehyde and quenched in 0.25 M glycine. ChIP-seq methodology was adapted from previous report (Zhu et al. 2012). Chromatin–protein complexes were isolated with extraction buffer and sonicated with a Bioruptor (Diagenode) for 11 cycles. About 10 µL of chromatin lysate before IP was reserved as input control, and the remains were incubated overnight with the antibody on a rotator at 4 °C. Then ChIP-enriched and input chromatin was released from complexes with reverse crosslinking and purified with phenol and chloroform.
FAIRE-seq experiment was performed following the previous protocol with some modifications (Simon et al. 2012). After fixation and sonication, the chromatin lysate separated nucleosome-depleted DNA from proteins by phenol–chloroform extraction without specific antibodies. Then the purified FAIRE and input DNA products were eluted into 20 µL 10 mM Tris–HCl (pH 7.4).
DNA concentration and size distribution were assessed using Qubit fluorometer (Invitrogen) and Agilent 2100 Bioanalyzer (Agilent). ChIP and FAIRE libraries were constructed using NEXTFLEX ChIP-Seq Kit (PerkinElmer), sequenced on Illumina HiSeq with PE150 mode, and about 6 Gb output data were generated per library (Supplemental Table S16).
|
|
title
|
ChIP-seq and FAIRE-seq
|
|
p
|
Tissues were fixed by vacuum filtration 1% (w/v) formaldehyde and quenched in 0.25 M glycine. ChIP-seq methodology was adapted from previous report (Zhu et al. 2012). Chromatin–protein complexes were isolated with extraction buffer and sonicated with a Bioruptor (Diagenode) for 11 cycles. About 10 µL of chromatin lysate before IP was reserved as input control, and the remains were incubated overnight with the antibody on a rotator at 4 °C. Then ChIP-enriched and input chromatin was released from complexes with reverse crosslinking and purified with phenol and chloroform.
|
|
p
|
FAIRE-seq experiment was performed following the previous protocol with some modifications (Simon et al. 2012). After fixation and sonication, the chromatin lysate separated nucleosome-depleted DNA from proteins by phenol–chloroform extraction without specific antibodies. Then the purified FAIRE and input DNA products were eluted into 20 µL 10 mM Tris–HCl (pH 7.4).
|
|
p
|
DNA concentration and size distribution were assessed using Qubit fluorometer (Invitrogen) and Agilent 2100 Bioanalyzer (Agilent). ChIP and FAIRE libraries were constructed using NEXTFLEX ChIP-Seq Kit (PerkinElmer), sequenced on Illumina HiSeq with PE150 mode, and about 6 Gb output data were generated per library (Supplemental Table S16).
|
|
sec
|
RNA-seq, WGBS, and resequencing
Total RNA was extracted using TRIzol reagent following the manufacturer's protocols (Invitrogen). Genomic DNAs were extracted with an established cetyl trimethylammonium bromide protocol (Murray and Thompson 1980). Nucleic acid products were quantified using 1% agarose gels and Agilent 2100 (Agilent). WGBS libraries were constructed as previously described (Zhong et al. 2013). Libraries for RNA-seq and DNA resequencing were prepared with NEBNext Ultra RNA library prep kit (NEB) and Annoroad Universal DNA Fragmentase kit (Annoroad), respectively. Sequencing was performed on Illumina NovaSeq PE150 platform. Each library for WGBS analysis produced 12 Gb of data (Supplemental Table S17). And output data from RNA-seq and resequencing library were about 5 Gb per sample (Supplemental Tables S18 and S19).
|
|
title
|
RNA-seq, WGBS, and resequencing
|
|
p
|
Total RNA was extracted using TRIzol reagent following the manufacturer's protocols (Invitrogen). Genomic DNAs were extracted with an established cetyl trimethylammonium bromide protocol (Murray and Thompson 1980). Nucleic acid products were quantified using 1% agarose gels and Agilent 2100 (Agilent). WGBS libraries were constructed as previously described (Zhong et al. 2013). Libraries for RNA-seq and DNA resequencing were prepared with NEBNext Ultra RNA library prep kit (NEB) and Annoroad Universal DNA Fragmentase kit (Annoroad), respectively. Sequencing was performed on Illumina NovaSeq PE150 platform. Each library for WGBS analysis produced 12 Gb of data (Supplemental Table S17). And output data from RNA-seq and resequencing library were about 5 Gb per sample (Supplemental Tables S18 and S19).
|
|
sec
|
ChIP-seq and FAIRE-seq data alignment and processing
For paired-end ChIP-seq and FAIRE-seq, the removal of adaptor sequences and quality filter of reads with quality <Q30 and shorter than 30 bases was performed with FASTQ v0.20.1 (Chen et al. 2018). The trimmed reads were mapped to the cucumber reference genome ‘9930' v3 (Li et al. 2019) using Bowtie2 v2.3.4.3 (Langmead and Salzberg 2012), with the following parameters: --very-sensitive --no-mixed --no-discordant -k 4.
We employed SAMtools v1.9 (Li et al. 2009) to further filter aligned read pairs that contained more than 6 mismatches, only 1 read in pair, or its alignment score Mapping Quality (MAPQ) <2. For the read pairs aligned to multiple positions, only the alignment with the highest alignment score remained. Alignments with MAPQ ≥23 that aligned to only 1 genomic location remained. Then the remaining alignments were written into BAM files, further transformed to bigWig and bedGraph formats utilizing bamCoverage tool from deepTools v3.1.3 (Ramírez et al. 2014), and normalized to counts per million mapped reads (CPM) in 1-bp and 1-Mb adjacent genomic windows, respectively.
|
|
title
|
ChIP-seq and FAIRE-seq data alignment and processing
|
|
p
|
For paired-end ChIP-seq and FAIRE-seq, the removal of adaptor sequences and quality filter of reads with quality <Q30 and shorter than 30 bases was performed with FASTQ v0.20.1 (Chen et al. 2018). The trimmed reads were mapped to the cucumber reference genome ‘9930' v3 (Li et al. 2019) using Bowtie2 v2.3.4.3 (Langmead and Salzberg 2012), with the following parameters: --very-sensitive --no-mixed --no-discordant -k 4.
|
|
p
|
We employed SAMtools v1.9 (Li et al. 2009) to further filter aligned read pairs that contained more than 6 mismatches, only 1 read in pair, or its alignment score Mapping Quality (MAPQ) <2. For the read pairs aligned to multiple positions, only the alignment with the highest alignment score remained. Alignments with MAPQ ≥23 that aligned to only 1 genomic location remained. Then the remaining alignments were written into BAM files, further transformed to bigWig and bedGraph formats utilizing bamCoverage tool from deepTools v3.1.3 (Ramírez et al. 2014), and normalized to counts per million mapped reads (CPM) in 1-bp and 1-Mb adjacent genomic windows, respectively.
|
|
sec
|
Bisulfite sequencing data alignment and processing
For bisulfite sequencing data, adaptor sequences, low-quality bases (quality <20), and reads shorter than 30 bases were filtered using FASTQ. The filtered reads were mapped to the genome sequence using Bismark v0.20.0 (Krueger and Andrews 2011). The best alignment was retained when read pairs mapped to multiple genomic locations. The proportions of DNA methylation contexts (CG, CHG, and CHH) were calculated using Bismark Methylation extractor. And Bismark2bedGraph module in Bismark and bedGraphToBigWig from USCS toolkit were used to generate bigWig files for checking in IGV browser (Thorvaldsdóttir et al. 2013).
|
|
title
|
Bisulfite sequencing data alignment and processing
|
|
p
|
For bisulfite sequencing data, adaptor sequences, low-quality bases (quality <20), and reads shorter than 30 bases were filtered using FASTQ. The filtered reads were mapped to the genome sequence using Bismark v0.20.0 (Krueger and Andrews 2011). The best alignment was retained when read pairs mapped to multiple genomic locations. The proportions of DNA methylation contexts (CG, CHG, and CHH) were calculated using Bismark Methylation extractor. And Bismark2bedGraph module in Bismark and bedGraphToBigWig from USCS toolkit were used to generate bigWig files for checking in IGV browser (Thorvaldsdóttir et al. 2013).
|
|
sec
|
RNA-seq data alignment and processing
The raw RNA-seq data were trimmed by FASTQ, and adaptors and reads with low-quality (quality <20) or lengths shorter than 30 bases were removed in the step. The clean reads were aligned to the genome sequence using HISAT2 v2.1.0 (Kim et al. 2015), with the following settings: --no-unal --no-discordant --no-mixed --rna-strandness RF -k 5. Unique read pairs with high-quality alignment (MAPQ = 60) were retained. For alignments with MAPQ <60, containing more than 6 mismatches or consisting of only 1 read in a pair was discarded. Then transcripts per million (TPM) values were calculated, and sorted bam files were transformed into bigWig formats. The read counts for each gene were obtained by FeatureCounts (v.2.0.1) from BAM files (Liao et al. 2014). DEG analysis was performed using DESeq2 (v.1.30.0 R package) with default parameters (Love et al. 2014). To identify DEGs in our study, we set a threshold for statistical significance, where genes showing a P < 0.05 and |fold change| ≥ 2 were deemed to be differentially expressed.
|
|
title
|
RNA-seq data alignment and processing
|
|
p
|
The raw RNA-seq data were trimmed by FASTQ, and adaptors and reads with low-quality (quality <20) or lengths shorter than 30 bases were removed in the step. The clean reads were aligned to the genome sequence using HISAT2 v2.1.0 (Kim et al. 2015), with the following settings: --no-unal --no-discordant --no-mixed --rna-strandness RF -k 5. Unique read pairs with high-quality alignment (MAPQ = 60) were retained. For alignments with MAPQ <60, containing more than 6 mismatches or consisting of only 1 read in a pair was discarded. Then transcripts per million (TPM) values were calculated, and sorted bam files were transformed into bigWig formats. The read counts for each gene were obtained by FeatureCounts (v.2.0.1) from BAM files (Liao et al. 2014). DEG analysis was performed using DESeq2 (v.1.30.0 R package) with default parameters (Love et al. 2014). To identify DEGs in our study, we set a threshold for statistical significance, where genes showing a P < 0.05 and |fold change| ≥ 2 were deemed to be differentially expressed.
|
|
sec
|
Genotyping and identification of CO events in F2 populations
Syntenic regions between biparents of 2 populations were described in cucumber pan-genome assay (Li et al. 2022). Sequencing reads were aligned against reference genome using Bowtie2, and SNPs and indels were called with GATK v4.0.12.0 (McKenna et al. 2010). To obtain credible markers, variants on syntenic regions were filtered by bcftools with the following parameters: N_ALT >1, QD <2, FS >60, SOR >3, MQ <40, MQRankSum <−12.5, and ReadPosRankSum <−8.0. Missing genotypes in F2 population were imputed with BEAGLE 4.1 (Browning and Browning 2007). Then, every chromosome was split into bins with 1,000 SNPs to roughly estimate COs, and each maker in adjacent bins with genotype changes was scored to find CO boundaries.
|
|
title
|
Genotyping and identification of CO events in F2 populations
|
|
p
|
Syntenic regions between biparents of 2 populations were described in cucumber pan-genome assay (Li et al. 2022). Sequencing reads were aligned against reference genome using Bowtie2, and SNPs and indels were called with GATK v4.0.12.0 (McKenna et al. 2010). To obtain credible markers, variants on syntenic regions were filtered by bcftools with the following parameters: N_ALT >1, QD <2, FS >60, SOR >3, MQ <40, MQRankSum <−12.5, and ReadPosRankSum <−8.0. Missing genotypes in F2 population were imputed with BEAGLE 4.1 (Browning and Browning 2007). Then, every chromosome was split into bins with 1,000 SNPs to roughly estimate COs, and each maker in adjacent bins with genotype changes was scored to find CO boundaries.
|
|
sec
|
Differential recombination regions and CO hotspot identification
Permutation tests were employed to assess the differences in recombination rates between 2 populations. Considering the continuity of meiotic CO in adjacent windows, each chromosome was considered to be a whole unit. Firstly, the order of 7 chromosomes was rearranged, each chromosome was randomly inverted, and the origin of data from which population was also randomly chosen. Secondly, the differences in CO rates of each 50-kb bin between the 2 populations were calculated. Thirdly, a series of hypothetic differences in all the bins were computed by performing permutation 10,000 times. And the P-value represented the proportion of hypothetic difference, which was larger than the observation.
A unique CO hotspot was defined as the local recombination rate in 1 population that was higher than 5 times of genomic average, and the corresponding recombination in another population was lower than the average. The landscape of CO comparison was drawn with shinyChromosome (Yu et al. 2019).
|
|
title
|
Differential recombination regions and CO hotspot identification
|
|
p
|
Permutation tests were employed to assess the differences in recombination rates between 2 populations. Considering the continuity of meiotic CO in adjacent windows, each chromosome was considered to be a whole unit. Firstly, the order of 7 chromosomes was rearranged, each chromosome was randomly inverted, and the origin of data from which population was also randomly chosen. Secondly, the differences in CO rates of each 50-kb bin between the 2 populations were calculated. Thirdly, a series of hypothetic differences in all the bins were computed by performing permutation 10,000 times. And the P-value represented the proportion of hypothetic difference, which was larger than the observation.
|
|
p
|
A unique CO hotspot was defined as the local recombination rate in 1 population that was higher than 5 times of genomic average, and the corresponding recombination in another population was lower than the average. The landscape of CO comparison was drawn with shinyChromosome (Yu et al. 2019).
|
|
sec
|
GO term enrichment analysis
We performed enrichment analysis of genes overlapped within regions of interest with R package “GOstats.” Clustered heatmaps of significantly enriched GO terms (P < 0.05) were constructed with “ComplexHeatmap” package.
|
|
title
|
GO term enrichment analysis
|
|
p
|
We performed enrichment analysis of genes overlapped within regions of interest with R package “GOstats.” Clustered heatmaps of significantly enriched GO terms (P < 0.05) were constructed with “ComplexHeatmap” package.
|
|
sec
|
ChIP-seq and FAIRE-seq peak calling
CsRAD51A IP and input data sets generated from meiotic and nonmeiotic tissues were supplied to PeakRanger with ranger function (Feng et al. 2011). Then the meiosis-specific DSB peaks were obtained after differential peak comparison between meiotic and control peaks by MAnorm (Tu et al. 2021) with 1.8-fold change criteria. H3K4me3 peaks were analyzed with MACS2 (Zhang et al. 2008), and the overlapping peaks between biological replicates were filtered with BEDTools intersect function (Quinlan and Hall 2010). For FAIRE-seq analysis, peak callings were performed by fseq2 (Zhao and Boyle 2021). Before CsRAD51A ChIP-seq and FAIRE-seq peak calling, bam files of different replicates from the same tissues were merged.
|
|
title
|
ChIP-seq and FAIRE-seq peak calling
|
|
p
|
CsRAD51A IP and input data sets generated from meiotic and nonmeiotic tissues were supplied to PeakRanger with ranger function (Feng et al. 2011). Then the meiosis-specific DSB peaks were obtained after differential peak comparison between meiotic and control peaks by MAnorm (Tu et al. 2021) with 1.8-fold change criteria. H3K4me3 peaks were analyzed with MACS2 (Zhang et al. 2008), and the overlapping peaks between biological replicates were filtered with BEDTools intersect function (Quinlan and Hall 2010). For FAIRE-seq analysis, peak callings were performed by fseq2 (Zhao and Boyle 2021). Before CsRAD51A ChIP-seq and FAIRE-seq peak calling, bam files of different replicates from the same tissues were merged.
|
|
sec
|
Enrichment annotation and motif analysis
The enrichment significance within regions of interest was analyzed with 10,000 times permutation test by R package “regioneR.” DSB-enriched motifs were generated by HOMER using findMotifGenome.pl script (Heinz et al. 2010) with the P < 0.05 and enrichment >30%. A phylogenetic analysis of motif matrixes was built by R package “universalmotif” based on position weight matrix (PWM).
|
|
title
|
Enrichment annotation and motif analysis
|
|
p
|
The enrichment significance within regions of interest was analyzed with 10,000 times permutation test by R package “regioneR.” DSB-enriched motifs were generated by HOMER using findMotifGenome.pl script (Heinz et al. 2010) with the P < 0.05 and enrichment >30%. A phylogenetic analysis of motif matrixes was built by R package “universalmotif” based on position weight matrix (PWM).
|
|
sec
|
Genome-wide multiomic landscapes and correlation analysis
The log2([ChIP + 1]/[input + 1]) coverage ratios of ChIP-seq and FAIRE-seq, TPM values of RNA-seq, DNA methylation proportions, and densities of the genes, repeats, and SNPs were calculated in nonoverlapping 500-kb bins and plotted using Circos v0.67 (Krzywinski et al. 2009). The pairwise Spearman correlation coefficients between multiomics were calculated with “rcorr” function of the “Hmisc” package, and heatmaps were plotted using the “corrplot” package in R.
|
|
title
|
Genome-wide multiomic landscapes and correlation analysis
|
|
p
|
The log2([ChIP + 1]/[input + 1]) coverage ratios of ChIP-seq and FAIRE-seq, TPM values of RNA-seq, DNA methylation proportions, and densities of the genes, repeats, and SNPs were calculated in nonoverlapping 500-kb bins and plotted using Circos v0.67 (Krzywinski et al. 2009). The pairwise Spearman correlation coefficients between multiomics were calculated with “rcorr” function of the “Hmisc” package, and heatmaps were plotted using the “corrplot” package in R.
|
|
sec
|
Chromatin status around DSB hotspots and genes
Multiomic signal profiles around DSB hotspots and genes were obtained from bigWig files by computeMatrix “scale-regions” mode in deepTools. The average coverage ratios of epigenetic features from 2-kb upstream to 2-kb downstream were calculated per 20-bp window. Each profile of epigenetic features was grouped into 2 groups by decreasing recombination rate or randomly. In addition, genomic regions with similar characteristics were randomly chosen as the control. Average fitted values of epigenetic features of each group were respectively visualized with 95% confidence intervals using R.
|
|
title
|
Chromatin status around DSB hotspots and genes
|
|
p
|
Multiomic signal profiles around DSB hotspots and genes were obtained from bigWig files by computeMatrix “scale-regions” mode in deepTools. The average coverage ratios of epigenetic features from 2-kb upstream to 2-kb downstream were calculated per 20-bp window. Each profile of epigenetic features was grouped into 2 groups by decreasing recombination rate or randomly. In addition, genomic regions with similar characteristics were randomly chosen as the control. Average fitted values of epigenetic features of each group were respectively visualized with 95% confidence intervals using R.
|
|
sec
|
Mega-scale CO frequency and CO suppression prediction
Predictions of CO frequency and CO suppression were performed by random forest regression and classification with 7-fold crossvalidation. The whole genome was divided into nonoverlapping 500-kb bins. The coverage profiles of multiomics were taken as independent variables, recombination rate, and whether CO desert as the dependent variable. Every chromosome was split into 7 equals, a seventh of a chromosome was randomly chosen as testing data, and the remaining as training data. These random forest models were carried out by the R package “randomForest” (mtry = 3, ntree = 1,500). The importance of features was identified by the mean decrease in node impurity (%IncMSE) with the “importance” function. Model performance in regression analysis was assessed according to MSE and nonlinear correlation coefficient of predictions and observations (“nlcor” package). For classification analysis, P-value of the testing set was optimized with its training set, and the predictions were appointed as CO suppression when the probabilities were higher than the P-value. The overall performance of random forest classification was given by mean accuracy of predictions and ROC curve.
|
|
title
|
Mega-scale CO frequency and CO suppression prediction
|
|
p
|
Predictions of CO frequency and CO suppression were performed by random forest regression and classification with 7-fold crossvalidation. The whole genome was divided into nonoverlapping 500-kb bins. The coverage profiles of multiomics were taken as independent variables, recombination rate, and whether CO desert as the dependent variable. Every chromosome was split into 7 equals, a seventh of a chromosome was randomly chosen as testing data, and the remaining as training data. These random forest models were carried out by the R package “randomForest” (mtry = 3, ntree = 1,500). The importance of features was identified by the mean decrease in node impurity (%IncMSE) with the “importance” function. Model performance in regression analysis was assessed according to MSE and nonlinear correlation coefficient of predictions and observations (“nlcor” package). For classification analysis, P-value of the testing set was optimized with its training set, and the predictions were appointed as CO suppression when the probabilities were higher than the P-value. The overall performance of random forest classification was given by mean accuracy of predictions and ROC curve.
|
|
sec
|
Accession numbers
All sequencing data generated in this study were deposited in the NCBI database under BioProject accession numbers PRJNA936557 and PRJNA936649.
|
|
title
|
Accession numbers
|
|
p
|
All sequencing data generated in this study were deposited in the NCBI database under BioProject accession numbers PRJNA936557 and PRJNA936649.
|
|
sec
|
Supplementary Material
kiad432_Supplementary_Data Click here for additional data file.
|
|
title
|
Supplementary Material
|
|
label
|
kiad432_Supplementary_Data
|
|
caption
|
Click here for additional data file.
|
|
p
|
Click here for additional data file.
|
|
back
|
Acknowledgments
We thank Yaoyao Wu (AGIS, CAAS), Qian Zhou (Peng Cheng Laboratory), Xiang Li (Institute of Genetics and Developmental Biology, Chinese Academy of Sciences), Shenhao Wang (IVF, CAAS), Tao Lin (College of Horticulture, China Agricultural University), Yuanchao Xu (AGIS, CAAS), and Junbo Gou (Hubei University of Chinese Medicine) for their valuable advice and suggestions. We also thank Shenhao Wang and Tongxu Xin (IVF, CAAS) for providing materials, Xiaoyong Cai (AGIS, CAAS) for helping in F2 genotyping, Hongbo Li (AGIS, CAAS) for sharing cucumber reference genome and pan-genome data, Yang Zhong (AGIS, CAAS) and Bowen Wang (IVF, CAAS) for assistance with cucumber transformation and ChIP-seq experiments, and Wojciech P. Pawlowski (Cornell University) and Minghui Wang (Cornell University) for providing maize DSB motif matrix file and codes.
Author contributions
X.Y., S.H., and Y.W. conceived and designed the research. X.Y. and S.H. coordinated the project. Y.W. performed most of the experiments. Y.M. assisted in sample preparation and biochemistry work. Y.W. and Z.D. carried out bioinformatics analysis. Y.Z. contributed to F2 genotype analysis. Y.W. analyzed the data and drafted the manuscript with the help of Z.D. And X.Y. and S.H. edited the paper. All the authors read and approved the final manuscript.
Supplemental data
The following materials are available in the online version of this article.
Supplemental Figure S1. Distribution of CO frequency in cucumber F2 populations.
Supplemental Figure S2. GO terms of CO-overlapping genes in cucumber.
Supplemental Figure S3. Comparison of CO landscapes between cucumber populations in low resolution.
Supplemental Figure S4. Distribution of distances from COs to the cucurbitadienol synthase cluster in cucumber.
Supplemental Figure S5. Associations between SNP density and CO rate in cucumber.
Supplemental Figure S6. Phylogenetic tree of recA/RAD51 superfamily proteins.
Supplemental Figure S7. Identification of antibody against CsRAD51A in cucumber.
Supplemental Figure S8. Association between meiotic DSB hotspot number per chromosome and chromosome length in cucumber.
Supplemental Figure S9. Enriched de novo motifs in meiotic DSB hotspots of cucumber.
Supplemental Figure S10. Phylogenetic tree of motifs enriched in meiotic DSB hotspots of cucumber.
Supplemental Figure S11. GO terms of meiotic DSB-overlapping genes in cucumber.
Supplemental Figure S12. Performance of predictions of CO frequency landscapes in cucumber.
Supplemental Figure S13. Comparisons of the ROC curves and area under ROC (AUC) values for CO suppression predictions in cucumber.
Supplemental Figure S14. Examples of epigenomic and SV profiling within unique CO between cucumber hybrids.
Supplemental Table S1. Distribution of SNP used for CO genotyping.
Supplemental Table S2. CO genotypes of cucumber F2 populations.
Supplemental Table S3. Overlap of COs with different genomic components.
Supplemental Table S4. GO terms enriched within cucumber COs and CO cold regions.
Supplemental Table S5. Overlaps of cucumber COs with different types of SVs.
Supplemental Table S6. Difference significance of intervals of total COs and SV-overlapping COs in cucumber.
Supplemental Table S7. Information of meiotic DSB hotspots in cucumber.
Supplemental Table S8. Overlap of meiotic DSB with different genomic components in cucumber.
Supplemental Table S9. Significantly enriched motifs in cucumber meiotic DSB hotspots.
Supplemental Table S10. GO terms enriched within meiotic DSB-overlapping genes in cucumber.
Supplemental Table S11. GO terms enriched within meitotic DSB-overlapping down-expressed genes in cucumber stamen.
Supplemental Table S12. Overlaps of meiotic DSB hotspots and CO intervals with epigenetic modifications in cucumber.
Supplemental Table S13. Significantly enriched distinct features within unique COs between 2 populations in cucumber.
Supplemental Table S14. Accessions of recA-like proteins.
Supplemental Table S15. Primers used in this research.
Supplemental Table S16. Information of ChIP-seq and FAIRE-seq libraries.
Supplemental Table S17. Information of WGBS libraries.
Supplemental Table S18. Information of transcriptome libraries.
Supplemental Table S19. Information of cucumber F2 resequencing libraries.
Funding
This work was supported by the National Natural Science Foundation of China (31991181 and 31801859), National Key Research and Development Program of China (2021YFF1000100), and the Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2023-IVF).
Dive Curated Terms
The following phenotypic, genotypic, and functional terms are of significance to the work described in this paper:
MSH2 Gramene: AT3G18524
MSH2 Araport: AT3G18524
TAF4b Gramene: AT1G27720
TAF4b Araport: AT1G27720
h3k4me3 CHEBI: CHEBI:85043
DMC1 Gramene: AT3G22880
DMC1 Araport: AT3G22880
5-methylcytosine CHEBI: CHEBI:27551
SPO11-1 Gramene: AT3G13170
SPO11-1 Araport: AT3G13170
|
|
ack
|
Acknowledgments
We thank Yaoyao Wu (AGIS, CAAS), Qian Zhou (Peng Cheng Laboratory), Xiang Li (Institute of Genetics and Developmental Biology, Chinese Academy of Sciences), Shenhao Wang (IVF, CAAS), Tao Lin (College of Horticulture, China Agricultural University), Yuanchao Xu (AGIS, CAAS), and Junbo Gou (Hubei University of Chinese Medicine) for their valuable advice and suggestions. We also thank Shenhao Wang and Tongxu Xin (IVF, CAAS) for providing materials, Xiaoyong Cai (AGIS, CAAS) for helping in F2 genotyping, Hongbo Li (AGIS, CAAS) for sharing cucumber reference genome and pan-genome data, Yang Zhong (AGIS, CAAS) and Bowen Wang (IVF, CAAS) for assistance with cucumber transformation and ChIP-seq experiments, and Wojciech P. Pawlowski (Cornell University) and Minghui Wang (Cornell University) for providing maize DSB motif matrix file and codes.
|
|
title
|
Acknowledgments
|
|
p
|
We thank Yaoyao Wu (AGIS, CAAS), Qian Zhou (Peng Cheng Laboratory), Xiang Li (Institute of Genetics and Developmental Biology, Chinese Academy of Sciences), Shenhao Wang (IVF, CAAS), Tao Lin (College of Horticulture, China Agricultural University), Yuanchao Xu (AGIS, CAAS), and Junbo Gou (Hubei University of Chinese Medicine) for their valuable advice and suggestions. We also thank Shenhao Wang and Tongxu Xin (IVF, CAAS) for providing materials, Xiaoyong Cai (AGIS, CAAS) for helping in F2 genotyping, Hongbo Li (AGIS, CAAS) for sharing cucumber reference genome and pan-genome data, Yang Zhong (AGIS, CAAS) and Bowen Wang (IVF, CAAS) for assistance with cucumber transformation and ChIP-seq experiments, and Wojciech P. Pawlowski (Cornell University) and Minghui Wang (Cornell University) for providing maize DSB motif matrix file and codes.
|
|
sec
|
Author contributions
X.Y., S.H., and Y.W. conceived and designed the research. X.Y. and S.H. coordinated the project. Y.W. performed most of the experiments. Y.M. assisted in sample preparation and biochemistry work. Y.W. and Z.D. carried out bioinformatics analysis. Y.Z. contributed to F2 genotype analysis. Y.W. analyzed the data and drafted the manuscript with the help of Z.D. And X.Y. and S.H. edited the paper. All the authors read and approved the final manuscript.
|
|
title
|
Author contributions
|
|
p
|
X.Y., S.H., and Y.W. conceived and designed the research. X.Y. and S.H. coordinated the project. Y.W. performed most of the experiments. Y.M. assisted in sample preparation and biochemistry work. Y.W. and Z.D. carried out bioinformatics analysis. Y.Z. contributed to F2 genotype analysis. Y.W. analyzed the data and drafted the manuscript with the help of Z.D. And X.Y. and S.H. edited the paper. All the authors read and approved the final manuscript.
|
|
sec
|
Supplemental data
The following materials are available in the online version of this article.
Supplemental Figure S1. Distribution of CO frequency in cucumber F2 populations.
Supplemental Figure S2. GO terms of CO-overlapping genes in cucumber.
Supplemental Figure S3. Comparison of CO landscapes between cucumber populations in low resolution.
Supplemental Figure S4. Distribution of distances from COs to the cucurbitadienol synthase cluster in cucumber.
Supplemental Figure S5. Associations between SNP density and CO rate in cucumber.
Supplemental Figure S6. Phylogenetic tree of recA/RAD51 superfamily proteins.
Supplemental Figure S7. Identification of antibody against CsRAD51A in cucumber.
Supplemental Figure S8. Association between meiotic DSB hotspot number per chromosome and chromosome length in cucumber.
Supplemental Figure S9. Enriched de novo motifs in meiotic DSB hotspots of cucumber.
Supplemental Figure S10. Phylogenetic tree of motifs enriched in meiotic DSB hotspots of cucumber.
Supplemental Figure S11. GO terms of meiotic DSB-overlapping genes in cucumber.
Supplemental Figure S12. Performance of predictions of CO frequency landscapes in cucumber.
Supplemental Figure S13. Comparisons of the ROC curves and area under ROC (AUC) values for CO suppression predictions in cucumber.
Supplemental Figure S14. Examples of epigenomic and SV profiling within unique CO between cucumber hybrids.
Supplemental Table S1. Distribution of SNP used for CO genotyping.
Supplemental Table S2. CO genotypes of cucumber F2 populations.
Supplemental Table S3. Overlap of COs with different genomic components.
Supplemental Table S4. GO terms enriched within cucumber COs and CO cold regions.
Supplemental Table S5. Overlaps of cucumber COs with different types of SVs.
Supplemental Table S6. Difference significance of intervals of total COs and SV-overlapping COs in cucumber.
Supplemental Table S7. Information of meiotic DSB hotspots in cucumber.
Supplemental Table S8. Overlap of meiotic DSB with different genomic components in cucumber.
Supplemental Table S9. Significantly enriched motifs in cucumber meiotic DSB hotspots.
Supplemental Table S10. GO terms enriched within meiotic DSB-overlapping genes in cucumber.
Supplemental Table S11. GO terms enriched within meitotic DSB-overlapping down-expressed genes in cucumber stamen.
Supplemental Table S12. Overlaps of meiotic DSB hotspots and CO intervals with epigenetic modifications in cucumber.
Supplemental Table S13. Significantly enriched distinct features within unique COs between 2 populations in cucumber.
Supplemental Table S14. Accessions of recA-like proteins.
Supplemental Table S15. Primers used in this research.
Supplemental Table S16. Information of ChIP-seq and FAIRE-seq libraries.
Supplemental Table S17. Information of WGBS libraries.
Supplemental Table S18. Information of transcriptome libraries.
Supplemental Table S19. Information of cucumber F2 resequencing libraries.
|
|
title
|
Supplemental data
|
|
p
|
The following materials are available in the online version of this article.
|
|
p
|
Supplemental Figure S1. Distribution of CO frequency in cucumber F2 populations.
|
|
p
|
Supplemental Figure S2. GO terms of CO-overlapping genes in cucumber.
|
|
p
|
Supplemental Figure S3. Comparison of CO landscapes between cucumber populations in low resolution.
|
|
p
|
Supplemental Figure S4. Distribution of distances from COs to the cucurbitadienol synthase cluster in cucumber.
|
|
p
|
Supplemental Figure S5. Associations between SNP density and CO rate in cucumber.
|
|
p
|
Supplemental Figure S6. Phylogenetic tree of recA/RAD51 superfamily proteins.
|
|
p
|
Supplemental Figure S7. Identification of antibody against CsRAD51A in cucumber.
|
|
p
|
Supplemental Figure S8. Association between meiotic DSB hotspot number per chromosome and chromosome length in cucumber.
|
|
p
|
Supplemental Figure S9. Enriched de novo motifs in meiotic DSB hotspots of cucumber.
|
|
p
|
Supplemental Figure S10. Phylogenetic tree of motifs enriched in meiotic DSB hotspots of cucumber.
|
|
p
|
Supplemental Figure S11. GO terms of meiotic DSB-overlapping genes in cucumber.
|
|
p
|
Supplemental Figure S12. Performance of predictions of CO frequency landscapes in cucumber.
|
|
p
|
Supplemental Figure S13. Comparisons of the ROC curves and area under ROC (AUC) values for CO suppression predictions in cucumber.
|
|
p
|
Supplemental Figure S14. Examples of epigenomic and SV profiling within unique CO between cucumber hybrids.
|
|
p
|
Supplemental Table S1. Distribution of SNP used for CO genotyping.
|
|
p
|
Supplemental Table S2. CO genotypes of cucumber F2 populations.
|
|
p
|
Supplemental Table S3. Overlap of COs with different genomic components.
|
|
p
|
Supplemental Table S4. GO terms enriched within cucumber COs and CO cold regions.
|
|
p
|
Supplemental Table S5. Overlaps of cucumber COs with different types of SVs.
|
|
p
|
Supplemental Table S6. Difference significance of intervals of total COs and SV-overlapping COs in cucumber.
|
|
p
|
Supplemental Table S7. Information of meiotic DSB hotspots in cucumber.
|
|
p
|
Supplemental Table S8. Overlap of meiotic DSB with different genomic components in cucumber.
|
|
p
|
Supplemental Table S9. Significantly enriched motifs in cucumber meiotic DSB hotspots.
|
|
p
|
Supplemental Table S10. GO terms enriched within meiotic DSB-overlapping genes in cucumber.
|
|
p
|
Supplemental Table S11. GO terms enriched within meitotic DSB-overlapping down-expressed genes in cucumber stamen.
|
|
p
|
Supplemental Table S12. Overlaps of meiotic DSB hotspots and CO intervals with epigenetic modifications in cucumber.
|
|
p
|
Supplemental Table S13. Significantly enriched distinct features within unique COs between 2 populations in cucumber.
|
|
p
|
Supplemental Table S14. Accessions of recA-like proteins.
|
|
p
|
Supplemental Table S15. Primers used in this research.
|
|
p
|
Supplemental Table S16. Information of ChIP-seq and FAIRE-seq libraries.
|
|
p
|
Supplemental Table S17. Information of WGBS libraries.
|
|
p
|
Supplemental Table S18. Information of transcriptome libraries.
|
|
p
|
Supplemental Table S19. Information of cucumber F2 resequencing libraries.
|
|
sec
|
Funding
This work was supported by the National Natural Science Foundation of China (31991181 and 31801859), National Key Research and Development Program of China (2021YFF1000100), and the Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2023-IVF).
|
|
title
|
Funding
|
|
p
|
This work was supported by the National Natural Science Foundation of China (31991181 and 31801859), National Key Research and Development Program of China (2021YFF1000100), and the Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2023-IVF).
|
|
sec
|
Dive Curated Terms
The following phenotypic, genotypic, and functional terms are of significance to the work described in this paper:
MSH2 Gramene: AT3G18524
MSH2 Araport: AT3G18524
TAF4b Gramene: AT1G27720
TAF4b Araport: AT1G27720
h3k4me3 CHEBI: CHEBI:85043
DMC1 Gramene: AT3G22880
DMC1 Araport: AT3G22880
5-methylcytosine CHEBI: CHEBI:27551
SPO11-1 Gramene: AT3G13170
SPO11-1 Araport: AT3G13170
|
|
title
|
Dive Curated Terms
|
|
p
|
The following phenotypic, genotypic, and functional terms are of significance to the work described in this paper:
|
|
p
|
MSH2 Gramene: AT3G18524
|
|
p
|
MSH2 Araport: AT3G18524
|
|
p
|
TAF4b Gramene: AT1G27720
|
|
p
|
TAF4b Araport: AT1G27720
|
|
p
|
h3k4me3 CHEBI: CHEBI:85043
|
|
p
|
DMC1 Gramene: AT3G22880
|
|
p
|
DMC1 Araport: AT3G22880
|
|
p
|
5-methylcytosine CHEBI: CHEBI:27551
|
|
p
|
SPO11-1 Gramene: AT3G13170
|
|
p
|
SPO11-1 Araport: AT3G13170
|