PubMed@Masakazu SAGA:29186418
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PubMed@Masakazu%20SAGA/sourceid/29186418","sourcedb":"PubMed@Masakazu SAGA","sourceid":"29186418","text":"Genome sequence of the Japanese oak silk moth, Antheraea yamamai: the first draft genome in the family Saturniidae\r\n\r\nAbstract\r\n\r\nBackground\r\n\r\n Antheraea yamamai, also known as the Japanese oak silk moth, is a wild species of silk moth. Silk produced by A. yamamai, referred to as tensan silk, shows different characteristics such as thickness, compressive elasticity, and chemical resistance compared with common silk produced from the domesticated silkworm, Bombyx mori. Its unique characteristics have led to its use in many research fields including biotechnology and medical science, and the scientific as well as economic importance of the wild silk moth continues to gradually increase. However, no genomic information for the wild silk moth, including A. yamamai, is currently available.\r\n\r\nFindings\r\n\r\nIn order to construct the A. yamamai genome, a total of 147G base pairs using Illumina and Pacbio sequencing platforms were generated, providing 210-fold coverage based on the 700-Mb estimated genome size of A. yamamai. The assembled genome of A. yamamai was 656 Mb (\u003e2 kb) with 3675 scaffolds, and the N50 length of assembly was 739 Kb with a 34.07% GC ratio. Identified repeat elements covered 37.33% of the total genome, and the completeness of the constructed genome assembly was estimated to be 96.7% by Benchmarking Universal Single-Copy Orthologs v2 analysis. A total of 15 481 genes were identified using Evidence Modeler based on the gene prediction results obtained from 3 different methods (ab initio, RNA-seq-based, known-gene-based) and manual curation.\r\n\r\nConclusions\r\n\r\nHere we present the genome sequence of A. yamamai, the first genome sequence of the wild silk moth. These results provide valuable genomic information, which will help enrich our understanding of the molecular mechanisms relating to not only specific phenotypes such as wild silk itself but also the genomic evolution of Saturniidae.\r\n\r\nData Description\r\n\r\n Antheraea yamamai (NCBI Taxonomy ID: 7121), also known as the Japanese oak silk moth, is a wild silk moth species belonging to the Saturniidae family (Fig. 1). Silk moths can be categorized into 2 families—Bombycidae and Saturniidae. Saterniidae has been estimated to contain approximately 1861 species, with 162 genera, and is known as the largest family in the Lepidoptera. Among the many species in family Saturniidae, only a few species, including A. yamamai, can be utilized for silk production. Previous phylogenetic studies have shown that the family Saturniidae shares common ancestors with the family Sphingidae, including the hawk moth (Macroglossum stellatarum), and the Bombycidae family, including the most representative silkworm, Bombyx mori. The estimated divergence time between A. yamamai and B. mori is 84 million years ago (MYA), making it similar to the 88 MYA estimated divergence time between the human and mouse.\r\n\r\nPhotograph of Antheraea Yamamai. From left: larva, cocoon, and adult A. yamamai, respectively. Green color is one of the representative characteristics of tensan silk.\r\n\r\n A. yamamai produces a specific silk called tensan silk, which shows distinctive characteristics compared with common silk from B. mori, including characteristics such as thickness, bulkiness, compressive elasticity, and resistance to dyeing chemicals. These characteristics have received the attention of researchers as a new biomaterial for use in various fields. Additionally, tensan silk also has been studied for its applications to human health. However, despite the potential importance of the wild silk moth in research and economic fields, no whole genomic information is currently available for this or any other species from the family Saturniidae.\r\n\r\nIn this study, we present the annotated genome sequence of A. yamamai, the first published genome in the family Saturniidae, with transcriptome datasets collected from 10 different body organ tissues. These data will be a fundamental resource for future studies and provide more insight into the genome evolution and molecular phylogeny of the family Saturniidae.\r\n\r\nSample preparation and sequencing\r\n\r\nFor whole-genome sequencing, we selected 1 male sample (Ay-7-male1) from a breeding line (Ay-7) of A. yamamai raised at the National Academy of Agricultural Science, Rural Development Administration, Korea. In lepidopterans, males are homogametic (ZZ), and selecting a male sample can reduce the complexity of assembly from excessive repeats on the W chromosome in females. For genomic library construction, we removed the guts of A. yamamai to prevent contamination of genomes from other organisms such as gut microbes and oak, the main food source of A. yamamai. Details of the sample preparation process used in this study are presented in the Supplementary Data. Genomic DNA was extracted using a DNeasy Animal Mini Kit (Qiagen, Hilden, Germany), and the quality of extracted DNA was checked using trenean, picogreen assay, and gel electrophoresis (1% agarose gel/40 ng loading). After quality control processing, we were left with a total of 61.5 ug of A. yamamai DNA for genome sequencing. Using standard Illumina whole-genome shotgun (WGS) sequencing protocol (paired-end and mate-pair), we added 2 long read sequencing platforms, Moleculo (Illumina synthetic long read) and RS II (Pacific Bioscience). Tables 1–3 show a summary of generated data for each library used in this study. RNA-seq libraries were also constructed for 10 different tissues (hemocyte, malpighian tube, midgut, fat body, anterior-middle/silk gland, posterior/silk gland, head, integument, testis, ovary) with 3 biological replicates following standard manufacturer protocol (Illumina, San Diego, CA, USA). For this, more than 100 individual A. yamamai samples in 5 instar stages from the same breeding line were used for tissue anatomy, and 3 samples from each tissue were selected based on the quality of extracted RNA. Details of transcriptome library construction are shown in the Supplementary Data. Information of libraries and generated data is provided in Table 4, and a total of 147 Gb of genomic data and 76 Gb of transcriptomic data was generated for this study.\r\n\r\nSummary statistics of generated whole-genome shotgun sequencing data using Illumina Nextseq 500\r\n\r\nLibrary name\tLibrary type\tInsert size\tPlatform\tRead length\tNo. of reads\tTotal base, bp\tReads retained after trimming\t \t350 bp\tPaired-end\t350 bp\tNextseq500\t151\t293 176 268\t44 269 616 468\t291 070 362\t \t700 bp\tPaired-end\t700 bp\tNextseq500\t151\t246 945 900\t37 288 830 900\t244 698 580\t \t3 Kbp\tMate-pair\t3 Kbp\tNextseq500\t76\t284 204 762\t21 599 561 912\t195 095 164\t \t6 Kbp\tMate-pair\t6 Kbp\tNextseq500\t76\t246 238 370\t18 714 116 120\t152 496 372\t \t9 Kbp\tMate-pair\t9 Kbp\tNextseq500\t76\t239 919 538\t18 233 884 888\t148 612 724\t \tTotal\t\t\t\t\t1 310 484 838\t140 106 010 288\t1 031 973 202\t \t\r\n\r\nSummary statistics of generated Illumina synthetic long read (Moleculo) library\r\n\r\n\t500–1499 bp\t≥1500 bp\t \tNo. of assembled reads\t302 132\t342 738\t \tNo. of bases in assembled read\t268 853 717\t1 205 349 082\t \tN50 length of assembled read\t960\t4031\t \t\r\n\r\nSummary statistics of generated long reads data using Pacbio RS II system\r\n\r\nNo. of reads\t1005,571\t \tTotal bases\t5836 969 225\t \tLength of longest (shortest) read\t50 132 (50)\t \tAverage read length\t5804.63\t \t\r\n\r\nSummary statistics of generated transcriptome data obtained from 6 organ tissues using Illumina platform\r\n\r\nTissue\tSample name\tRead length\tRead count\tTotal base, bp\t \tHemocyte\tHemocyte_1\t76\t20 815 674\t1 581 991 224\t \t\tHemocyte_2\t76\t26 704 666\t2 029 554 616\t \t\tHemocyte_2\t76\t53 068 562\t4 033 210 712\t \tMalpighian tube\tMalpighi_1\t76\t22 635 428\t1 720 292 528\t \t\tMalpighi_2\t76\t24 893 788\t1 891 927 888\t \t\tMalpighi_3\t76\t45 213 164\t3 436 200 464\t \tMidgut\tMidgut_1\t76\t23 350 138\t1 774 610 488\t \t\tMidgut_2\t76\t24 597 972\t1 869 445 872\t \t\tMidgut_3\t76\t50 949 986\t3 872 198 936\t \tHead\tHead_1\t76\t26 526 276\t2 015 996 976\t \t\tHead_2\t76\t26 581 124\t2 020 165 424\t \t\tHead_3\t76\t40 900 456\t3 108 434 656\t \tIntegument\tSkin_1\t76\t24 592 846\t1 869 056 296\t \t\tSkin_2\t76\t42 775 430\t3 250 932 680\t \t\tSkin_3\t76\t35 043 570\t2 663 311 320\t \tFat body\tFat Body_1\t76\t24 637 810\t1 872 473 560\t \t\tFat Body_2\t76\t24 037 494\t1 826 849 544\t \t\tFat Body_3\t76\t40 817 582\t3 102 136 232\t \tAnterior-middle/silk gland\tAM/Silk Gland_1\t76\t21 399 638\t1 626 372 488\t \t\tAM/Silk Gland_2\t76\t24 292 386\t1 846 221 336\t \t\tAM/Silk Gland_3\t76\t37 331 530\t2 837 196 280\t \tPosterior/silk gland\tP/Silk Gland_1\t76\t27 359 580\t2 079 328 080\t \t\tP/Silk Gland_2\t76\t23 300 962\t1 770 873 112\t \t\tP/Silk Gland_3\t76\t39 421 430\t2 996 028 680\t \tTestis\tTestis_1\t76\t40 890 404\t3 107 670 704\t \t\tTestis_2\t76\t45 733 846\t3 475 772 296\t \t\tTestis_3\t76\t44 985 224\t3 418 877 024\t \tOvary\tOvary_1\t76\t40 797 628\t3 100 619 728\t \t\tOvary_2\t76\t40 409 752\t3 071 141 152\t \t\tOvary_3\t76\t42 417 892\t3 223 759 792\t \t\r\n\r\nGenome assembly and evaluation\r\n\r\nBefore conducting genome assembly, we conducted k-mer distribution analysis using a 350-bp paired-end library in order to estimate the size and characteristics of the A. yamamai genome. The quality of our generated raw data was checked using FASTQC (FastQC, RRID:SCR_014583). Sequencing artifacts such as adapter sequences and low-quality bases were removed using Trimmomatic (Trimmomatic, RRID:SCR_011848). Jellyfish was used to count the k-mer frequency for estimation of the genome size of A. yamamai. Figure 2 shows the 19-mer distribution of the A. yamamai genome using a 350-bp paired-end library. In the 19-mer distribution, the second peak, at approximately half the coverage value (x-axis) of the main peak, indicates heterozygosity. Although the inbred line used in this study was the single pair sib-mating maintained for more than 10 generations, high heterozygosity still remains. This phenomenon has been observed in a previous genomic study of the Diamondback moth (Plutella xylostella), and sustained heterozygosity as an important genomic characteristic was hypothesized to be a result of environmental adaption. The underlying mechanism of the sustained heterozygosity is unclear, but associative overdominance can be one of the candidate explanations of this phenomenon. Based on the result of 19-mer distribution analysis, the genome size of A. yamamai was estimated to be 709 Mb. However, this size might be larger than the real genome size of A. yamamai because high heterozygosity could affect the estimation of genome size based on the K-mer distribution. Next, we conducted error correction on Illumina paired-end libraries using the error correction module of Allpaths-LG before the initial contig assembly process (ALLPATHS-LG, RRID:SCR_010742). After error correction, initial contig assembly with 350-bp and 700-bp libraries was conducted using SOAP denovo2 with the parameter option set at K = 19; this approach showed the best assembly statistics compared with other assemblers and parameters (SOAPdenovo2, RRID:SCR_014986). Quality control processing for mate-pair libraries and scaffolding was conducted using Nxtrim and SSPACE (SSPACE, RRID:SCR_011848), respectively. At each scaffolding step, SOAP Gapcloser with -l 155 and -p 31 parameters was repeatedly used to close the gaps within each scaffold. In order to obtain a higher-quality genome assembly of A. yamamai, we employed several long read scaffolding strategies using SSPACE-LongRead. First, we used an Illumina synthetic long read sequencing platform called Moleculo, which has been proven valuable for the study of highly heterozygous genomes in previous studies. After scaffolding was performed using SSPACE-LongRead with Illumina synthetic long read data, the total number of assembled scaffolds was effectively reduced from 398 446 to 24 558. The average scaffold length was also extended from 1.7 Kb to 24.8 Kb. However, there was no impressive improvement in N50 length (approximately 91 Kb to 112 Kb) of assembled scaffolds. Therefore, we employed another type of long read data generated from 10 cells of Pacbio RS II system with P6-C4 chemistry. After final scaffolding processing using Pacbio long reads, the number of scaffolds was reduced to 3675, and N50 length was effectively extended from 112 Kb to 739 Kb. Summary statistics of the assembled A. yamamai genome are provided in Table 5. Final assembly of the A. yamamai genome was 656 Mb (\u003e2 kb) long with 3675 scaffolds, and the N50 length of assembly was 739 Kb with a 34.07% GC ratio. To evaluate the quality of the assembled genome, we conducted Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis using BUSCO v2.0 with insecta_odb9 including 1658 BUSCOs from 42 species (BUSCO, RRID:SCR_015008). From BUSCO analysis, 96.7% of BUSCOs were completely detected in the assembled genome (1576: complete and single-copy; 27: complete and duplicated) among 1658 tested BUSCOs. The numbers of fragmented and missing BUSCOs were 21 and 34, respectively. Based on the result of BUSCO analysis, the genome of A. yamamai presented here was considered properly constructed for downstream analysis.\r\n\r\n19-mer distribution of A. yamamai genome using jellyfish with 350-bp paired-end whole genome sequencing data.\r\n\r\nSummary statistics of the A. yamamai genome (\u003e2 kb)\r\n\r\nAssembled genome\t\t \tSize, 1n\t656 Mb\t \tGC level\t34.07\t \tNo. of scaffolds\t3675\t \tN50 of scaffolds, bp\t739 388\t \tNo. of bases in scaffolds, %\t19 257 439 (2.93)\t \tLongest (shortest) scaffolds, bp\t3 156 949 (2003)\t \tAverage scaffold length, bp\t178 657.53\t \t\r\n\r\nRepeat identification and comparative repeat analysis\r\n\r\nTo identify repeat elements of the A. yamamai genome, a custom repeat library was constructed using RepeatModeler with RECON, RepeatScout, and TRF. The resulting constructed custom repeat library for A. yamamai was further curated using the CENSOR search, and the curated library was employed in RepeatMasker with Repbase. RepeatMasker (RepeatMasker, RRID:SCR_012954) was conducted with RMBlast and the “no_is” option for skipping bacterial insertion element check. Table 6 summarizes the proportion of identified mobile elements in the A. yamamai genome. The most prevalent repeat elements in the A. yamamai genome were LINE elements (101 Mb, 15.31% of total genome), and total repeat elements accounted for 37.33% of the total genome. In order to compare the repeat elements of A. yamamai with those of other genomes, we conducted the same process for 7 public genomes that are close neighbors of A. yamamai—Aedes aegypti, Bombyx mori, Danaus plexippus, Drosophila melanogaster, Heliconius Melpomene, Melitaea cinxia, and Plutella xylostella. Figure 3 displays the amount and proportion of identified repeat elements from the 8 species. Despite the small genome size of B. mori, the total amount of identified SINE elements in the B. mori genome was 5.77 times larger than that of A. yamamai. The top 5 expanded repeat elements in the A. yamamai genome were DNA/RC, LINE/L2, LINE/RTE-BovB, DNA/TcMar-Mariner, and LINE/CR1. Among these, DNA/TcMar-Mariner was the specifically expanded repeat element in A. yamamai among the 8 species. In B. mori, SINE/tRNA-CR1, LINE/Jockey, DNA/RC, LINE/CR1-Zenon, and LINE/RTE-BovB were the top 5 expanded repeat elements. When comparing the repeat elements of A. yamamai with those of B. mori, which are both producers of the same type of silk, repeat elements showed family and species-specific patterns in the 2 silk moth linages. Particularly, we found that the mariner repeat element, which was found specifically expanded in the A. yamamai genome, was also included in the fibroin gene. A previous sequencing study also showed that the mariner repeat element was inserted in the 5’-end of the fibroin gene of A. yamamai. Fibroin is the core component of the silk protein found in the silk moth, and the physical characteristics of silk mainly depend on the types and unique repeat motifs of the fibroin. This gene is known to have hundreds of tandem repeat motifs, and these kinds of tandem repeats can be derived through transposable elements. This indicates that the mariner repeat element, specifically expanded in the A. yamamai genome, may play an important role in the development of the unique silk of A. yamamai, and the lineage-specific repeat elements may be one of the candidate evolution forces related to host-specific phenotype during genome evolution.\r\n\r\nAmount and proportion of identified repeat element from 8 species including A. yamamai. A) Absolute amount of repeat element classified into 8 different categories. B) Proportion of each repeat element in identified total repeat element.\r\n\r\nSummary of identified repeat elements in the A. yamamai genome\r\n\r\nRepeat element\tNo. of elements\tLength, %\t \tSINE\t59 968\t8 615 338 (1.30)\t \tLINE\t426 522\t101 251 176 (15.31)\t \tLTR element\t53 977\t4 552 386 (0.69)\t \tDNA element\t512 760\t69 071 227 (10.44)\t \tSmall RNA\t43 645\t6 691 619 (1.01)\t \tSimple repeat\t135 989\t6 256 839 (0.95)\t \tLow complexity\t19 937\t932 829 (0.14)\t \tUnclassified\t294 190\t54 552 009 (8.25)\t \t\r\n\r\nGene prediction and annotation\r\n\r\nThree different algorithms were used for gene prediction of the A. yamamai genome: ab initio, RNA-seq transcript-based, and protein homology-based approaches. For ab initio gene prediction, Augustus (Augustus: Gene Prediction, RRID:SCR_008417), Geneid, and GeneMarks-ET were employed. Augustus was trained using known genes of A. yamamai in the NCBI database, and mapping information of RNA-seq data obtained from Tophat (TopHat, RRID:SCR_013035) was also utilized for gene prediction. Geneid was used with predefined parameters for Drosophila melanogaster. GeneMarks-ET was employed using junction information of genes from transcriptome data alignment. For RNA-seq transcript-based prediction, generated transcriptome data from 10 organ tissues of A. yamamai were aligned to the assembled genome and gene information was predicted using Cufflinks (Cufflinks, RRID:SCR_014597). The longest CDS sequences were identified from Cufflinks results using Transdecoder. For the homology-based approach, all known genes of the order Lepidoptera in the NCBI database were aligned using PASA. Table 7 shows the gene prediction results from each method. Gene prediction results from different prediction algorithms were combined using Evidence Modeler, and a consensus gene set of the A. yamamai genome was created. Manual curation was performed based on the 5 types of evidence (3 in silico, known protein, and RNA-seq) using IGV and BLASTP (BLASTP, RRID:SCR_001010). Using IGV with each gene evidence and comparing results with known genes via BLASTP, we mainly focused on removing false positively predicted genes that don’t have enough evidence. Merged and spliced gene structured were corrected by comparing the gene structure with known exon structure in the NCBI NR database. In addition, fibroin and sericin genes, which couldn’t be properly predicted because of their high repeat motif, were also manually identified with previously known sequences with RNA-seq data. The final gene set of the A. yamamai genome contains 15 481 genes. Summary statistics for the consensus gene set are provided in Table 8. The average gene length was 11 016.34 bp with a 34.38% GC ratio, and the number of exons per gene was 5.64. In order to identify the function of predicted genes in A yamamai, 3 nonredundant sequence databases (Swiss-Prot, Uniref100, and NCBI NR) as well as the gene information of 2 species (B. mori and D. melanogaster) were used for target databases using BLASTP. Additionally, protein domain searches were conducted on the consensus gene set using InterproScan5 (InterProScan, RRID:SCR_005829). Figure S1 shows the top 20 identified terms from 7 different InterproScan5 analyses. Among the various analyses conducted using InterproScan5, gene ontology analysis with the Pfam database showed that a large proportion of genes in the A. yamamai genome were related with the function of molecular binding, catalytic activity, internal component of membrane, metabolic process, oxidation-reduction process, and transmembrane transport.\r\n\r\nSummary statistics of ab initio, RNA-seq-based, and homology-based gene prediction results\r\n\r\nEvidence type\tPrograms\tElement\tTotal count\tExon/gene\tTotal length, bp\tMean length, bp\t \t\t\tGene\t14 576\t\t142 415 318\t9770.53\t \t\tAugustus\t\t\t4.85\t\t\t \t\t\tExon\t70 733\t\t14 736 668\t208.34\t \t\t\tGene\t10 946\t\t46 119 402\t4213.35\t \tab_initio\tGeneid\t\t\t2.25\t\t\t \t\t\tExon\t24 686\t\t3 925 563\t159.01\t \t\t\tGene\t27 754\t\t273 745 951\t9863.29\t \t\tGeneMarks-ET\t\t\t5.50\t\t\t \t\t\tExon\t152 660\t\t30 847 503\t202.06\t \t\t\tGene\t36 213\t\t840 429 061\t23 207.94\t \tRNA-seq\tCufflinks Transdecoder\t\t\t7.03\t\t\t \t\t\tExon\t254 770\t\t201 721 675\t791.77\t \tKnown gene (NCBI lepidoptera)\tPASA (gmap)\t\t44 561\t\t22 484 151\t504.57\t \t\r\n\r\nSummary statistics for the consensus gene set of the A. yamamai genome\r\n\r\nElement\tNo. of elements\tExon/gene\tAvg. length\tTotal length\tGenome coverage, %\t \tGene\t15 481\t\t11 016.34\t170 543 958\t25.78\t \t\t\t5.64\t\t\t\t \tExon\t87 346\t\t1346.23\t20 840 925\t3.31\t \t\r\n\r\nComparative genome analysis\r\n\r\nWe used OrthoMCL and Reciprocal Best Hit within BLASTP for identification of gene family clusters and 1:1 orthologous gene sets. Gene information of 7 taxa (A. aegypti, B. mori, D. plexippus, D. melanogaster, H. melpomene, M. cinxia, and P. xylostella), the same taxa used in repeat analysis, was employed for OrthoMCL with A. yamamai. A total of 17 406 gene family clusters were constructed, and 3586 1:1 orthologous genes were identified. Before conducting comparative genome analysis, we constructed phylogenetic trees for the 8 species. In order to build the phylogenetic tree, multiple sequence alignment for the 1:1 orthologous genes of all 8 species was conducted using PRANK, and Gblocks was used to obtain conserved blocks for the phylogenetic tree. Conserved block sequences were sequentially concatenated to obtain 1 consensus sequence for each species. MEGA was used for constructing Neighbor-Joining Trees (bootstrap 1000, maximum composite likelihood, transitions + transversions, and gamma distributed option), and MrBayes (MrBayes, RRID:SCR_012067) was employed for the construction of Bayesian inference trees. To select the best evolution model for our data, Modeltest was conducted and the GTR-based invariant model was chosen based on the AIC value of Modeltest. Figure 4 shows the constructed phylogenetic tree of the 8 species using 3586 orthologous genes. The bootstrap value and Bayesian poster probability value of all nodes were 100 and 1, respectively. The closest neighbor of A. yamamai was B. mori, which is included in Bombycidae family; this result is consistent with that of previous studies. Three butterfly species (D. plexippus, M. cinxia, and H. meplmene) included in Nymphalidae family were also shown to share a common ancestor with the families Saturniidae and Bombycidae.\r\n\r\nConstructed phylogenetic tree and comparative gene family analysis. Node values indicate Bayesian posterior probability, bootstrap and gene expansion, and contraction value. Orange and blue colors indicate expansion and contraction, respectively. Bar chart indicates the number of genes cauterized into 4 groups (Specific, 1:Multi, Multi:Multi, and 1:1) using OrthoMCL.\r\n\r\nBased on the constructed phylogenetic tree, gene family expansion and contraction analysis were conducted using a 2-parameter model in CAFÉ and the gene tree was constructed using protein sequence via MEGA. Figure 4 shows the result of gene family expansion and contraction analysis of 8 species; 938 and 1987 gene families of A. yamamai and 567 and 715 gene families of B. mori were estimated to be expanded and contracted from the common ancestors, respectively. Among these, 15 gene families in A. yamamai were estimated to be under rapid expansion during the evolution process. Functions of genes in the rapidly expanded gene families of A. yamamai were transposase, fatty acid synthase, zinc finger protein, chorion (eggshell protein), reverse transcriptase, prostaglandin dehydrogenase, RNA-directed DNA polymerase, gag like protein, juvenile hormone acid methyltransferase, facilitated trehalose transporter, and glucose dehydrogenase. Figure 5 shows the gene tree of 2 chorion gene (chorion class A and B) family clusters rapidly expanded in the A. yamamai genome. Chorion, called eggshell protein, composes the surface of the egg and protects the embryo from environmental threats such as desiccation, flooding, freezing, infection of microorganisms, and physical destruction. It also provides channels, such as aeropyle, that enable gas exchange and maintain proper conditions for diapause of the egg. These diverse functions of eggshell are implemented by the specific eggshell structure, and the surface structure of eggshell varies between species for the adaptation in a different environment. The ancestor of Antherea has the unique aeropyle structure called “aerophyle crown” on the eggshell surface. This unique structure is formed by the circular vertical projection of lamellar chorion from the follicle cell, and it surrounds the aeropyles near the end of oogenesis. Acquiring this kind of de novo complex structure requires numerous genetic changes, and a previous study about Antheraea polyphemus has shown that more than 100 chorion-specific polypeptides were involved in this unique ultra-structure. Therefore, the specific rapid expansion of the chorion class A and B gene family in the A. yamamai genome might be one of the convincing molecular explanations for acquiring this unique ultrastructure in the eggshell surface of Antheraea genus. However, this unique ultrastructure tends to be reduced during the current evolution process of the Antheraea genus. Types of eggshell structures in the Antheraea genus can be categorized into multiple classes based on the morphology and regional distribution of the aeropyle. The shape of the aeropyle in the A. yamamai egg is known to be converted to a mound shape from the crown shape, and these aeropyle mounds only exist in the narrow band surrounding the micropyle region. Only a very few, small aeropyle crowns remain, and it is entirely different from the ancestral form of eggshell surface mostly covered by aeropyle crowns. These regional differences were known to be adjusted by regional differences of filler genes during choriogenesis, and the additional regulations of related genes for choriogenesis have to be considered. This indicates that the specifically expanded chorion gene families of A. yamamai may be one of the remaining evolutionary tracks in the genome of the Antheraea genus. However, further functional studies must be conducted to resolve the limited understanding about the relationship between these expanded chorion gene families and the current eggshell surface formation of A. yamamai.\r\n\r\nExpansion of chorion gene in the A. yamamai genome. A, B) The gene trees of chorion A and B in the rapid expanded gene family cluster, respectively. Color of terminal node indicates each taxon identified in the gene family cluster.\r\n\r\nThe constructed genome of A. yamamai presented here is the first announced genome in the family Saturniidae, and the karyotyping analysis using gamete in the metaphase showed that the genome of A. yamamai consists of 31 chromosomes (Fig. 6). This constructed genome information provides more insight into the genome evolution and phylogeny of the family Saturniidae, which contains the largest number of species in Lepidoptera. For example, although 2 silk moths, A. yamamai and B. mori, appear similar, comparative genome analysis showed the significant differences in the genome size and the specific expansion of repeat elements and gene families between the families Saturniidae and Bombycidae. In case of molecular phylogeny, most previous phylogenetic studies were limited to few genes due to the lack of genomic information on the family Saturniidae. We expect that our study and resulting constructed genome will resolve some limitations of molecular phylogenetic and ecological research on the Saturniidae species. Additionally, constructed genome information will help researchers better understand the molecular background of wild silk and its production. Silk produced by A. yamamai, referred to as tensan silk, shows unique characteristics that have made it valuable in various fields. However, A. yamamai has not been completely domesticated compared with B. mori, making mass production of tensan silk infeasible. Understanding of the molecular mechanisms behind the tensan silk production process is essential for mass production using biotechnology, and this genome sequence with manually curated gene information is a fundamental resource for related research and industrial improvement. Additionally, the transcriptome data of 10 different organ tissues with the 3 biological replications presented here may be also useful resources for uncovering the molecular mechanisms related to specific phenotypes of A. yamamai and the family Saturniidae.\r\n\r\nKaryotype of A. yamamai using a gamete of testis in metaphase.\r\n\r\nAvailability of supporting data\r\n\r\nThe generated genome sequence and gene information of A. yamamai are available in GigaDB, and generated raw data are available under project accession PRJNA383008 and PRJNA383025 of the NCBI database.\r\n\r\nAdditional file\r\n\r\nAdditional file 1: Figure S1. Top 20 terms in each 7 Interproscan5 analysis (CDD, Pfam, PRINTS, ProSitePatterns, ProSiteProfiles, TIGRF, Gene Ontology).\r\n\r\nAdditional file 1: Supplementrary_information2.xlsx.\r\n\r\nCompeting interests\r\n\r\nAll authors report no competing interests.\r\n\r\nAbbreviations\r\n\r\nBUSCO: Benchmarking Universal Single-Copy Orthologs; MYA: million years ago.\r\n\r\nAuthor contributions\r\n\r\nSampling: Kee-Young Kim, Su-Bae Kim; sequencing: Kwang-Ho Choi, Seong-Wan Kim genome assembly: Seong-Ryul Kim, Woori Kwak, Jae-Sam Hwang, Seung-Won Park repeat element analysis: Seong-Ryul Kim, Woori Kwak, Seung-Won Park gene prediction: Seong-Ryul Kim, Woori Kwak, Hyaekang Kim, Jae-Sam Hwang comparative genome analysis: Seong-Ryul Kim, Woori Kwak, Min-Jae Kim, Kelsey Caetano-Anolles funding and experimental design: Seong-Ryul Kim, Seung-Won Park.\r\n\r\nSupplementary Material\r\n\r\n\r\n","tracks":[]}