PMC:5056902 / 3070-5148 JSON TXT

Annnotations TAB JSON ListView MergeView

TextAE

Methods Data sources We downloaded all of the genomic data and gene annotation data (gff3) of two green algae genomes (C. reinhardtii and C. subellipsoidea) from the Joint Genome Institute, four plant genomes (O. sativa, Z. mays, V. vinifera, and A. thaliana) from the Ensembl genome database, two plant genomes (G. max and S. bicolor) from the Plant Genome Database, and the B. rapa genome and annotation information data, available online from the B. rapa Database. We also collected each mitochondrial genome sequence from the National Center for Biotechnology Information (NCBI). All these data sources are summarized in Table 1. Detection of Numts and data generation Plant Numt insertions were identified using BLASTN local alignment tools in the BLAST program package (ver. 2.2.26), with mitochondrial genomic DNA as a query sequence and each genome dataset as a BLAST database. The execution options included an e-value cutoff set to 0.01, filtering switched off (-dust no), a mismatch penalty of –2, and a word size of 9. The neighboring Numt hits within 10 kb were, if necessary, merged into a single event of Numt insertions. All of these analytical processes were carried out with in-house Python codes. Calculation of odd ratio for Numt insertion loci To calculate the relative abundance of each genomic feature, we gathered the length information of the categories, such as gene, coding sequence (CDS), exon, pseudogene, and noncoding RNA (ncRNA) (tRNA, rRNA, and long non-coding RNA [lncRNA]), from the gene annotation files of each species (gff3 format). The total length of exons and introns included only the protein-coding genes. The total length of introns was computed by subtracting the total exon length from the sum of all gene lengths. With all of the length information, we estimated the portion of each feature by dividing the total length of each feature (Si) by each whole-genome length (G). The relative abundance (RAi) of each feature was then calculated as follows: RAi = Ci / (Si/G), where Ci is the count of the genic feature i in a species.

PMC:5056902 / 3070-5148 JSONTXT

Annnotations TAB JSON ListView MergeView

PMC:5056902 / 3070-5148 JSON TXT