Methods Genome sequences Ten plant genome sequences and transcript (EST, CDS, or cDNA) sequences were downloaded and indicated in Table 3 [19-27]. Table 3 contains the name of the 10 plant species, source websites, and genome sequence versions used in this study. Comparative BLAT and Sim4cc analysis Using cDNA sequences and gene sequences, we searched rice and Arabidopsis introns by two methods-BLAT and Sim4cc- and then compared the results with annotated information. The steps of this method are as follows (Fig. 1): 1) Using the gene sequences of BLAT with its own cDNA sequences, we found intron information from the BLAT results by Perl script. 2) We sliced gene sequences and cDNA sequences to folders by Perl script. In these folders, there was one sequence per file, and the gene name was the file name. Using the same gene name of the gene and cDNA file, we blasted the gene sequences and cDNA sequences using Sim4cc. Then, we got intron information from the Sim4cc results by Perl script. 3) We compared the results of the two types of software (BLAT and Sim4cc) and then got the annotated intron information. 4) We aligned intron sequences with their own gene sequences to develop detailed intron information, such as the intron position in the gene, intron length, intron number, forward-exon length, and backward-exon length, etc. 5) We compared the results from the two types of software with the annotated information to validate the methods. Intron length distributions analysis Using Perl script, we extracted the intron information of the 10 plant genomes from the genome annotation. Then, we counted the number and percentage of 3n, 3n + 1, and 3n + 2 of these 10 plants' intron length distributions.