PMC:3623602 JSON TXT

miR-Explore: Predicting MicroRNA Precursors by Class Grouping and Secondary Structure Positional Alignment Abstract MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expressions by targeting the mRNAs especially in the 3′UTR regions. The identification of miRNAs has been done by biological experiment and computational prediction. The computational prediction approach has been done using two major methods: comparative and noncomparative. The comparative method is dependent on the conservation of the miRNA sequences and secondary structure. The noncomparative method, on the other hand, does not rely on conservation. We hypothesized that each miRNA class has its own unique set of features; therefore, grouping miRNA by classes before using them as training data will improve sensitivity and specificity. The average sensitivity was 88.62% for miR-Explore, which relies on within miRNA class alignment, and 70.82% for miR-abela, which relies on global alignment. Compared with global alignment, grouping miRNA by classes yields a better sensitivity with very high specificity for pre-miRNA prediction even when a simple positional based secondary and primary structure alignment are used. Introduction Mature miRNAs are small 22 nucleotides long, non-coding RNAs. They are expressed in a wide variety of organisms including viruses, plants, and animals.1–3 They have significant role in posttranscriptional control of eukaryotes genome causing degradation of the mRNA transcripts or blocking translation.4–5 In most eukaryotic genomes, miRNA genes are transcribed by RNA-polymerase-II to primary miRNA (pri-miRNA).6 This pri-miRNA is then further processed by RNAse-III endonuclease Drosha into precursor miRNA (pre-miRNA), which is 60 to 100 nucleotides long, on average, and forms a stem-loop secondary structure.7 The pre-miRNA is then processed to form the small double stranded RNAs by the endonuclease Dicer, which also initiates the formation of the RNAinduced silencing complex (RISC).8 One strand of the double stranded RNA is incorporated with the RISC and targets the mRNA transcripts to block gene expression.9 miRNAs have been involved in some critical diseases including heart disease10 and cancer.11 In cancer, miRNAs can function as both oncogenes and suppressors.12 Several classes of miRNAs such as miR-15, let-7, miR-16, miR-342, miR-223, and miR-107 have been reported to be involved in acute promyelocytic leukemia (APL).13 The oncogene YES and STAT1, which are responsible in the proliferation of the colon cancer cells, are targeted by miR-145, thereby making this particular miRNA class a colon cancer suppressor.14 In lung cancer, miR-34 has been shown to have an important role on the PRIMA-1 regulation, which is a small molecule that restores the cancer cell suppression function.15 The identification of the involvement of miRNAs in several cancers has assisted researchers to develop some potential therapeutics for cancer cure.16 The significant role of miRNAs in human health has made the continuing discovery of novel miRNAs in the genome important. Laboratory experiments have been conducted to discover miRNAs by direct cloning and short RNA sequencing. However, it is difficult to identify miRNAs with low levels of expression using only laboratory techniques.17 Hence, computational predictions have been needed to support the identification of novel miRNAs. There are two major computational approaches for predicting miRNAs: comparative and noncomparative methods. The comparative method relies on the conservation of miRNAs across the genome of organisms. Some examples of computational method using the comparative method are miRScan,2,18 miRAlign,19 ERPIN,20 and microHarvester.21 For miRNAs that do not have conserve sequences, noncomparative methods are needed. Some examples of the noncomparative method are triplet-SVM,22 miPred,23 PromiR,24 and miR-abela.25 The rationale behind the noncomparative methods is to design computer program to learn and distinguish between real pre-miRNA and pseudo pre-miRNA. Hence most of the noncomparative methods are based on the utilization of Support Vector Learning Machine (SVM) and hidden Markov model (HMM). Many of the miRNA prediction algorithms were developed based on broad assumptions.26 A study by Bram and Aggrey27 showed marked variability in sensitivity and specificity in predicting chicken pre-miRNA across classes using ERPIN, PromiR, and miR-abela. In this study, we have developed a comparative method (miR-Explore) to demonstrate that grouping chicken miRNAs by classes increases sensitivity and specificity of the prediction method even when a simple direct positional secondary structure alignment is used. The basic idea of the current approach was to create a consensus structure of the pre-miRNA for each miRNA class and use this consensus structure to perform alignment with the query sequence. Materials and Methods Training data A set of data was taken from the known pre-miRNA in chicken, human, mouse, zebra fish, fugu, worms, frog, chimpanzee, gorilla, platypus, pig, fruit fly, and buffalo, sequences which are available in the microRNA database miRBase.9 The data selected were based on the available miRNA classes in chicken. For example, if chicken has miR-1 class, then all available known miR-1 class pre-miRNA in the other organisms would be selected in this data set. The training data were taken from 80% of the sequences in the data set that best represents each miRNA class. In order to ascertain a good consensus structure, only classes that have 5 or more known pre-miRNAs were chosen to be part of the training data. Positive data and negative data The remaining 20% of the sequences that were not used as the training data were used as a query (positive data) for prediction. The negative data was taken from all coding sequences in human and mouse sequences from the University of California Santa Cruz (UCSC) genome browser.28 These coding sequences were scanned for hairpin-like structures as a final negative data set. There were 33,932 hairpin-like structures in mouse and 36,662 hairpin-like structures in human. The hairpin-like structures scanning was done by using the program written by Sewer et al for miR-abela.25 Negative data for comparison with PromiR were 5000 random sequences of length 300 nucleotides from coding genes, tRNA, and rRNA of chicken taken from the UCSC genome browser. Programming and testing The training data set was aligned using multiple alignment of RNAs (MARNA)29 to obtain the consensus structure with 80% primary and secondary structure identity. The identity percentage was defined as the number of the nucleotides that are conserved among all of the training sequences in a particular column of the secondary structure. These consensus structures along with their primary sequence information were then used for alignment with the query sequence. The query sequence was limited to a minimum length of 300 nucleotides. The minimum length of 300 nucleotides was to allow the program to initiate the alignment process. The average length of the consensus sequence was 150 nucleotides as miR-Explore cannot take a query sequence that is shorter than the length of the consensus structure. The alignment was done based on the information of the position of the secondary and primary structure. An alignment data from MARNA provided the consensus secondary structure on the first row along with the primary structure information in the rest of the rows (Fig. 1). A sliding window was created with a length similar to the number of characters including gaps and unpaired nucleotides in the output of the MARNA alignment. The sliding window was used to scan the query sequence starting from position 1. The consensus structures were made up of gaps and hairpin structures, which consist of stem and loop. In Figure 1, the stem starts in positions 1 and 75. The query sequence is a primary sequence without any gaps. Hence the alignment inserts gaps to the query sequence and shifts the gaps to align with the consensus structure and its corresponding nucleotide pairs in the stem. If an alignment matched the secondary structure and the corresponding stem nucleotides in the exact position shown in the MARNA alignment, then it was counted as one, otherwise it was zero. A match was defined as the same nucleotides in a particular column of a consensus secondary structure of the query sequence and the training sequence. The scoring method was a simple 0 and 1 to represent match or mismatch. The maximum score that an alignment can receive is the same as the number of pairs in the consensus secondary structure. Because each miRNA class has its own number of pairs in the secondary structure, the score of each miRNA class was different. This required normalization of the score to enable each miRNA class to attain a standardized score. The standardized score was calculated as standardized score = (NMA/NMC) × 100%, where NMA = number of matched secondary structure and nucleotide stems in the alignment and NMC = number of nucleotide pair in the stem of the consensus structure. The same scoring system was used for both negative and positive data. Since we used 80% identity in the construction of the consensus structure, and this approach is a direct position to position exact comparison, the program was considered a hit when the standardized score from a query sequence was at least 80%. We compared the current approach (miR-Explore) with miR-abela25 and PromiR.24 miRabela was tested using the same positive and negative data, while PromiR was tested using the negative data from 5000 randomly selected sequences of length 300 nucleotides from chicken coding genes, tRNA, and rRNA. The positive data for PromiR test were taken from the same data set used to test the other two programs, except that only chicken data were used. We tested PromiR with only chicken positive data because this program requires sequences to be inputted individually, which is laborious and time consuming. Therefore, we tested PromiR against miR-Explore and miR-abela using only chicken data, and miR-Explore was tested against miR-abela using data set across species. The sensitivity and specificity were calculated. Sensitivity = TP/(TP +FN) where TP (true positive) is the number of pre-miRNA predicted as pre-miRNA, and FN (false negative) is the number of pre-miRNA predicted as non–pre-miRNA. Specificity = TN/(TN +FP) where TN (true negative) is the number of non–pre-miRNA predicted as non– pre-miRNA, and FP (false positive) is the number of non–pre-miRNA predicted as pre-miRNA. For accuracy measurement, we used Mathews Correlation Coefficient (MCC).30 MCC is a measure of the quality of binary classification. It takes into account true and false positives and negatives. The value of MCC ranges between −1 and +1 where +1 represents a perfect prediction, 0 represents a random prediction, and −1 represents completely inverted predictions. The formula to calculate the MCC is as follows, where TP is true positive, TN is true negative, FP is false positive and FN is false negative. MCC = TP × TN - FP × FN ( TP + FP ) ( TP + FN ) ( TN + FP ) ( TN + FN ) Results and Discussion The sensitivity and specificity for miR-Explore, miR-abela, and PromiR using only chicken data are shown in Table 1. The specificity for miR-Explore and miR-abela was 99.99%, and that of PromiR was 99.00%. The specificity of these programs was high using chicken data. However, there was marked variability in their sensitivity. miR-Explore had 88% sensitivity, whereas miR-abela and PromiR had 78% and 53% sensitivity, respectively. The differences in sensitivity could be due to that fact that both miR-abela and PromiR were developed using mostly human and other mammalian training data whereas miR-Explore included chicken data as part of the training data. The training data in the secondary structure profiling can affect the sensitivity of a program that is dependent on the program. In principle, PromiR, a probabilistic colearning method based on a hidden Markov model,24 was developed to identify both close and distant homologs. Our results demonstrate that either the chicken is too distant from humans or some of generalized assumptions under which PromiR was develop do not hold. PromiR scans the stem of the stem-loop candidates to determine the signal of the Drosha cleavage site. However, multiple factors govern the Drosha cleavage site.31 Therefore, it is possible that the PromiR did not capture most of the factors affecting Drosha cleavage in chicken data thereby limiting its sensitivity. Whereas, miR-abela was developed based on Support Vector Machine (SVM), miR-Explore utilized both real pre-miRNA and pseudo pre-miRNA as the training data and used general features of pre-miRNA to train the computer to learn and distinguish between the two. In general, there are some global features of pre-miRNAs that can be used as data for pre-miRNA prediction programs. These features include but are not limited to the nucleotide length, number of bulges, and minimum free energy. In the most recent miRNA prediction programs, these features have been generalized.32 The advantage of using these generalized miRNA features is that they are able to predict novel miRNAs that belong to previously unknown classes. Despite of the advantage of using the generalized miRNA features, the results shown in Table 1 suggest that using global features without adequate training data that can represent all organisms will have reduced sensitivity in identifying new miRNA for species that are not well represented in the training data. The test results of comparing miR-Explore and miR-abela using an expanded set of positive and negative data are also shown in Table 1. The detailed predictive results of different organisms for miR-Explore and miR-abela are provided in Supplementary tables 1 and 2, respectively. The central hypothesis for this study was that each miRNA class has its own unique set of features; therefore, grouping miRNA by classes before using them as training data will improve sensitivity and specificity. Both miR-Explore and miR-abela are highly efficient in detecting true negative miRNAs. The sensitivity of miR-Explore was higher than miR-abela for every species compared including humans. The average sensitivity was 88.62% for miR-Explore and 70.82% for miR-abela. The calculation of MCC for miR-Explore yield a coefficient of 0.90 whereas miR-abela has a coefficient of 0.75. Compared with global alignment, grouping miRNA by classes yielded a better sensitivity with very high specificity for pre-miRNA prediction even when a simple positional based secondary and primary structure alignment were used. The grouping technique also yielded a higher MCC coefficient compared with the program using generalized features of pre-miRNA. It can be argued that as much as each class has its own unique features, there are other features that are conserved across classes and species. The ability to predict miRNA with increased sensitivity depends on the amount of conserved elements captured in the secondary structure. For example, the training data for miR-21 were from humans, chickens, and mice, yet miR-Explore had 100% sensitivity in predicting miR-21 in other species that were not used in the training data. The features of miR21 could have been conserved well across species. On the other hand, miR125 could be least conserved even within the class and across species, and any predictive algorithm based on global alignment would yield relatively poor sensitivity. There are some limitations to miR-Explore. First, this approach is dependent on previously known miRNA precursor class and can only predict novel miRNA with that particular class. Second, miR-Explore relies on the conservation of miRNAs within a class and cross species and the availability of known miRNA data. When there are inadequate known miRNA data that can be used to build the consensus structure, the sensitivity may decline. Third, the alignment method of this approach is very simple. We used the positional information of the secondary structure along with the corresponding primary structure nucleotide, yet we were able to achieve a high sensitivity and specificity. Improving the alignment algorithm could improve the sensitivity and specificity. Conclusion It is difficult to develop a perfect miRNA prediction program; therefore, to detect miRNA computationally, more than one program may be needed.23 In this study, we have shown that grouping miRNA by classes before using them as training data will improve sensitivity and specificity. However, this approach can only predict pre-miRNA of known classes. Even though the sensitivity of PromiR and miR-abela may not be as good as that of miR-Explore, as ab initio methods, they have the potential to predict a novel miRNA in unknown classes. Each program has its own strengths and limitations that can complement each other. miR-Explore is a new technique that can contribute to the future discovery of novel miRNA. Supplementary Tables Table S1 Sensitivity and specificity of miR-Explore in predicting pre-miRNA across species. Class Specificity Sensitivity Chicken Human Mouse Zebra fish Others Average let-7 99.75% 2,2 2,2 2,2 2,2 38,39 97.87% miR1 100.00% 2,2 2,2 2,2 2,2 12,12 100.00% miR7 100.00% 1,2 1,2 2,2 1,2 13,16 75.00% miR9 100.00% 2,2 2,2 2,2 2,2 14,19 81.48% miR10 100.00% 2,2 2,2 2,2 2,2 10,14 81.82% miR15 100.00% 2,2 2,2 2,2 2,2 9,10 94.44% miR16 100.00% 2,2 2,2 2,2 2,2 8,8 100.00% miR17 100.00% 1,1 1,1 1,1 2,2 5,5 100.00% miR18 100.00% 1,1 1,1 1,1 2,2 7,7 100.00% miR19 99.96% 2,2 2,2 2,2 2,2 10,13 85.71% miR20 100.00% 2,2 2,2 2,2 2,2 6,6 100.00% miR21 100.00% 1,1 1,1 1,1 2,2 4,4 100.00% miR22 100.00% 1,1 1,1 1,1 2,2 4,5 90.00% miR23 100.00% 1,1 2,2 2,2 2,2 9,9 100.00% miR24 100.00% 1,1 2,2 1,2 2,2 6,6 92.31% miR26 100.00% 1,1 2,2 2,2 2,2 6,6 100.00% miR27 100.00% 1,1 2,2 2,2 2,2 9,9 100.00% miR29 99.98% 2,2 2,2 2,2 2,2 14,17 88.00% miR30 99.99% 1,2 2,2 1,2 2,2 16,18 84.62% miR31 99.99% 1,1 0,1 1,1 2,2 11,11 93.75% miR32 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR33 100.00% 2,2 1,1 1,1 N/A 9,9 100.00% miR34 100.00% 1,2 2,2 2,2 0,1 14,14 90.48% miR92 99.81% 0,1 0,2 0,2 2,2 17,21 67.86% miR99 100.00% 1,1 2,2 2,2 2,2 4,5 91.67% miR100 100.00% 1,1 1,1 1,1 2,2 8,8 100.00% miR101 100.00% 2,2 2,2 2,2 2,2 7,8 93.75% miR103 100.00% 2,2 2,2 2,2 1,1 7,7 100.00% miR106 100.00% 1,1 2,2 2,2 N/A 6,6 100.00% miR107 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR122 100.00% 2,2 1,1 1,1 1,1 3,3 100.00% miR124 99.99% 2,2 2,2 2,2 2,2 15,15 100.00% miR125 99.97% 0,1 0,2 0,2 0,2 0,16 0.00% miR126 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR128 100.00% 2,2 2,2 2,2 2,2 6,6 100.00% miR130 100.00% 2,2 2,2 2,2 2,2 7,7 100.00% miR133 100.00% 2,2 2,2 2,2 2,2 11,14 86.36% miR135 100.00% 2,2 2,2 2,2 2,2 9,10 94.44% miR137 100.00% 1,1 1,1 1,1 2,2 5,6 90.91% miR138 100.00% 1,2 2,2 1,2 1,1 1,4 54.55% miR140 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR142 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR144 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR146 100.00% 2,2 2,2 2,2 2,2 5,5 100.00% miR147 100.00% 2,2 2,2 1,1 N/A 3,3 100.00% miR148 100.00% 1,1 2,2 2,2 1,1 4,4 100.00% miR153 100.00% 1,1 2,2 1,1 2,2 6,6 100.00% miR155 100.00% 0,1 0,1 0,1 0,1 0,2 0.00% miR181 100.00% 2,2 2,2 2,2 2,2 15,18 88.46% miR183 100.00% 1,1 1,1 1,1 1,1 2,5 66.67% miR184 100.00% 1,1 1,1 1,1 1,1 9,9 100.00% miR187 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR190 99.99% 1,1 1,1 1,1 1,1 7,8 91.67% miR193 100.00% 2,2 1,1 1,1 2,2 3,6 75.00% miR194 100.00% 1,1 2,2 2,2 2,2 5,5 100.00% miR196 100.00% 2,2 2,2 2,2 2,2 9,9 100.00% miR199 100.00% 2,2 2,2 2,2 2,2 8,9 94.12% miR200 100.00% 2,2 2,2 2,2 2,2 9,9 100.00% miR202 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR203 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR204 100.00% 2,2 1,1 1,1 2,2 5,5 100.00% miR205 100.00% 2,2 1,1 1,1 1,1 4,4 100.00% miR206 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR211 100.00% 1,1 1,1 1,1 N/A 2,2 100.00% miR214 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR215 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR216 99.99% 2,2 1,1 1,1 2,2 4,8 85.71% miR217 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR218 100.00% 2,2 2,2 2,2 2,2 8,8 100.00% miR219 100.00% 1,1 1,1 1,1 1,1 8,10 85.71% miR221 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR222 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR223 100.00% 1,1 1,1 1,1 1,1 3,4 87.50% miR301 100.00% 1,1 1,1 1,1 1,1 5,5 100.00% miR302 100.00% 0,2 0,2 0,2 N/A 0,8 0.00% miR365 100.00% 2,2 2,2 2,2 2,2 3,5 84.62% miR367 100.00% 1,1 1,1 1,1 N/A 2,2 100.00% miR375 100.00% 1,1 1,1 1,1 1,1 5,5 100.00% miR383 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR429 100.00% 1,1 1,1 1,1 1,1 3,5 77.78% miR449 100.00% 0,1 0,1 0,1 N/A 0,4 0.00% miR451 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR454 100.00% 1,1 N/A N/A 1,1 2,2 100.00% miR455 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR460 100.00% N/A N/A N/A N/A 1,1 100.00% miR466 100.00% 0,1 N/A 0,2 N/A 1,4 14.29% miR489 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR490 100.00% N/A N/A N/A N/A 2,2 100.00% miR499 100.00% 1,1 1,1 N/A N/A 3,3 100.00% miR551 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR1306 100.00% N/A N/A N/A N/A 1,1 100.00% Table S2 Sensitivity of miR-abela in predicting pre-miRNA across species. Class Sensitivity miR-abela Chicken Human Mouse Zebra fish Others Average let-7 2,2 0,2 1,2 2,2 26,39 65.96% miR1 2,2 2,2 1,2 2,2 10,12 85.00% miR7 2,2 2,2 2,2 2,2 14,16 91.67% miR9 2,2 2,2 2,2 2,2 16,19 88.89% miR10 2,2 2,2 1,2 2,2 11,14 81.82% miR15 1,2 1,2 0,2 2,2 3,9 41.18% miR16 2,2 2,2 2,2 1,2 7,8 87.50% miR17 0,1 0,1 0,1 0,2 1,5 10.00% miR18 2,2 0,1 1,1 2,2 3,7 61.54% miR19 2,2 2,2 2,2 2,2 8,13 76.19% miR20 1,2 1,2 1,2 1,2 3,6 50.00% miR21 1,1 1,1 1,1 2,2 3,4 88.89% miR22 1,1 1,1 1,1 2,2 3,5 80.00% miR23 1,1 2,2 2,2 2,2 5,9 75.00% miR24 0,1 1,2 1,2 2,2 1,6 38.46% miR26 1,1 2,2 2,2 2,2 3,6 100.00% miR27 0,1 1,2 1,2 1,2 4,9 43.75% miR29 2,2 2,2 2,2 2,2 13,17 84.00% miR30 1,2 2,2 2,2 2,2 11,18 69.23% miR31 1,1 1,1 1,1 0,2 8,11 68.75% miR32 1,1 1,1 1,1 N/A 2,3 83.33% miR33 2,2 1,1 1,1 N/A 8,9 92.31% miR34 1,2 1,2 1,2 1,1 14,14 85.71% miR92 1,1 2,2 2,2 2,2 17,21 92.86% miR99 1,1 1,2 1,2 1,2 0,5 33.33% miR100 0,1 0,1 0,1 2,2 5,8 53.85% miR101 2,2 2,2 2,2 2,2 5,8 81.25% miR103 2,2 1,2 2,2 1,1 1,7 50.00% miR106 0,1 1,2 0,2 N/A 4,6 45.45% miR107 0,1 0,1 1,1 1,1 0,4 25.00% miR122 2,2 1,1 1,1 1,1 2,3 87.50% miR124 2,2 2,2 1,2 1,2 11,15 73.91% miR125 1,1 2,2 1,2 2,2 6,16 52.17% miR126 1,1 1,1 1,1 1,1 2,3 85.71% miR128 2,2 2,2 2,2 2,2 3,6 78.57% miR130 1,2 2,2 1,2 2,2 5,7 73.33% miR133 2,2 2,2 2,2 2,2 9,14 77.27% miR135 2,2 2,2 2,2 0,2 9,10 83.33% miR137 1,1 1,1 1,1 2,2 5,6 90.91% miR138 0,2 1,2 0,2 0,1 1,4 18.18% miR140 1,1 1,1 1,1 1,1 3,3 100.00% miR142 1,1 1,1 0,1 1,1 3,4 75.00% miR144 1,1 1,1 1,1 1,1 2,3 85.71% miR146 2,2 1,2 2,2 2,2 4,5 84.62% miR147 0,2 1,2 0,1 N/A 2,3 37.50% miR148 0,1 1,2 2,2 0,1 1,4 40.00% miR153 1,1 2,2 1,1 2,2 5,6 91.67% miR155 0,1 0,1 1,1 1,1 1,2 50.00% miR181 1,2 1,2 1,2 1,2 5,18 34.62% miR183 0,1 0,1 0,1 0,1 3,5 33.33% miR184 1,1 1,1 1,1 1,1 7,9 84.62% miR187 1,1 0,1 0,1 0,1 0,3 14.29% miR190 1,1 1,1 1,1 1,1 6,7 90.91% miR193 1,2 1,1 0,1 2,2 4,6 66.67% miR194 1,1 1,2 1,2 2,2 4,5 75.00% miR196 2,2 2,2 2,2 2,2 6,9 100.00% miR199 2,2 2,2 1,2 2,2 3,9 58.82% miR200 2,2 2,2 2,2 2,2 7,9 88.24% miR202 1,1 1,1 1,1 1,1 3,3 100.00% miR203 1,1 1,1 1,1 1,1 3,3 100.00% miR204 2,2 1,1 1,1 1,2 4,5 81.82% miR205 2,2 1,1 1,1 1,1 3,4 88.89% miR206 1,1 1,1 1,1 1,1 2,3 85.71% miR211 1,1 1,1 1,1 N/A 2,2 100.00% miR214 1,1 1,1 1,1 1,1 4,4 100.00% miR215 1,1 1,1 1,1 N/A 3,3 100.00% miR216 0,2 1,1 0,1 0,2 5,8 42.86% miR217 1,1 1,1 1,1 1,1 3,4 87.50% miR218 2,2 2,2 2,2 2,2 5,8 81.25% miR219 1,1 1,1 1,1 1,1 5,10 64.29% miR221 1,1 1,1 1,1 1,1 3,4 87.50% miR222 1,1 1,1 1,1 1,1 3,3 100.00% miR223 1,1 1,1 1,1 1,1 4,4 100.00% miR301 1,1 0,1 0,1 0,1 0,5 11.11% miR302 1,2 2,2 1,2 N/A 8,8 85.71% miR365 1,2 1,2 2,2 1,2 3,5 61.54% miR367 0,1 1,1 1,1 N/A 0,2 40.00% miR375 1,1 0,1 0,1 1,1 3,5 55.56% miR383 0,1 0,1 0,1 N/A 0,3 0.00% miR429 1,1 0,1 1,1 1,1 4,5 77.78% miR449 1,1 1,1 1,1 N/A 2,4 71.43% miR451 1,1 1,1 1,1 1,1 2,2 100.00% miR454 1,1 N/A N/A 1,1 1,2 75.00% miR455 1,1 1,1 1,1 1,1 2,2 100.00% miR460 N/A N/A N/A N/A 1,1 100.00% miR466 0,1 N/A 1,2 N/A 4,4 71.43% miR489 1,1 1,1 1,1 1,1 2,2 100.00% miR490 N/A N/A N/A N/A 0,2 0.00% miR499 1,1 1,1 N/A N/A 2,3 80.00% miR551 1,1 1,1 1,1 N/A 3,3 100.00% miR1306 N/A N/A N/A N/A 0,1 0.00% Acknowledgement We thank Dr. Alain Sewer from Université de Lausanne, Switzerland who has provided us with the program to scan for the hairpin-like structures in the human and mouse coding sequences. Author Contributions Conceived and designed the experiments: SEA. Analyzed the data: BS. Wrote the first draft of the manuscript: BS. Contributed to the writing of the manuscript: SEA, BS. Agree with manuscript results and conclusions: SEA, BS. Jointly developed the structure and arguments for the paper: SEA, BS. Made critical revisions and approved final version: SEA, BS. All authors reviewed and approved of the final manuscript. Competing Interests Author(s) disclose no potential conflicts of interest. Disclosures and Ethics As a requirement of publication the authors have provided signed confirmation of their compliance with ethical and legal obligations including but not limited to compliance with ICMJE authorship and competing interests guidelines, that the article is neither under consideration for publication nor published elsewhere, of their compliance with legal and ethical guidelines concerning human and animal research participants (if applicable), and that permission has been obtained for reproduction of any copyrighted material. This article was subject to blind, independent, expert peer review. The reviewers reported no competing interests. Funding Author(s) disclose no funding sources. Figure 1 An example of alignment output from MARNA where ‘(’ and ‘)’ indicates the stem in the secondary structure, ‘.’ is a mismatch and ‘—’ is a gap. Table 1 Sensitivity and specificity of PromiR, miR-Explore and miR-abela in predicting pre-miRNA. Speciesa PromiR miR-Explore miR-abela GGA GGA HSA MMU DRE Others Averageb GGA HSA MMU DRE Others Averageb Sensitivity 53.00 91.00 92.00 89.00 95.00 86.00 88.00 78.00 78.00 74.00 82.00 65.00 71.00 Specificity 90.00 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 Notes: a GGA = chicken; HSA = human; MMU = Mouse; DRE = zebra fish; Others = fugu, worms, frog, chimpanzee, gorilla, platypus, pig, drosophila and buffalo; b average was calculated as the ratio between the total number of predicted pre-miRNA and the total number of pre-miRNA in the test data.

Document structure show

article-title	miR-Explore: Predicting MicroRNA Precursors by Class Grouping and Secondary Structure Positional Alignment
abstract	MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expressions by targeting the mRNAs especially in the 3′UTR regions. The identification of miRNAs has been done by biological experiment and computational prediction. The computational prediction approach has been done using two major methods: comparative and noncomparative. The comparative method is dependent on the conservation of the miRNA sequences and secondary structure. The noncomparative method, on the other hand, does not rely on conservation. We hypothesized that each miRNA class has its own unique set of features; therefore, grouping miRNA by classes before using them as training data will improve sensitivity and specificity. The average sensitivity was 88.62% for miR-Explore, which relies on within miRNA class alignment, and 70.82% for miR-abela, which relies on global alignment. Compared with global alignment, grouping miRNA by classes yields a better sensitivity with very high specificity for pre-miRNA prediction even when a simple positional based secondary and primary structure alignment are used.
p	MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expressions by targeting the mRNAs especially in the 3′UTR regions. The identification of miRNAs has been done by biological experiment and computational prediction. The computational prediction approach has been done using two major methods: comparative and noncomparative. The comparative method is dependent on the conservation of the miRNA sequences and secondary structure. The noncomparative method, on the other hand, does not rely on conservation. We hypothesized that each miRNA class has its own unique set of features; therefore, grouping miRNA by classes before using them as training data will improve sensitivity and specificity. The average sensitivity was 88.62% for miR-Explore, which relies on within miRNA class alignment, and 70.82% for miR-abela, which relies on global alignment. Compared with global alignment, grouping miRNA by classes yields a better sensitivity with very high specificity for pre-miRNA prediction even when a simple positional based secondary and primary structure alignment are used.
body	Introduction Mature miRNAs are small 22 nucleotides long, non-coding RNAs. They are expressed in a wide variety of organisms including viruses, plants, and animals.1–3 They have significant role in posttranscriptional control of eukaryotes genome causing degradation of the mRNA transcripts or blocking translation.4–5 In most eukaryotic genomes, miRNA genes are transcribed by RNA-polymerase-II to primary miRNA (pri-miRNA).6 This pri-miRNA is then further processed by RNAse-III endonuclease Drosha into precursor miRNA (pre-miRNA), which is 60 to 100 nucleotides long, on average, and forms a stem-loop secondary structure.7 The pre-miRNA is then processed to form the small double stranded RNAs by the endonuclease Dicer, which also initiates the formation of the RNAinduced silencing complex (RISC).8 One strand of the double stranded RNA is incorporated with the RISC and targets the mRNA transcripts to block gene expression.9 miRNAs have been involved in some critical diseases including heart disease10 and cancer.11 In cancer, miRNAs can function as both oncogenes and suppressors.12 Several classes of miRNAs such as miR-15, let-7, miR-16, miR-342, miR-223, and miR-107 have been reported to be involved in acute promyelocytic leukemia (APL).13 The oncogene YES and STAT1, which are responsible in the proliferation of the colon cancer cells, are targeted by miR-145, thereby making this particular miRNA class a colon cancer suppressor.14 In lung cancer, miR-34 has been shown to have an important role on the PRIMA-1 regulation, which is a small molecule that restores the cancer cell suppression function.15 The identification of the involvement of miRNAs in several cancers has assisted researchers to develop some potential therapeutics for cancer cure.16 The significant role of miRNAs in human health has made the continuing discovery of novel miRNAs in the genome important. Laboratory experiments have been conducted to discover miRNAs by direct cloning and short RNA sequencing. However, it is difficult to identify miRNAs with low levels of expression using only laboratory techniques.17 Hence, computational predictions have been needed to support the identification of novel miRNAs. There are two major computational approaches for predicting miRNAs: comparative and noncomparative methods. The comparative method relies on the conservation of miRNAs across the genome of organisms. Some examples of computational method using the comparative method are miRScan,2,18 miRAlign,19 ERPIN,20 and microHarvester.21 For miRNAs that do not have conserve sequences, noncomparative methods are needed. Some examples of the noncomparative method are triplet-SVM,22 miPred,23 PromiR,24 and miR-abela.25 The rationale behind the noncomparative methods is to design computer program to learn and distinguish between real pre-miRNA and pseudo pre-miRNA. Hence most of the noncomparative methods are based on the utilization of Support Vector Learning Machine (SVM) and hidden Markov model (HMM). Many of the miRNA prediction algorithms were developed based on broad assumptions.26 A study by Bram and Aggrey27 showed marked variability in sensitivity and specificity in predicting chicken pre-miRNA across classes using ERPIN, PromiR, and miR-abela. In this study, we have developed a comparative method (miR-Explore) to demonstrate that grouping chicken miRNAs by classes increases sensitivity and specificity of the prediction method even when a simple direct positional secondary structure alignment is used. The basic idea of the current approach was to create a consensus structure of the pre-miRNA for each miRNA class and use this consensus structure to perform alignment with the query sequence. Materials and Methods Training data A set of data was taken from the known pre-miRNA in chicken, human, mouse, zebra fish, fugu, worms, frog, chimpanzee, gorilla, platypus, pig, fruit fly, and buffalo, sequences which are available in the microRNA database miRBase.9 The data selected were based on the available miRNA classes in chicken. For example, if chicken has miR-1 class, then all available known miR-1 class pre-miRNA in the other organisms would be selected in this data set. The training data were taken from 80% of the sequences in the data set that best represents each miRNA class. In order to ascertain a good consensus structure, only classes that have 5 or more known pre-miRNAs were chosen to be part of the training data. Positive data and negative data The remaining 20% of the sequences that were not used as the training data were used as a query (positive data) for prediction. The negative data was taken from all coding sequences in human and mouse sequences from the University of California Santa Cruz (UCSC) genome browser.28 These coding sequences were scanned for hairpin-like structures as a final negative data set. There were 33,932 hairpin-like structures in mouse and 36,662 hairpin-like structures in human. The hairpin-like structures scanning was done by using the program written by Sewer et al for miR-abela.25 Negative data for comparison with PromiR were 5000 random sequences of length 300 nucleotides from coding genes, tRNA, and rRNA of chicken taken from the UCSC genome browser. Programming and testing The training data set was aligned using multiple alignment of RNAs (MARNA)29 to obtain the consensus structure with 80% primary and secondary structure identity. The identity percentage was defined as the number of the nucleotides that are conserved among all of the training sequences in a particular column of the secondary structure. These consensus structures along with their primary sequence information were then used for alignment with the query sequence. The query sequence was limited to a minimum length of 300 nucleotides. The minimum length of 300 nucleotides was to allow the program to initiate the alignment process. The average length of the consensus sequence was 150 nucleotides as miR-Explore cannot take a query sequence that is shorter than the length of the consensus structure. The alignment was done based on the information of the position of the secondary and primary structure. An alignment data from MARNA provided the consensus secondary structure on the first row along with the primary structure information in the rest of the rows (Fig. 1). A sliding window was created with a length similar to the number of characters including gaps and unpaired nucleotides in the output of the MARNA alignment. The sliding window was used to scan the query sequence starting from position 1. The consensus structures were made up of gaps and hairpin structures, which consist of stem and loop. In Figure 1, the stem starts in positions 1 and 75. The query sequence is a primary sequence without any gaps. Hence the alignment inserts gaps to the query sequence and shifts the gaps to align with the consensus structure and its corresponding nucleotide pairs in the stem. If an alignment matched the secondary structure and the corresponding stem nucleotides in the exact position shown in the MARNA alignment, then it was counted as one, otherwise it was zero. A match was defined as the same nucleotides in a particular column of a consensus secondary structure of the query sequence and the training sequence. The scoring method was a simple 0 and 1 to represent match or mismatch. The maximum score that an alignment can receive is the same as the number of pairs in the consensus secondary structure. Because each miRNA class has its own number of pairs in the secondary structure, the score of each miRNA class was different. This required normalization of the score to enable each miRNA class to attain a standardized score. The standardized score was calculated as standardized score = (NMA/NMC) × 100%, where NMA = number of matched secondary structure and nucleotide stems in the alignment and NMC = number of nucleotide pair in the stem of the consensus structure. The same scoring system was used for both negative and positive data. Since we used 80% identity in the construction of the consensus structure, and this approach is a direct position to position exact comparison, the program was considered a hit when the standardized score from a query sequence was at least 80%. We compared the current approach (miR-Explore) with miR-abela25 and PromiR.24 miRabela was tested using the same positive and negative data, while PromiR was tested using the negative data from 5000 randomly selected sequences of length 300 nucleotides from chicken coding genes, tRNA, and rRNA. The positive data for PromiR test were taken from the same data set used to test the other two programs, except that only chicken data were used. We tested PromiR with only chicken positive data because this program requires sequences to be inputted individually, which is laborious and time consuming. Therefore, we tested PromiR against miR-Explore and miR-abela using only chicken data, and miR-Explore was tested against miR-abela using data set across species. The sensitivity and specificity were calculated. Sensitivity = TP/(TP +FN) where TP (true positive) is the number of pre-miRNA predicted as pre-miRNA, and FN (false negative) is the number of pre-miRNA predicted as non–pre-miRNA. Specificity = TN/(TN +FP) where TN (true negative) is the number of non–pre-miRNA predicted as non– pre-miRNA, and FP (false positive) is the number of non–pre-miRNA predicted as pre-miRNA. For accuracy measurement, we used Mathews Correlation Coefficient (MCC).30 MCC is a measure of the quality of binary classification. It takes into account true and false positives and negatives. The value of MCC ranges between −1 and +1 where +1 represents a perfect prediction, 0 represents a random prediction, and −1 represents completely inverted predictions. The formula to calculate the MCC is as follows, where TP is true positive, TN is true negative, FP is false positive and FN is false negative. MCC = TP × TN - FP × FN ( TP + FP ) ( TP + FN ) ( TN + FP ) ( TN + FN ) Results and Discussion The sensitivity and specificity for miR-Explore, miR-abela, and PromiR using only chicken data are shown in Table 1. The specificity for miR-Explore and miR-abela was 99.99%, and that of PromiR was 99.00%. The specificity of these programs was high using chicken data. However, there was marked variability in their sensitivity. miR-Explore had 88% sensitivity, whereas miR-abela and PromiR had 78% and 53% sensitivity, respectively. The differences in sensitivity could be due to that fact that both miR-abela and PromiR were developed using mostly human and other mammalian training data whereas miR-Explore included chicken data as part of the training data. The training data in the secondary structure profiling can affect the sensitivity of a program that is dependent on the program. In principle, PromiR, a probabilistic colearning method based on a hidden Markov model,24 was developed to identify both close and distant homologs. Our results demonstrate that either the chicken is too distant from humans or some of generalized assumptions under which PromiR was develop do not hold. PromiR scans the stem of the stem-loop candidates to determine the signal of the Drosha cleavage site. However, multiple factors govern the Drosha cleavage site.31 Therefore, it is possible that the PromiR did not capture most of the factors affecting Drosha cleavage in chicken data thereby limiting its sensitivity. Whereas, miR-abela was developed based on Support Vector Machine (SVM), miR-Explore utilized both real pre-miRNA and pseudo pre-miRNA as the training data and used general features of pre-miRNA to train the computer to learn and distinguish between the two. In general, there are some global features of pre-miRNAs that can be used as data for pre-miRNA prediction programs. These features include but are not limited to the nucleotide length, number of bulges, and minimum free energy. In the most recent miRNA prediction programs, these features have been generalized.32 The advantage of using these generalized miRNA features is that they are able to predict novel miRNAs that belong to previously unknown classes. Despite of the advantage of using the generalized miRNA features, the results shown in Table 1 suggest that using global features without adequate training data that can represent all organisms will have reduced sensitivity in identifying new miRNA for species that are not well represented in the training data. The test results of comparing miR-Explore and miR-abela using an expanded set of positive and negative data are also shown in Table 1. The detailed predictive results of different organisms for miR-Explore and miR-abela are provided in Supplementary tables 1 and 2, respectively. The central hypothesis for this study was that each miRNA class has its own unique set of features; therefore, grouping miRNA by classes before using them as training data will improve sensitivity and specificity. Both miR-Explore and miR-abela are highly efficient in detecting true negative miRNAs. The sensitivity of miR-Explore was higher than miR-abela for every species compared including humans. The average sensitivity was 88.62% for miR-Explore and 70.82% for miR-abela. The calculation of MCC for miR-Explore yield a coefficient of 0.90 whereas miR-abela has a coefficient of 0.75. Compared with global alignment, grouping miRNA by classes yielded a better sensitivity with very high specificity for pre-miRNA prediction even when a simple positional based secondary and primary structure alignment were used. The grouping technique also yielded a higher MCC coefficient compared with the program using generalized features of pre-miRNA. It can be argued that as much as each class has its own unique features, there are other features that are conserved across classes and species. The ability to predict miRNA with increased sensitivity depends on the amount of conserved elements captured in the secondary structure. For example, the training data for miR-21 were from humans, chickens, and mice, yet miR-Explore had 100% sensitivity in predicting miR-21 in other species that were not used in the training data. The features of miR21 could have been conserved well across species. On the other hand, miR125 could be least conserved even within the class and across species, and any predictive algorithm based on global alignment would yield relatively poor sensitivity. There are some limitations to miR-Explore. First, this approach is dependent on previously known miRNA precursor class and can only predict novel miRNA with that particular class. Second, miR-Explore relies on the conservation of miRNAs within a class and cross species and the availability of known miRNA data. When there are inadequate known miRNA data that can be used to build the consensus structure, the sensitivity may decline. Third, the alignment method of this approach is very simple. We used the positional information of the secondary structure along with the corresponding primary structure nucleotide, yet we were able to achieve a high sensitivity and specificity. Improving the alignment algorithm could improve the sensitivity and specificity. Conclusion It is difficult to develop a perfect miRNA prediction program; therefore, to detect miRNA computationally, more than one program may be needed.23 In this study, we have shown that grouping miRNA by classes before using them as training data will improve sensitivity and specificity. However, this approach can only predict pre-miRNA of known classes. Even though the sensitivity of PromiR and miR-abela may not be as good as that of miR-Explore, as ab initio methods, they have the potential to predict a novel miRNA in unknown classes. Each program has its own strengths and limitations that can complement each other. miR-Explore is a new technique that can contribute to the future discovery of novel miRNA. Supplementary Tables Table S1 Sensitivity and specificity of miR-Explore in predicting pre-miRNA across species. Class Specificity Sensitivity Chicken Human Mouse Zebra fish Others Average let-7 99.75% 2,2 2,2 2,2 2,2 38,39 97.87% miR1 100.00% 2,2 2,2 2,2 2,2 12,12 100.00% miR7 100.00% 1,2 1,2 2,2 1,2 13,16 75.00% miR9 100.00% 2,2 2,2 2,2 2,2 14,19 81.48% miR10 100.00% 2,2 2,2 2,2 2,2 10,14 81.82% miR15 100.00% 2,2 2,2 2,2 2,2 9,10 94.44% miR16 100.00% 2,2 2,2 2,2 2,2 8,8 100.00% miR17 100.00% 1,1 1,1 1,1 2,2 5,5 100.00% miR18 100.00% 1,1 1,1 1,1 2,2 7,7 100.00% miR19 99.96% 2,2 2,2 2,2 2,2 10,13 85.71% miR20 100.00% 2,2 2,2 2,2 2,2 6,6 100.00% miR21 100.00% 1,1 1,1 1,1 2,2 4,4 100.00% miR22 100.00% 1,1 1,1 1,1 2,2 4,5 90.00% miR23 100.00% 1,1 2,2 2,2 2,2 9,9 100.00% miR24 100.00% 1,1 2,2 1,2 2,2 6,6 92.31% miR26 100.00% 1,1 2,2 2,2 2,2 6,6 100.00% miR27 100.00% 1,1 2,2 2,2 2,2 9,9 100.00% miR29 99.98% 2,2 2,2 2,2 2,2 14,17 88.00% miR30 99.99% 1,2 2,2 1,2 2,2 16,18 84.62% miR31 99.99% 1,1 0,1 1,1 2,2 11,11 93.75% miR32 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR33 100.00% 2,2 1,1 1,1 N/A 9,9 100.00% miR34 100.00% 1,2 2,2 2,2 0,1 14,14 90.48% miR92 99.81% 0,1 0,2 0,2 2,2 17,21 67.86% miR99 100.00% 1,1 2,2 2,2 2,2 4,5 91.67% miR100 100.00% 1,1 1,1 1,1 2,2 8,8 100.00% miR101 100.00% 2,2 2,2 2,2 2,2 7,8 93.75% miR103 100.00% 2,2 2,2 2,2 1,1 7,7 100.00% miR106 100.00% 1,1 2,2 2,2 N/A 6,6 100.00% miR107 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR122 100.00% 2,2 1,1 1,1 1,1 3,3 100.00% miR124 99.99% 2,2 2,2 2,2 2,2 15,15 100.00% miR125 99.97% 0,1 0,2 0,2 0,2 0,16 0.00% miR126 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR128 100.00% 2,2 2,2 2,2 2,2 6,6 100.00% miR130 100.00% 2,2 2,2 2,2 2,2 7,7 100.00% miR133 100.00% 2,2 2,2 2,2 2,2 11,14 86.36% miR135 100.00% 2,2 2,2 2,2 2,2 9,10 94.44% miR137 100.00% 1,1 1,1 1,1 2,2 5,6 90.91% miR138 100.00% 1,2 2,2 1,2 1,1 1,4 54.55% miR140 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR142 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR144 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR146 100.00% 2,2 2,2 2,2 2,2 5,5 100.00% miR147 100.00% 2,2 2,2 1,1 N/A 3,3 100.00% miR148 100.00% 1,1 2,2 2,2 1,1 4,4 100.00% miR153 100.00% 1,1 2,2 1,1 2,2 6,6 100.00% miR155 100.00% 0,1 0,1 0,1 0,1 0,2 0.00% miR181 100.00% 2,2 2,2 2,2 2,2 15,18 88.46% miR183 100.00% 1,1 1,1 1,1 1,1 2,5 66.67% miR184 100.00% 1,1 1,1 1,1 1,1 9,9 100.00% miR187 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR190 99.99% 1,1 1,1 1,1 1,1 7,8 91.67% miR193 100.00% 2,2 1,1 1,1 2,2 3,6 75.00% miR194 100.00% 1,1 2,2 2,2 2,2 5,5 100.00% miR196 100.00% 2,2 2,2 2,2 2,2 9,9 100.00% miR199 100.00% 2,2 2,2 2,2 2,2 8,9 94.12% miR200 100.00% 2,2 2,2 2,2 2,2 9,9 100.00% miR202 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR203 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR204 100.00% 2,2 1,1 1,1 2,2 5,5 100.00% miR205 100.00% 2,2 1,1 1,1 1,1 4,4 100.00% miR206 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR211 100.00% 1,1 1,1 1,1 N/A 2,2 100.00% miR214 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR215 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR216 99.99% 2,2 1,1 1,1 2,2 4,8 85.71% miR217 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR218 100.00% 2,2 2,2 2,2 2,2 8,8 100.00% miR219 100.00% 1,1 1,1 1,1 1,1 8,10 85.71% miR221 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR222 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR223 100.00% 1,1 1,1 1,1 1,1 3,4 87.50% miR301 100.00% 1,1 1,1 1,1 1,1 5,5 100.00% miR302 100.00% 0,2 0,2 0,2 N/A 0,8 0.00% miR365 100.00% 2,2 2,2 2,2 2,2 3,5 84.62% miR367 100.00% 1,1 1,1 1,1 N/A 2,2 100.00% miR375 100.00% 1,1 1,1 1,1 1,1 5,5 100.00% miR383 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR429 100.00% 1,1 1,1 1,1 1,1 3,5 77.78% miR449 100.00% 0,1 0,1 0,1 N/A 0,4 0.00% miR451 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR454 100.00% 1,1 N/A N/A 1,1 2,2 100.00% miR455 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR460 100.00% N/A N/A N/A N/A 1,1 100.00% miR466 100.00% 0,1 N/A 0,2 N/A 1,4 14.29% miR489 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR490 100.00% N/A N/A N/A N/A 2,2 100.00% miR499 100.00% 1,1 1,1 N/A N/A 3,3 100.00% miR551 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR1306 100.00% N/A N/A N/A N/A 1,1 100.00% Table S2 Sensitivity of miR-abela in predicting pre-miRNA across species. Class Sensitivity miR-abela Chicken Human Mouse Zebra fish Others Average let-7 2,2 0,2 1,2 2,2 26,39 65.96% miR1 2,2 2,2 1,2 2,2 10,12 85.00% miR7 2,2 2,2 2,2 2,2 14,16 91.67% miR9 2,2 2,2 2,2 2,2 16,19 88.89% miR10 2,2 2,2 1,2 2,2 11,14 81.82% miR15 1,2 1,2 0,2 2,2 3,9 41.18% miR16 2,2 2,2 2,2 1,2 7,8 87.50% miR17 0,1 0,1 0,1 0,2 1,5 10.00% miR18 2,2 0,1 1,1 2,2 3,7 61.54% miR19 2,2 2,2 2,2 2,2 8,13 76.19% miR20 1,2 1,2 1,2 1,2 3,6 50.00% miR21 1,1 1,1 1,1 2,2 3,4 88.89% miR22 1,1 1,1 1,1 2,2 3,5 80.00% miR23 1,1 2,2 2,2 2,2 5,9 75.00% miR24 0,1 1,2 1,2 2,2 1,6 38.46% miR26 1,1 2,2 2,2 2,2 3,6 100.00% miR27 0,1 1,2 1,2 1,2 4,9 43.75% miR29 2,2 2,2 2,2 2,2 13,17 84.00% miR30 1,2 2,2 2,2 2,2 11,18 69.23% miR31 1,1 1,1 1,1 0,2 8,11 68.75% miR32 1,1 1,1 1,1 N/A 2,3 83.33% miR33 2,2 1,1 1,1 N/A 8,9 92.31% miR34 1,2 1,2 1,2 1,1 14,14 85.71% miR92 1,1 2,2 2,2 2,2 17,21 92.86% miR99 1,1 1,2 1,2 1,2 0,5 33.33% miR100 0,1 0,1 0,1 2,2 5,8 53.85% miR101 2,2 2,2 2,2 2,2 5,8 81.25% miR103 2,2 1,2 2,2 1,1 1,7 50.00% miR106 0,1 1,2 0,2 N/A 4,6 45.45% miR107 0,1 0,1 1,1 1,1 0,4 25.00% miR122 2,2 1,1 1,1 1,1 2,3 87.50% miR124 2,2 2,2 1,2 1,2 11,15 73.91% miR125 1,1 2,2 1,2 2,2 6,16 52.17% miR126 1,1 1,1 1,1 1,1 2,3 85.71% miR128 2,2 2,2 2,2 2,2 3,6 78.57% miR130 1,2 2,2 1,2 2,2 5,7 73.33% miR133 2,2 2,2 2,2 2,2 9,14 77.27% miR135 2,2 2,2 2,2 0,2 9,10 83.33% miR137 1,1 1,1 1,1 2,2 5,6 90.91% miR138 0,2 1,2 0,2 0,1 1,4 18.18% miR140 1,1 1,1 1,1 1,1 3,3 100.00% miR142 1,1 1,1 0,1 1,1 3,4 75.00% miR144 1,1 1,1 1,1 1,1 2,3 85.71% miR146 2,2 1,2 2,2 2,2 4,5 84.62% miR147 0,2 1,2 0,1 N/A 2,3 37.50% miR148 0,1 1,2 2,2 0,1 1,4 40.00% miR153 1,1 2,2 1,1 2,2 5,6 91.67% miR155 0,1 0,1 1,1 1,1 1,2 50.00% miR181 1,2 1,2 1,2 1,2 5,18 34.62% miR183 0,1 0,1 0,1 0,1 3,5 33.33% miR184 1,1 1,1 1,1 1,1 7,9 84.62% miR187 1,1 0,1 0,1 0,1 0,3 14.29% miR190 1,1 1,1 1,1 1,1 6,7 90.91% miR193 1,2 1,1 0,1 2,2 4,6 66.67% miR194 1,1 1,2 1,2 2,2 4,5 75.00% miR196 2,2 2,2 2,2 2,2 6,9 100.00% miR199 2,2 2,2 1,2 2,2 3,9 58.82% miR200 2,2 2,2 2,2 2,2 7,9 88.24% miR202 1,1 1,1 1,1 1,1 3,3 100.00% miR203 1,1 1,1 1,1 1,1 3,3 100.00% miR204 2,2 1,1 1,1 1,2 4,5 81.82% miR205 2,2 1,1 1,1 1,1 3,4 88.89% miR206 1,1 1,1 1,1 1,1 2,3 85.71% miR211 1,1 1,1 1,1 N/A 2,2 100.00% miR214 1,1 1,1 1,1 1,1 4,4 100.00% miR215 1,1 1,1 1,1 N/A 3,3 100.00% miR216 0,2 1,1 0,1 0,2 5,8 42.86% miR217 1,1 1,1 1,1 1,1 3,4 87.50% miR218 2,2 2,2 2,2 2,2 5,8 81.25% miR219 1,1 1,1 1,1 1,1 5,10 64.29% miR221 1,1 1,1 1,1 1,1 3,4 87.50% miR222 1,1 1,1 1,1 1,1 3,3 100.00% miR223 1,1 1,1 1,1 1,1 4,4 100.00% miR301 1,1 0,1 0,1 0,1 0,5 11.11% miR302 1,2 2,2 1,2 N/A 8,8 85.71% miR365 1,2 1,2 2,2 1,2 3,5 61.54% miR367 0,1 1,1 1,1 N/A 0,2 40.00% miR375 1,1 0,1 0,1 1,1 3,5 55.56% miR383 0,1 0,1 0,1 N/A 0,3 0.00% miR429 1,1 0,1 1,1 1,1 4,5 77.78% miR449 1,1 1,1 1,1 N/A 2,4 71.43% miR451 1,1 1,1 1,1 1,1 2,2 100.00% miR454 1,1 N/A N/A 1,1 1,2 75.00% miR455 1,1 1,1 1,1 1,1 2,2 100.00% miR460 N/A N/A N/A N/A 1,1 100.00% miR466 0,1 N/A 1,2 N/A 4,4 71.43% miR489 1,1 1,1 1,1 1,1 2,2 100.00% miR490 N/A N/A N/A N/A 0,2 0.00% miR499 1,1 1,1 N/A N/A 2,3 80.00% miR551 1,1 1,1 1,1 N/A 3,3 100.00% miR1306 N/A N/A N/A N/A 0,1 0.00%
sec	Introduction Mature miRNAs are small 22 nucleotides long, non-coding RNAs. They are expressed in a wide variety of organisms including viruses, plants, and animals.1–3 They have significant role in posttranscriptional control of eukaryotes genome causing degradation of the mRNA transcripts or blocking translation.4–5 In most eukaryotic genomes, miRNA genes are transcribed by RNA-polymerase-II to primary miRNA (pri-miRNA).6 This pri-miRNA is then further processed by RNAse-III endonuclease Drosha into precursor miRNA (pre-miRNA), which is 60 to 100 nucleotides long, on average, and forms a stem-loop secondary structure.7 The pre-miRNA is then processed to form the small double stranded RNAs by the endonuclease Dicer, which also initiates the formation of the RNAinduced silencing complex (RISC).8 One strand of the double stranded RNA is incorporated with the RISC and targets the mRNA transcripts to block gene expression.9 miRNAs have been involved in some critical diseases including heart disease10 and cancer.11 In cancer, miRNAs can function as both oncogenes and suppressors.12 Several classes of miRNAs such as miR-15, let-7, miR-16, miR-342, miR-223, and miR-107 have been reported to be involved in acute promyelocytic leukemia (APL).13 The oncogene YES and STAT1, which are responsible in the proliferation of the colon cancer cells, are targeted by miR-145, thereby making this particular miRNA class a colon cancer suppressor.14 In lung cancer, miR-34 has been shown to have an important role on the PRIMA-1 regulation, which is a small molecule that restores the cancer cell suppression function.15 The identification of the involvement of miRNAs in several cancers has assisted researchers to develop some potential therapeutics for cancer cure.16 The significant role of miRNAs in human health has made the continuing discovery of novel miRNAs in the genome important. Laboratory experiments have been conducted to discover miRNAs by direct cloning and short RNA sequencing. However, it is difficult to identify miRNAs with low levels of expression using only laboratory techniques.17 Hence, computational predictions have been needed to support the identification of novel miRNAs. There are two major computational approaches for predicting miRNAs: comparative and noncomparative methods. The comparative method relies on the conservation of miRNAs across the genome of organisms. Some examples of computational method using the comparative method are miRScan,2,18 miRAlign,19 ERPIN,20 and microHarvester.21 For miRNAs that do not have conserve sequences, noncomparative methods are needed. Some examples of the noncomparative method are triplet-SVM,22 miPred,23 PromiR,24 and miR-abela.25 The rationale behind the noncomparative methods is to design computer program to learn and distinguish between real pre-miRNA and pseudo pre-miRNA. Hence most of the noncomparative methods are based on the utilization of Support Vector Learning Machine (SVM) and hidden Markov model (HMM). Many of the miRNA prediction algorithms were developed based on broad assumptions.26 A study by Bram and Aggrey27 showed marked variability in sensitivity and specificity in predicting chicken pre-miRNA across classes using ERPIN, PromiR, and miR-abela. In this study, we have developed a comparative method (miR-Explore) to demonstrate that grouping chicken miRNAs by classes increases sensitivity and specificity of the prediction method even when a simple direct positional secondary structure alignment is used. The basic idea of the current approach was to create a consensus structure of the pre-miRNA for each miRNA class and use this consensus structure to perform alignment with the query sequence.
title	Introduction
p	Mature miRNAs are small 22 nucleotides long, non-coding RNAs. They are expressed in a wide variety of organisms including viruses, plants, and animals.1–3 They have significant role in posttranscriptional control of eukaryotes genome causing degradation of the mRNA transcripts or blocking translation.4–5 In most eukaryotic genomes, miRNA genes are transcribed by RNA-polymerase-II to primary miRNA (pri-miRNA).6 This pri-miRNA is then further processed by RNAse-III endonuclease Drosha into precursor miRNA (pre-miRNA), which is 60 to 100 nucleotides long, on average, and forms a stem-loop secondary structure.7 The pre-miRNA is then processed to form the small double stranded RNAs by the endonuclease Dicer, which also initiates the formation of the RNAinduced silencing complex (RISC).8 One strand of the double stranded RNA is incorporated with the RISC and targets the mRNA transcripts to block gene expression.9
p	miRNAs have been involved in some critical diseases including heart disease10 and cancer.11 In cancer, miRNAs can function as both oncogenes and suppressors.12 Several classes of miRNAs such as miR-15, let-7, miR-16, miR-342, miR-223, and miR-107 have been reported to be involved in acute promyelocytic leukemia (APL).13 The oncogene YES and STAT1, which are responsible in the proliferation of the colon cancer cells, are targeted by miR-145, thereby making this particular miRNA class a colon cancer suppressor.14 In lung cancer, miR-34 has been shown to have an important role on the PRIMA-1 regulation, which is a small molecule that restores the cancer cell suppression function.15 The identification of the involvement of miRNAs in several cancers has assisted researchers to develop some potential therapeutics for cancer cure.16 The significant role of miRNAs in human health has made the continuing discovery of novel miRNAs in the genome important.
p	Laboratory experiments have been conducted to discover miRNAs by direct cloning and short RNA sequencing. However, it is difficult to identify miRNAs with low levels of expression using only laboratory techniques.17 Hence, computational predictions have been needed to support the identification of novel miRNAs. There are two major computational approaches for predicting miRNAs: comparative and noncomparative methods. The comparative method relies on the conservation of miRNAs across the genome of organisms. Some examples of computational method using the comparative method are miRScan,2,18 miRAlign,19 ERPIN,20 and microHarvester.21 For miRNAs that do not have conserve sequences, noncomparative methods are needed. Some examples of the noncomparative method are triplet-SVM,22 miPred,23 PromiR,24 and miR-abela.25 The rationale behind the noncomparative methods is to design computer program to learn and distinguish between real pre-miRNA and pseudo pre-miRNA. Hence most of the noncomparative methods are based on the utilization of Support Vector Learning Machine (SVM) and hidden Markov model (HMM). Many of the miRNA prediction algorithms were developed based on broad assumptions.26 A study by Bram and Aggrey27 showed marked variability in sensitivity and specificity in predicting chicken pre-miRNA across classes using ERPIN, PromiR, and miR-abela.
p	In this study, we have developed a comparative method (miR-Explore) to demonstrate that grouping chicken miRNAs by classes increases sensitivity and specificity of the prediction method even when a simple direct positional secondary structure alignment is used. The basic idea of the current approach was to create a consensus structure of the pre-miRNA for each miRNA class and use this consensus structure to perform alignment with the query sequence.
sec	Materials and Methods Training data A set of data was taken from the known pre-miRNA in chicken, human, mouse, zebra fish, fugu, worms, frog, chimpanzee, gorilla, platypus, pig, fruit fly, and buffalo, sequences which are available in the microRNA database miRBase.9 The data selected were based on the available miRNA classes in chicken. For example, if chicken has miR-1 class, then all available known miR-1 class pre-miRNA in the other organisms would be selected in this data set. The training data were taken from 80% of the sequences in the data set that best represents each miRNA class. In order to ascertain a good consensus structure, only classes that have 5 or more known pre-miRNAs were chosen to be part of the training data. Positive data and negative data The remaining 20% of the sequences that were not used as the training data were used as a query (positive data) for prediction. The negative data was taken from all coding sequences in human and mouse sequences from the University of California Santa Cruz (UCSC) genome browser.28 These coding sequences were scanned for hairpin-like structures as a final negative data set. There were 33,932 hairpin-like structures in mouse and 36,662 hairpin-like structures in human. The hairpin-like structures scanning was done by using the program written by Sewer et al for miR-abela.25 Negative data for comparison with PromiR were 5000 random sequences of length 300 nucleotides from coding genes, tRNA, and rRNA of chicken taken from the UCSC genome browser. Programming and testing The training data set was aligned using multiple alignment of RNAs (MARNA)29 to obtain the consensus structure with 80% primary and secondary structure identity. The identity percentage was defined as the number of the nucleotides that are conserved among all of the training sequences in a particular column of the secondary structure. These consensus structures along with their primary sequence information were then used for alignment with the query sequence. The query sequence was limited to a minimum length of 300 nucleotides. The minimum length of 300 nucleotides was to allow the program to initiate the alignment process. The average length of the consensus sequence was 150 nucleotides as miR-Explore cannot take a query sequence that is shorter than the length of the consensus structure. The alignment was done based on the information of the position of the secondary and primary structure. An alignment data from MARNA provided the consensus secondary structure on the first row along with the primary structure information in the rest of the rows (Fig. 1). A sliding window was created with a length similar to the number of characters including gaps and unpaired nucleotides in the output of the MARNA alignment. The sliding window was used to scan the query sequence starting from position 1. The consensus structures were made up of gaps and hairpin structures, which consist of stem and loop. In Figure 1, the stem starts in positions 1 and 75. The query sequence is a primary sequence without any gaps. Hence the alignment inserts gaps to the query sequence and shifts the gaps to align with the consensus structure and its corresponding nucleotide pairs in the stem. If an alignment matched the secondary structure and the corresponding stem nucleotides in the exact position shown in the MARNA alignment, then it was counted as one, otherwise it was zero. A match was defined as the same nucleotides in a particular column of a consensus secondary structure of the query sequence and the training sequence. The scoring method was a simple 0 and 1 to represent match or mismatch. The maximum score that an alignment can receive is the same as the number of pairs in the consensus secondary structure. Because each miRNA class has its own number of pairs in the secondary structure, the score of each miRNA class was different. This required normalization of the score to enable each miRNA class to attain a standardized score. The standardized score was calculated as standardized score = (NMA/NMC) × 100%, where NMA = number of matched secondary structure and nucleotide stems in the alignment and NMC = number of nucleotide pair in the stem of the consensus structure. The same scoring system was used for both negative and positive data. Since we used 80% identity in the construction of the consensus structure, and this approach is a direct position to position exact comparison, the program was considered a hit when the standardized score from a query sequence was at least 80%. We compared the current approach (miR-Explore) with miR-abela25 and PromiR.24 miRabela was tested using the same positive and negative data, while PromiR was tested using the negative data from 5000 randomly selected sequences of length 300 nucleotides from chicken coding genes, tRNA, and rRNA. The positive data for PromiR test were taken from the same data set used to test the other two programs, except that only chicken data were used. We tested PromiR with only chicken positive data because this program requires sequences to be inputted individually, which is laborious and time consuming. Therefore, we tested PromiR against miR-Explore and miR-abela using only chicken data, and miR-Explore was tested against miR-abela using data set across species. The sensitivity and specificity were calculated. Sensitivity = TP/(TP +FN) where TP (true positive) is the number of pre-miRNA predicted as pre-miRNA, and FN (false negative) is the number of pre-miRNA predicted as non–pre-miRNA. Specificity = TN/(TN +FP) where TN (true negative) is the number of non–pre-miRNA predicted as non– pre-miRNA, and FP (false positive) is the number of non–pre-miRNA predicted as pre-miRNA. For accuracy measurement, we used Mathews Correlation Coefficient (MCC).30 MCC is a measure of the quality of binary classification. It takes into account true and false positives and negatives. The value of MCC ranges between −1 and +1 where +1 represents a perfect prediction, 0 represents a random prediction, and −1 represents completely inverted predictions. The formula to calculate the MCC is as follows, where TP is true positive, TN is true negative, FP is false positive and FN is false negative. MCC = TP × TN - FP × FN ( TP + FP ) ( TP + FN ) ( TN + FP ) ( TN + FN )
title	Materials and Methods
sec	Training data A set of data was taken from the known pre-miRNA in chicken, human, mouse, zebra fish, fugu, worms, frog, chimpanzee, gorilla, platypus, pig, fruit fly, and buffalo, sequences which are available in the microRNA database miRBase.9 The data selected were based on the available miRNA classes in chicken. For example, if chicken has miR-1 class, then all available known miR-1 class pre-miRNA in the other organisms would be selected in this data set. The training data were taken from 80% of the sequences in the data set that best represents each miRNA class. In order to ascertain a good consensus structure, only classes that have 5 or more known pre-miRNAs were chosen to be part of the training data.
title	Training data
p	A set of data was taken from the known pre-miRNA in chicken, human, mouse, zebra fish, fugu, worms, frog, chimpanzee, gorilla, platypus, pig, fruit fly, and buffalo, sequences which are available in the microRNA database miRBase.9 The data selected were based on the available miRNA classes in chicken. For example, if chicken has miR-1 class, then all available known miR-1 class pre-miRNA in the other organisms would be selected in this data set. The training data were taken from 80% of the sequences in the data set that best represents each miRNA class. In order to ascertain a good consensus structure, only classes that have 5 or more known pre-miRNAs were chosen to be part of the training data.
sec	Positive data and negative data The remaining 20% of the sequences that were not used as the training data were used as a query (positive data) for prediction. The negative data was taken from all coding sequences in human and mouse sequences from the University of California Santa Cruz (UCSC) genome browser.28 These coding sequences were scanned for hairpin-like structures as a final negative data set. There were 33,932 hairpin-like structures in mouse and 36,662 hairpin-like structures in human. The hairpin-like structures scanning was done by using the program written by Sewer et al for miR-abela.25 Negative data for comparison with PromiR were 5000 random sequences of length 300 nucleotides from coding genes, tRNA, and rRNA of chicken taken from the UCSC genome browser.
title	Positive data and negative data
p	The remaining 20% of the sequences that were not used as the training data were used as a query (positive data) for prediction. The negative data was taken from all coding sequences in human and mouse sequences from the University of California Santa Cruz (UCSC) genome browser.28 These coding sequences were scanned for hairpin-like structures as a final negative data set. There were 33,932 hairpin-like structures in mouse and 36,662 hairpin-like structures in human. The hairpin-like structures scanning was done by using the program written by Sewer et al for miR-abela.25 Negative data for comparison with PromiR were 5000 random sequences of length 300 nucleotides from coding genes, tRNA, and rRNA of chicken taken from the UCSC genome browser.
sec	Programming and testing The training data set was aligned using multiple alignment of RNAs (MARNA)29 to obtain the consensus structure with 80% primary and secondary structure identity. The identity percentage was defined as the number of the nucleotides that are conserved among all of the training sequences in a particular column of the secondary structure. These consensus structures along with their primary sequence information were then used for alignment with the query sequence. The query sequence was limited to a minimum length of 300 nucleotides. The minimum length of 300 nucleotides was to allow the program to initiate the alignment process. The average length of the consensus sequence was 150 nucleotides as miR-Explore cannot take a query sequence that is shorter than the length of the consensus structure. The alignment was done based on the information of the position of the secondary and primary structure. An alignment data from MARNA provided the consensus secondary structure on the first row along with the primary structure information in the rest of the rows (Fig. 1). A sliding window was created with a length similar to the number of characters including gaps and unpaired nucleotides in the output of the MARNA alignment. The sliding window was used to scan the query sequence starting from position 1. The consensus structures were made up of gaps and hairpin structures, which consist of stem and loop. In Figure 1, the stem starts in positions 1 and 75. The query sequence is a primary sequence without any gaps. Hence the alignment inserts gaps to the query sequence and shifts the gaps to align with the consensus structure and its corresponding nucleotide pairs in the stem. If an alignment matched the secondary structure and the corresponding stem nucleotides in the exact position shown in the MARNA alignment, then it was counted as one, otherwise it was zero. A match was defined as the same nucleotides in a particular column of a consensus secondary structure of the query sequence and the training sequence. The scoring method was a simple 0 and 1 to represent match or mismatch. The maximum score that an alignment can receive is the same as the number of pairs in the consensus secondary structure. Because each miRNA class has its own number of pairs in the secondary structure, the score of each miRNA class was different. This required normalization of the score to enable each miRNA class to attain a standardized score. The standardized score was calculated as standardized score = (NMA/NMC) × 100%, where NMA = number of matched secondary structure and nucleotide stems in the alignment and NMC = number of nucleotide pair in the stem of the consensus structure. The same scoring system was used for both negative and positive data. Since we used 80% identity in the construction of the consensus structure, and this approach is a direct position to position exact comparison, the program was considered a hit when the standardized score from a query sequence was at least 80%. We compared the current approach (miR-Explore) with miR-abela25 and PromiR.24 miRabela was tested using the same positive and negative data, while PromiR was tested using the negative data from 5000 randomly selected sequences of length 300 nucleotides from chicken coding genes, tRNA, and rRNA. The positive data for PromiR test were taken from the same data set used to test the other two programs, except that only chicken data were used. We tested PromiR with only chicken positive data because this program requires sequences to be inputted individually, which is laborious and time consuming. Therefore, we tested PromiR against miR-Explore and miR-abela using only chicken data, and miR-Explore was tested against miR-abela using data set across species. The sensitivity and specificity were calculated. Sensitivity = TP/(TP +FN) where TP (true positive) is the number of pre-miRNA predicted as pre-miRNA, and FN (false negative) is the number of pre-miRNA predicted as non–pre-miRNA. Specificity = TN/(TN +FP) where TN (true negative) is the number of non–pre-miRNA predicted as non– pre-miRNA, and FP (false positive) is the number of non–pre-miRNA predicted as pre-miRNA. For accuracy measurement, we used Mathews Correlation Coefficient (MCC).30 MCC is a measure of the quality of binary classification. It takes into account true and false positives and negatives. The value of MCC ranges between −1 and +1 where +1 represents a perfect prediction, 0 represents a random prediction, and −1 represents completely inverted predictions. The formula to calculate the MCC is as follows, where TP is true positive, TN is true negative, FP is false positive and FN is false negative. MCC = TP × TN - FP × FN ( TP + FP ) ( TP + FN ) ( TN + FP ) ( TN + FN )
title	Programming and testing
p	The training data set was aligned using multiple alignment of RNAs (MARNA)29 to obtain the consensus structure with 80% primary and secondary structure identity. The identity percentage was defined as the number of the nucleotides that are conserved among all of the training sequences in a particular column of the secondary structure. These consensus structures along with their primary sequence information were then used for alignment with the query sequence. The query sequence was limited to a minimum length of 300 nucleotides. The minimum length of 300 nucleotides was to allow the program to initiate the alignment process. The average length of the consensus sequence was 150 nucleotides as miR-Explore cannot take a query sequence that is shorter than the length of the consensus structure. The alignment was done based on the information of the position of the secondary and primary structure. An alignment data from MARNA provided the consensus secondary structure on the first row along with the primary structure information in the rest of the rows (Fig. 1). A sliding window was created with a length similar to the number of characters including gaps and unpaired nucleotides in the output of the MARNA alignment. The sliding window was used to scan the query sequence starting from position 1. The consensus structures were made up of gaps and hairpin structures, which consist of stem and loop. In Figure 1, the stem starts in positions 1 and 75. The query sequence is a primary sequence without any gaps. Hence the alignment inserts gaps to the query sequence and shifts the gaps to align with the consensus structure and its corresponding nucleotide pairs in the stem. If an alignment matched the secondary structure and the corresponding stem nucleotides in the exact position shown in the MARNA alignment, then it was counted as one, otherwise it was zero. A match was defined as the same nucleotides in a particular column of a consensus secondary structure of the query sequence and the training sequence. The scoring method was a simple 0 and 1 to represent match or mismatch. The maximum score that an alignment can receive is the same as the number of pairs in the consensus secondary structure. Because each miRNA class has its own number of pairs in the secondary structure, the score of each miRNA class was different. This required normalization of the score to enable each miRNA class to attain a standardized score. The standardized score was calculated as standardized score = (NMA/NMC) × 100%, where NMA = number of matched secondary structure and nucleotide stems in the alignment and NMC = number of nucleotide pair in the stem of the consensus structure.
p	The same scoring system was used for both negative and positive data. Since we used 80% identity in the construction of the consensus structure, and this approach is a direct position to position exact comparison, the program was considered a hit when the standardized score from a query sequence was at least 80%. We compared the current approach (miR-Explore) with miR-abela25 and PromiR.24 miRabela was tested using the same positive and negative data, while PromiR was tested using the negative data from 5000 randomly selected sequences of length 300 nucleotides from chicken coding genes, tRNA, and rRNA. The positive data for PromiR test were taken from the same data set used to test the other two programs, except that only chicken data were used. We tested PromiR with only chicken positive data because this program requires sequences to be inputted individually, which is laborious and time consuming. Therefore, we tested PromiR against miR-Explore and miR-abela using only chicken data, and miR-Explore was tested against miR-abela using data set across species.
p	The sensitivity and specificity were calculated.
p	Sensitivity = TP/(TP +FN) where TP (true positive) is the number of pre-miRNA predicted as pre-miRNA, and FN (false negative) is the number of pre-miRNA predicted as non–pre-miRNA.
p	Specificity = TN/(TN +FP) where TN (true negative) is the number of non–pre-miRNA predicted as non– pre-miRNA, and FP (false positive) is the number of non–pre-miRNA predicted as pre-miRNA.
p	For accuracy measurement, we used Mathews Correlation Coefficient (MCC).30 MCC is a measure of the quality of binary classification. It takes into account true and false positives and negatives. The value of MCC ranges between −1 and +1 where +1 represents a perfect prediction, 0 represents a random prediction, and −1 represents completely inverted predictions. The formula to calculate the MCC is as follows, where TP is true positive, TN is true negative, FP is false positive and FN is false negative.
sec	Results and Discussion The sensitivity and specificity for miR-Explore, miR-abela, and PromiR using only chicken data are shown in Table 1. The specificity for miR-Explore and miR-abela was 99.99%, and that of PromiR was 99.00%. The specificity of these programs was high using chicken data. However, there was marked variability in their sensitivity. miR-Explore had 88% sensitivity, whereas miR-abela and PromiR had 78% and 53% sensitivity, respectively. The differences in sensitivity could be due to that fact that both miR-abela and PromiR were developed using mostly human and other mammalian training data whereas miR-Explore included chicken data as part of the training data. The training data in the secondary structure profiling can affect the sensitivity of a program that is dependent on the program. In principle, PromiR, a probabilistic colearning method based on a hidden Markov model,24 was developed to identify both close and distant homologs. Our results demonstrate that either the chicken is too distant from humans or some of generalized assumptions under which PromiR was develop do not hold. PromiR scans the stem of the stem-loop candidates to determine the signal of the Drosha cleavage site. However, multiple factors govern the Drosha cleavage site.31 Therefore, it is possible that the PromiR did not capture most of the factors affecting Drosha cleavage in chicken data thereby limiting its sensitivity. Whereas, miR-abela was developed based on Support Vector Machine (SVM), miR-Explore utilized both real pre-miRNA and pseudo pre-miRNA as the training data and used general features of pre-miRNA to train the computer to learn and distinguish between the two. In general, there are some global features of pre-miRNAs that can be used as data for pre-miRNA prediction programs. These features include but are not limited to the nucleotide length, number of bulges, and minimum free energy. In the most recent miRNA prediction programs, these features have been generalized.32 The advantage of using these generalized miRNA features is that they are able to predict novel miRNAs that belong to previously unknown classes. Despite of the advantage of using the generalized miRNA features, the results shown in Table 1 suggest that using global features without adequate training data that can represent all organisms will have reduced sensitivity in identifying new miRNA for species that are not well represented in the training data. The test results of comparing miR-Explore and miR-abela using an expanded set of positive and negative data are also shown in Table 1. The detailed predictive results of different organisms for miR-Explore and miR-abela are provided in Supplementary tables 1 and 2, respectively. The central hypothesis for this study was that each miRNA class has its own unique set of features; therefore, grouping miRNA by classes before using them as training data will improve sensitivity and specificity. Both miR-Explore and miR-abela are highly efficient in detecting true negative miRNAs. The sensitivity of miR-Explore was higher than miR-abela for every species compared including humans. The average sensitivity was 88.62% for miR-Explore and 70.82% for miR-abela. The calculation of MCC for miR-Explore yield a coefficient of 0.90 whereas miR-abela has a coefficient of 0.75. Compared with global alignment, grouping miRNA by classes yielded a better sensitivity with very high specificity for pre-miRNA prediction even when a simple positional based secondary and primary structure alignment were used. The grouping technique also yielded a higher MCC coefficient compared with the program using generalized features of pre-miRNA. It can be argued that as much as each class has its own unique features, there are other features that are conserved across classes and species. The ability to predict miRNA with increased sensitivity depends on the amount of conserved elements captured in the secondary structure. For example, the training data for miR-21 were from humans, chickens, and mice, yet miR-Explore had 100% sensitivity in predicting miR-21 in other species that were not used in the training data. The features of miR21 could have been conserved well across species. On the other hand, miR125 could be least conserved even within the class and across species, and any predictive algorithm based on global alignment would yield relatively poor sensitivity. There are some limitations to miR-Explore. First, this approach is dependent on previously known miRNA precursor class and can only predict novel miRNA with that particular class. Second, miR-Explore relies on the conservation of miRNAs within a class and cross species and the availability of known miRNA data. When there are inadequate known miRNA data that can be used to build the consensus structure, the sensitivity may decline. Third, the alignment method of this approach is very simple. We used the positional information of the secondary structure along with the corresponding primary structure nucleotide, yet we were able to achieve a high sensitivity and specificity. Improving the alignment algorithm could improve the sensitivity and specificity.
title	Results and Discussion
p	The sensitivity and specificity for miR-Explore, miR-abela, and PromiR using only chicken data are shown in Table 1. The specificity for miR-Explore and miR-abela was 99.99%, and that of PromiR was 99.00%. The specificity of these programs was high using chicken data. However, there was marked variability in their sensitivity. miR-Explore had 88% sensitivity, whereas miR-abela and PromiR had 78% and 53% sensitivity, respectively. The differences in sensitivity could be due to that fact that both miR-abela and PromiR were developed using mostly human and other mammalian training data whereas miR-Explore included chicken data as part of the training data. The training data in the secondary structure profiling can affect the sensitivity of a program that is dependent on the program. In principle, PromiR, a probabilistic colearning method based on a hidden Markov model,24 was developed to identify both close and distant homologs. Our results demonstrate that either the chicken is too distant from humans or some of generalized assumptions under which PromiR was develop do not hold. PromiR scans the stem of the stem-loop candidates to determine the signal of the Drosha cleavage site. However, multiple factors govern the Drosha cleavage site.31 Therefore, it is possible that the PromiR did not capture most of the factors affecting Drosha cleavage in chicken data thereby limiting its sensitivity. Whereas, miR-abela was developed based on Support Vector Machine (SVM), miR-Explore utilized both real pre-miRNA and pseudo pre-miRNA as the training data and used general features of pre-miRNA to train the computer to learn and distinguish between the two.
p	In general, there are some global features of pre-miRNAs that can be used as data for pre-miRNA prediction programs. These features include but are not limited to the nucleotide length, number of bulges, and minimum free energy. In the most recent miRNA prediction programs, these features have been generalized.32 The advantage of using these generalized miRNA features is that they are able to predict novel miRNAs that belong to previously unknown classes. Despite of the advantage of using the generalized miRNA features, the results shown in Table 1 suggest that using global features without adequate training data that can represent all organisms will have reduced sensitivity in identifying new miRNA for species that are not well represented in the training data.
p	The test results of comparing miR-Explore and miR-abela using an expanded set of positive and negative data are also shown in Table 1. The detailed predictive results of different organisms for miR-Explore and miR-abela are provided in Supplementary tables 1 and 2, respectively. The central hypothesis for this study was that each miRNA class has its own unique set of features; therefore, grouping miRNA by classes before using them as training data will improve sensitivity and specificity. Both miR-Explore and miR-abela are highly efficient in detecting true negative miRNAs. The sensitivity of miR-Explore was higher than miR-abela for every species compared including humans. The average sensitivity was 88.62% for miR-Explore and 70.82% for miR-abela. The calculation of MCC for miR-Explore yield a coefficient of 0.90 whereas miR-abela has a coefficient of 0.75. Compared with global alignment, grouping miRNA by classes yielded a better sensitivity with very high specificity for pre-miRNA prediction even when a simple positional based secondary and primary structure alignment were used. The grouping technique also yielded a higher MCC coefficient compared with the program using generalized features of pre-miRNA. It can be argued that as much as each class has its own unique features, there are other features that are conserved across classes and species. The ability to predict miRNA with increased sensitivity depends on the amount of conserved elements captured in the secondary structure. For example, the training data for miR-21 were from humans, chickens, and mice, yet miR-Explore had 100% sensitivity in predicting miR-21 in other species that were not used in the training data. The features of miR21 could have been conserved well across species. On the other hand, miR125 could be least conserved even within the class and across species, and any predictive algorithm based on global alignment would yield relatively poor sensitivity.
p	There are some limitations to miR-Explore. First, this approach is dependent on previously known miRNA precursor class and can only predict novel miRNA with that particular class. Second, miR-Explore relies on the conservation of miRNAs within a class and cross species and the availability of known miRNA data. When there are inadequate known miRNA data that can be used to build the consensus structure, the sensitivity may decline. Third, the alignment method of this approach is very simple. We used the positional information of the secondary structure along with the corresponding primary structure nucleotide, yet we were able to achieve a high sensitivity and specificity. Improving the alignment algorithm could improve the sensitivity and specificity.
sec	Conclusion It is difficult to develop a perfect miRNA prediction program; therefore, to detect miRNA computationally, more than one program may be needed.23 In this study, we have shown that grouping miRNA by classes before using them as training data will improve sensitivity and specificity. However, this approach can only predict pre-miRNA of known classes. Even though the sensitivity of PromiR and miR-abela may not be as good as that of miR-Explore, as ab initio methods, they have the potential to predict a novel miRNA in unknown classes. Each program has its own strengths and limitations that can complement each other. miR-Explore is a new technique that can contribute to the future discovery of novel miRNA.
title	Conclusion
p	It is difficult to develop a perfect miRNA prediction program; therefore, to detect miRNA computationally, more than one program may be needed.23 In this study, we have shown that grouping miRNA by classes before using them as training data will improve sensitivity and specificity. However, this approach can only predict pre-miRNA of known classes. Even though the sensitivity of PromiR and miR-abela may not be as good as that of miR-Explore, as ab initio methods, they have the potential to predict a novel miRNA in unknown classes. Each program has its own strengths and limitations that can complement each other. miR-Explore is a new technique that can contribute to the future discovery of novel miRNA.
sec	Supplementary Tables Table S1 Sensitivity and specificity of miR-Explore in predicting pre-miRNA across species. Class Specificity Sensitivity Chicken Human Mouse Zebra fish Others Average let-7 99.75% 2,2 2,2 2,2 2,2 38,39 97.87% miR1 100.00% 2,2 2,2 2,2 2,2 12,12 100.00% miR7 100.00% 1,2 1,2 2,2 1,2 13,16 75.00% miR9 100.00% 2,2 2,2 2,2 2,2 14,19 81.48% miR10 100.00% 2,2 2,2 2,2 2,2 10,14 81.82% miR15 100.00% 2,2 2,2 2,2 2,2 9,10 94.44% miR16 100.00% 2,2 2,2 2,2 2,2 8,8 100.00% miR17 100.00% 1,1 1,1 1,1 2,2 5,5 100.00% miR18 100.00% 1,1 1,1 1,1 2,2 7,7 100.00% miR19 99.96% 2,2 2,2 2,2 2,2 10,13 85.71% miR20 100.00% 2,2 2,2 2,2 2,2 6,6 100.00% miR21 100.00% 1,1 1,1 1,1 2,2 4,4 100.00% miR22 100.00% 1,1 1,1 1,1 2,2 4,5 90.00% miR23 100.00% 1,1 2,2 2,2 2,2 9,9 100.00% miR24 100.00% 1,1 2,2 1,2 2,2 6,6 92.31% miR26 100.00% 1,1 2,2 2,2 2,2 6,6 100.00% miR27 100.00% 1,1 2,2 2,2 2,2 9,9 100.00% miR29 99.98% 2,2 2,2 2,2 2,2 14,17 88.00% miR30 99.99% 1,2 2,2 1,2 2,2 16,18 84.62% miR31 99.99% 1,1 0,1 1,1 2,2 11,11 93.75% miR32 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR33 100.00% 2,2 1,1 1,1 N/A 9,9 100.00% miR34 100.00% 1,2 2,2 2,2 0,1 14,14 90.48% miR92 99.81% 0,1 0,2 0,2 2,2 17,21 67.86% miR99 100.00% 1,1 2,2 2,2 2,2 4,5 91.67% miR100 100.00% 1,1 1,1 1,1 2,2 8,8 100.00% miR101 100.00% 2,2 2,2 2,2 2,2 7,8 93.75% miR103 100.00% 2,2 2,2 2,2 1,1 7,7 100.00% miR106 100.00% 1,1 2,2 2,2 N/A 6,6 100.00% miR107 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR122 100.00% 2,2 1,1 1,1 1,1 3,3 100.00% miR124 99.99% 2,2 2,2 2,2 2,2 15,15 100.00% miR125 99.97% 0,1 0,2 0,2 0,2 0,16 0.00% miR126 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR128 100.00% 2,2 2,2 2,2 2,2 6,6 100.00% miR130 100.00% 2,2 2,2 2,2 2,2 7,7 100.00% miR133 100.00% 2,2 2,2 2,2 2,2 11,14 86.36% miR135 100.00% 2,2 2,2 2,2 2,2 9,10 94.44% miR137 100.00% 1,1 1,1 1,1 2,2 5,6 90.91% miR138 100.00% 1,2 2,2 1,2 1,1 1,4 54.55% miR140 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR142 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR144 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR146 100.00% 2,2 2,2 2,2 2,2 5,5 100.00% miR147 100.00% 2,2 2,2 1,1 N/A 3,3 100.00% miR148 100.00% 1,1 2,2 2,2 1,1 4,4 100.00% miR153 100.00% 1,1 2,2 1,1 2,2 6,6 100.00% miR155 100.00% 0,1 0,1 0,1 0,1 0,2 0.00% miR181 100.00% 2,2 2,2 2,2 2,2 15,18 88.46% miR183 100.00% 1,1 1,1 1,1 1,1 2,5 66.67% miR184 100.00% 1,1 1,1 1,1 1,1 9,9 100.00% miR187 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR190 99.99% 1,1 1,1 1,1 1,1 7,8 91.67% miR193 100.00% 2,2 1,1 1,1 2,2 3,6 75.00% miR194 100.00% 1,1 2,2 2,2 2,2 5,5 100.00% miR196 100.00% 2,2 2,2 2,2 2,2 9,9 100.00% miR199 100.00% 2,2 2,2 2,2 2,2 8,9 94.12% miR200 100.00% 2,2 2,2 2,2 2,2 9,9 100.00% miR202 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR203 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR204 100.00% 2,2 1,1 1,1 2,2 5,5 100.00% miR205 100.00% 2,2 1,1 1,1 1,1 4,4 100.00% miR206 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR211 100.00% 1,1 1,1 1,1 N/A 2,2 100.00% miR214 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR215 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR216 99.99% 2,2 1,1 1,1 2,2 4,8 85.71% miR217 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR218 100.00% 2,2 2,2 2,2 2,2 8,8 100.00% miR219 100.00% 1,1 1,1 1,1 1,1 8,10 85.71% miR221 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR222 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR223 100.00% 1,1 1,1 1,1 1,1 3,4 87.50% miR301 100.00% 1,1 1,1 1,1 1,1 5,5 100.00% miR302 100.00% 0,2 0,2 0,2 N/A 0,8 0.00% miR365 100.00% 2,2 2,2 2,2 2,2 3,5 84.62% miR367 100.00% 1,1 1,1 1,1 N/A 2,2 100.00% miR375 100.00% 1,1 1,1 1,1 1,1 5,5 100.00% miR383 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR429 100.00% 1,1 1,1 1,1 1,1 3,5 77.78% miR449 100.00% 0,1 0,1 0,1 N/A 0,4 0.00% miR451 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR454 100.00% 1,1 N/A N/A 1,1 2,2 100.00% miR455 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR460 100.00% N/A N/A N/A N/A 1,1 100.00% miR466 100.00% 0,1 N/A 0,2 N/A 1,4 14.29% miR489 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR490 100.00% N/A N/A N/A N/A 2,2 100.00% miR499 100.00% 1,1 1,1 N/A N/A 3,3 100.00% miR551 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR1306 100.00% N/A N/A N/A N/A 1,1 100.00% Table S2 Sensitivity of miR-abela in predicting pre-miRNA across species. Class Sensitivity miR-abela Chicken Human Mouse Zebra fish Others Average let-7 2,2 0,2 1,2 2,2 26,39 65.96% miR1 2,2 2,2 1,2 2,2 10,12 85.00% miR7 2,2 2,2 2,2 2,2 14,16 91.67% miR9 2,2 2,2 2,2 2,2 16,19 88.89% miR10 2,2 2,2 1,2 2,2 11,14 81.82% miR15 1,2 1,2 0,2 2,2 3,9 41.18% miR16 2,2 2,2 2,2 1,2 7,8 87.50% miR17 0,1 0,1 0,1 0,2 1,5 10.00% miR18 2,2 0,1 1,1 2,2 3,7 61.54% miR19 2,2 2,2 2,2 2,2 8,13 76.19% miR20 1,2 1,2 1,2 1,2 3,6 50.00% miR21 1,1 1,1 1,1 2,2 3,4 88.89% miR22 1,1 1,1 1,1 2,2 3,5 80.00% miR23 1,1 2,2 2,2 2,2 5,9 75.00% miR24 0,1 1,2 1,2 2,2 1,6 38.46% miR26 1,1 2,2 2,2 2,2 3,6 100.00% miR27 0,1 1,2 1,2 1,2 4,9 43.75% miR29 2,2 2,2 2,2 2,2 13,17 84.00% miR30 1,2 2,2 2,2 2,2 11,18 69.23% miR31 1,1 1,1 1,1 0,2 8,11 68.75% miR32 1,1 1,1 1,1 N/A 2,3 83.33% miR33 2,2 1,1 1,1 N/A 8,9 92.31% miR34 1,2 1,2 1,2 1,1 14,14 85.71% miR92 1,1 2,2 2,2 2,2 17,21 92.86% miR99 1,1 1,2 1,2 1,2 0,5 33.33% miR100 0,1 0,1 0,1 2,2 5,8 53.85% miR101 2,2 2,2 2,2 2,2 5,8 81.25% miR103 2,2 1,2 2,2 1,1 1,7 50.00% miR106 0,1 1,2 0,2 N/A 4,6 45.45% miR107 0,1 0,1 1,1 1,1 0,4 25.00% miR122 2,2 1,1 1,1 1,1 2,3 87.50% miR124 2,2 2,2 1,2 1,2 11,15 73.91% miR125 1,1 2,2 1,2 2,2 6,16 52.17% miR126 1,1 1,1 1,1 1,1 2,3 85.71% miR128 2,2 2,2 2,2 2,2 3,6 78.57% miR130 1,2 2,2 1,2 2,2 5,7 73.33% miR133 2,2 2,2 2,2 2,2 9,14 77.27% miR135 2,2 2,2 2,2 0,2 9,10 83.33% miR137 1,1 1,1 1,1 2,2 5,6 90.91% miR138 0,2 1,2 0,2 0,1 1,4 18.18% miR140 1,1 1,1 1,1 1,1 3,3 100.00% miR142 1,1 1,1 0,1 1,1 3,4 75.00% miR144 1,1 1,1 1,1 1,1 2,3 85.71% miR146 2,2 1,2 2,2 2,2 4,5 84.62% miR147 0,2 1,2 0,1 N/A 2,3 37.50% miR148 0,1 1,2 2,2 0,1 1,4 40.00% miR153 1,1 2,2 1,1 2,2 5,6 91.67% miR155 0,1 0,1 1,1 1,1 1,2 50.00% miR181 1,2 1,2 1,2 1,2 5,18 34.62% miR183 0,1 0,1 0,1 0,1 3,5 33.33% miR184 1,1 1,1 1,1 1,1 7,9 84.62% miR187 1,1 0,1 0,1 0,1 0,3 14.29% miR190 1,1 1,1 1,1 1,1 6,7 90.91% miR193 1,2 1,1 0,1 2,2 4,6 66.67% miR194 1,1 1,2 1,2 2,2 4,5 75.00% miR196 2,2 2,2 2,2 2,2 6,9 100.00% miR199 2,2 2,2 1,2 2,2 3,9 58.82% miR200 2,2 2,2 2,2 2,2 7,9 88.24% miR202 1,1 1,1 1,1 1,1 3,3 100.00% miR203 1,1 1,1 1,1 1,1 3,3 100.00% miR204 2,2 1,1 1,1 1,2 4,5 81.82% miR205 2,2 1,1 1,1 1,1 3,4 88.89% miR206 1,1 1,1 1,1 1,1 2,3 85.71% miR211 1,1 1,1 1,1 N/A 2,2 100.00% miR214 1,1 1,1 1,1 1,1 4,4 100.00% miR215 1,1 1,1 1,1 N/A 3,3 100.00% miR216 0,2 1,1 0,1 0,2 5,8 42.86% miR217 1,1 1,1 1,1 1,1 3,4 87.50% miR218 2,2 2,2 2,2 2,2 5,8 81.25% miR219 1,1 1,1 1,1 1,1 5,10 64.29% miR221 1,1 1,1 1,1 1,1 3,4 87.50% miR222 1,1 1,1 1,1 1,1 3,3 100.00% miR223 1,1 1,1 1,1 1,1 4,4 100.00% miR301 1,1 0,1 0,1 0,1 0,5 11.11% miR302 1,2 2,2 1,2 N/A 8,8 85.71% miR365 1,2 1,2 2,2 1,2 3,5 61.54% miR367 0,1 1,1 1,1 N/A 0,2 40.00% miR375 1,1 0,1 0,1 1,1 3,5 55.56% miR383 0,1 0,1 0,1 N/A 0,3 0.00% miR429 1,1 0,1 1,1 1,1 4,5 77.78% miR449 1,1 1,1 1,1 N/A 2,4 71.43% miR451 1,1 1,1 1,1 1,1 2,2 100.00% miR454 1,1 N/A N/A 1,1 1,2 75.00% miR455 1,1 1,1 1,1 1,1 2,2 100.00% miR460 N/A N/A N/A N/A 1,1 100.00% miR466 0,1 N/A 1,2 N/A 4,4 71.43% miR489 1,1 1,1 1,1 1,1 2,2 100.00% miR490 N/A N/A N/A N/A 0,2 0.00% miR499 1,1 1,1 N/A N/A 2,3 80.00% miR551 1,1 1,1 1,1 N/A 3,3 100.00% miR1306 N/A N/A N/A N/A 0,1 0.00%
title	Supplementary Tables
table-wrap	Table S1 Sensitivity and specificity of miR-Explore in predicting pre-miRNA across species. Class Specificity Sensitivity Chicken Human Mouse Zebra fish Others Average let-7 99.75% 2,2 2,2 2,2 2,2 38,39 97.87% miR1 100.00% 2,2 2,2 2,2 2,2 12,12 100.00% miR7 100.00% 1,2 1,2 2,2 1,2 13,16 75.00% miR9 100.00% 2,2 2,2 2,2 2,2 14,19 81.48% miR10 100.00% 2,2 2,2 2,2 2,2 10,14 81.82% miR15 100.00% 2,2 2,2 2,2 2,2 9,10 94.44% miR16 100.00% 2,2 2,2 2,2 2,2 8,8 100.00% miR17 100.00% 1,1 1,1 1,1 2,2 5,5 100.00% miR18 100.00% 1,1 1,1 1,1 2,2 7,7 100.00% miR19 99.96% 2,2 2,2 2,2 2,2 10,13 85.71% miR20 100.00% 2,2 2,2 2,2 2,2 6,6 100.00% miR21 100.00% 1,1 1,1 1,1 2,2 4,4 100.00% miR22 100.00% 1,1 1,1 1,1 2,2 4,5 90.00% miR23 100.00% 1,1 2,2 2,2 2,2 9,9 100.00% miR24 100.00% 1,1 2,2 1,2 2,2 6,6 92.31% miR26 100.00% 1,1 2,2 2,2 2,2 6,6 100.00% miR27 100.00% 1,1 2,2 2,2 2,2 9,9 100.00% miR29 99.98% 2,2 2,2 2,2 2,2 14,17 88.00% miR30 99.99% 1,2 2,2 1,2 2,2 16,18 84.62% miR31 99.99% 1,1 0,1 1,1 2,2 11,11 93.75% miR32 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR33 100.00% 2,2 1,1 1,1 N/A 9,9 100.00% miR34 100.00% 1,2 2,2 2,2 0,1 14,14 90.48% miR92 99.81% 0,1 0,2 0,2 2,2 17,21 67.86% miR99 100.00% 1,1 2,2 2,2 2,2 4,5 91.67% miR100 100.00% 1,1 1,1 1,1 2,2 8,8 100.00% miR101 100.00% 2,2 2,2 2,2 2,2 7,8 93.75% miR103 100.00% 2,2 2,2 2,2 1,1 7,7 100.00% miR106 100.00% 1,1 2,2 2,2 N/A 6,6 100.00% miR107 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR122 100.00% 2,2 1,1 1,1 1,1 3,3 100.00% miR124 99.99% 2,2 2,2 2,2 2,2 15,15 100.00% miR125 99.97% 0,1 0,2 0,2 0,2 0,16 0.00% miR126 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR128 100.00% 2,2 2,2 2,2 2,2 6,6 100.00% miR130 100.00% 2,2 2,2 2,2 2,2 7,7 100.00% miR133 100.00% 2,2 2,2 2,2 2,2 11,14 86.36% miR135 100.00% 2,2 2,2 2,2 2,2 9,10 94.44% miR137 100.00% 1,1 1,1 1,1 2,2 5,6 90.91% miR138 100.00% 1,2 2,2 1,2 1,1 1,4 54.55% miR140 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR142 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR144 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR146 100.00% 2,2 2,2 2,2 2,2 5,5 100.00% miR147 100.00% 2,2 2,2 1,1 N/A 3,3 100.00% miR148 100.00% 1,1 2,2 2,2 1,1 4,4 100.00% miR153 100.00% 1,1 2,2 1,1 2,2 6,6 100.00% miR155 100.00% 0,1 0,1 0,1 0,1 0,2 0.00% miR181 100.00% 2,2 2,2 2,2 2,2 15,18 88.46% miR183 100.00% 1,1 1,1 1,1 1,1 2,5 66.67% miR184 100.00% 1,1 1,1 1,1 1,1 9,9 100.00% miR187 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR190 99.99% 1,1 1,1 1,1 1,1 7,8 91.67% miR193 100.00% 2,2 1,1 1,1 2,2 3,6 75.00% miR194 100.00% 1,1 2,2 2,2 2,2 5,5 100.00% miR196 100.00% 2,2 2,2 2,2 2,2 9,9 100.00% miR199 100.00% 2,2 2,2 2,2 2,2 8,9 94.12% miR200 100.00% 2,2 2,2 2,2 2,2 9,9 100.00% miR202 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR203 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR204 100.00% 2,2 1,1 1,1 2,2 5,5 100.00% miR205 100.00% 2,2 1,1 1,1 1,1 4,4 100.00% miR206 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR211 100.00% 1,1 1,1 1,1 N/A 2,2 100.00% miR214 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR215 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR216 99.99% 2,2 1,1 1,1 2,2 4,8 85.71% miR217 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR218 100.00% 2,2 2,2 2,2 2,2 8,8 100.00% miR219 100.00% 1,1 1,1 1,1 1,1 8,10 85.71% miR221 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR222 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR223 100.00% 1,1 1,1 1,1 1,1 3,4 87.50% miR301 100.00% 1,1 1,1 1,1 1,1 5,5 100.00% miR302 100.00% 0,2 0,2 0,2 N/A 0,8 0.00% miR365 100.00% 2,2 2,2 2,2 2,2 3,5 84.62% miR367 100.00% 1,1 1,1 1,1 N/A 2,2 100.00% miR375 100.00% 1,1 1,1 1,1 1,1 5,5 100.00% miR383 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR429 100.00% 1,1 1,1 1,1 1,1 3,5 77.78% miR449 100.00% 0,1 0,1 0,1 N/A 0,4 0.00% miR451 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR454 100.00% 1,1 N/A N/A 1,1 2,2 100.00% miR455 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR460 100.00% N/A N/A N/A N/A 1,1 100.00% miR466 100.00% 0,1 N/A 0,2 N/A 1,4 14.29% miR489 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR490 100.00% N/A N/A N/A N/A 2,2 100.00% miR499 100.00% 1,1 1,1 N/A N/A 3,3 100.00% miR551 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR1306 100.00% N/A N/A N/A N/A 1,1 100.00%
label	Table S1
caption	Sensitivity and specificity of miR-Explore in predicting pre-miRNA across species.
p	Sensitivity and specificity of miR-Explore in predicting pre-miRNA across species.
table	Class Specificity Sensitivity Chicken Human Mouse Zebra fish Others Average let-7 99.75% 2,2 2,2 2,2 2,2 38,39 97.87% miR1 100.00% 2,2 2,2 2,2 2,2 12,12 100.00% miR7 100.00% 1,2 1,2 2,2 1,2 13,16 75.00% miR9 100.00% 2,2 2,2 2,2 2,2 14,19 81.48% miR10 100.00% 2,2 2,2 2,2 2,2 10,14 81.82% miR15 100.00% 2,2 2,2 2,2 2,2 9,10 94.44% miR16 100.00% 2,2 2,2 2,2 2,2 8,8 100.00% miR17 100.00% 1,1 1,1 1,1 2,2 5,5 100.00% miR18 100.00% 1,1 1,1 1,1 2,2 7,7 100.00% miR19 99.96% 2,2 2,2 2,2 2,2 10,13 85.71% miR20 100.00% 2,2 2,2 2,2 2,2 6,6 100.00% miR21 100.00% 1,1 1,1 1,1 2,2 4,4 100.00% miR22 100.00% 1,1 1,1 1,1 2,2 4,5 90.00% miR23 100.00% 1,1 2,2 2,2 2,2 9,9 100.00% miR24 100.00% 1,1 2,2 1,2 2,2 6,6 92.31% miR26 100.00% 1,1 2,2 2,2 2,2 6,6 100.00% miR27 100.00% 1,1 2,2 2,2 2,2 9,9 100.00% miR29 99.98% 2,2 2,2 2,2 2,2 14,17 88.00% miR30 99.99% 1,2 2,2 1,2 2,2 16,18 84.62% miR31 99.99% 1,1 0,1 1,1 2,2 11,11 93.75% miR32 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR33 100.00% 2,2 1,1 1,1 N/A 9,9 100.00% miR34 100.00% 1,2 2,2 2,2 0,1 14,14 90.48% miR92 99.81% 0,1 0,2 0,2 2,2 17,21 67.86% miR99 100.00% 1,1 2,2 2,2 2,2 4,5 91.67% miR100 100.00% 1,1 1,1 1,1 2,2 8,8 100.00% miR101 100.00% 2,2 2,2 2,2 2,2 7,8 93.75% miR103 100.00% 2,2 2,2 2,2 1,1 7,7 100.00% miR106 100.00% 1,1 2,2 2,2 N/A 6,6 100.00% miR107 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR122 100.00% 2,2 1,1 1,1 1,1 3,3 100.00% miR124 99.99% 2,2 2,2 2,2 2,2 15,15 100.00% miR125 99.97% 0,1 0,2 0,2 0,2 0,16 0.00% miR126 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR128 100.00% 2,2 2,2 2,2 2,2 6,6 100.00% miR130 100.00% 2,2 2,2 2,2 2,2 7,7 100.00% miR133 100.00% 2,2 2,2 2,2 2,2 11,14 86.36% miR135 100.00% 2,2 2,2 2,2 2,2 9,10 94.44% miR137 100.00% 1,1 1,1 1,1 2,2 5,6 90.91% miR138 100.00% 1,2 2,2 1,2 1,1 1,4 54.55% miR140 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR142 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR144 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR146 100.00% 2,2 2,2 2,2 2,2 5,5 100.00% miR147 100.00% 2,2 2,2 1,1 N/A 3,3 100.00% miR148 100.00% 1,1 2,2 2,2 1,1 4,4 100.00% miR153 100.00% 1,1 2,2 1,1 2,2 6,6 100.00% miR155 100.00% 0,1 0,1 0,1 0,1 0,2 0.00% miR181 100.00% 2,2 2,2 2,2 2,2 15,18 88.46% miR183 100.00% 1,1 1,1 1,1 1,1 2,5 66.67% miR184 100.00% 1,1 1,1 1,1 1,1 9,9 100.00% miR187 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR190 99.99% 1,1 1,1 1,1 1,1 7,8 91.67% miR193 100.00% 2,2 1,1 1,1 2,2 3,6 75.00% miR194 100.00% 1,1 2,2 2,2 2,2 5,5 100.00% miR196 100.00% 2,2 2,2 2,2 2,2 9,9 100.00% miR199 100.00% 2,2 2,2 2,2 2,2 8,9 94.12% miR200 100.00% 2,2 2,2 2,2 2,2 9,9 100.00% miR202 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR203 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR204 100.00% 2,2 1,1 1,1 2,2 5,5 100.00% miR205 100.00% 2,2 1,1 1,1 1,1 4,4 100.00% miR206 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR211 100.00% 1,1 1,1 1,1 N/A 2,2 100.00% miR214 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR215 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR216 99.99% 2,2 1,1 1,1 2,2 4,8 85.71% miR217 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR218 100.00% 2,2 2,2 2,2 2,2 8,8 100.00% miR219 100.00% 1,1 1,1 1,1 1,1 8,10 85.71% miR221 100.00% 1,1 1,1 1,1 1,1 4,4 100.00% miR222 100.00% 1,1 1,1 1,1 1,1 3,3 100.00% miR223 100.00% 1,1 1,1 1,1 1,1 3,4 87.50% miR301 100.00% 1,1 1,1 1,1 1,1 5,5 100.00% miR302 100.00% 0,2 0,2 0,2 N/A 0,8 0.00% miR365 100.00% 2,2 2,2 2,2 2,2 3,5 84.62% miR367 100.00% 1,1 1,1 1,1 N/A 2,2 100.00% miR375 100.00% 1,1 1,1 1,1 1,1 5,5 100.00% miR383 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR429 100.00% 1,1 1,1 1,1 1,1 3,5 77.78% miR449 100.00% 0,1 0,1 0,1 N/A 0,4 0.00% miR451 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR454 100.00% 1,1 N/A N/A 1,1 2,2 100.00% miR455 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR460 100.00% N/A N/A N/A N/A 1,1 100.00% miR466 100.00% 0,1 N/A 0,2 N/A 1,4 14.29% miR489 100.00% 1,1 1,1 1,1 1,1 2,2 100.00% miR490 100.00% N/A N/A N/A N/A 2,2 100.00% miR499 100.00% 1,1 1,1 N/A N/A 3,3 100.00% miR551 100.00% 1,1 1,1 1,1 N/A 3,3 100.00% miR1306 100.00% N/A N/A N/A N/A 1,1 100.00%
tr	Class Specificity Sensitivity
th	Class
th	Specificity
th	Sensitivity
tr	Chicken Human Mouse Zebra fish Others Average
th	Chicken
th	Human
th	Mouse
th	Zebra fish
th	Others
th	Average
tr	let-7 99.75% 2,2 2,2 2,2 2,2 38,39 97.87%
td	let-7
td	99.75%
td	2,2
td	2,2
td	2,2
td	2,2
td	38,39
td	97.87%
tr	miR1 100.00% 2,2 2,2 2,2 2,2 12,12 100.00%
td	miR1
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	12,12
td	100.00%
tr	miR7 100.00% 1,2 1,2 2,2 1,2 13,16 75.00%
td	miR7
td	100.00%
td	1,2
td	1,2
td	2,2
td	1,2
td	13,16
td	75.00%
tr	miR9 100.00% 2,2 2,2 2,2 2,2 14,19 81.48%
td	miR9
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	14,19
td	81.48%
tr	miR10 100.00% 2,2 2,2 2,2 2,2 10,14 81.82%
td	miR10
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	10,14
td	81.82%
tr	miR15 100.00% 2,2 2,2 2,2 2,2 9,10 94.44%
td	miR15
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	9,10
td	94.44%
tr	miR16 100.00% 2,2 2,2 2,2 2,2 8,8 100.00%
td	miR16
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	8,8
td	100.00%
tr	miR17 100.00% 1,1 1,1 1,1 2,2 5,5 100.00%
td	miR17
td	100.00%
td	1,1
td	1,1
td	1,1
td	2,2
td	5,5
td	100.00%
tr	miR18 100.00% 1,1 1,1 1,1 2,2 7,7 100.00%
td	miR18
td	100.00%
td	1,1
td	1,1
td	1,1
td	2,2
td	7,7
td	100.00%
tr	miR19 99.96% 2,2 2,2 2,2 2,2 10,13 85.71%
td	miR19
td	99.96%
td	2,2
td	2,2
td	2,2
td	2,2
td	10,13
td	85.71%
tr	miR20 100.00% 2,2 2,2 2,2 2,2 6,6 100.00%
td	miR20
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	6,6
td	100.00%
tr	miR21 100.00% 1,1 1,1 1,1 2,2 4,4 100.00%
td	miR21
td	100.00%
td	1,1
td	1,1
td	1,1
td	2,2
td	4,4
td	100.00%
tr	miR22 100.00% 1,1 1,1 1,1 2,2 4,5 90.00%
td	miR22
td	100.00%
td	1,1
td	1,1
td	1,1
td	2,2
td	4,5
td	90.00%
tr	miR23 100.00% 1,1 2,2 2,2 2,2 9,9 100.00%
td	miR23
td	100.00%
td	1,1
td	2,2
td	2,2
td	2,2
td	9,9
td	100.00%
tr	miR24 100.00% 1,1 2,2 1,2 2,2 6,6 92.31%
td	miR24
td	100.00%
td	1,1
td	2,2
td	1,2
td	2,2
td	6,6
td	92.31%
tr	miR26 100.00% 1,1 2,2 2,2 2,2 6,6 100.00%
td	miR26
td	100.00%
td	1,1
td	2,2
td	2,2
td	2,2
td	6,6
td	100.00%
tr	miR27 100.00% 1,1 2,2 2,2 2,2 9,9 100.00%
td	miR27
td	100.00%
td	1,1
td	2,2
td	2,2
td	2,2
td	9,9
td	100.00%
tr	miR29 99.98% 2,2 2,2 2,2 2,2 14,17 88.00%
td	miR29
td	99.98%
td	2,2
td	2,2
td	2,2
td	2,2
td	14,17
td	88.00%
tr	miR30 99.99% 1,2 2,2 1,2 2,2 16,18 84.62%
td	miR30
td	99.99%
td	1,2
td	2,2
td	1,2
td	2,2
td	16,18
td	84.62%
tr	miR31 99.99% 1,1 0,1 1,1 2,2 11,11 93.75%
td	miR31
td	99.99%
td	1,1
td	0,1
td	1,1
td	2,2
td	11,11
td	93.75%
tr	miR32 100.00% 1,1 1,1 1,1 N/A 3,3 100.00%
td	miR32
td	100.00%
td	1,1
td	1,1
td	1,1
td	N/A
td	3,3
td	100.00%
tr	miR33 100.00% 2,2 1,1 1,1 N/A 9,9 100.00%
td	miR33
td	100.00%
td	2,2
td	1,1
td	1,1
td	N/A
td	9,9
td	100.00%
tr	miR34 100.00% 1,2 2,2 2,2 0,1 14,14 90.48%
td	miR34
td	100.00%
td	1,2
td	2,2
td	2,2
td	0,1
td	14,14
td	90.48%
tr	miR92 99.81% 0,1 0,2 0,2 2,2 17,21 67.86%
td	miR92
td	99.81%
td	0,1
td	0,2
td	0,2
td	2,2
td	17,21
td	67.86%
tr	miR99 100.00% 1,1 2,2 2,2 2,2 4,5 91.67%
td	miR99
td	100.00%
td	1,1
td	2,2
td	2,2
td	2,2
td	4,5
td	91.67%
tr	miR100 100.00% 1,1 1,1 1,1 2,2 8,8 100.00%
td	miR100
td	100.00%
td	1,1
td	1,1
td	1,1
td	2,2
td	8,8
td	100.00%
tr	miR101 100.00% 2,2 2,2 2,2 2,2 7,8 93.75%
td	miR101
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	7,8
td	93.75%
tr	miR103 100.00% 2,2 2,2 2,2 1,1 7,7 100.00%
td	miR103
td	100.00%
td	2,2
td	2,2
td	2,2
td	1,1
td	7,7
td	100.00%
tr	miR106 100.00% 1,1 2,2 2,2 N/A 6,6 100.00%
td	miR106
td	100.00%
td	1,1
td	2,2
td	2,2
td	N/A
td	6,6
td	100.00%
tr	miR107 100.00% 1,1 1,1 1,1 1,1 4,4 100.00%
td	miR107
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	4,4
td	100.00%
tr	miR122 100.00% 2,2 1,1 1,1 1,1 3,3 100.00%
td	miR122
td	100.00%
td	2,2
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR124 99.99% 2,2 2,2 2,2 2,2 15,15 100.00%
td	miR124
td	99.99%
td	2,2
td	2,2
td	2,2
td	2,2
td	15,15
td	100.00%
tr	miR125 99.97% 0,1 0,2 0,2 0,2 0,16 0.00%
td	miR125
td	99.97%
td	0,1
td	0,2
td	0,2
td	0,2
td	0,16
td	0.00%
tr	miR126 100.00% 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR126
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR128 100.00% 2,2 2,2 2,2 2,2 6,6 100.00%
td	miR128
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	6,6
td	100.00%
tr	miR130 100.00% 2,2 2,2 2,2 2,2 7,7 100.00%
td	miR130
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	7,7
td	100.00%
tr	miR133 100.00% 2,2 2,2 2,2 2,2 11,14 86.36%
td	miR133
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	11,14
td	86.36%
tr	miR135 100.00% 2,2 2,2 2,2 2,2 9,10 94.44%
td	miR135
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	9,10
td	94.44%
tr	miR137 100.00% 1,1 1,1 1,1 2,2 5,6 90.91%
td	miR137
td	100.00%
td	1,1
td	1,1
td	1,1
td	2,2
td	5,6
td	90.91%
tr	miR138 100.00% 1,2 2,2 1,2 1,1 1,4 54.55%
td	miR138
td	100.00%
td	1,2
td	2,2
td	1,2
td	1,1
td	1,4
td	54.55%
tr	miR140 100.00% 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR140
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR142 100.00% 1,1 1,1 1,1 1,1 4,4 100.00%
td	miR142
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	4,4
td	100.00%
tr	miR144 100.00% 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR144
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR146 100.00% 2,2 2,2 2,2 2,2 5,5 100.00%
td	miR146
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	5,5
td	100.00%
tr	miR147 100.00% 2,2 2,2 1,1 N/A 3,3 100.00%
td	miR147
td	100.00%
td	2,2
td	2,2
td	1,1
td	N/A
td	3,3
td	100.00%
tr	miR148 100.00% 1,1 2,2 2,2 1,1 4,4 100.00%
td	miR148
td	100.00%
td	1,1
td	2,2
td	2,2
td	1,1
td	4,4
td	100.00%
tr	miR153 100.00% 1,1 2,2 1,1 2,2 6,6 100.00%
td	miR153
td	100.00%
td	1,1
td	2,2
td	1,1
td	2,2
td	6,6
td	100.00%
tr	miR155 100.00% 0,1 0,1 0,1 0,1 0,2 0.00%
td	miR155
td	100.00%
td	0,1
td	0,1
td	0,1
td	0,1
td	0,2
td	0.00%
tr	miR181 100.00% 2,2 2,2 2,2 2,2 15,18 88.46%
td	miR181
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	15,18
td	88.46%
tr	miR183 100.00% 1,1 1,1 1,1 1,1 2,5 66.67%
td	miR183
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	2,5
td	66.67%
tr	miR184 100.00% 1,1 1,1 1,1 1,1 9,9 100.00%
td	miR184
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	9,9
td	100.00%
tr	miR187 100.00% 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR187
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR190 99.99% 1,1 1,1 1,1 1,1 7,8 91.67%
td	miR190
td	99.99%
td	1,1
td	1,1
td	1,1
td	1,1
td	7,8
td	91.67%
tr	miR193 100.00% 2,2 1,1 1,1 2,2 3,6 75.00%
td	miR193
td	100.00%
td	2,2
td	1,1
td	1,1
td	2,2
td	3,6
td	75.00%
tr	miR194 100.00% 1,1 2,2 2,2 2,2 5,5 100.00%
td	miR194
td	100.00%
td	1,1
td	2,2
td	2,2
td	2,2
td	5,5
td	100.00%
tr	miR196 100.00% 2,2 2,2 2,2 2,2 9,9 100.00%
td	miR196
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	9,9
td	100.00%
tr	miR199 100.00% 2,2 2,2 2,2 2,2 8,9 94.12%
td	miR199
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	8,9
td	94.12%
tr	miR200 100.00% 2,2 2,2 2,2 2,2 9,9 100.00%
td	miR200
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	9,9
td	100.00%
tr	miR202 100.00% 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR202
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR203 100.00% 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR203
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR204 100.00% 2,2 1,1 1,1 2,2 5,5 100.00%
td	miR204
td	100.00%
td	2,2
td	1,1
td	1,1
td	2,2
td	5,5
td	100.00%
tr	miR205 100.00% 2,2 1,1 1,1 1,1 4,4 100.00%
td	miR205
td	100.00%
td	2,2
td	1,1
td	1,1
td	1,1
td	4,4
td	100.00%
tr	miR206 100.00% 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR206
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR211 100.00% 1,1 1,1 1,1 N/A 2,2 100.00%
td	miR211
td	100.00%
td	1,1
td	1,1
td	1,1
td	N/A
td	2,2
td	100.00%
tr	miR214 100.00% 1,1 1,1 1,1 1,1 4,4 100.00%
td	miR214
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	4,4
td	100.00%
tr	miR215 100.00% 1,1 1,1 1,1 N/A 3,3 100.00%
td	miR215
td	100.00%
td	1,1
td	1,1
td	1,1
td	N/A
td	3,3
td	100.00%
tr	miR216 99.99% 2,2 1,1 1,1 2,2 4,8 85.71%
td	miR216
td	99.99%
td	2,2
td	1,1
td	1,1
td	2,2
td	4,8
td	85.71%
tr	miR217 100.00% 1,1 1,1 1,1 1,1 4,4 100.00%
td	miR217
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	4,4
td	100.00%
tr	miR218 100.00% 2,2 2,2 2,2 2,2 8,8 100.00%
td	miR218
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	8,8
td	100.00%
tr	miR219 100.00% 1,1 1,1 1,1 1,1 8,10 85.71%
td	miR219
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	8,10
td	85.71%
tr	miR221 100.00% 1,1 1,1 1,1 1,1 4,4 100.00%
td	miR221
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	4,4
td	100.00%
tr	miR222 100.00% 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR222
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR223 100.00% 1,1 1,1 1,1 1,1 3,4 87.50%
td	miR223
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	3,4
td	87.50%
tr	miR301 100.00% 1,1 1,1 1,1 1,1 5,5 100.00%
td	miR301
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	5,5
td	100.00%
tr	miR302 100.00% 0,2 0,2 0,2 N/A 0,8 0.00%
td	miR302
td	100.00%
td	0,2
td	0,2
td	0,2
td	N/A
td	0,8
td	0.00%
tr	miR365 100.00% 2,2 2,2 2,2 2,2 3,5 84.62%
td	miR365
td	100.00%
td	2,2
td	2,2
td	2,2
td	2,2
td	3,5
td	84.62%
tr	miR367 100.00% 1,1 1,1 1,1 N/A 2,2 100.00%
td	miR367
td	100.00%
td	1,1
td	1,1
td	1,1
td	N/A
td	2,2
td	100.00%
tr	miR375 100.00% 1,1 1,1 1,1 1,1 5,5 100.00%
td	miR375
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	5,5
td	100.00%
tr	miR383 100.00% 1,1 1,1 1,1 N/A 3,3 100.00%
td	miR383
td	100.00%
td	1,1
td	1,1
td	1,1
td	N/A
td	3,3
td	100.00%
tr	miR429 100.00% 1,1 1,1 1,1 1,1 3,5 77.78%
td	miR429
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	3,5
td	77.78%
tr	miR449 100.00% 0,1 0,1 0,1 N/A 0,4 0.00%
td	miR449
td	100.00%
td	0,1
td	0,1
td	0,1
td	N/A
td	0,4
td	0.00%
tr	miR451 100.00% 1,1 1,1 1,1 1,1 2,2 100.00%
td	miR451
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	2,2
td	100.00%
tr	miR454 100.00% 1,1 N/A N/A 1,1 2,2 100.00%
td	miR454
td	100.00%
td	1,1
td	N/A
td	N/A
td	1,1
td	2,2
td	100.00%
tr	miR455 100.00% 1,1 1,1 1,1 1,1 2,2 100.00%
td	miR455
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	2,2
td	100.00%
tr	miR460 100.00% N/A N/A N/A N/A 1,1 100.00%
td	miR460
td	100.00%
td	N/A
td	N/A
td	N/A
td	N/A
td	1,1
td	100.00%
tr	miR466 100.00% 0,1 N/A 0,2 N/A 1,4 14.29%
td	miR466
td	100.00%
td	0,1
td	N/A
td	0,2
td	N/A
td	1,4
td	14.29%
tr	miR489 100.00% 1,1 1,1 1,1 1,1 2,2 100.00%
td	miR489
td	100.00%
td	1,1
td	1,1
td	1,1
td	1,1
td	2,2
td	100.00%
tr	miR490 100.00% N/A N/A N/A N/A 2,2 100.00%
td	miR490
td	100.00%
td	N/A
td	N/A
td	N/A
td	N/A
td	2,2
td	100.00%
tr	miR499 100.00% 1,1 1,1 N/A N/A 3,3 100.00%
td	miR499
td	100.00%
td	1,1
td	1,1
td	N/A
td	N/A
td	3,3
td	100.00%
tr	miR551 100.00% 1,1 1,1 1,1 N/A 3,3 100.00%
td	miR551
td	100.00%
td	1,1
td	1,1
td	1,1
td	N/A
td	3,3
td	100.00%
tr	miR1306 100.00% N/A N/A N/A N/A 1,1 100.00%
td	miR1306
td	100.00%
td	N/A
td	N/A
td	N/A
td	N/A
td	1,1
td	100.00%
table-wrap	Table S2 Sensitivity of miR-abela in predicting pre-miRNA across species. Class Sensitivity miR-abela Chicken Human Mouse Zebra fish Others Average let-7 2,2 0,2 1,2 2,2 26,39 65.96% miR1 2,2 2,2 1,2 2,2 10,12 85.00% miR7 2,2 2,2 2,2 2,2 14,16 91.67% miR9 2,2 2,2 2,2 2,2 16,19 88.89% miR10 2,2 2,2 1,2 2,2 11,14 81.82% miR15 1,2 1,2 0,2 2,2 3,9 41.18% miR16 2,2 2,2 2,2 1,2 7,8 87.50% miR17 0,1 0,1 0,1 0,2 1,5 10.00% miR18 2,2 0,1 1,1 2,2 3,7 61.54% miR19 2,2 2,2 2,2 2,2 8,13 76.19% miR20 1,2 1,2 1,2 1,2 3,6 50.00% miR21 1,1 1,1 1,1 2,2 3,4 88.89% miR22 1,1 1,1 1,1 2,2 3,5 80.00% miR23 1,1 2,2 2,2 2,2 5,9 75.00% miR24 0,1 1,2 1,2 2,2 1,6 38.46% miR26 1,1 2,2 2,2 2,2 3,6 100.00% miR27 0,1 1,2 1,2 1,2 4,9 43.75% miR29 2,2 2,2 2,2 2,2 13,17 84.00% miR30 1,2 2,2 2,2 2,2 11,18 69.23% miR31 1,1 1,1 1,1 0,2 8,11 68.75% miR32 1,1 1,1 1,1 N/A 2,3 83.33% miR33 2,2 1,1 1,1 N/A 8,9 92.31% miR34 1,2 1,2 1,2 1,1 14,14 85.71% miR92 1,1 2,2 2,2 2,2 17,21 92.86% miR99 1,1 1,2 1,2 1,2 0,5 33.33% miR100 0,1 0,1 0,1 2,2 5,8 53.85% miR101 2,2 2,2 2,2 2,2 5,8 81.25% miR103 2,2 1,2 2,2 1,1 1,7 50.00% miR106 0,1 1,2 0,2 N/A 4,6 45.45% miR107 0,1 0,1 1,1 1,1 0,4 25.00% miR122 2,2 1,1 1,1 1,1 2,3 87.50% miR124 2,2 2,2 1,2 1,2 11,15 73.91% miR125 1,1 2,2 1,2 2,2 6,16 52.17% miR126 1,1 1,1 1,1 1,1 2,3 85.71% miR128 2,2 2,2 2,2 2,2 3,6 78.57% miR130 1,2 2,2 1,2 2,2 5,7 73.33% miR133 2,2 2,2 2,2 2,2 9,14 77.27% miR135 2,2 2,2 2,2 0,2 9,10 83.33% miR137 1,1 1,1 1,1 2,2 5,6 90.91% miR138 0,2 1,2 0,2 0,1 1,4 18.18% miR140 1,1 1,1 1,1 1,1 3,3 100.00% miR142 1,1 1,1 0,1 1,1 3,4 75.00% miR144 1,1 1,1 1,1 1,1 2,3 85.71% miR146 2,2 1,2 2,2 2,2 4,5 84.62% miR147 0,2 1,2 0,1 N/A 2,3 37.50% miR148 0,1 1,2 2,2 0,1 1,4 40.00% miR153 1,1 2,2 1,1 2,2 5,6 91.67% miR155 0,1 0,1 1,1 1,1 1,2 50.00% miR181 1,2 1,2 1,2 1,2 5,18 34.62% miR183 0,1 0,1 0,1 0,1 3,5 33.33% miR184 1,1 1,1 1,1 1,1 7,9 84.62% miR187 1,1 0,1 0,1 0,1 0,3 14.29% miR190 1,1 1,1 1,1 1,1 6,7 90.91% miR193 1,2 1,1 0,1 2,2 4,6 66.67% miR194 1,1 1,2 1,2 2,2 4,5 75.00% miR196 2,2 2,2 2,2 2,2 6,9 100.00% miR199 2,2 2,2 1,2 2,2 3,9 58.82% miR200 2,2 2,2 2,2 2,2 7,9 88.24% miR202 1,1 1,1 1,1 1,1 3,3 100.00% miR203 1,1 1,1 1,1 1,1 3,3 100.00% miR204 2,2 1,1 1,1 1,2 4,5 81.82% miR205 2,2 1,1 1,1 1,1 3,4 88.89% miR206 1,1 1,1 1,1 1,1 2,3 85.71% miR211 1,1 1,1 1,1 N/A 2,2 100.00% miR214 1,1 1,1 1,1 1,1 4,4 100.00% miR215 1,1 1,1 1,1 N/A 3,3 100.00% miR216 0,2 1,1 0,1 0,2 5,8 42.86% miR217 1,1 1,1 1,1 1,1 3,4 87.50% miR218 2,2 2,2 2,2 2,2 5,8 81.25% miR219 1,1 1,1 1,1 1,1 5,10 64.29% miR221 1,1 1,1 1,1 1,1 3,4 87.50% miR222 1,1 1,1 1,1 1,1 3,3 100.00% miR223 1,1 1,1 1,1 1,1 4,4 100.00% miR301 1,1 0,1 0,1 0,1 0,5 11.11% miR302 1,2 2,2 1,2 N/A 8,8 85.71% miR365 1,2 1,2 2,2 1,2 3,5 61.54% miR367 0,1 1,1 1,1 N/A 0,2 40.00% miR375 1,1 0,1 0,1 1,1 3,5 55.56% miR383 0,1 0,1 0,1 N/A 0,3 0.00% miR429 1,1 0,1 1,1 1,1 4,5 77.78% miR449 1,1 1,1 1,1 N/A 2,4 71.43% miR451 1,1 1,1 1,1 1,1 2,2 100.00% miR454 1,1 N/A N/A 1,1 1,2 75.00% miR455 1,1 1,1 1,1 1,1 2,2 100.00% miR460 N/A N/A N/A N/A 1,1 100.00% miR466 0,1 N/A 1,2 N/A 4,4 71.43% miR489 1,1 1,1 1,1 1,1 2,2 100.00% miR490 N/A N/A N/A N/A 0,2 0.00% miR499 1,1 1,1 N/A N/A 2,3 80.00% miR551 1,1 1,1 1,1 N/A 3,3 100.00% miR1306 N/A N/A N/A N/A 0,1 0.00%
label	Table S2
caption	Sensitivity of miR-abela in predicting pre-miRNA across species.
p	Sensitivity of miR-abela in predicting pre-miRNA across species.
table	Class Sensitivity miR-abela Chicken Human Mouse Zebra fish Others Average let-7 2,2 0,2 1,2 2,2 26,39 65.96% miR1 2,2 2,2 1,2 2,2 10,12 85.00% miR7 2,2 2,2 2,2 2,2 14,16 91.67% miR9 2,2 2,2 2,2 2,2 16,19 88.89% miR10 2,2 2,2 1,2 2,2 11,14 81.82% miR15 1,2 1,2 0,2 2,2 3,9 41.18% miR16 2,2 2,2 2,2 1,2 7,8 87.50% miR17 0,1 0,1 0,1 0,2 1,5 10.00% miR18 2,2 0,1 1,1 2,2 3,7 61.54% miR19 2,2 2,2 2,2 2,2 8,13 76.19% miR20 1,2 1,2 1,2 1,2 3,6 50.00% miR21 1,1 1,1 1,1 2,2 3,4 88.89% miR22 1,1 1,1 1,1 2,2 3,5 80.00% miR23 1,1 2,2 2,2 2,2 5,9 75.00% miR24 0,1 1,2 1,2 2,2 1,6 38.46% miR26 1,1 2,2 2,2 2,2 3,6 100.00% miR27 0,1 1,2 1,2 1,2 4,9 43.75% miR29 2,2 2,2 2,2 2,2 13,17 84.00% miR30 1,2 2,2 2,2 2,2 11,18 69.23% miR31 1,1 1,1 1,1 0,2 8,11 68.75% miR32 1,1 1,1 1,1 N/A 2,3 83.33% miR33 2,2 1,1 1,1 N/A 8,9 92.31% miR34 1,2 1,2 1,2 1,1 14,14 85.71% miR92 1,1 2,2 2,2 2,2 17,21 92.86% miR99 1,1 1,2 1,2 1,2 0,5 33.33% miR100 0,1 0,1 0,1 2,2 5,8 53.85% miR101 2,2 2,2 2,2 2,2 5,8 81.25% miR103 2,2 1,2 2,2 1,1 1,7 50.00% miR106 0,1 1,2 0,2 N/A 4,6 45.45% miR107 0,1 0,1 1,1 1,1 0,4 25.00% miR122 2,2 1,1 1,1 1,1 2,3 87.50% miR124 2,2 2,2 1,2 1,2 11,15 73.91% miR125 1,1 2,2 1,2 2,2 6,16 52.17% miR126 1,1 1,1 1,1 1,1 2,3 85.71% miR128 2,2 2,2 2,2 2,2 3,6 78.57% miR130 1,2 2,2 1,2 2,2 5,7 73.33% miR133 2,2 2,2 2,2 2,2 9,14 77.27% miR135 2,2 2,2 2,2 0,2 9,10 83.33% miR137 1,1 1,1 1,1 2,2 5,6 90.91% miR138 0,2 1,2 0,2 0,1 1,4 18.18% miR140 1,1 1,1 1,1 1,1 3,3 100.00% miR142 1,1 1,1 0,1 1,1 3,4 75.00% miR144 1,1 1,1 1,1 1,1 2,3 85.71% miR146 2,2 1,2 2,2 2,2 4,5 84.62% miR147 0,2 1,2 0,1 N/A 2,3 37.50% miR148 0,1 1,2 2,2 0,1 1,4 40.00% miR153 1,1 2,2 1,1 2,2 5,6 91.67% miR155 0,1 0,1 1,1 1,1 1,2 50.00% miR181 1,2 1,2 1,2 1,2 5,18 34.62% miR183 0,1 0,1 0,1 0,1 3,5 33.33% miR184 1,1 1,1 1,1 1,1 7,9 84.62% miR187 1,1 0,1 0,1 0,1 0,3 14.29% miR190 1,1 1,1 1,1 1,1 6,7 90.91% miR193 1,2 1,1 0,1 2,2 4,6 66.67% miR194 1,1 1,2 1,2 2,2 4,5 75.00% miR196 2,2 2,2 2,2 2,2 6,9 100.00% miR199 2,2 2,2 1,2 2,2 3,9 58.82% miR200 2,2 2,2 2,2 2,2 7,9 88.24% miR202 1,1 1,1 1,1 1,1 3,3 100.00% miR203 1,1 1,1 1,1 1,1 3,3 100.00% miR204 2,2 1,1 1,1 1,2 4,5 81.82% miR205 2,2 1,1 1,1 1,1 3,4 88.89% miR206 1,1 1,1 1,1 1,1 2,3 85.71% miR211 1,1 1,1 1,1 N/A 2,2 100.00% miR214 1,1 1,1 1,1 1,1 4,4 100.00% miR215 1,1 1,1 1,1 N/A 3,3 100.00% miR216 0,2 1,1 0,1 0,2 5,8 42.86% miR217 1,1 1,1 1,1 1,1 3,4 87.50% miR218 2,2 2,2 2,2 2,2 5,8 81.25% miR219 1,1 1,1 1,1 1,1 5,10 64.29% miR221 1,1 1,1 1,1 1,1 3,4 87.50% miR222 1,1 1,1 1,1 1,1 3,3 100.00% miR223 1,1 1,1 1,1 1,1 4,4 100.00% miR301 1,1 0,1 0,1 0,1 0,5 11.11% miR302 1,2 2,2 1,2 N/A 8,8 85.71% miR365 1,2 1,2 2,2 1,2 3,5 61.54% miR367 0,1 1,1 1,1 N/A 0,2 40.00% miR375 1,1 0,1 0,1 1,1 3,5 55.56% miR383 0,1 0,1 0,1 N/A 0,3 0.00% miR429 1,1 0,1 1,1 1,1 4,5 77.78% miR449 1,1 1,1 1,1 N/A 2,4 71.43% miR451 1,1 1,1 1,1 1,1 2,2 100.00% miR454 1,1 N/A N/A 1,1 1,2 75.00% miR455 1,1 1,1 1,1 1,1 2,2 100.00% miR460 N/A N/A N/A N/A 1,1 100.00% miR466 0,1 N/A 1,2 N/A 4,4 71.43% miR489 1,1 1,1 1,1 1,1 2,2 100.00% miR490 N/A N/A N/A N/A 0,2 0.00% miR499 1,1 1,1 N/A N/A 2,3 80.00% miR551 1,1 1,1 1,1 N/A 3,3 100.00% miR1306 N/A N/A N/A N/A 0,1 0.00%
tr	Class Sensitivity miR-abela
th	Class
th	Sensitivity miR-abela
tr	Chicken Human Mouse Zebra fish Others Average
th	Chicken
th	Human
th	Mouse
th	Zebra fish
th	Others
th	Average
tr	let-7 2,2 0,2 1,2 2,2 26,39 65.96%
td	let-7
td	2,2
td	0,2
td	1,2
td	2,2
td	26,39
td	65.96%
tr	miR1 2,2 2,2 1,2 2,2 10,12 85.00%
td	miR1
td	2,2
td	2,2
td	1,2
td	2,2
td	10,12
td	85.00%
tr	miR7 2,2 2,2 2,2 2,2 14,16 91.67%
td	miR7
td	2,2
td	2,2
td	2,2
td	2,2
td	14,16
td	91.67%
tr	miR9 2,2 2,2 2,2 2,2 16,19 88.89%
td	miR9
td	2,2
td	2,2
td	2,2
td	2,2
td	16,19
td	88.89%
tr	miR10 2,2 2,2 1,2 2,2 11,14 81.82%
td	miR10
td	2,2
td	2,2
td	1,2
td	2,2
td	11,14
td	81.82%
tr	miR15 1,2 1,2 0,2 2,2 3,9 41.18%
td	miR15
td	1,2
td	1,2
td	0,2
td	2,2
td	3,9
td	41.18%
tr	miR16 2,2 2,2 2,2 1,2 7,8 87.50%
td	miR16
td	2,2
td	2,2
td	2,2
td	1,2
td	7,8
td	87.50%
tr	miR17 0,1 0,1 0,1 0,2 1,5 10.00%
td	miR17
td	0,1
td	0,1
td	0,1
td	0,2
td	1,5
td	10.00%
tr	miR18 2,2 0,1 1,1 2,2 3,7 61.54%
td	miR18
td	2,2
td	0,1
td	1,1
td	2,2
td	3,7
td	61.54%
tr	miR19 2,2 2,2 2,2 2,2 8,13 76.19%
td	miR19
td	2,2
td	2,2
td	2,2
td	2,2
td	8,13
td	76.19%
tr	miR20 1,2 1,2 1,2 1,2 3,6 50.00%
td	miR20
td	1,2
td	1,2
td	1,2
td	1,2
td	3,6
td	50.00%
tr	miR21 1,1 1,1 1,1 2,2 3,4 88.89%
td	miR21
td	1,1
td	1,1
td	1,1
td	2,2
td	3,4
td	88.89%
tr	miR22 1,1 1,1 1,1 2,2 3,5 80.00%
td	miR22
td	1,1
td	1,1
td	1,1
td	2,2
td	3,5
td	80.00%
tr	miR23 1,1 2,2 2,2 2,2 5,9 75.00%
td	miR23
td	1,1
td	2,2
td	2,2
td	2,2
td	5,9
td	75.00%
tr	miR24 0,1 1,2 1,2 2,2 1,6 38.46%
td	miR24
td	0,1
td	1,2
td	1,2
td	2,2
td	1,6
td	38.46%
tr	miR26 1,1 2,2 2,2 2,2 3,6 100.00%
td	miR26
td	1,1
td	2,2
td	2,2
td	2,2
td	3,6
td	100.00%
tr	miR27 0,1 1,2 1,2 1,2 4,9 43.75%
td	miR27
td	0,1
td	1,2
td	1,2
td	1,2
td	4,9
td	43.75%
tr	miR29 2,2 2,2 2,2 2,2 13,17 84.00%
td	miR29
td	2,2
td	2,2
td	2,2
td	2,2
td	13,17
td	84.00%
tr	miR30 1,2 2,2 2,2 2,2 11,18 69.23%
td	miR30
td	1,2
td	2,2
td	2,2
td	2,2
td	11,18
td	69.23%
tr	miR31 1,1 1,1 1,1 0,2 8,11 68.75%
td	miR31
td	1,1
td	1,1
td	1,1
td	0,2
td	8,11
td	68.75%
tr	miR32 1,1 1,1 1,1 N/A 2,3 83.33%
td	miR32
td	1,1
td	1,1
td	1,1
td	N/A
td	2,3
td	83.33%
tr	miR33 2,2 1,1 1,1 N/A 8,9 92.31%
td	miR33
td	2,2
td	1,1
td	1,1
td	N/A
td	8,9
td	92.31%
tr	miR34 1,2 1,2 1,2 1,1 14,14 85.71%
td	miR34
td	1,2
td	1,2
td	1,2
td	1,1
td	14,14
td	85.71%
tr	miR92 1,1 2,2 2,2 2,2 17,21 92.86%
td	miR92
td	1,1
td	2,2
td	2,2
td	2,2
td	17,21
td	92.86%
tr	miR99 1,1 1,2 1,2 1,2 0,5 33.33%
td	miR99
td	1,1
td	1,2
td	1,2
td	1,2
td	0,5
td	33.33%
tr	miR100 0,1 0,1 0,1 2,2 5,8 53.85%
td	miR100
td	0,1
td	0,1
td	0,1
td	2,2
td	5,8
td	53.85%
tr	miR101 2,2 2,2 2,2 2,2 5,8 81.25%
td	miR101
td	2,2
td	2,2
td	2,2
td	2,2
td	5,8
td	81.25%
tr	miR103 2,2 1,2 2,2 1,1 1,7 50.00%
td	miR103
td	2,2
td	1,2
td	2,2
td	1,1
td	1,7
td	50.00%
tr	miR106 0,1 1,2 0,2 N/A 4,6 45.45%
td	miR106
td	0,1
td	1,2
td	0,2
td	N/A
td	4,6
td	45.45%
tr	miR107 0,1 0,1 1,1 1,1 0,4 25.00%
td	miR107
td	0,1
td	0,1
td	1,1
td	1,1
td	0,4
td	25.00%
tr	miR122 2,2 1,1 1,1 1,1 2,3 87.50%
td	miR122
td	2,2
td	1,1
td	1,1
td	1,1
td	2,3
td	87.50%
tr	miR124 2,2 2,2 1,2 1,2 11,15 73.91%
td	miR124
td	2,2
td	2,2
td	1,2
td	1,2
td	11,15
td	73.91%
tr	miR125 1,1 2,2 1,2 2,2 6,16 52.17%
td	miR125
td	1,1
td	2,2
td	1,2
td	2,2
td	6,16
td	52.17%
tr	miR126 1,1 1,1 1,1 1,1 2,3 85.71%
td	miR126
td	1,1
td	1,1
td	1,1
td	1,1
td	2,3
td	85.71%
tr	miR128 2,2 2,2 2,2 2,2 3,6 78.57%
td	miR128
td	2,2
td	2,2
td	2,2
td	2,2
td	3,6
td	78.57%
tr	miR130 1,2 2,2 1,2 2,2 5,7 73.33%
td	miR130
td	1,2
td	2,2
td	1,2
td	2,2
td	5,7
td	73.33%
tr	miR133 2,2 2,2 2,2 2,2 9,14 77.27%
td	miR133
td	2,2
td	2,2
td	2,2
td	2,2
td	9,14
td	77.27%
tr	miR135 2,2 2,2 2,2 0,2 9,10 83.33%
td	miR135
td	2,2
td	2,2
td	2,2
td	0,2
td	9,10
td	83.33%
tr	miR137 1,1 1,1 1,1 2,2 5,6 90.91%
td	miR137
td	1,1
td	1,1
td	1,1
td	2,2
td	5,6
td	90.91%
tr	miR138 0,2 1,2 0,2 0,1 1,4 18.18%
td	miR138
td	0,2
td	1,2
td	0,2
td	0,1
td	1,4
td	18.18%
tr	miR140 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR140
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR142 1,1 1,1 0,1 1,1 3,4 75.00%
td	miR142
td	1,1
td	1,1
td	0,1
td	1,1
td	3,4
td	75.00%
tr	miR144 1,1 1,1 1,1 1,1 2,3 85.71%
td	miR144
td	1,1
td	1,1
td	1,1
td	1,1
td	2,3
td	85.71%
tr	miR146 2,2 1,2 2,2 2,2 4,5 84.62%
td	miR146
td	2,2
td	1,2
td	2,2
td	2,2
td	4,5
td	84.62%
tr	miR147 0,2 1,2 0,1 N/A 2,3 37.50%
td	miR147
td	0,2
td	1,2
td	0,1
td	N/A
td	2,3
td	37.50%
tr	miR148 0,1 1,2 2,2 0,1 1,4 40.00%
td	miR148
td	0,1
td	1,2
td	2,2
td	0,1
td	1,4
td	40.00%
tr	miR153 1,1 2,2 1,1 2,2 5,6 91.67%
td	miR153
td	1,1
td	2,2
td	1,1
td	2,2
td	5,6
td	91.67%
tr	miR155 0,1 0,1 1,1 1,1 1,2 50.00%
td	miR155
td	0,1
td	0,1
td	1,1
td	1,1
td	1,2
td	50.00%
tr	miR181 1,2 1,2 1,2 1,2 5,18 34.62%
td	miR181
td	1,2
td	1,2
td	1,2
td	1,2
td	5,18
td	34.62%
tr	miR183 0,1 0,1 0,1 0,1 3,5 33.33%
td	miR183
td	0,1
td	0,1
td	0,1
td	0,1
td	3,5
td	33.33%
tr	miR184 1,1 1,1 1,1 1,1 7,9 84.62%
td	miR184
td	1,1
td	1,1
td	1,1
td	1,1
td	7,9
td	84.62%
tr	miR187 1,1 0,1 0,1 0,1 0,3 14.29%
td	miR187
td	1,1
td	0,1
td	0,1
td	0,1
td	0,3
td	14.29%
tr	miR190 1,1 1,1 1,1 1,1 6,7 90.91%
td	miR190
td	1,1
td	1,1
td	1,1
td	1,1
td	6,7
td	90.91%
tr	miR193 1,2 1,1 0,1 2,2 4,6 66.67%
td	miR193
td	1,2
td	1,1
td	0,1
td	2,2
td	4,6
td	66.67%
tr	miR194 1,1 1,2 1,2 2,2 4,5 75.00%
td	miR194
td	1,1
td	1,2
td	1,2
td	2,2
td	4,5
td	75.00%
tr	miR196 2,2 2,2 2,2 2,2 6,9 100.00%
td	miR196
td	2,2
td	2,2
td	2,2
td	2,2
td	6,9
td	100.00%
tr	miR199 2,2 2,2 1,2 2,2 3,9 58.82%
td	miR199
td	2,2
td	2,2
td	1,2
td	2,2
td	3,9
td	58.82%
tr	miR200 2,2 2,2 2,2 2,2 7,9 88.24%
td	miR200
td	2,2
td	2,2
td	2,2
td	2,2
td	7,9
td	88.24%
tr	miR202 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR202
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR203 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR203
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR204 2,2 1,1 1,1 1,2 4,5 81.82%
td	miR204
td	2,2
td	1,1
td	1,1
td	1,2
td	4,5
td	81.82%
tr	miR205 2,2 1,1 1,1 1,1 3,4 88.89%
td	miR205
td	2,2
td	1,1
td	1,1
td	1,1
td	3,4
td	88.89%
tr	miR206 1,1 1,1 1,1 1,1 2,3 85.71%
td	miR206
td	1,1
td	1,1
td	1,1
td	1,1
td	2,3
td	85.71%
tr	miR211 1,1 1,1 1,1 N/A 2,2 100.00%
td	miR211
td	1,1
td	1,1
td	1,1
td	N/A
td	2,2
td	100.00%
tr	miR214 1,1 1,1 1,1 1,1 4,4 100.00%
td	miR214
td	1,1
td	1,1
td	1,1
td	1,1
td	4,4
td	100.00%
tr	miR215 1,1 1,1 1,1 N/A 3,3 100.00%
td	miR215
td	1,1
td	1,1
td	1,1
td	N/A
td	3,3
td	100.00%
tr	miR216 0,2 1,1 0,1 0,2 5,8 42.86%
td	miR216
td	0,2
td	1,1
td	0,1
td	0,2
td	5,8
td	42.86%
tr	miR217 1,1 1,1 1,1 1,1 3,4 87.50%
td	miR217
td	1,1
td	1,1
td	1,1
td	1,1
td	3,4
td	87.50%
tr	miR218 2,2 2,2 2,2 2,2 5,8 81.25%
td	miR218
td	2,2
td	2,2
td	2,2
td	2,2
td	5,8
td	81.25%
tr	miR219 1,1 1,1 1,1 1,1 5,10 64.29%
td	miR219
td	1,1
td	1,1
td	1,1
td	1,1
td	5,10
td	64.29%
tr	miR221 1,1 1,1 1,1 1,1 3,4 87.50%
td	miR221
td	1,1
td	1,1
td	1,1
td	1,1
td	3,4
td	87.50%
tr	miR222 1,1 1,1 1,1 1,1 3,3 100.00%
td	miR222
td	1,1
td	1,1
td	1,1
td	1,1
td	3,3
td	100.00%
tr	miR223 1,1 1,1 1,1 1,1 4,4 100.00%
td	miR223
td	1,1
td	1,1
td	1,1
td	1,1
td	4,4
td	100.00%
tr	miR301 1,1 0,1 0,1 0,1 0,5 11.11%
td	miR301
td	1,1
td	0,1
td	0,1
td	0,1
td	0,5
td	11.11%
tr	miR302 1,2 2,2 1,2 N/A 8,8 85.71%
td	miR302
td	1,2
td	2,2
td	1,2
td	N/A
td	8,8
td	85.71%
tr	miR365 1,2 1,2 2,2 1,2 3,5 61.54%
td	miR365
td	1,2
td	1,2
td	2,2
td	1,2
td	3,5
td	61.54%
tr	miR367 0,1 1,1 1,1 N/A 0,2 40.00%
td	miR367
td	0,1
td	1,1
td	1,1
td	N/A
td	0,2
td	40.00%
tr	miR375 1,1 0,1 0,1 1,1 3,5 55.56%
td	miR375
td	1,1
td	0,1
td	0,1
td	1,1
td	3,5
td	55.56%
tr	miR383 0,1 0,1 0,1 N/A 0,3 0.00%
td	miR383
td	0,1
td	0,1
td	0,1
td	N/A
td	0,3
td	0.00%
tr	miR429 1,1 0,1 1,1 1,1 4,5 77.78%
td	miR429
td	1,1
td	0,1
td	1,1
td	1,1
td	4,5
td	77.78%
tr	miR449 1,1 1,1 1,1 N/A 2,4 71.43%
td	miR449
td	1,1
td	1,1
td	1,1
td	N/A
td	2,4
td	71.43%
tr	miR451 1,1 1,1 1,1 1,1 2,2 100.00%
td	miR451
td	1,1
td	1,1
td	1,1
td	1,1
td	2,2
td	100.00%
tr	miR454 1,1 N/A N/A 1,1 1,2 75.00%
td	miR454
td	1,1
td	N/A
td	N/A
td	1,1
td	1,2
td	75.00%
tr	miR455 1,1 1,1 1,1 1,1 2,2 100.00%
td	miR455
td	1,1
td	1,1
td	1,1
td	1,1
td	2,2
td	100.00%
tr	miR460 N/A N/A N/A N/A 1,1 100.00%
td	miR460
td	N/A
td	N/A
td	N/A
td	N/A
td	1,1
td	100.00%
tr	miR466 0,1 N/A 1,2 N/A 4,4 71.43%
td	miR466
td	0,1
td	N/A
td	1,2
td	N/A
td	4,4
td	71.43%
tr	miR489 1,1 1,1 1,1 1,1 2,2 100.00%
td	miR489
td	1,1
td	1,1
td	1,1
td	1,1
td	2,2
td	100.00%
tr	miR490 N/A N/A N/A N/A 0,2 0.00%
td	miR490
td	N/A
td	N/A
td	N/A
td	N/A
td	0,2
td	0.00%
tr	miR499 1,1 1,1 N/A N/A 2,3 80.00%
td	miR499
td	1,1
td	1,1
td	N/A
td	N/A
td	2,3
td	80.00%
tr	miR551 1,1 1,1 1,1 N/A 3,3 100.00%
td	miR551
td	1,1
td	1,1
td	1,1
td	N/A
td	3,3
td	100.00%
tr	miR1306 N/A N/A N/A N/A 0,1 0.00%
td	miR1306
td	N/A
td	N/A
td	N/A
td	N/A
td	0,1
td	0.00%
back	Acknowledgement We thank Dr. Alain Sewer from Université de Lausanne, Switzerland who has provided us with the program to scan for the hairpin-like structures in the human and mouse coding sequences. Author Contributions Conceived and designed the experiments: SEA. Analyzed the data: BS. Wrote the first draft of the manuscript: BS. Contributed to the writing of the manuscript: SEA, BS. Agree with manuscript results and conclusions: SEA, BS. Jointly developed the structure and arguments for the paper: SEA, BS. Made critical revisions and approved final version: SEA, BS. All authors reviewed and approved of the final manuscript. Competing Interests Author(s) disclose no potential conflicts of interest. Disclosures and Ethics As a requirement of publication the authors have provided signed confirmation of their compliance with ethical and legal obligations including but not limited to compliance with ICMJE authorship and competing interests guidelines, that the article is neither under consideration for publication nor published elsewhere, of their compliance with legal and ethical guidelines concerning human and animal research participants (if applicable), and that permission has been obtained for reproduction of any copyrighted material. This article was subject to blind, independent, expert peer review. The reviewers reported no competing interests. Funding Author(s) disclose no funding sources.
ack	Acknowledgement We thank Dr. Alain Sewer from Université de Lausanne, Switzerland who has provided us with the program to scan for the hairpin-like structures in the human and mouse coding sequences.
title	Acknowledgement
p	We thank Dr. Alain Sewer from Université de Lausanne, Switzerland who has provided us with the program to scan for the hairpin-like structures in the human and mouse coding sequences.
footnote	Author Contributions Conceived and designed the experiments: SEA. Analyzed the data: BS. Wrote the first draft of the manuscript: BS. Contributed to the writing of the manuscript: SEA, BS. Agree with manuscript results and conclusions: SEA, BS. Jointly developed the structure and arguments for the paper: SEA, BS. Made critical revisions and approved final version: SEA, BS. All authors reviewed and approved of the final manuscript.
p	Author Contributions
p	Conceived and designed the experiments: SEA. Analyzed the data: BS. Wrote the first draft of the manuscript: BS. Contributed to the writing of the manuscript: SEA, BS. Agree with manuscript results and conclusions: SEA, BS. Jointly developed the structure and arguments for the paper: SEA, BS. Made critical revisions and approved final version: SEA, BS. All authors reviewed and approved of the final manuscript.
footnote	Competing Interests Author(s) disclose no potential conflicts of interest.
p	Competing Interests
p	Author(s) disclose no potential conflicts of interest.
footnote	Disclosures and Ethics As a requirement of publication the authors have provided signed confirmation of their compliance with ethical and legal obligations including but not limited to compliance with ICMJE authorship and competing interests guidelines, that the article is neither under consideration for publication nor published elsewhere, of their compliance with legal and ethical guidelines concerning human and animal research participants (if applicable), and that permission has been obtained for reproduction of any copyrighted material. This article was subject to blind, independent, expert peer review. The reviewers reported no competing interests.
p	Disclosures and Ethics
p	As a requirement of publication the authors have provided signed confirmation of their compliance with ethical and legal obligations including but not limited to compliance with ICMJE authorship and competing interests guidelines, that the article is neither under consideration for publication nor published elsewhere, of their compliance with legal and ethical guidelines concerning human and animal research participants (if applicable), and that permission has been obtained for reproduction of any copyrighted material. This article was subject to blind, independent, expert peer review. The reviewers reported no competing interests.
footnote	Funding Author(s) disclose no funding sources.
p	Funding
p	Author(s) disclose no funding sources.
figure	Figure 1 An example of alignment output from MARNA where ‘(’ and ‘)’ indicates the stem in the secondary structure, ‘.’ is a mismatch and ‘—’ is a gap.
label	Figure 1
caption	An example of alignment output from MARNA where ‘(’ and ‘)’ indicates the stem in the secondary structure, ‘.’ is a mismatch and ‘—’ is a gap.
p	An example of alignment output from MARNA where ‘(’ and ‘)’ indicates the stem in the secondary structure, ‘.’ is a mismatch and ‘—’ is a gap.
table-wrap	Table 1 Sensitivity and specificity of PromiR, miR-Explore and miR-abela in predicting pre-miRNA. Speciesa PromiR miR-Explore miR-abela GGA GGA HSA MMU DRE Others Averageb GGA HSA MMU DRE Others Averageb Sensitivity 53.00 91.00 92.00 89.00 95.00 86.00 88.00 78.00 78.00 74.00 82.00 65.00 71.00 Specificity 90.00 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 Notes: a GGA = chicken; HSA = human; MMU = Mouse; DRE = zebra fish; Others = fugu, worms, frog, chimpanzee, gorilla, platypus, pig, drosophila and buffalo; b average was calculated as the ratio between the total number of predicted pre-miRNA and the total number of pre-miRNA in the test data.
label	Table 1
caption	Sensitivity and specificity of PromiR, miR-Explore and miR-abela in predicting pre-miRNA.
p	Sensitivity and specificity of PromiR, miR-Explore and miR-abela in predicting pre-miRNA.
table	Speciesa PromiR miR-Explore miR-abela GGA GGA HSA MMU DRE Others Averageb GGA HSA MMU DRE Others Averageb Sensitivity 53.00 91.00 92.00 89.00 95.00 86.00 88.00 78.00 78.00 74.00 82.00 65.00 71.00 Specificity 90.00 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99
tr	Speciesa PromiR miR-Explore miR-abela
th	Speciesa
th	PromiR
th	miR-Explore
th	miR-abela
tr	GGA GGA HSA MMU DRE Others Averageb GGA HSA MMU DRE Others Averageb
th	GGA
th	GGA
th	HSA
th	MMU
th	DRE
th	Others
th	Averageb
th	GGA
th	HSA
th	MMU
th	DRE
th	Others
th	Averageb
tr	Sensitivity 53.00 91.00 92.00 89.00 95.00 86.00 88.00 78.00 78.00 74.00 82.00 65.00 71.00
td	Sensitivity
td	53.00
td	91.00
td	92.00
td	89.00
td	95.00
td	86.00
td	88.00
td	78.00
td	78.00
td	74.00
td	82.00
td	65.00
td	71.00
tr	Specificity 90.00 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99 99.99
td	Specificity
td	90.00
td	99.99
td	99.99
td	99.99
td	99.99
td	99.99
td	99.99
td	99.99
td	99.99
td	99.99
td	99.99
td	99.99
td	99.99
table-wrap-foot	Notes: a GGA = chicken; HSA = human; MMU = Mouse; DRE = zebra fish; Others = fugu, worms, frog, chimpanzee, gorilla, platypus, pig, drosophila and buffalo; b average was calculated as the ratio between the total number of predicted pre-miRNA and the total number of pre-miRNA in the test data.
footnote	Notes:
p	Notes:
footnote	a GGA = chicken; HSA = human; MMU = Mouse; DRE = zebra fish; Others = fugu, worms, frog, chimpanzee, gorilla, platypus, pig, drosophila and buffalo;
label	a
p	GGA = chicken; HSA = human; MMU = Mouse; DRE = zebra fish; Others = fugu, worms, frog, chimpanzee, gorilla, platypus, pig, drosophila and buffalo;
footnote	b average was calculated as the ratio between the total number of predicted pre-miRNA and the total number of pre-miRNA in the test data.
label	b
p	average was calculated as the ratio between the total number of predicted pre-miRNA and the total number of pre-miRNA in the test data.

projects that include this document

Unselected / annnotation		Selected / annnotation
MyTest (0) TEST0 (0) 2_test (38)

TAB JSON ListView MergeView

PMC:3623602 JSONTXT

Document structure show

projects that include this document

PMC:3623602 JSON TXT