PMC:1892782 / 36906-39308
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"17540014-10329189-1690065","span":{"begin":239,"end":240},"obj":"10329189"},{"id":"17540014-15024064-1690066","span":{"begin":1009,"end":1011},"obj":"15024064"},{"id":"17540014-12912925-1690067","span":{"begin":1050,"end":1052},"obj":"12912925"}],"text":"3 Discussion\nRNA molecules and their reverse complements in general form fairly similar secondary structures [25]. For individual sequences, small differences between plus and minus strand arise from small asymmetries in the energy model [9]. In a multiple sequence alignment, GU pairs in an evolutionary conserved stem provide information on the correct reading direction since their reverse complement, AC, is not a canonical base pair. Nevertheless, it is a surprisingly hard problem to recognize the correct reading direction of a structured RNA from a multiple sequence alignment in practise. This is an important task in genome annotation, however, since without reliable strand information it is not even possible to determine whether an evolutionarily conserved secondary structure is located in an UTR or intron, or in an antisense transcript. The reading direction is also of obvious importance in context of recognizing class membership by means of short sequence motifs such as SMN-binding sites [26] or a Cajal body localization signal [27].\nThe RNAstrand tool presented in this contribution uses a SVM to predict strand information from a set of four thermodynamic features that can readily be computed for any multiple sequence alignment based on well-established energy parameters and dynamic programming algorithms. We show here that, together with basic information on the size, sequence and GU base pair variation in the input alignment, these features are sufficient to determine the reading direction of an RNA motif with an evolutionary conserved secondary structure. The tool RNAstrand achieves classification accuracies of 90% and above for most ncRNA families. On microRNAs, its performance is comparable to that of EvoFold. In applications to data from organisms for which not much genomic DNA has been sequenced, RNAstrand has the advantage that it does not require fairly accurate estimates of evolutionary distances as input.\nThe main area of application for a tool like RNAstrand is of course in large scale surveys for evolutionary conserved ncRNAs. RNAstrand achieves a 2-fold reduction of misclassifications on known ncRNAs compared to the naïve approach of determining the likely reading direction by comparing the scores of ncRNA detectors in both directions in the case of RNAz. It has therefore been integrated into the current release 1.0 of the RNAz package [28]."}