PMC:1892782 / 16859-18270
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1892782","sourcedb":"PMC","sourceid":"1892782","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1892782","text":"Classification performance is evaluated using 30920 automatically generated ClustalW alignments of 313 of the 503 ncRNA families from RFAM (version 7.0). All sequences attending at the training alignments were excluded from the test set. For each family at most 500 ClustalW alignments were randomly constructed each for 2 to 6 sequences, resulting in maximal 2500 alignments for a family. Since the alignments which were taken to train the SVM are no longer than 400 nt, have a minimal pairwise sequence identity of 60% and contain maximal six sequences, test alignments were created which meet the same criteria. For alignments which do not fall into those ranges probability estimates of the SVM need to be regarded with certainty. 8 families had no alignments between 40 and 400 nt and were hence discarded from the test set. 67 families are not included because they consist of only one or two sequences. 2 families had no sampled alignments with a mean pairwise sequence identity larger than 60%. Lastly, the sampled alignments of 113 families were not recognized as ncRNA by RNAz on at least one reading direction and were also discarded from the test data set. A list of families excluded from the test data can be found in the supplementary material (see Additional file 1). All alignments in the test set were used as positive test cases and their realigned reverse complements as negative test cases.","tracks":[]}