PMC:1687185 / 41499-44064
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1687185","sourcedb":"PMC","sourceid":"1687185","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1687185","text":"6.2 Real Datasets\nTable 5 shows the results of the Exit-point methodology on real biological sequences. We have chosen l = 20 and d = 2. 't' indicates the number of sequences in the real data. For the biological samples taken from [1,12], the value m once again is the average number of random projection + EM cycles required to discover the motif. All other parameter values (like projection size k = 7 and threshold s = 4) are chosen to be the same as those used in the Random projection paper [1]. All of the motifs were recovered with m = 1 using the Exit-point strategy. The Random Projection algorithm alone needed multiple cycles (m = 8 in some cases and m = 15 in others) in order to retrieve the correct motif. This elucidates the fact that global methods can only be used to a certain extent and should be combined with refined local heuristics in order to obtain better efficiency. Since the random projection algorithm has outperformed other prominent motif finding algorithms like SP-STAR, WINNOWER, Gibbs sampling etc., we did not repeat the same experiments that were conducted in [1]. Running one cycle of random projection + EM is much more expensive computationally. The main advantage of our strategy comes from the deterministic nature of our algorithm in refining motifs.\nTable 5 Results on real datasets. Results of Exit-point method on biological samples. The real motifs were obtained in all the six cases using the Exit-point framework.\nSequence Sample Size t Best (20,2) Motif Reference Motif\nE. coli CRP 1890 18 TGTGAAATAGATCACATTTT TGTGANNNNGNTCACA\npreproinsulin 7689 4 GGAAATTGCAGCCTCAGCCC CCTCAGCCC\nDHFR 800 4 CTGCAATTTCGCGCCAAACT ATTTCNNGCCA\nmetallothionein 6823 4 CCCTCTGCGCCCGGACCGGT TGCRCYCGG\nc-fos 3695 5 CCATATTAGGACATCTGCGT CCATATTAGAGACTCT\nyeast ECB 5000 5 GTATTTCCCGTTTAGGAAAA TTTCCCNNTNAGGAAA Let the cost of applying EM algorithm for a given bucket be f and let the average number of buckets for a given projection be b. Then the running time of the Exit-point method will be O(cbf) where c is a constant that is linear in l-the length of the motif. If there were m projections, then cost of the random projection algorithm using restarts will be O(mbf). The two main advantages of using Exit-point strategy compared to random projection algorithm are :\n• It avoids multiple random projections which often provide similar optimal motifs.\n• It provides multiple optimal solutions in a promising region of a given bucket as opposed to a single solution provided by random projection algorithm.","divisions":[{"label":"title","span":{"begin":0,"end":17}},{"label":"p","span":{"begin":18,"end":1292}},{"label":"table-wrap","span":{"begin":1293,"end":1865}},{"label":"label","span":{"begin":1293,"end":1300}},{"label":"caption","span":{"begin":1302,"end":1462}},{"label":"p","span":{"begin":1302,"end":1462}},{"label":"table","span":{"begin":1463,"end":1865}},{"label":"tr","span":{"begin":1463,"end":1527}},{"label":"td","span":{"begin":1463,"end":1472}},{"label":"td","span":{"begin":1473,"end":1485}},{"label":"td","span":{"begin":1487,"end":1489}},{"label":"td","span":{"begin":1491,"end":1509}},{"label":"td","span":{"begin":1511,"end":1527}},{"label":"tr","span":{"begin":1528,"end":1589}},{"label":"td","span":{"begin":1528,"end":1539}},{"label":"td","span":{"begin":1541,"end":1545}},{"label":"td","span":{"begin":1547,"end":1549}},{"label":"td","span":{"begin":1551,"end":1571}},{"label":"td","span":{"begin":1573,"end":1589}},{"label":"tr","span":{"begin":1590,"end":1645}},{"label":"td","span":{"begin":1590,"end":1603}},{"label":"td","span":{"begin":1605,"end":1609}},{"label":"td","span":{"begin":1611,"end":1612}},{"label":"td","span":{"begin":1614,"end":1634}},{"label":"td","span":{"begin":1636,"end":1645}},{"label":"tr","span":{"begin":1646,"end":1693}},{"label":"td","span":{"begin":1646,"end":1650}},{"label":"td","span":{"begin":1652,"end":1655}},{"label":"td","span":{"begin":1657,"end":1658}},{"label":"td","span":{"begin":1660,"end":1680}},{"label":"td","span":{"begin":1682,"end":1693}},{"label":"tr","span":{"begin":1694,"end":1751}},{"label":"td","span":{"begin":1694,"end":1709}},{"label":"td","span":{"begin":1711,"end":1715}},{"label":"td","span":{"begin":1717,"end":1718}},{"label":"td","span":{"begin":1720,"end":1740}},{"label":"td","span":{"begin":1742,"end":1751}},{"label":"tr","span":{"begin":1752,"end":1806}},{"label":"td","span":{"begin":1752,"end":1757}},{"label":"td","span":{"begin":1759,"end":1763}},{"label":"td","span":{"begin":1765,"end":1766}},{"label":"td","span":{"begin":1768,"end":1788}},{"label":"td","span":{"begin":1790,"end":1806}},{"label":"tr","span":{"begin":1807,"end":1865}},{"label":"td","span":{"begin":1807,"end":1816}},{"label":"td","span":{"begin":1818,"end":1822}},{"label":"td","span":{"begin":1824,"end":1825}},{"label":"td","span":{"begin":1827,"end":1847}},{"label":"td","span":{"begin":1849,"end":1865}},{"label":"p","span":{"begin":1866,"end":2327}},{"label":"p","span":{"begin":2328,"end":2411}}],"tracks":[]}