PMC:1570465 / 7922-9851
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"16916460-8211139-1687075","span":{"begin":240,"end":242},"obj":"8211139"},{"id":"16916460-12015879-1687076","span":{"begin":739,"end":741},"obj":"12015879"},{"id":"16916460-12824370-1687077","span":{"begin":742,"end":744},"obj":"12824370"},{"id":"16916460-11997340-1687078","span":{"begin":1100,"end":1101},"obj":"11997340"}],"text":"We test our coupled mathematical programming and pruning approach, LP/DEE, in diverse settings. First, we consider the problem of finding shared motifs in protein sequences. Unlike commonly-used PSSM-based methods for motif finding (e.g., [15,18]), our combinatorial formulation naturally incorporates amino acid substitution matrices. For all tested datasets, we find the actual protein motifs exactly, and these motifs correspond to optimal solutions according to the SP scoring scheme. Second, we consider sets of genes known to be regulated by the same E. coli transcription factor, and apply our approach to find the corresponding binding sites in genomic sequence data. We compare our results with those of three popular methods [18,22,39], and show that our method is often able to better locate the actual binding sites. Using the same dataset, we also embed E. coli binding sites within sequences of increasingly biased composition, and show that our scoring scheme and motif finding procedure is effective in this scenario as well. Third, we consider the phylogenetic footprinting problem [9], and find shared motifs upstream of orthologous genes. The difficulty of this problem lies in that the sequences may not have had enough evolutionary time to diverge and may share sequence level similarity beyond the functionally important site; incorporation of additional information, in the form of weightings obtained from a phylogenetic tree relating the species, proves useful in this context. Finally, we demonstrate in the context of phylogenetic footprinting that our formulation can be used to find multiple solutions, corresponding to several distinct motifs. In all scenarios, we test the uncovered motifs for statistical significance. We show that our method works well in practice, typically recovering statistically significant motifs that correspond to either known motifs or other motifs of high conservation."}