PMC:1679804 / 18979-19930
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1679804","sourcedb":"PMC","sourceid":"1679804","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1679804","text":"In this paper, we focus on the problem of searching for a given structured motif in one or more sequences. We propose SMOTIF, an efficient algorithm for structured motif searches. It uses an inverted index of symbol positions, and it finds all occurrences by positional joins over this index. For structured pattern search problem, we propose two main variants of our approach: i) a direct search for simple motifs and the structured motif via positional joins, and ii) a two-step approach, where we use a suffix tree to search for simple motifs and then use positional joins for the structured motif. For structured profile search problem, we first search each simple motif by aligning its profile with the sequences, and then search structured motifs with positional joins. SMOTIF allows missing components, overlapping motifs, and also approximate matches (when using the two-step approach). SMOTIF also allows flexible matches using IUPAC symbols.","tracks":[]}