PMC:1679804 / 1408-3586 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"17118189-10487868-1688062","span":{"begin":510,"end":511},"obj":"10487868"},{"id":"17118189-16093699-1688063","span":{"begin":1493,"end":1494},"obj":"16093699"},{"id":"17118189-8324630-1688064","span":{"begin":1886,"end":1887},"obj":"8324630"},{"id":"17118189-8697238-1688064","span":{"begin":1886,"end":1887},"obj":"8697238"},{"id":"17118189-14980017-1688064","span":{"begin":1886,"end":1887},"obj":"14980017"}],"text":"Searching biological sequence(s) for motifs is a fundamental task in bioinformatics. Motifs can be represented as either patterns over a specific alphabet, or profiles (also called positional weight matrix (PWM)), which give the probability of observing each symbol in each position. Motifs can be classified into two main types. If no variable gaps are allowed in the motif, it is called a simple motif. For example, in the genome of Saccharomyces cerevisiae, the binding sites of transcription factor, GAL4 [1], can be characterized by the simple motif shown in Table 1, which illustrates the pattern over the IUPAC alphabet (ΣIUPAC; see Table 2), as well as its profile (which gives the frequency of each DNA base at each position). The motif in Table 1 only consists of one component and thus is a simple motif. Since the symbols in the first 3 positions (CGS) and in the last 3 positions (SCG) are well conserved, we can also represent this motif as CGS[11,11]SCG, where [11,11] means that there is a fixed \"gap\" of length 11 between the two components. If variable gaps are allowed in a motif, it is called a structured motif. A structured motif can be regarded as an ordered collection of simple motifs with gap constraints between each pair of adjacent simple motifs. For example, the LTR retrotransposons from the Copia group, corresponding to genes encoding reverse transcriptase, in A. thaliana can be characterized by the structured motif M1 [2,5] M2 [6,7] M3, as shown in Table 3[2]. Here M1, M2 and M3 are three simple motifs; [2,5] and [6,7] are variable gap constraints ([minimum gap, maximum gap]) allowed between the adjacent simple motifs. Note that each simple motif Mi (with 1 ≤ i ≤ 3) can either be a pattern over ΣIUPAC or a profile over ΣDNA. Searching for structured motifs is more complicated than searching for simple motifs, and is an ongoing research area [3-7]. The sequence to be searched can be very long, e.g., chromosome 1 of Homo Sapiens contains 245 million (245M) base pairs. The structured motif can also be as long as several kilobases. All these factors need to be considered when designing an efficient structured motif search algorithm."}