PMC:1687185 / 2169-4726
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"17129371-15637633-1691516","span":{"begin":1180,"end":1181},"obj":"15637633"},{"id":"17129371-8211139-1691517","span":{"begin":1972,"end":1973},"obj":"8211139"}],"text":"1 Introduction\nRecent developments in DNA sequencing have allowed biologists to obtain complete genomes for several species. However, knowledge of the sequence does not imply the understanding of how genes interact and regulate one another within the genome. Many transcription factor binding sites are highly conserved throughout the sequences and the discovery of the location of such binding sites plays an important role in understanding gene interaction and gene regulation.\nWe consider a precise version of the motif discovery problem in computational biology as discussed in [1,2]. The planted (l, d) motif problem [2] considered in this paper is described as follows: Suppose there is a fixed but unknown nucleotide sequence M(the motif) of length l. The problem is to determine M, given t sequences with ti being the length of the ith sequence and each containing a planted variant of M. More precisely, each such planted variant is a substring that is M with exactly d point substitutions (see Fig. 1). More details about the complexity of the motif finding problem is given in [3]. A detailed assessment of different motif finding algorithms was published recently in [4].\nFigure 1 Synthetic DNA sequences containing some instance of the pattern 'CCGATTACCGA' with a maximum number of 2 mutations. The motifs in each sequence are highlighted in the box. We have a (11,2) motif where 11 is the length of the motif and 2 is the number of mutations allowed. Although there are several variations of the motif finding algorithms, the problem discussed in this paper is defined as follows: without any previous knowledge of the consensus pattern, discover all the occurences of the motifs and then recover a pattern for which all of these instances are within a given number of mutations (or substitutions). Despite the significant amount of literature available on the motif finding problem, many do not exploit the probabilistic models used for motif refinement [5,6].\nWe provide a novel optimization framework for refining motifs using systematic subspace exploration and neighborhood search techniques. This paper is organized as follows: Section 2 gives some relevant background about the existing approaches used for finding motifs. Section 3 describes the problem formulation in detail. Section 4 discusses our new framework and Section 5 details our implementation. Section 6 gives the experimental results from running our algorithm on synthetic and real datasets. Finally, Section 7 concludes our discussion with future research directions."}