6.2 Real Datasets Table 5 shows the results of the Exit-point methodology on real biological sequences. We have chosen l = 20 and d = 2. 't' indicates the number of sequences in the real data. For the biological samples taken from [1,12], the value m once again is the average number of random projection + EM cycles required to discover the motif. All other parameter values (like projection size k = 7 and threshold s = 4) are chosen to be the same as those used in the Random projection paper [1]. All of the motifs were recovered with m = 1 using the Exit-point strategy. The Random Projection algorithm alone needed multiple cycles (m = 8 in some cases and m = 15 in others) in order to retrieve the correct motif. This elucidates the fact that global methods can only be used to a certain extent and should be combined with refined local heuristics in order to obtain better efficiency. Since the random projection algorithm has outperformed other prominent motif finding algorithms like SP-STAR, WINNOWER, Gibbs sampling etc., we did not repeat the same experiments that were conducted in [1]. Running one cycle of random projection + EM is much more expensive computationally. The main advantage of our strategy comes from the deterministic nature of our algorithm in refining motifs. Table 5 Results on real datasets. Results of Exit-point method on biological samples. The real motifs were obtained in all the six cases using the Exit-point framework. Sequence Sample Size t Best (20,2) Motif Reference Motif E. coli CRP 1890 18 TGTGAAATAGATCACATTTT TGTGANNNNGNTCACA preproinsulin 7689 4 GGAAATTGCAGCCTCAGCCC CCTCAGCCC DHFR 800 4 CTGCAATTTCGCGCCAAACT ATTTCNNGCCA metallothionein 6823 4 CCCTCTGCGCCCGGACCGGT TGCRCYCGG c-fos 3695 5 CCATATTAGGACATCTGCGT CCATATTAGAGACTCT yeast ECB 5000 5 GTATTTCCCGTTTAGGAAAA TTTCCCNNTNAGGAAA Let the cost of applying EM algorithm for a given bucket be f and let the average number of buckets for a given projection be b. Then the running time of the Exit-point method will be O(cbf) where c is a constant that is linear in l-the length of the motif. If there were m projections, then cost of the random projection algorithm using restarts will be O(mbf). The two main advantages of using Exit-point strategy compared to random projection algorithm are : • It avoids multiple random projections which often provide similar optimal motifs. • It provides multiple optimal solutions in a promising region of a given bucket as opposed to a single solution provided by random projection algorithm.