PMC:1570465 / 45136-45982
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"16916460-10854408-1687100","span":{"begin":123,"end":124},"obj":"10854408"},{"id":"16916460-9813115-1687101","span":{"begin":125,"end":127},"obj":"9813115"},{"id":"16916460-15297295-1687102","span":{"begin":147,"end":149},"obj":"15297295"},{"id":"16916460-9813115-1687103","span":{"begin":515,"end":517},"obj":"9813115"}],"text":"We apply our method to identify the binding sites of 36 E.coli regulatory proteins. We construct our dataset from that of [6,28], as described in [32]. For each binding site, we locate it within the genome and extract up to 600 bp of DNA sequence upstream from the gene it regulates. We remove binding sites for sigma factors, binding sites for transcription factors with fewer than three known sites, and those that could not be unambiguously located in the genome. Motif length parameters are set as reported by [28], except for crp, where a length of 18 instead of 22 is used. Background nucleotide frequencies are computed using the upstream regions for each dataset individually. The final dataset consists of 36 transcription factors, each regulating between 3 and 33 genes, with binding site length ranging between 11 and 48 (see Table 2)."}