PMC:1794230 / 51771-53429
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"17244358-11125071-1689330","span":{"begin":115,"end":117},"obj":"11125071"},{"id":"17244358-11743721-1689331","span":{"begin":564,"end":566},"obj":"11743721"},{"id":"17244358-9254694-1689332","span":{"begin":594,"end":596},"obj":"9254694"}],"text":"Locating orthologous sequences\nGenome sequence data and annotations were downloaded from the NCBI RefSeq database [42]: Escherichia coli K12 (NC_000913.1), Salmonella enterica serovar Typhi (S. typhi)(NC_003198), Yersinia pestis CO92 (NC_003143), Haemophilus influenzae Rd (NC_000907), Vibrio cholerae El Tor (NC_002505 and NC_002506), Shewanella oneidensis MR-1 (NC_004347 and NC_004349), and Pseudomonas aeruginosa PA01 (NC_002516). Orthologs for each of the annotated E. coli genes were identified in each of the remaining six species, using INPARANOID v.1.35 [43]. This program uses BLAST [44] to compare the complete set of predicted protein sequences from one genome with that of another, and identifies the reciprocal best hits. We set the parameters to use the BLOSUM62 matrix and a minimum bit score of 30, and we required that the alignment cover at least 50% of both proteins.\nIn the examples presented in this study, E. coli was the primary species of interest; we therefore identified a set of E. coli promoter-containing sequences by identifying each E. coli protein-coding gene (excluding 111 genes encoded on transposons or prophage elements) that has at least 20 bp of upstream intergenic sequence. By these criteria, there are 2379 E. coli intergenic regions of interest. Orthologous upstream intergenic-sequence data files were then generated for this set of 2379 E. coli regions, using the results from INPARANOID to identify orthologs, and the seven genome annotations to define intergenic boundaries. In the Supplementary Materials are a table with these data [see Additional file 2] and a caption for the table [see Additional file 1]."}