Table 2 Top 20 Predictions by PhyloScan C1 C2 C3 C4 C5 C6 E. coli Sequence Fulla Fulla Reducedb Reducedb Reduced & Alignedc Reduced & Alignedc Indep. Species No Yes No Yes No Yes Rank Gene log(q) Gene log(q) Gene log(q) Gene log(q) Gene log(q) Gene log(q) 1 yibI -4.65 cdd -9.28 mtlA -5.14 mtlA -9.76 mtlA -7.66 mtlA -12.15 2 yqcE -2.86 glpT -7.21 ygcW -2.89 cdd -9.60 yjcB -4.55 glpA -9.19 3 b1904 -2.61 mglB -6.01 yjcB -2.62 glpA -8.31 gcd -3.99 cdd -9.16 4 fucA -2.51 yibI -5.26 yjiY -2.60 mglB -6.53 b2146 -3.97 mglB -7.60 5 deaD -2.51 yjiY -4.57 b2146 -2.53 gapA -5.21 fucA -3.93 udp -6.26 6 yjiY -2.42 hemC -4.38 fucA -2.51 udp -5.17 ygcW -3.42 gapA -6.02 7 cdd -2.29 deaD -4.35 deaD -2.47 yjiY -4.79 flhD -3.03 yjcB -5.09 8 yeaA -2.22 ysgA -4.33 cdd -2.31 cyaA -4.70 gapA -3.03 cyaA -5.04 9 yhcR -2.06 yhcR -3.99 gapA -2.22 deaD -4.37 ycdZ -3.01 malE -4.83 10 ycdZ -1.96 yqcE -3.56 qseA -2.03 malE -4.29 udp -2.78 ycdZ -4.69 11 b2736 -1.87 adhE -3.47 ycdZ -1.98 ygcW -3.63 b2248 -2.76 adhE -4.56 12 uxaC -1.81 ycdZ -3.45 mglB -1.90 adhE -3.58 glpA -2.76 b2146 -4.53 13 ysgA -1.77 yeaA -3.44 udp -1.86 ycdZ -3.52 mglB -2.73 fucA -4.46 14 glpT -1.75 mlc -3.37 uxaC -1.85 mlc -3.48 qseA -2.68 pckA -4.09 15 mglB -1.63 b1904 -3.31 glpA -1.84 fucA -3.32 pckA -2.36 aer -3.97 16 pckA -1.39 fucA -3.23 pckA -1.45 yjcB -3.32 adhE -2.14 ygcW -3.78 17 serA -1.23 b2736 -3.18 malE -1.36 pckA -3.23 aer -2.13 gcd -3.67 18 aer -1.23 pckA -3.17 aer -1.32 aer -3.17 cdd -2.10 deaD -3.65 19 adhE -1.22 aer -3.08 serA -1.32 qseA -3.07 deaD -2.04 serA -3.62 20 mlc -1.01 yjeG -3.05 adhE -1.28 uxaC -3.07 uxaC -2.02 mlc -3.62 # Diffs from C6 10 11 3 3 4 0 Because it is sometimes instructive to examine a fixed number of top hits regardless of the reported q-values, in this table we compare the six approaches' best 20 intergenic regions for Crp. By comparing each column to Column C6, which is the best approach we employed, we see that the C1-C5 approaches give significantly different q-values for, and orderings of, the predicted regulated genes. As indicated in the bottom row, the C1-C5 approaches miss several of the top-20 genes reported in C6, replacing them with genes that did not make the C6 top-20 list. In particular, although it uses all of the sequence data except S. typhi, C2 is significantly different from C6. Furthermore, although C3 has few differences from C6 in the set of genes indicated, the q-values of C3 are considerably worse and the gene order is substantially rearranged. These data suggest that the ability to simultaneously handle both aligned and unaligned data is important in obtaining accurate predictions. Notes: abcSee the caption notes for Table 1. Also see the Table 1 caption for descriptions of Columns C1-C6.