PMC:1794230 / 36055-38376 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1794230","sourcedb":"PMC","sourceid":"1794230","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1794230","text":"Quality of reported sites\nThe second cutoff that our approach requires is the q-value cutoff that specifies which sites will be reported. We chose a default value of 0.001, meaning that according to our model, at most 0.1% of the intergenic sequences that we report as binding the transcription factor are chance false positives. While we have incorporated a fairly accurate phylogenetic model, we have not incorporated into this model such effects as the non-independence of the positions in a site (e.g., the effect of di- or tri-nucleotide energy terms, also known as stacking energies), nor effects from the cooperative binding of multiple transcription factors on the ability of a factor to bind to a DNA site. Because our model does not capture these and other features, the actual rate of false positives is likely to be higher than 0.1%.\nOn the other hand, in calculating the q-value, we have assumed that the vast majority of intergenic sequences in a genome will likely not contain a transcription factor binding site for the particular transcription factor under study, i.e., we are looking for rare events. Under this assumption, the proportion of all intergenic sequences that are truly null will approach 1.0 in Storey and Tibshirani's q-value calculation (the π^0 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFapaCgaqcamaaBaaaleaacqaIWaamaeqaaaaa@2F9A@ term of [16]), and so does not appear in our q-value equation (see Methods). In a case where this assumption does not hold, the q-values provided by our approach will be overly conservative.\nNote that the scan technology, first described by Staden [10] and employed here, is a frequentist hypothesis testing approach. A Bayesian approach presents an alternative through the use of Bayesian posterior probabilities for each site. Such an approach would require the specification of a model from which alternative sequences are drawn as well as null sequences. When a large number of observations are available the approach of Efron et al. [36] provides a compromise that yields local false discovery rates through the use of empirical Bayesian methods.","divisions":[{"label":"title","span":{"begin":0,"end":25}},{"label":"p","span":{"begin":26,"end":845}},{"label":"p","span":{"begin":846,"end":1760}}],"tracks":[{"project":"2_test","denotations":[{"id":"17244358-12883005-1689320","span":{"begin":1579,"end":1581},"obj":"12883005"},{"id":"17244358-2720468-1689321","span":{"begin":1819,"end":1821},"obj":"2720468"}],"attributes":[{"subj":"17244358-12883005-1689320","pred":"source","obj":"2_test"},{"subj":"17244358-2720468-1689321","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#ecb393","default":true}]}]}}