Methods
Let I represent the set of all genes, J all experiments, and K all biclusters. A bicluster k∈K contains genes Ik, where each gene is i∈I, and includes experiments j∈Jk such that Jk⊆J.
In the original cMonkey [3], the variance for each experiment j is calculated as σj2=|I|-1∑i∈I(xij-x¯j)2 where xij is the expression level for gene i in experiment j and x¯j=∑i∈Ixij/|I|. The likelihood for a given xij in cluster k is
(1)  p ( x  i j    )  = 1  2 π ( σ  j  2   + ε  2    )      exp - ( x  i j   - x  ¯   j k    )   2   + ε  2    σ  j  2   + ε  2
where ε is a constant error term, x¯jk=∑i∈Ixij/|Ik|, and Ik is the genes in cluster k. The co-expression p-value, rik, for each gene i is derived from Equation (1). This is combined with weighted log p-values calculated for the TF binding motifs (Qik) and known gene associations (Sik) as gik=rologr˜ik+Qik+Sik where logr˜ik is the a z-score normalized version of logrik and ro is a weight for adjusting the relative importance of rik. A final score for each bicluster is calculated as
(2)  scor e  k   = ∑  i ∈ I  k      g  i k   /  | I  k   |

Bicluster Sampled Coherence Metric (BSCM) method
Here we change how the co-expression p-value, rik was calculated as follows:
(3)  r  j k   = 1  2 π σ  σ  ¯   j | k |    2       exp - σ  j k  2   - σ  ¯   j | k |  2    σ  σ  ¯   j | k |    2
(4)  r  i k   = ∑  j ∈ J  k      r  j k    | J  k   |
σ¯j|k| is the mean variance for the number of genes in bicluster k as determined bootstrap sampling. σσ¯j|k|2 is the standard deviation of the values used to calculate σ¯j|k|. The background distribution is calculated for each condition j∈J and for each number of genes that occurs in a given bicluster k by sampling |k| genes 200 times from experimental condition j and drawing additional samples in sets of 200 until σ¯j|k| and σσ¯j|k|2 change by less than 1%. To determine which genes should be added or removed from a cluster, we calculate a new rik supposing gene i were added or removed. As a practical matter, background distributions for are pre-calculated for all cluster sizes less than or equal to the maximum size represented in the initial seed clusters, and additional background distributions are calculated as needed during program execution.

Cluster scoring based on GO terms
To independently evaluate the quality of the clusters, we calculate a Gene Ontology[7] based GOScore from the binomial enrichment of GO slim terms, G.
(5)  G O S c o r e = ∑  k    ∑  g  G    - log ( pG O  k , g    )
where pGOk,g is the enrichment p-value for term g in cluster k.

Classifier construction
We tested whether rik could be used with a p-value cutoff of 0.05 to predict if experimental conditions would result in peroxisome proliferation ("YES") or not ("NO"). We built 544 yeast biclusters using 233 experiments in seven different experimental conditions with known peroxisome proliferation: thirty glucose ("NO"), twenty early oleate ("YES"), and twenty-one late oleate experiments ("YES")[2], seventy-five galactose ("NO"), eighteen lactate ("YES"), five rho- ("YES"), and sixty-four antimycin ("YES") experiments [8,9,13,17]. For every bicluster, each of the 233 experiments was assigned a value indicating whether genes are "UP" or "DOWN" -regulated if included in a given bicluster, or "EXCLUDED" otherwise. Many experiments were replicates, so standard n-fold cross-validation was inappropriate. Therefore, each of the seven growth-conditions was treated as a splitting boundary. Thus when the classifier predicted proliferation in antimycin, antimycin was absent from the training set. During each split we downsampled, thus providing stochastisticity. Predictions were made using decision trees, logistic regression, support vector machines (SVMs), and naive bayes [42,43]. (See supplemental code and data for implementation.)