In Equation 6, C is the size of the cluster containing k proteins with a given function; G is the size of the universal set of proteins of known proteins and contains n proteins with the function. Because the p-values are frequently small numbers with positive values between 0 and 1, the negative logarithms (to base 10, denoted -log p) are used. A -log p value of 2 or greater indicates statistical significance at α = 0.01.