1 INTRODUCTION Protein phosphorylation causes the addition of a phosphate group onto serine, threonine or tyrosine amino-acid residues of proteins. Phosphorylations are precise reversible changes that are used to regulate intracellular events such as protein complex formation, cell signaling, cytoskeleton remodeling and cell cycle control. Consequently, protein kinases, which are responsible for the phosphorylations, play an important role in controlling protein function, cellular machine regulation and information transfer through cell signaling pathways. Kinase activities therefore have definitive regulatory effects on a broad variety of biological processes, in which activated kinases typically target a large number of different substrate proteins. There are over 500 protein kinases encoded in the human genome, and it is approximated that 40% of all proteins are phosphorylated at some stage in different cell types and at different cell states (Manning et al., 2002). Furthermore, kinases regulate each other through phosphorylation, resulting in a complex web of regulatory relations (Ma'ayan et al., 2005). High-throughput techniques such as stable isotope labeling coupled with affinity purification and mass-spectrometry proteomics are now able to identify phosphorylation sites on multiple proteins under different experimental conditions. Databases that integrate the results from such studies are emerging, e.g. phosphosite (Hornbeck et al., 2004). However, such data does not provide the kinases responsible for the phosphorylation. Several resources are available to link identified phosphorylation sites to the kinases that are most likely responsible for protein phosphorylations (Huang et al., 2005; Linding et al., 2008). For example, NetworKIN (Linding et al., 2007; Linding et al., 2008) uses an algorithm to predict the most probable kinase that is responsible for phosphorylating an identified phosphosite. The NetworKIN algorithm is accompanied with a database containing ∼1450 predicted mammalian substrates that are mapped to 73 upstream protein kinases belonging to 21 kinase families. Although useful, the coverage of this dataset is not comprehensive enough for kinase statistical enrichment analysis. To achieve more comprehensive prior knowledge kinase–substrate dataset, large enough for statistical enrichment analysis, we merged interactions from several other online sources reporting mammalian kinase–substrate relations. Additionally, we included binary protein—-protein interactions involving kinases from protein–protein interaction databases as these were recently proposed to be highly enriched in kinase–substrate relations: in a recent study that identified ∼14 000 phosphosites at different stages of the cell cycle in Hela cells (Dephoure et al., 2008) it was shown that many phosphosites experimentally identified using phosphoproteomics can be associated with four known kinases (CDC2, PLK1, Aurora-B and Aurora-A) using the literature-based protein–protein interactions from the HPRD database (Mishra et al., 2008). Hence, having a large background knowledge dataset of kinase–substrate interactions and protein–protein interactions that involve kinases, we can associate large lists of proteins/genes with many kinases that phosphorylate them. This allows the computation of statistical enrichment which can be used to suggest the kinases that are most likely to be involved in regulating the proteins/genes from a list generated under specific experimental conditions.