DATABASE CONTENT Over the past eight years, the number of analyzed genomes in the PEDANT database has grown steadily (Figure 1) and stands at 334 at the time of writing, including 228 completely sequenced and 106 unfinished genomic sequences from all three kingdoms of life (Figure 2). Most of these genomes were annotated in a totally unsupervised fashion. However, the database also includes several genomes that were manually annotated and, in many cases, published by MIPS. Those are Saccharomyces cerevisiae (8), Thermoplasma acidophilum (9), Arabidopsis thaliana (10), Neurospora crassa (11), Parachlamydia UWE25 (12), Listeria monocytogenes EGD, Listeria innocuaClip 11262 and Helicobacter pylori KE26695. The total amount of data managed by PEDANT via a relational database system MySQL, is ∼360 GB, more than one gigabyte per genome on average. To illustrate the functional and structural content of the PEDANT database, we calculated the coverage of all 1 240 000 annotated protein sequences by three selected popular categories: PFAM sequence motifs (13), SCOP structural domains (14) and MIPS functional role categories (15). As seen in Figure 3, the coverage varies in a wide range—from 64.3% by PFAM to 34.5% by SCOP. Only 15.2% of proteins possess all three attributes emphasizing the usefulness of applying many complementary bioinformatics techniques. The total number of all attributes computed by PEDANT for each sequence exceeds 20. The PEDANT database thus represents a valuable resource for large-scale association rule mining in automatically generated protein annotation.