Genome sequencing and data collation We determined the genome sequence of Staphylococcus simiae type strain CCM 7213T (= LMG 22723T), isolated from the faeces of a South American squirrel monkey [11]. Roche/454 pyrosequencing, involving a single full run of the GS-20 sequencer, was used to determine the sequence of the Staphylococcus simiae genome. The sequences were assembled (De novo assembly with Newbler Software) into 565 contigs. Genome annotation for the strain was done by the NCBI Prokaryotic Genomes Automatic Annotation Pipeline. The S. simiae whole genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AEUN00000000. The version described in this paper is the first version, AEUN01000000. For comparative analysis genome sequences of bacteria in GenBank format [12] were retrieved from the National Center for Biotechnology Information (NCBI) site We analyzed sequences of 28 Staphylococcus strains belonging to 12 different species, and an outgroup Macrococcus caseolyticus JCSCS5402 [13] (Table 1 and Additional file 1, Table S1). The 16 Staphylococcus aureus strains included COL [14], ED133 [15], ED98 [16], JH1, JH9, MRSA252 [17], MSSA476 [17], Mu3 [18], Mu50 [19], MW2 [20], N315 [19], NCTC_8325, Newman [21], RF122/ET3-1 [22], USA300_FPR3757 [23], and USA300_TCH1516 [24]. The remaining 12 Staphylococcus strains included Staphylococcus capitis SK14 [25], Staphylococcus caprae C87, Staphylococcus carnosus TM300 [26], Staphylococcus epidermidis ATCC 12228 [27], Staphylococcus epidermidis RP62a [14], Staphylococcus haemolyticus JCSC1435 [28], Staphylococcus hominis SK119, Staphylococcus lugdunensis HKU09-01 [29], Staphylococcus pseudintermedius HKU10-03 [30], Staphylococcus saprophyticus ATCC_15305 [31], Staphylococcus simiae [11], and Staphylococcus warneri L37603. Genome sequence analyses were implemented using Bioperl version 1.6.1 [32] and G-language Genome Analysis Environment version 1.8.12 [33-35]. Statistical tests and graphics were implemented using R, version 2.11.1 [36]. Table 1 Genomic features of Macrococcus caseolyticus and 28 Staphylococcus strains. %G + C = 100 × (G + C)/(A + T + G + C). S = Selected codon usage bias. No.CDS = Number of protein-coding sequences. No.MCL = Number of protein families built by BLAST and Markov clustering.