Gene content The 69171 protein-coding sequences from the 29 strains were classified into 5361 homolgous groups or protein families (see Additional file 1, Table S2). A dendrogram constructed by hierarchical clustering (Figure 2) indicates that the overall similarity of the 29 strains based on gene content (binary data for presence or absence of different protein families) did not strictly follow their phylogenetic history (Figure 1 and Additional file 2, Figure S1). This indicates that the Staphylococcus gene repertoire reflects not only vertical inheritance of genes, but probable instances of one or more of the following: lineage-specific gene loss, non-orthologous gene displacement, or gene gain through horizontal gene transfer [64]. Figure 2 Gene content dendrogram. A dendrogram constructed by hierarchical clustering based on dissimilarities in gene content (binary data for presence or absence of protein families) for the 28 Staphylococcus strains and Macrococcus caseolyticus JCSCS5402. The dissimilarities were measured using the Jaccard distance, ranging from 0 to 1, represented by the horizontal bar at the base of the figure. We assessed presence of virulence factors in the Staphylococcus strains based on the gene content table (Additional file 1, Table S2) and percent identity values of TBLASTN best hits against VFDB (Additional file 3, Figure S2). Many virulence genes of S. aureus are encoded on mobile genetic elements such as staphylococcal cassette chromosomes (SCC), genomic islands, pathogenicity islands, prophages, plasmids, insertion sequences, and transposons [2,3,65]. For movement, SCC carries cassette chromosome recombinase (ccr) gene(s) (ccrAB or ccrC) [66,67]. The three ccr genes (ccrA, ccrB, and ccrC) are homologous and have no homolog in S. carnosus. The genetic determinant of methicillin resistance (mec) is encoded on SCC in S. aureus, designated as SCCmec [68]. Expression of beta-lactamase (blaZ) and penicillin-binding protein 2a (PBP 2a) genes (mecA) is controlled by the BlaR-BlaI-BlaZ and MecR-MecI-MecA regulatory systems, respectively [69]. There is homology between blaI and mecI, between blaR1 and mecR1, and between the promoter and N-terminal portions of blaZ and mecA [70]. mecA gene homologs were present in all Staphylococcus species, while presence of blaI/mecI and blaR1/mecR1 gene homologs varied among different Staphylococcus species and even between different strains within S. aureus. S. aureus genomic islands and pathogenicity islands carry superantigenic toxic shock syndrome toxin-1 (TSST-1) encoded by tst [71] homologous to the staphylococcal exotoxin-like (set) proteins, renamed staphylococcal superantigen-like (ssl) proteins. The tst gene homolog was present in S. carnosus TM300 (Sca_0436 and Sca_0905) and S. pseudintermedius HKU10-03 (SPSINT_0099). A previous study [26] reported that S. carnosus TM300 lacks the known superantigens such as toxic shock syndrome toxin 1 (tst) and enterotoxins (sea to sep). The serine protease (spl) gene homolog was not found in S. lugdunensis. Lipoprotein (lpl) gene homologs were present in S. aureus, S. epidermidis, S. haemolyticus, and S. lugdunensis. S. aureus prophages carry virulence factors such as Panton-Valentine leukocidin (lukS-PV and lukF-PV), staphylokinase (sak), exfoliative toxin A (eta), and enterotoxins [72]. The sak gene homolog was present in the 12 S. aureus strains but absent in the 4 S. aureus strains (COL, ED133, ED98, and RF122). The eta gene homolog was present in S. aureus, S. carnosus TM300 (Sca_2302), and S. pseudintermedius HKU10-03 (SPSINT_0069). S. aureus can produce several homologous two-component pore-forming toxins including Panton-Valentine leukocidin (lukS-PV and lukF-PV on prophage), leukotoxin D and E (lukD and lukE on genomic island), and gamma-hemolysin (hlgA, hlgB, and hlgC) [73,74], with homologs present in S. pseudintermedius HKU10-03 (SPSINT_1566 and SPSINT_1567). Staphylococcal enterotoxins (entD, entE, sea, seb, sec1, sec3, sed, seg2, seh, and sek2) encoded on S. aureus mobile genetic elements [2] were homologous and have a single homolog in S. pseudintermedius HKU10-03 (SPSINT_0513). As expected, a secreted von Willebrand factor-binding protein (coagulase) [75] was present in the coagulase-positive staphylococci (S. aureus and S. pseudintermedius) but absent in the coagulase-negative staphylococci [76]. To identify S. aureus and S. simiae unique genes, we compared gene presence and absence between the 16 S. aureus strains and the other 12 Staphylococcus strains, and between the single S. simiae strain and the other 27 Staphylococcus strains. A total of 272 protein families were present in S. aureus but absent in the other Staphylococcus species (Additional file 1, Table S3). This set included known as well as candidate virulence factors of S. aureus such as staphylococcal complement inhibitor SCIN (fibrinogen-binding protein), hyaluronate lyase (hysA), GntR family transcriptional regulator, secretory extracellular matrix and plasma binding protein, isdD (Iron uptake; Heme uptake), zinc finger SWIM domain-containing protein, 1-phosphatidylinositol phosphodiesterase known as a virulence factor (Exoenzyme; Membrane-damaging; Phospholipase) of Listeria monocytogenes (serovar 1/2a) EGD-e, formyl peptide receptor-like 1 inhibitory protein, NADH dehydrogenase subunit, 3-methyladenine DNA glycosylase, probable exported proteins and membrane proteins. Genes encoding quaternary ammonium compound-resistance protein SugE were absent in S. aureus but present in the other Staphylococcus species. It was previously shown that high-level expression of SugE of Escherichia coli leads to resistance to a subset of toxic quaternary ammonium compounds [77]. A total of 129 unique protein families were present in S. simiae but absent in other Staphylococcus species (Additional file 1, Table S4). This set included surface anchored protein, DNA-3-methyladenine glycosylase II, reverse transcriptase, transcriptional regulators, and phage-related proteins. The S. aureus and S. simiae unique genes may have been gained on the branch leading to the S. aureus ancestor and the S. simiae strain, and could be linked to their specific host adaptation and pathogenesis. Many of these genes were, however, quite short (< 150 bp) and functionally unknown, and thus could be protein-coding sequence prediction error. Enrichment tests across functional categories indicated that the JCVI mainrole categories "Cell envelope" (odds ratio = 1.15) and "Mobile and extrachromosomal element functions" (odds ratio = 1.38), the JCVI subrole categories "Pathogenesis" (odds ratio = 1.40) and "Prophage functions" (odds ratio = 1.38), the KEGG pathway map "Staphylococcus aureus infection" (odds ratio = 1.91), and the VFDB keyword "Type VII secretion system" (odds ratio = 7.06) were overrepresented in S. aureus relative to S. simiae (Additional file 1, Table S5). None of the functional categories were significantly over- or underrepresented based on Fisher's exact test after false discovery rate correction for multiple comparisons (P < 0.05). A total of 52 protein families associated with cell envelope were identified here, and the numbers were higher in S. aureus (ranging from 48 to 50) than in other Staphylococcus species (ranging from 33 to 45). Cell wall-associated proteins are involved in host-pathogen interactions, and those from S. aureus ED133 have been shown to be under diversifying selection pressure [15]. A total of 79 protein families associated with cell wall were identified here, and the numbers were higher in S. aureus (ranging from 60 to 64) than in other Staphylococcus species (ranging from 47 to 60). A cluster of eight genes, esxA, esaA, essA, essB, esaB, essC, esaC, and esxB, related to type VII secretion system [78] was present in the 15 S. aureus strains. Of the eight genes, esxA, esaA, essA, essB, esaB, and essC were present but esaC and esxB were absent in S. aureus MRSA252 and S. lugdunensis HKU09-01. S. aureus is known to carry a variety of mobile genetic elements such as prophages, plasmids, and transposons [2,72]. A total of 302, 166, and 27 protein families associated with phage, plasmid, and transposase were identified here. The numbers of protein families annotated as phage, plasmid, and transposase in S. simiae were 126, 75, and 13, whereas the numbers present in genomes of S. aureus ranged from 130-195, 84-124, and 11-20. This ranks S. aureus among the top of Staphylococcus genomes in terms of abundance of genes related to mobile genetic elements. Our results suggest that pathogenesis in the S. aureus group has developed by gene gain through horizontal transfer of mobile genetic elements, after divergence of S. simiae and S. aureus from their common ancestor.