Results We used a comparative genomics approach to identify genetic characteristics that were either unique to C. pneumoniae or were commonly shared with chlamydial species and other organisms. Overall, we analysed a total of 66 genes for key similarities and differences between human and animal strains of C. pneumoniae (see Additional file 1 for list of 66 genes analysed). We chose to use our C. pneumoniae animal genome as the reference genome and compared the available human genomes to it. Chlamydia pneumoniae-specific genes Using tblastx and tblastn, we searched other genomes for orthologs of the genes identified from individual gene-by-gene comparisons of the five C. pneumoniae genomes. The comparative approach identified 140 genes that were specific to C. pneumoniae and for which no significant similarity was detected in any other organism (Additional file 2). Many of these species-specific genes are short open reading frames (ORFs) that have been annotated as genes. The large number of short hypothetical ORFs makes it difficult to determine whether these genes are 'real' or artefacts of the genome annotation or sequencing process. Until these proteins are systematically studied in the future, it cannot be determined whether these proteins are valid or too short to be protein-coding genes. Genes with suggested or predicted functions include putative lipoproteins and chlamydial inclusion membrane proteins (IncA). Several hypothetical proteins are clustered together (including CPK_ORF00340-343, 389-401, 496-498, 567-569, 658-661, 969-980: LPCoLN locus designation, CPK), suggesting that they may exist in an operon and might be functionally related. Genes with a demonstrated role in chlamydial biology and/or pathogenicity One of the most striking differences between the C. pneumoniae koala and human genomes were changes associated with the polymorphic membrane proteins (Pmps). Like C. pneumoniae of human origin [25,26], koala LPCoLN is predicted to encode 21 Pmps that are phylogenetically related to one of six basic subtypes (pmpA, B/C, D, E/F, G/I and H; Additional files 3 and 4) [25,27-29]. The organisation of the pmp loci of koala LPCoLN is conserved relative to the C. pneumoniae human isolates. However, where the four human isolates carry several interrupted pmp genes (Additional file 3), koala LPCoLN carries uninterrupted, full-length versions of the same genes, including pmpG3, pmpG4 and pmpE3 (Additional file 3). A global comparison of all pmp sequences reveals a total of 2015 SNPs (of which 994 generated an amino acid change; see Additional file 5 for approximate SNP positions) differentiating koala LPCoLN from the four C. pneumoniae human isolates, with the highest percentage of SNPs observed in pmpE4 (30.46%) and pmpE3 (6.95%). In addition to SNPs, several strain-specific (human versus animal) indels (insertions and deletions) were evident in pmpB, pmpE1, pmpG2, pmpG4, pmpG5, pmpG7, pmpG10, and pmpG13 (Additional file 3). Interestingly, the pmpG5 pseudogene is interrupted by a stop codon in all five strains albeit at different sites: LPCoLN carries a seven nt indel (GAT GTA C) at nt position 332, resulting in a TAA stop codon at nt position 346, while the four human isolates have a SNP (C to T) at nt 1483, resulting in a TAA stop codon (data not shown). Previous analyses of human isolates have revealed variable numbers of 393 nt tandem repeat segments in pmpG6, including two repeats in AR39 [25] and J138 [30], and three repeats in TW183 (Geng MM, Schuhmacher A, Muehldorfer I, Bensch KW, Schaefer KP, Schneider S, Pohl T, Essig A, Marre R, Melchers K: The genome sequence of Chlamydia pneumoniae TW183 and comparison with other Chlamydia strains based on whole genome sequence analysis, submitted) and CWL029 [26]. The LPCoLN genome carries three variable tandem repeats in pmpG6. Type III secretion (T3S) occurs independently of the sec pathway and requires assembly of a secretion apparatus composed of approximately 20 proteins. However, in this analysis we looked at more than just the apparatus proteins; we also examined potential secreted proteins and chaperone proteins involved in T3S. Additional file 6 compares 26 proteins from LPCoLN to putative orthologs of other chlamydial species. Ten apparatus-encoding genes (CDSs CPK_ORF00106, 111, 115, 231, 232, 233, 234, 236, 830 and 831), which were either annotated as such and/or found to be homologous to previously studied proteins in other chlamydial spp. were examined. Genetic comparisons indicate ≥ 98.2% nucleotide sequence identity between each koala LPCoLN T3S apparatus gene and orthologs from the human isolates. Similarly, comparisons of genes which were annotated as, or are similar to, chaperone-encoding genes that assist in the folding of effectors demonstrated high conservation with ≥ 99.3% sequence identity to the equivalent genes in the C. pneumoniae isolates from humans. Nine putative effector-encoding genes (CDSs CPK_ORF00107, 216, 217, 430, 445, 446, 799, 800 and 832) of C. pneumoniae koala LPCoLN were compared to their counterparts in the human isolates. As with the T3S apparatus proteins, all koala LPCoLN effectors exhibited ≥ 98.2% sequence identity with their human isolate counterparts (Additional file 6). The plasticity zone or replication termination region is a hypervariable region that is linked to genetic differences in chlamydial pathogen-host relationships. The membrane attack complex/perforin (MACPF) of the plasticity zone is one such protein that showed varied degrees of polymorphism among the chlamydial species. There was a significant length polymorphism differentiating the MACPF of C. pneumoniae koala (2457 nt CPK_ORF00685) from all four human isolates which encode a predicted defective MACPF separately incorporated into two ORFs, CP_0594 (381 nt) and CP_0593 (1236 nt) (Figure 1). A comparison with other chlamydial species showed that a C. pneumoniae ancestor separated from other 'Chlamydophila spp.' before the large indel was removed from the human isolates. Similar to the C. pneumoniae koala LPCoLN MACPF, C. trachomatis serovar, A/HAR, B/Jali20/OT, D/UQ-3/CX, L2b/UCH-1/proctititis, L2/434/Bu and C. muridarum Nigg isolates had a full-length version (Additional file 7), while variations of the MACPF were observed in C. abortus S26/3, C. caviae GPIC and C. felis Fe/C-56 isolates (Additional file 7), which showed frame disruptions (likely pseudogenes). Protochlamydia amoebophila UWE25 did not have any detectable MACPF orthologs. MotifScans of the chlamydial MACPF (with the exception of C. caviae, C. abortus and C. felis) revealed an MIR (Mannosyltransferase, Inositol 1,4,5-trisphosphate receptor and Ryanodine receptor) motif, suggestive of a possible ligand transferase function. The size variation of the C. pneumoniae MACPF may serve as a useful marker in future genetic investigation. For example, the MACPF gene sequence may potentially differentiate C. pneumoniae animal isolates from C. pneumoniae human isolates. Figure 1 Organisation of the C. pneumoniae plasticity zone. A comparison of the C. pneumoniae koala LPCoLN and human AR39 genomes revealed evidence of fragmentation, gene decay, gene gain/loss in the plasticity zone. Genes are labelled with the published locus numbers. Lines connect orthologs. Role categories and colours are as follows: fatty acid and phospholipid metabolism, magenta; conserved hypothetical proteins, blue; cell envelope, light green; hypothetical proteins, black; biosynthesis of purines, pyrimidines, nucleosides, and nucleotides, orange; energy metabolism, light gray. Arrows indicate the direction of transcription. Genes involved in nucleotide salvaging pathways or amino acid biosynthesis All sequenced chlamydial genomes to date encode a CTP synthetase, the enzyme that converts UTP to CTP, and an ATP/ADP translocase [25,31]. However, comparative genomic analysis suggests that multiple modifications have occurred in nucleotide salvage pathways during the course of chlamydial evolution, as revealed by the variable presence of udk (uridine kinase), pyrE (pyrimidine phosphoribosyl transferase), guaB (IMP dehydrogenase), guaA (GMP synthase) and add (adenosine deaminase) in different isolates (Table 1). Table 1 Chlamydiaceae genome features with suspected host and niche specific genes Species Genome size (nt) Protein coding sequences Tryptophan metabolism Toxin genes Plasmid Bacteriophage tyrP copies Nucleotide salvaging C. pneumoniae AR39 1229853 1052 tph Absent Absent Present Two guaBA-add, udk, pyrE C. pneumoniae CWL029 1230230 1073 tph Absent Absent Absent Two guaBA-add, udk, pyrE C. pneumoniae J138 1226565 1072 tph Absent Absent Absent One guaBA-add, udk, pyrE C. pneumoniae TW183 1225935 1113 tph Absent Absent Absent One guaBA-add, udk, pyrE C. pneumoniae LPCoLN 1241024 1095 tph Absent Present Present* One udk, pyrE C. felis Fe/C-56 1166239 1005 trpABFCDR, kynU Absent Present Absent One guaBA-add, pyrE C. caviae GPIC 1173390 1009 trpABFCDR, kynU, prsA, tph Present Present Present One guaBA-add, pyrE C. abortus S26/3 1144377 961 tph Absent Absent Present One guaB (pseudogene), pyrE C. muridarum Nigg 1069412 924 none Present Present Absent Two guaBA-add, upp C. trachomatis serovar D 1042519 894 trpABCR Present Present Absent Two None of the above tph, tryptophan hydroxylase; trp, tryptophan biosynthesis; kynU, kynureninase; prsA, ribose-phosphate pyrophosphokinase; gua, purine biosynthesis; add, adenosine deaminase; udk, uridine kinase; pyrE, UMP synthase; upp, uracil phosphororibosyl transferase. * Asterisk indicates remnants. C. pneumoniae is the only chlamydial species to have a udk gene for UMP production. Our examination of the C. pneumoniae sequence revealed that the 3' end of the gene was unique to the species. No significant sequence similarity was observed in other organisms between nt regions 541-669, indicating that this region might be specific for C. pneumoniae. A sequence alignment of the full-length udk gene identified only three SNPs (one amino acid change) differentiating koala LPCoLN from the sequenced human isolates. The bacterial pyrimidine biosynthesis pathway includes several enzymes for the conversion of UMP into CTP. However, all chlamydial genomes thus far lack the genes for most of the pathway with the exception of the last few steps (Additional file 8). C. pneumoniae of koala and human origins, C. felis, C. caviae and C. abortus have the pyrE gene, encoding an orotate phosphoribosyl transferase involved in pyrimidine biosynthesis. However, the final step in de novo pyrimidine biosynthesis is via orotidine-5'-monophosphate decarboxylase (pyrF) and this gene is missing from all chlamydial genomes. The purine biosynthesis pathway is also incomplete, similar to the pyrimidine pathway, with many genes variably missing in the chlamydial genomes (Table 1). The only four genes that are absent from the koala LPCoLN genome but are present in the human genomes include CP_0597, which encodes a hypothetical protein, guaB (IMP dehydrogenase), guaA (GMP synthase) and add (AMP adenosine deaminase), which are involved in purine ribonucleotide biosynthesis. The guaA and add sequences of the C. pneumoniae human isolates were identical, while guaB fragmentation was evident in TW183, CWL029 and J138 isolates with a deleted 324 nt at the 5' end of the sequence (Figure 1). The CP_0597 gene resides next to this guaBA-add cluster (Figure 1), which may indicate that this gene may also be involved in purine biosynthesis. The tryptophan biosynthesis operon is missing from several chlamydial species including C. muridarum Nigg, C. abortus S26/3, and C. pneumoniae of both koala and human origin (Table 1). Despite this absence, C. pneumoniae koala and human encode a functional aromatic amino acid (tryptophan) hydroxylase, although the koala LPCoLN isolate is missing the extended N-terminal region [32]. While variations in C. pneumoniae tyrP (tryptophan tyrosine permease) copy numbers have been found between human isolates [33], we report that koala LPCoLN, a respiratory isolate, has a single copy of tyrP. A comparison of the tyrP sequence from all five sequenced (full-genome) C. pneumoniae isolates revealed that the sequence was highly conserved across the seven copies of tyrP, revealing only nine SNPs including seven unique to koala LPCoLN, four of which led to an amino acid change. Previously, Gieffers et al. [33] published a tyrP-specific SNP profile of 20 C. pneumoniae human isolates, and here we report the SNP profile of two additional respiratory isolates J138 (CGGGG) and LPCoLN (CAAGG). Extrachromosomal elements of the Chlamydiaceae Extrachromosomal plasmid sequences pCpnKo from C. pneumoniae koala LPCoLN [24], pCpnE1 from C. pneumoniae horse N16 [34], pCpA1 from C. psittaci avian N352 (Lusher ME, Gregory J, Storey CC, Richmond SJ: Analysis of the complete nucleotide sequence of the plasmid pCpA1 isolated from an avian strain of Chlamydia psittaci, Submitted), pCfe1 from C. felis feline Fe/C-56 [35], pCpGP1 from C. caviae guinea pig GPIC [36], pMoPn from C. muridarum mouse Nigg [25], and pCTA, pJALI, pSW2 and pLVG440 from C. trachomatis human serovars A [37], B [38], E [38] and L1 [39], respectively, were compared for their overall synteny and relationship (Additional file 9). These plasmids each contain eight major open reading frames (ORFs – potentially encode a protein) designated ORF1-8. Although the plasmid sequences vary in size (7169-7966 bp), alignment of their amino acid sequences revealed a high degree of similarity and large conserved regions (Additional files 9). The pCpnKo and pCpnE1 plasmid sequences share 96.2% identity and are more closely related to pCpA1 (81.3 and 78.9% identity), pCpGP1 (81.6 and 79.1% identity) and pCfe1 (77.5 and 75.1% identity) than with pMoPn (69.5 and 67.7% identity), pCTA, pJALI, pSW2 and pLVG440 (63.9-69.5% identity) (Additional file 10). Overall, there were approximately 30 indels among the species; three of the longest indels were identified in ORF1 of pCpnE1 (deletion), pCfe1 (insertion) and pSW2 (deletion). A phylogenetic tree was inferred from multiple sequence alignment of the amino acid sequence (Figure 2). Three main branches (supported with a high bootstrap value) were evident: (1) C. pneumoniae LPCoLN and N16; (2) C. psittaci N352, C. felis Fe/C-56 and C. caviae GPIC; (3) C. muridarum and C. trachomatis isolates. Figure 2 Phylogeny of the chlamydial plasmid. Phylogenetic relationships of C. pneumoniae koala LPCoLN (pCpnKo), C. pneumoniae horse N16 (pCpnE1), C. psittaci avian N352 (pCpA1), C. felis feline Fe/C-56 (pCfe1), C. caviae guinea pig GPIC (pCpGP1), C. muridarum mouse Nigg (pMoPn) and C. trachomatis human serovars A (pCTA), B (pJALI), E (pSW2) and L1 (pLVG440) were inferred from predicted amino acid sequences, and were constructed by Neighbor-Joining analysis and 1,000 bootstrap replicates. The bacteriophage is another strain-specific extrachromosomal element reported in chlamydial species. In C. pneumoniae, AR39 is the only human genome to have an extrachromosomal bacteriophage (4524 nt single-stranded DNA) [25], whereas the koala LPCoLN genome showed remnants of a phage. The first remnant of a phage in the koala LPCoLN genome was evident in CPK_ORF00729, a 366 nt incomplete ORF that is presumably defective, sharing approximately 79% similarity (nt 301/382) to the partial-length of the human AR39 phage. CPK_ORF00729 also shares 97% similarity (nt 324/333) to the Chp1 remnant (C. psittaci phage), which is present in the four C. pneumoniae human genomes. This suggests earlier integration of the phage genome in the C. pneumoniae genome (Geng MM, Schuhmacher A, Muehldorfer I, Bensch KW, Schaefer KP, Schneider S, Pohl T, Essig A, Marre R, Melchers K: The genome sequence of Chlamydia pneumoniae TW183 and comparison with other Chlamydia strains based on whole genome sequence analysis, submitted). The second koala LPCoLN phage remnant was a 445 nt ORF, termed CPK_ORF00730, which appears to be 'intact', sharing approximately 77% similarity (nt 342/445) to the human AR39 phage. The koala LPCoLN phage remnants are positioned between hypothetical genes in the genome and there appear to be no further remnants of the phage within this region of the genome. A comparison of the C. pneumoniae koala LPCoLN phage with other chlamydial species revealed approximately 77-79% sequence identity to partial-length sequences of the C. psittaci Chp2 phage, C. pecorum phage 3, C. caviae phiCPG1 phage and C. abortus Chp4 phage.