Visualization We compared our multidimensional scaling (MDS) approach with the correspondence analysis (CA) method as implemented in the CodonW program [31] of J. Peden. Computations were based on relative synonymous codon usage (RSCU) values which is the most common way to perform CA on codon usage data [6]. For both methods the resulting coordinates were normalized according to a unit variance of the leading two factors and principal components, respectively. The CA-based visualization for E. coli (Fig. 1) shows the typical "rabbit head" structure which has been described in [1]. The "ears" correspond to two branches of the distribution with low density. The "left ear" in the upper left corner shows a cluster of ribosomal protein genes while putative alien genes are mainly located around the other branch of the distribution. The MDS plot in Fig. 1 shows a similar picture with ribosomal protein genes and putative alien genes again concentrated in the two branches of the distribution which here appears rotated by 180 degrees. Comparing the visualizations, most of the ribosomal protein genes are well-clustered in both plots while putative alien genes are slightly more concentrated in the MDS plot. Note that the CA-based visualization shows an outlier at the lower boundary of the plot which is not among the putative alien genes. Figure 1 Scatter plots for E. coli based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively. Red dots: ribosomal protein genes; blue dots: putative alien genes; yellow dots: all other genes. For B. subtilis (Fig. 2) both visualization methods show a good clustering of putative alien genes and ribosomal protein genes in the branches of the distribution. Again the lower boundary of the CA plot is determined by an outlier which does not belong to the set of putative alien genes. Figure 2 Scatter plots for B. subtilis based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively. Red dots: ribosomal protein genes; blue dots: putative alien genes; yellow dots: all other genes. For the first chromosome of V. cholerae (Fig. 3) the comparison shows a similar situation as for B. subtilis: in both plots, most of the ribosomal protein and putative alien genes are well-clustered in the two branches of the distribution. In the lower left corner of the CA-based plot there is an outlier which is not in the set of putative alien genes. As chromosome II of V. cholerae does not contain any ribosomal protein genes, the visualization of this replicon is restricted to putative alien genes (Fig. 4). These genes are slightly more concentrated in the MDS-based plot. Again, the lower boundary of the CA-plot is determined by an outlier which is not among putative alien genes. Figure 3 Scatter plots for V. cholerae (chromosome 1) based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively. Red dots: ribosomal protein genes; blue dots: putative alien genes; yellow dots: all other genes. Figure 4 Scatter plots for V. cholerae (chromosome 2) based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively. Red dots: ribosomal protein genes; blue dots: putative alien genes; yellow dots: all other genes. For T. thermophilus (Fig. 5) the outlier sensitivity of CA results in a highly distorted plot which makes it difficult to draw any conclusions from the visualization at all. While ribosomal protein genes are clumped together with the remaining genes in a small region of the plot, putative alien genes are widespread in a region of low density. In contrast, the MDS-based plot shows a specific proximity of putative alien genes in a tail at the right border and the ribosomal protein genes at least show some weak clustering in the upper right part of the core distribution. Figure 5 Scatter plots for T. thermophilus based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively. Red dots: ribosomal protein genes; blue dots: putative alien genes; yellow dots: all other genes.