4. Discussion 4.1. Strain history ‘PCC-M’ shows sequence differences in several genes compared with the reference sequence of ‘GT-Kazusa’ and also to the recently sequenced ‘GT-S’ strain. Kanesaki et al.9 concluded that 15 differences between the resequenced strains and the published GT-Kazusa sequence were annotation errors in the latter due to sequencing artefacts, a list to which we add two more putative errors in the database, differences #4 and #42 in Table 1. According to the proposed strain history in Ikeuchi and Tabata,8 the early division of Synechocystis sp. PCC 6803 into two branches occurred due to an insertion in spkA. Thus, our data suggest that the motile ‘PCC-M’ strain belongs to the motile PCC 6803 branch, whereas the non-motile ‘GT-Kazusa’, ‘GT-S’ and ‘GT-V’ strains are more closely related to each other and belong to the ATCC 27 184 branch. However, the 1-bp insertion in the pilC leading to ‘GT-Kazusa’ as described in the proposed strain history8 is not present in either ‘GT-S’ or ‘GT-V’, characterizing ‘GT-Kazusa’ as a more derived substrain. That ‘PCC-M’ belongs to the motile PCC 6803 branch is further reinforced by our finding of six SNPs specifically shared between the ‘PCC-M’ and the ‘PCC-N and PCC-P’ substrains (Tables 1 and 2).9 These six SNPs are in slr1865, in sll1951, encoding a haemolysin-like protein, in ssr1176, encoding a transposase and, interestingly, in genes encoding sensor and/or regulatory proteins (slr1983, slr0222 and slr0302) (Tables 1 and 2) and must already have been present in the progenitor strain to ‘PCC-M’, ‘PCC-N’ and ‘PCC-P’. Additional support comes from the analysis of two larger indels (#2 and #6 in Table 1). The preceding paper, Kanesaki et al.,9 described difficulties in finding indels between direct repeat sequences such as slr1084 and slr2031 by short read type re-sequencing data. Therefore, these two regions were analysed by PCR and Sanger sequencing in addition to the re-sequencing analysis. Indeed, the finding of indels between direct repeat sequences in genes slr1084 and slr2031 turned out as not been straightforward in our analysis as well. Compared with the reference, we found in both cases the additional sequences of 102 and 154 bp to be present in ‘PCC-M’. This result is relevant for lineage relationships among substrains. The additional 102 bp in gene slr1084 are shared between ‘PCC-M’ and the other substrains ‘PCC-P’, ‘PCC-N’ and ‘GT-I’. Therefore, this must be a deletion in the lineage leading to GT-Kazusa and GT-S. In contrast, the additional 154 bp within and upstream of gene slr2031 are shared between ‘PCC-M’, ‘PCC-P’ and ‘PCC-N’ and are absent from all studied GT substrains. These 154 bp comprise the conserved start codon of slr2031 and extend the gene by 29 codons compared with ‘GT-Kazusa’. Hence, the lack of these 154 bp in GT strains indicate a functionally adverse deletion there. In fact, the 154-bp deletion in GT substrains was noticed before,46 as well as the activity of slr2031 in the original Synechocystis sp. PCC 6803 substrains.47 From these considerations, the tree shown in Fig. 3 can be derived. In this tree, ‘GT-Kazusa’ is displayed as the strain with the longest evolutionary distance from the original isolate, whereas the ‘PCC-M’ substrain belongs to the ‘PCC’ group of substrains and is probably close to the original characteristics. All strains belonging to the ‘PCC’ group of substrains exhibit twitching motility as was shown also for the original PCC strain deposited in the Pasteur Culture Collection6 with variations in the motility behaviour.48,49 Since ‘PCC-M’ shows motility and is tolerant to glucose, it appears physiologically as a sort of intermediate between the two major branches: the motile and GT branches, consistent with its characterization as being close to the original characteristics. Figure 3. Visualization of phylogenetic relationships between various strains of Synechocystis sp. PCC 6803. The occurrence of the identified SNPs and other known events are indicated along the branches. The eight events separating the ‘GT’ and ‘PCC’ strains from each other are given at the branch point where these two lineages split or on the respective branches where they occurred. Putative insertions and deletions are labelled ‘Ins’. and ‘Del’., respectively. 4.2. Re-sequencing studies of Synechocystis sp. PCC 6803 The analysis of genome sequences of cyanobacteria has had a large impact on photosynthesis, ecology and biotechnology research.50 The present re-sequencing project delivers the new and complete sequence of the Synechocystis sp. PCC 6803 ‘PCC-M’, a substrain used in many laboratories and in several aspects close to the original isolate. Altogether, there are now chromosomal sequences for seven substrains of Synechocystis sp. PCC 6803 available: ‘PCC-M’ (this study); ‘PCC-P’ (positive phototaxis) and ‘PCC-N’ (negative phototaxis), both based on single colonies isolated from the PCC strain and designated according to their direction of phototactic movement;24 ‘GT-I’, the standard strain in Dr Ikeuchi's group;8 ‘YF’17 and ‘GT-S’,10 a current derivative of the original stock of Synechocystis sp. PCC 6803 from which the chromosomal reference sequence for ‘GT-Kazusa’ was determined in 19962 and for the large plasmids in 2003,20 whereas the three small plasmids had been sequenced already before.37,51,52 4.3. Mutations potentially linked to phenotype It is likely that most of the identified differences between the sequenced substrains result from distinct differences in the cultivation conditions in the different laboratories that have selected for fixing one or the other mutation. That also implies that the majority of identified mutations are not silent but linked to a certain effect. Indeed, most mutations in coding regions are not silent as might be expected but lead to frameshifts, amino acid substitutions or the truncation of reading frames. Similarly, SNPs in non-coding regions are probably biologically meaningful, too. This idea received support here by linking three ‘PCC-M’-specific SNPs in IGRs to the promoter regions controlling the expression of two protein-coding and one antisense RNA. For all these reasons, it appears likely that several of the mutations specific to ‘PCC-M’ or shared with ‘PCC-P’ and ‘PCC-N’ may be related to the known phenotypes of these strains. For example, the truncation of sll1951 (haemolysin) and possible truncation of slr1753 (surface protein) may contribute to a stress-induced clumping phenotype. Several other mutations might cause alterations in glucose tolerance or phototactic behaviour of these substrains. Differences at other loci may affect the phage resistance, stress response or functions in the primary metabolism, potentially relevant for the synthesis of alkanes or the N and C metabolism. The absence of ISY203g in the sll1473–5 regions in PCC substrains leads to an intact photoreceptor that regulates the expression of an alternative phycobilisome linker gene.53 Regarding phenotypic differences among motile PCC substrains, it might be noteworthy that ‘PCC-M’, despite its general ability to be motile, is not phototactic towards blue light (see direct comparison of strains in Fig. 1 of Fiedler et al.48). Here, the SNP #39 in the sigF gene, known to be involved in the control of phototactic movement30 might be considered, as the resulting M231K substitution could influence the DNA–protein interaction of this group 3 sigma factor in a very subtle way. For sure, the subtle differences in genome sequences have to be considered when choosing a particular substrain for certain experiments and when comparing phenotypes of mutant lines from different laboratories with the wild-type strain. Information on the re-sequenced genome and plasmid sequences including precisely annotated SNPs can be found in the eight sequence files available from GenBank under the accession numbers CP003265–CP003272.