Genome sequence analysis Comparing the two genome sequences with a non-redundant selection of representatives from all known CoV families by alignment [10] and phylogenetic tree (Figure 1A) [11] shows that they belong to the SARS family of betacoronaviruses and while related to SARS-CoV (80% genome identity), they were most closely related to SARS-like bat CoV from China (88% identity) as closest known sequence at the time of emergence. Figure 1 Phylogenetic trees of Thai sequences in context of all coronavirus families (A) and structural mapping of mutations in the spike glycoprotein between SARS CoV (PDB:6CG [12]) and the current SARS-CoV-2 using YASARA [20] (B) CoV: coronavirus; MERS: Middle East respiratory syndrome coronavirus; SARS: severe acute respiratory syndrome. Panel A: blue: SARS-CoV-2; red: SARS; purple: MERS; green: common cold. Panel B: cyan: ACE2 human host receptor; grey: CoV spike glycoprotein trimer (PDB:6ACG); red: mutations between SARS-CoV vs SARS-CoV-2. The phylogenetic tree was created from whole genome alignment with MAFFT using the neighbour-joining method with maximum composite likelihood (MCL) model, uniform site rates and 500 bootstrap tests using MEGA X. Structural mapping of mutations in the spike glycoprotein between SARS CoV and the two cases of the SARS-CoV-2 reported here shows only 76% identity at the protein level (Figure 1B). This surface protein is critical for ACE2 host receptor interaction and is also a target of the immune response [12,13]. Given several mutations in the binding interface, it may differ in host cell binding efficiency compared with SARS-CoV which could result in differences in virulence and transmission potential [14,15]. The genomes of the two separate cases of coronavirus disease 2019 (COVID-19) are identical over the full length of close to 30 kb and are furthermore identical to five other sequences (four from Wuhan and one from Zhejiang); together these sequences form the largest cluster of identical cases within the early outbreak, comprising a core of at least indirectly linked cases (Figure 2). Within-outbreak sequence divergence is generally low with 0–9 nt differences over the whole genome and mutations unique to individual strains are possibly related to quality differences of the samples and noise of the methods used for sequencing. Figure 2 Within-outbreak SARS-CoV-2 sequence divergence and clusters, China and Thailand, January 2020 Number of pair-wise nt differences across whole genomes colour-coded from zero (green) to nine (red). Blue: Thai sample names. Orange: samples with sequences identical to each other and the Thai sequences. We gratefully acknowledge the authors, the originating and submitting laboratories for their sequence and metadata shared through GISAID, on which this research is based (as listed in Supplementary Table 1).