An immunogenic domain in the S2 subunit of SARS-CoV-1 is highly conserved in SARS-CoV-2 but not in MERS and common cold HCoV Sequence alignment of the S2 fragment corresponding to residues 1029 to 1192 shows that this fragment, which encompasses the heptad repeat (HR)2 but not HR1, is highly conserved in SARS-CoV-1 and SARS-CoV-2 (Figure 1). When compared with additional reference sequences from bat RaTG13 (closest bat precursor), MERS and human common cold coronaviruses 229E, NL63, OC43 and HKU1 (Figure 1), it becomes apparent that the amino-acid identity between SARS-CoV-2 and SARS-CoV-1 is much higher in this region (93%, Table) than over the full protein length (78%, Table) and the similarity drops sharply (< 40% in this region) when considering MERS and the other coronaviruses infecting humans regularly. Figure 1 Multiple sequence alignment for the S2 subunit fragment of SARS-CoV-1 spike glycoprotein with other relevant coronaviruses MERS: Middle East respiratory syndrome; SARS: severe acute respiratory coronavirus 1; SARS-CoV-2: severe acute respiratory coronavirus 2. The name of the viruses, for which sequences are being compared figure on the left side of the alignment, together with the respective sequences’ GenBank accession numbers. Colour schemes represent the following categories of amino acids: blue – hydrophobic, cyan – aromatic, green – polar, magenta – negative charge, orange – glycines, pink – cysteines, red – positive charge, yellow – prolines, white – unconserved. Table Pairwise amino-acid identity across relevant coronaviruses in the sequence fragment of the spike glycoprotein S2 subunit recognised by monoclonal antibody 1A9 or the sequence of the full spike glycoprotein Query/reference Pairwise amino-acid identity (%) SARS-CoV-2 BatRaTG13 SARS-CoV-1 MERS OC43 HKU1 229E NL63 Fragment region of spike S2 SARS-Co-V2 100.00 SB SB SB SB SB SB SB BatRaTG13 99.40 100.00 SB SB SB SB SB SB SARS 93.10 92.50 100.00 SB SB SB SB SB MERS 39.00 39.00 39.00 100.00 SB SB SB SB OC43 39.00 39.00 38.40 51.20 100.00 SB SB SB HKU1 32.70 32.70 30.80 50.60 68.40 100.00 SB SB 229E 30.80 30.20 32.10 31.50 29.70 30.40 100.00 SB NL63 30.80 30.20 30.20 32.10 31.60 33.50 64.20 100.00 Full spike protein SARS-CoV-2 100.00 SB SB SB SB SB SB SB BatRaTG13 97.70 100.00 SB SB SB SB SB SB SARS-CoV-1 77.80 78.20 100.00 SB SB SB SB SB MERS 35.40 35.40 35.20 100.00 SB SB SB SB OC43 37.30 37.10 36.90 39.50 100.00 SB SB SB HKU1 35.20 35.30 35.00 39.00 67.00 100.00 SB SB 229E 41.70 41.50 41.80 41.80 43.50 43.50 100.00 SB NL63 36.30 36.20 36.20 35.40 39.70 37.80 64.70 100.00 MERS: Middle East respiratory syndrome; SARS-CoV-1: severe acute respiratory coronavirus; SARS-CoV-2: severe acute respiratory coronavirus; SB: shown below. High to low pairwise amino-acid identity are coloured coded respectively by contrasting green to red backgrounds. The sequence identity is not affected by the order in which paired sequences are compared so only one-way comparisons are shown to avoid redundancies; the abbreviation ‘SB’ is used when the pairwise amino-acid identity in question is already shown in a further cell of the table. We also studied the sequence diversity across 174 SARS-CoV-2 S proteins derived from nt sequences shared via the GISAID platform [27]. Only four amino-acid mutations were found within the putative antibody-binding region compared with 30 mutations over the full length protein (Supplementary Table 2). Two of these four amino-acid mutations are from a sequence flagged in GISAID’s EpiCoV database as lower quality due to many undetermined bases.