2_test

PMC:7067204 / 6364-6372 JSON TXT

Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan EMERGING MICROBES AND INFECTIONS J. F-W. CHAN ET AL. Abstract ABSTRACT A mysterious outbreak of atypical pneumonia in late 2019 was traced to a seafood wholesale market in Wuhan of China. Within a few weeks, a novel coronavirus tentatively named as 2019 novel coronavirus (2019-nCoV) was announced by the World Health Organization. We performed bioinformatics analysis on a virus genome from a patient with 2019-nCoV infection and compared it with other related coronavirus genomes. Overall, the genome of 2019-nCoV has 89% nucleotide identity with bat SARS-like-CoVZXC21 and 82% with that of human SARS-CoV. The phylogenetic trees of their orf1a/b, Spike, Envelope, Membrane and Nucleoprotein also clustered closely with those of the bat, civet and human SARS coronaviruses. However, the external subdomain of Spike’s receptor binding domain of 2019-nCoV shares only 40% amino acid identity with other SARS-related coronaviruses. Remarkably, its orf3b encodes a completely novel short protein. Furthermore, its new orf8 likely encodes a secreted protein with an alpha-helix, following with a beta-sheet(s) containing six strands. Learning from the roles of civet in SARS and camel in MERS, hunting for the animal source of 2019-nCoV and its more ancestral virus would be important for understanding the origin and evolution of this novel lineage B betacoronavirus. These findings provide the basis for starting further studies on the pathogenesis, and optimizing the design of diagnostic, antiviral and vaccination strategies for this emerging infection. Introduction Coronaviruses (CoVs) are enveloped, positive-sense, single-stranded RNA viruses that belong to the subfamily Coronavirinae, family Coronavirdiae, order Nidovirales. There are four genera of CoVs, namely, Alphacoronavirus (αCoV), Betacoronavirus (βCoV), Deltacoronavirus (δCoV), and Gammacoronavirus (γCoV) [1]. Evolutionary analyses have shown that bats and rodents are the gene sources of most αCoVs and βCoVs, while avian species are the gene sources of most δCoVs and γCoVs. CoVs have repeatedly crossed species barriers and some have emerged as important human pathogens. The best-known examples include severe acute respiratory syndrome CoV (SARS-CoV) which emerged in China in 2002–2003 to cause a large-scale epidemic with about 8000 infections and 800 deaths, and Middle East respiratory syndrome CoV (MERS-CoV) which has caused a persistent epidemic in the Arabian Peninsula since 2012 [2,3]. In both of these epidemics, these viruses have likely originated from bats and then jumped into another amplification mammalian host [the Himalayan palm civet (Paguma larvata) for SARS-CoV and the dromedary camel (Camelus dromedarius) for MERS-CoV] before crossing species barriers to infect humans. Prior to December 2019, 6 CoVs were known to infect human, including 2 αCoV (HCoV-229E and HKU-NL63) and 4 βCoV (HCoV-OC43 [lineage A], HCoV-HKU1 [lineage A], SARS-CoV [lineage B] and MERS-CoV [lineage C]). The βCoV lineage A HCoV-OC43 and HCoV-HKU1 usually cause self-limiting upper respiratory infections in immunocompetent hosts and occasionally lower respiratory tract infections in immunocompromised hosts and elderly [4]. In contrast, SARS-CoV (lineage B βCoV) and MERS-CoV (lineage C βCoV) may cause severe lower respiratory tract infection with acute respiratory distress syndrome and extrapulmonary manifestations, such as diarrhea, lymphopenia, deranged liver and renal function tests, and multiorgan dysfunction syndrome, among both immunocompetent and immunocompromised hosts with mortality rates of ∼10% and ∼35%, respectively [5,6]. On 31 December 2019, the World Health Organization (WHO) was informed of cases of pneumonia of unknown cause in Wuhan City, Hubei Province, China [7]. Subsequent virological testing showed that a novel CoV was detected in these patients. As of 16 January 2020, 43 patients have been diagnosed to have infection with this novel CoV, including two exported cases of mild pneumonia in Thailand and Japan [8,9]. The earliest date of symptom onset was 1 December 2019 [10]. The symptomatology of these patients included fever, malaise, dry cough, and dyspnea. Among 41 patients admitted to a designated hospital in Wuhan, 13 (32%) required intensive care and 6 (15%) died. All 41 patients had pneumonia with abnormal findings on chest computerized tomography scans [10]. We recently reported a familial cluster of 2019-nCoV infection in a Shenzhen family with travel history to Wuhan [11]. In the present study, we analyzed a 2019-nCoV complete genome from a patient in this familial cluster and compared it with the genomes of related β CoVs to provide insights into the potential source and control strategies. Materials and methods Viral sequences The complete genome sequence of 2019-nCoV HKU-SZ-005b was available at GenBank (accession no. MN975262) (Table 1). The representative complete genomes of other related βCoVs strains collected from human or mammals were included for comparative analysis. These included strains collected from human, bats, and Himalayan palm civet between 2003 and 2018, with one 229E coronavirus strain as the outgroup. Table 1. List of coronaviruses used in this study. Accession number Name displayed on the tree Name of full-length genome Year AY274119 Human SARS-CoV Tor2 2003 SARS-related coronavirus isolate Tor2 2003 AY278488 Human SARS-CoV BJ01 2003 SARS coronavirus BJ01 2003 AY278491 SARS coronavirus HKU-39849 2003 SARS coronavirus HKU-39849 2003 2003 AY390556 Human SARS-CoV GZ02 2003 SARS coronavirus GZ02 2003 AY391777 Human CoV OC43 2003 Human coronavirus OC43 2003 AY515512 Paguma SARS CoV HC/SZ/61/03 2003 SARS coronavirus HC/SZ/61/03 (paguma SARS) 2018 EF065513 Bat CoV HKU9-1 2006 Bat coronavirus HKU9-1 2006 FJ588686 Bat SL-CoV Rs672 2006 Bat SARS CoV Rs672/2006 2006 KC881005 Bat SL-CoV RsSHC014 2013 Bat SARS-like coronavirus RsSHC014 2013 KC881006 Bat SL-CoV Rs3367 2013 Bat SARS-like coronavirus Rs3367 2013 KY417146 Bat SL-CoV Rs4231 2016 Bat SARS-like coronavirus isolate Rs4231 2016 KY417149 Bat SL-CoV Rs4255 2016 Bat SARS-like coronavirus isolate Rs4255 2016 MG772933 Bat SL-CoV ZC45 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZC45 2018 MG772934 Bat SL-CoV ZXC21 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZXC21 2018 MK211377 Bat CoV YN2018C 2018 Coronavirus BtRs-BetaCoV/YN2018C 2018 MK211378 Bat CoV YN2018D 2018 Coronavirus BtRs-BetaCoV/YN2018Da 2018 MN975262 HKU-SZ-005b Human 2019-nCoV HKU-SZ-005b 2020 NC002645 Human CoV 229E 2000 Human coronavirus 229E 2000 NC006577 Human CoV HKU1 2004 Human coronavirus HKU1 2004 NC009019 Bat CoV HKU4-1 2006 Bat coronavirus HKU4-1 2006 NC009020 Bat CoV HKU5-1 2006 Bat coronavirus HKU5-1 2006 NC014470 Bat SARS-related CoV BM48-31 2009 Bat coronavirus BM48-31/BGR/2008 2008 NC019843 Human MERS-CoV 2012 Middle East respiratory syndrome coronavirus 2012 aOne nucleotide was added within M gene to maintain the sequence in-frame. Genome characterization and phylogenetic analysis Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees [12]. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) was shown next to the branches [13]. The tree was drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method and were in the units of the number of amino acid substitutions per site [14]. All ambiguous positions were removed for each sequence pair (pairwise deletion option). Evolutionary analyses were conducted in MEGA X [15]. Multiple alignment was performed using CLUSTAL 2.1 and further visualized using BOXSHADE 3.21. Structural analysis of orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED) [16]. For the prediction of protein secondary structure including beta sheet, alpha helix, and coil, initial amino acid sequences were input and analysed using neural networking and its own algorithm. Predicted structures were visualized and highlighted on the BOXSHADE alignment. Prediction of transmembrane domains was performed using the TMHMM 2.0 server (http://www.cbs.dtu.dk/services/TMHMM/). Secondary structure prediction in the 5′-untranslated region (UTR) and 3′-UTR was performed using the RNAfold WebServer (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) with minimum free energy (MFE) and partition function in Fold algorithms and basic options. The human SARS-CoV 5′- and 3′- UTR were used as references to adjust the prediction results. Results and discussion Genome organization The single-stranded RNA genome of the 2019-nCoV was 29891 nucleotides in size, encoding 9860 amino acids. The G + C content was 38%. Similar to other βCoVs, the 2019-nCoV genome contains two flanking untranslated regions (UTRs) and a single long open reading frame encoding a polyprotein. The 2019-nCoV genome is arranged in the order of 5′-replicase (orf1/ab)-structural proteins [Spike (S)-Envelope (E)-Membrane (M)-Nucleocapsid (N)]−3′ and lacks the hemagglutinin-esterase gene which is characteristically found in lineage A β-CoVs (Figure 1). Figure 1. Betacoronavirus genome organization. The betacoronavirus genome comprises of the 5'-untranslated region (5'-UTR), open reading frame (orf) 1a/b (yellow box) encoding non-structural proteins (nsp) for replication, structural proteins including spike (blue box), envelop (orange box), membrane (red box), and nucleocapsid (cyan box) proteins, accessory proteins (purple boxes) such as orf 3, 6, 7a, 7b, 8 and 9b in the 2019-nCoV (HKU-SZ-005b) genome, and the 3'-untranslated region (3'-UTR). Examples of lineages A to D betacoronaviruses include human coronavirus (HCoV) HKU1 (lineage A), 2019-nCoV (HKU-SZ-005b) and SARS-CoV (lineage B), MERS-CoV and Tylonycteris bat CoV HKU4 (lineage C), and Rousettus bat CoV HKU9 (lineage D). The length of nsps and orfs are not drawn in scale. There are 12 putative, functional open reading frames (orfs) expressed from a nested set of 9 subgenomic mRNAs carrying a conserved leader sequence in the genome, 9 transcription-regulatory sequences, and 2 terminal untranslated regions. The 5′- and 3′-UTRs are 265 and 358 nucleotides long, respectively. The 5′- and 3 ′-UTR sequences of 2019-nCoV are similar to those of other βCoVs with nucleotide identities of ⩾83.6%. The large replicase polyproteins pp1a and pp1ab encoded by the partially overlapping 5′-terminal orf1a/b within the 5′ two-thirds of the genome is proteolytic cleaved into 16 putative non-structural proteins (nsps). These putative nsps included two viral cysteine proteases, namely, nsp3 (papain-like protease) and nsp5 (chymotrypsin-like, 3C-like, or main protease), nsp12 (RNA-dependent RNA polymerase [RdRp]), nsp13 (helicase), and other nsps which are likely involved in the transcription and replication of the virus (Table 2). There are no remarkable differences between the orfs and nsps of 2019-nCoV with those of SARS-CoV (Table 3). The major distinction between SARSr-CoV and SARS-CoV is in orf3b, Spike and orf8 but especially variable in Spike S1 and orf8 which were previously shown to be recombination hot spots. Table 2. Putative functions and proteolytic cleavage sites of 16 nonstructural proteins in orf1a/b as predicted by bioinformatics. NSP Putative function/domain Amino acid position Putative cleave site nsp1 suppress antiviral host response M1 – G180 (LNGG'AYTR) nsp2 unknown A181 – G818 (LKGG'APTK) nsp3 putative PL-pro domain A819 – G2763 (LKGG'KIVN) nsp4 complex with nsp3 and 6: DMV formation K2764 – Q3263 (AVLQ'SGFR) nsp5 3CL-pro domain S3264 – Q3569 (VTFQ'SAVK) nsp6 complex with nsp3 and 4: DMV formation S3570 – Q3859 (ATVQ'SKMS) nsp7 complex with nsp8: primase S3860 – Q3942 (ATLQ'AIAS) nsp8 complex with nsp7: primase A3943 – Q4140 (VKLQ'NNEL) nsp9 RNA/DNA binding activity N4141 – Q4253 (VRLQ'AGNA) nsp10 complex with nsp14: replication fidelity A4254 – Q4392 (PMLQ'SADA) nsp11 short peptide at the end of orf1a S4393 – V4405 (end of orf1a) nsp12 RNA-dependent RNA polymerase S4393 – Q5324 (TVLQ'AVGA) nsp13 helicase A5325 – Q5925 (ATLQ'AENV) nsp14 ExoN: 3′–5′ exonuclease A5926 – Q6452 (TRLQ'SLEN) nsp15 XendoU: poly(U)-specific endoribonuclease S6453 – Q6798 (PKLQ'SSQA) nsp16 2'-O-MT: 2'-O-ribose methyltransferase S6799 – N7096 (end of orf1b) Table 3. Amino acid identity between the 2019 novel coronavirus and bat SARS-like coronavirus or human SARS-CoV. Amino acid identity (%) 2019-nCoV 2019-nCoV vs. bat-SL-CoVZXC21 vs. SARS-CoV NSP1 96 84 NSP2 96 68 NSP3 93 76 NSP4 96 80 NSP5 99 96 NSP6 98 88 NSP7 99 99 NSP8 96 97 NSP9 96 97 NSP10 98 97 NSP11 85 85 NSP12 96 96 NSP13 99 100 NSP14 95 95 NSP15 88 89 NSP16 98 93 Spike 80 76 Orf3a 92 72 Orf3b 32 32 Envelope 100 95 Membrane 99 91 Orf6 94 69 Orf7a 89 85 Orf7b 93 81 Orf8/Orf8b 94 40 Nucleoprotein 94 94 Orf9b 73 73 Spike Spike glycoprotein comprised of S1 and S2 subunits. The S1 subunit contains a signal peptide, followed by an N-terminal domain (NTD) and receptor-binding domain (RBD), while the S2 subunit contains conserved fusion peptide (FP), heptad repeat (HR) 1 and 2, transmembrane domain (TM), and cytoplasmic domain (CP). We found that the S2 subunit of 2019-nCoV is highly conserved and shares 99% identity with those of the two bat SARS-like CoVs (SL-CoV ZXC21 and ZC45) and human SARS-CoV (Figure 2). Thus the broad spectrum antiviral peptides against S2 would be an important preventive and treatment modality for testing in animal models before clinical trials [18]. Though the S1 subunit of 2019-nCoV shares around 70% identity to that of the two bat SARS-like CoVs and human SARS-CoV (Figure 3(A)), the core domain of RBD (excluding the external subdomain) are highly conserved (Figure 3(B)). Most of the amino acid differences of RBD are located in the external subdomain, which is responsible for the direct interaction with the host receptor. Further investigation of this soluble variable external subdomain region will reveal its receptor usage, interspecies transmission and pathogenesis. Unlike 2019-nCoV and human SARS-CoV, most known bat SARSr-CoVs have two stretches of deletions in the spike receptor binding domain (RBD) when compared with that of human SARS-CoV. But some Yunnan strains such as the WIV1 had no such deletions and can use human ACE2 as a cellular entry receptor. It is interesting to note that the two bat SARS-related coronavirus ZXC21 and ZC45, being closest to 2019-nCoV, can infect suckling rats and cause inflammation in the brain tissue, and pathological changes in lung & intestine. However, these two viruses could not be isolated in Vero E6 cells and were not investigated further. The two retained deletion sites in the Spike genes of ZXC21 and ZC45 may lessen their likelihood of jumping species barriers imposed by receptor specificity. Figure 2. Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences. Figure 3. Comparison of protein sequences of A. Spike globular head S1, and B. S1 receptor-binding domain (RBD) subunit. Multiple alignment of Spike S1 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21, bat-SL-CoVZXC45, bat-SL-CoV-YNLF_31C, bat-SL-CoV-YNLF_34C and bat SL-CoV HKU3-1 (accession number MG772934.1 and MG772933.1, KP886808, KP886809 and DQ022305, respectively), human SARS coronavirus GZ02 and Tor2 (accession number AY390556 and AY274119, respectively) and Paguma SARS-CoV (accession number AY515512) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. Orange box indicates the region of signal peptide, while green and blue boxes indicate the core domain and receptor binding domain respectively. Sequences of RBD, highlighted in (A) were used for comparison. External subdomain variable region of 2019-nCoV HKU-SZ-005b was predicted by comparison of amino acid similarity and published structural analysis [17]. Purple box indicates the external subdomain region. Orf3b A novel short putative protein with 4 helices and no homology to existing SARS-CoV or SARS-r-CoV protein was found within Orf3b (Figure 4). It is notable that SARS-CoV deletion mutants lacking orf3b replicate to levels similar to those of wild-type virus in several cell types [19], suggesting that orf3b is dispensable for viral replication in vitro. But orf3b may have a role in viral pathogenicity as Vero E6 but not 293T cells transfected with a construct expressing Orf3b underwent necrosis as early as 6 h after transfection and underwent simultaneous necrosis and apoptosis at later time points [20]. Orf3b was also shown to inhibit expression of IFN-β at synthesis and signalling [21]. Subsequently, orf3b homologues identified from three bat SARS-related-CoV strains were C-terminally truncated and lacked the C-terminal nucleus localization signal of SARS-CoV [22]. IFN antagonist activity analysis demonstrated that one SARS-related-CoV orf3b still possessed IFN antagonist and IRF3-modulating activities. These results indicated that different orf3b proteins display different IFN antagonist activities and this function is independent of the protein's nuclear localization, suggesting a potential link between bat SARS-related-CoV orf3b function and pathogenesis. The importance of this new protein in 2019-nCoV will require further validation and study. Figure 4. Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b. Orf8 orf8 is an accessory protein found in the Betacoronavirus lineage B coronaviruses. Human SARS-CoVs isolated from early-phase patients, all civet SARS-CoVs, and other bat SARS-related CoVs contain full-length orf8 [23]. However, a 29-nucleotide deletion, which causes the split of full length of orf8 into putative orf8a and orf8b, has been found in all SARS-CoV isolated from mid- and late- phase human patients [24]. In addition, we have previously identified two bat SARS-related-CoV (Bat-CoV YNLF_31C and YNLF_34C) and proposed that the original SARS-CoV full-length orf8 is acquired from these two bat SARS-related-CoV [25]. Since the SARS-CoV is the closest human pathogenic virus to the 2019-nCoV, we performed phylogenetic analysis and multiple alignments to investigate the orf8 amino acid sequences. The orf8 protein sequences used in the analysis derived from early phase SARS-CoV that includes full-length orf8 (human SARS-CoV GZ02), the mid- and late-phase SARS-CoV that includes the split orf8b (human SARS-CoV Tor2), civet SARS-CoV (paguma SARS-CoV), two bat SARS-related-CoV containing full-length orf8 (bat-CoV YNLF_31C and YNLF_34C), 2019-nCoV, the other two closest bat SARS-related-CoV to 2019-nCoV SL-CoV ZXC21 and ZC45), and bat SARS-related-CoV HKU3-1 (Figure 5(A)). As expected, orf8 derived from 2019-nCoV belongs to the group that includes the closest genome sequences of bat SARS-related-CoV ZXC21 and ZC45. Interestingly, the new 2019-nCoV orf8 is distant from the conserved orf8 or orf8b derived from human SARS-CoV or its related viruses derived from civet (paguma SARS-CoV) and bat (bat-CoV YNLF_31C and YNLF_34C). This new orf8 of 2019-nCoV does not contain known functional domain or motif. An aggregation motif VLVVL (amino acid 75–79) has been found in SARS-CoV orf8b (Figure 5(B)) which was shown to trigger intracellular stress pathways and activates NLRP3 inflammasomes [26], but this is absent in this novel orf8 of 2019-nCoV. Based on a secondary structure prediction, this novel orf8 has a high possibility to form a protein with an alpha-helix, following with a beta-sheet(s) containing six strands (Figure 5(C)). Figure 5. Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively. Phylogenetic relationship among 2019-nCoV and other βCoVs The genome of 2019-nCoV has overall 89% nucleotide identity with bat SARS-related-CoV SL-CoVZXC21 (MG772934.1), and 82% with human SARS-CoV BJ01 2003 (AY278488) and human SARS-CoV Tor2 (AY274119). The phylogenetic trees constructed using the amino acid sequences of orf1a/b and the 4 structural genes (S, E, M, and N) were shown (Figure 6(A–E)). For all these 5 genes, the 2019-nCoV was clustered with lineage B βCoVs. It was most closely related to the bat SARS-related CoVs ZXC21 and ZC45 found in Chinese horseshoe bats (Rhinolopus sinicus) collected from Zhoushan city, Zhejiang province, China between 2015 and 2017. Thus this novel coronavirus should belong to the genus Betacoronavirus, subgenus Sabecovirus (previously lineage 2b of Group 2 coronavirus). SARS-related coronaviruses have been found continuously especially in horseshoe bat species in the last 13 years. Between 2003 and 2018, 339 complete SARS-related coronavirus genomes have been sequenced, including 274 human SARS-CoV, 18 civet SARS coronavirus, and 47 bat SARS-related coronaviruses mainly from Rhinolophus bat species. Together, they formed a distinct subclade among other lineage B βCoVs. These results suggested that the 2019-nCoV might have also originated from bats. But we cannot ascertain whether another intermediate or amplification animal host infected by 2019-nCoV could be found in the epidemiological market, just as in the case of Paguma civets for SARS-CoV. Figure 6. Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein. RNA secondary structures As shown in Figure 7(A–C), the SARS-CoV 5′-UTR contains SL1, SL2, SL3, SL4, S5, SL5A, SL5B, SL5C, SL6, SL7, and SL8. The SL3 contains trans–cis motif [27]. The SL1, SL2, SL3, SL4, S5, SL5A, SL5B, and SL5C structures were similar among the 2019-nCoV, human SARS-CoV and the bat SARS-related ZC45. In the 2019-nCoV, part of the S5 found was inside the orf1a/b (marked in red), which was similar to SARS-CoV. In bat SARS-related CoV ZC45, the S5 was not found inside orf1a/b. The 2019-nCoV had the same SL6, SL7, and SL8 as SARS-CoV, and an additional stem loop. Bat SARS-related CoV ZC45 did not have the SARS-COV SL6-like stem loop. Instead, it possessed two other stem loops in this region. All three strains had similar SL7 and SL8. The bat SARS-like CoV ZC45 also had an additional stem loop between SL7 and SL8. Overall, the 5′-UTR of 2019-nCoV was more similar to that of SARS-CoV than the bat SARS-related CoV ZC 45. The biological relevance and effects of virulence of the 5′-UTR structures should be investigated further. The 2019-nCoV had various 3′-UTR structures, including BSL, S1, S2, S3, S4, L1, L2, L3, and HVR (Figure 7(D–F)). The 3′-UTR was conserved among 2019-nCoV, human SARS-CoV and SARS-related CoVs [27]. Figure 7. Secondary structure prediction and comparison in the 5′-untranslated region (UTR) and 3′-UTR using the RNAfold WebServer (with minimum free energy and partition function in Fold algorithms and basic options. The SARS 5′- and 3′- UTR was used as a reference to adjust the prediction results.(A) SARS-CoV 5'-UTR; (B) 2019-nCoV (HKU-SZ-005b) 5'-UTR; (C) ZC45 5'-UTR; (D) SARS-CoV 3'-UTR; (E) 2019-nCoV (HKU-SZ-005b) 3'-UTR; (F) ZC45 3'-UTR. In summary, 2019-nCoV is a novel lineage B Betacoronavirus closely related to bat SARS-related coronaviruses. It also has unique genomic features which deserves further investigation to ascertain their roles in viral replication cycle and pathogenesis. More animal sampling to determine its natural animal reservoir and intermediate animal host in the market is important. This will shed light on the evolutionary history of this emerging coronavirus which has jumped into human after the other two zoonotic Betacoroanviruses, SARS-CoV and MERS-CoV. Acknowledgements The funding sources had no role in the study design, data collection, analysis, interpretation, or writing of the report. Disclosure statement No potential conflict of interest was reported by the author(s). ORCID Jasper Fuk-Woo Chan http://orcid.org/0000-0001-6336-6657 Kin-Hang Kok http://orcid.org/0000-0003-3426-332X

Document structure show

article-title	Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan
alt-title	EMERGING MICROBES AND INFECTIONS
alt-title	J. F-W. CHAN ET AL.
abstract	ABSTRACT A mysterious outbreak of atypical pneumonia in late 2019 was traced to a seafood wholesale market in Wuhan of China. Within a few weeks, a novel coronavirus tentatively named as 2019 novel coronavirus (2019-nCoV) was announced by the World Health Organization. We performed bioinformatics analysis on a virus genome from a patient with 2019-nCoV infection and compared it with other related coronavirus genomes. Overall, the genome of 2019-nCoV has 89% nucleotide identity with bat SARS-like-CoVZXC21 and 82% with that of human SARS-CoV. The phylogenetic trees of their orf1a/b, Spike, Envelope, Membrane and Nucleoprotein also clustered closely with those of the bat, civet and human SARS coronaviruses. However, the external subdomain of Spike’s receptor binding domain of 2019-nCoV shares only 40% amino acid identity with other SARS-related coronaviruses. Remarkably, its orf3b encodes a completely novel short protein. Furthermore, its new orf8 likely encodes a secreted protein with an alpha-helix, following with a beta-sheet(s) containing six strands. Learning from the roles of civet in SARS and camel in MERS, hunting for the animal source of 2019-nCoV and its more ancestral virus would be important for understanding the origin and evolution of this novel lineage B betacoronavirus. These findings provide the basis for starting further studies on the pathogenesis, and optimizing the design of diagnostic, antiviral and vaccination strategies for this emerging infection.
title	ABSTRACT
p	A mysterious outbreak of atypical pneumonia in late 2019 was traced to a seafood wholesale market in Wuhan of China. Within a few weeks, a novel coronavirus tentatively named as 2019 novel coronavirus (2019-nCoV) was announced by the World Health Organization. We performed bioinformatics analysis on a virus genome from a patient with 2019-nCoV infection and compared it with other related coronavirus genomes. Overall, the genome of 2019-nCoV has 89% nucleotide identity with bat SARS-like-CoVZXC21 and 82% with that of human SARS-CoV. The phylogenetic trees of their orf1a/b, Spike, Envelope, Membrane and Nucleoprotein also clustered closely with those of the bat, civet and human SARS coronaviruses. However, the external subdomain of Spike’s receptor binding domain of 2019-nCoV shares only 40% amino acid identity with other SARS-related coronaviruses. Remarkably, its orf3b encodes a completely novel short protein. Furthermore, its new orf8 likely encodes a secreted protein with an alpha-helix, following with a beta-sheet(s) containing six strands. Learning from the roles of civet in SARS and camel in MERS, hunting for the animal source of 2019-nCoV and its more ancestral virus would be important for understanding the origin and evolution of this novel lineage B betacoronavirus. These findings provide the basis for starting further studies on the pathogenesis, and optimizing the design of diagnostic, antiviral and vaccination strategies for this emerging infection.
body	Introduction Coronaviruses (CoVs) are enveloped, positive-sense, single-stranded RNA viruses that belong to the subfamily Coronavirinae, family Coronavirdiae, order Nidovirales. There are four genera of CoVs, namely, Alphacoronavirus (αCoV), Betacoronavirus (βCoV), Deltacoronavirus (δCoV), and Gammacoronavirus (γCoV) [1]. Evolutionary analyses have shown that bats and rodents are the gene sources of most αCoVs and βCoVs, while avian species are the gene sources of most δCoVs and γCoVs. CoVs have repeatedly crossed species barriers and some have emerged as important human pathogens. The best-known examples include severe acute respiratory syndrome CoV (SARS-CoV) which emerged in China in 2002–2003 to cause a large-scale epidemic with about 8000 infections and 800 deaths, and Middle East respiratory syndrome CoV (MERS-CoV) which has caused a persistent epidemic in the Arabian Peninsula since 2012 [2,3]. In both of these epidemics, these viruses have likely originated from bats and then jumped into another amplification mammalian host [the Himalayan palm civet (Paguma larvata) for SARS-CoV and the dromedary camel (Camelus dromedarius) for MERS-CoV] before crossing species barriers to infect humans. Prior to December 2019, 6 CoVs were known to infect human, including 2 αCoV (HCoV-229E and HKU-NL63) and 4 βCoV (HCoV-OC43 [lineage A], HCoV-HKU1 [lineage A], SARS-CoV [lineage B] and MERS-CoV [lineage C]). The βCoV lineage A HCoV-OC43 and HCoV-HKU1 usually cause self-limiting upper respiratory infections in immunocompetent hosts and occasionally lower respiratory tract infections in immunocompromised hosts and elderly [4]. In contrast, SARS-CoV (lineage B βCoV) and MERS-CoV (lineage C βCoV) may cause severe lower respiratory tract infection with acute respiratory distress syndrome and extrapulmonary manifestations, such as diarrhea, lymphopenia, deranged liver and renal function tests, and multiorgan dysfunction syndrome, among both immunocompetent and immunocompromised hosts with mortality rates of ∼10% and ∼35%, respectively [5,6]. On 31 December 2019, the World Health Organization (WHO) was informed of cases of pneumonia of unknown cause in Wuhan City, Hubei Province, China [7]. Subsequent virological testing showed that a novel CoV was detected in these patients. As of 16 January 2020, 43 patients have been diagnosed to have infection with this novel CoV, including two exported cases of mild pneumonia in Thailand and Japan [8,9]. The earliest date of symptom onset was 1 December 2019 [10]. The symptomatology of these patients included fever, malaise, dry cough, and dyspnea. Among 41 patients admitted to a designated hospital in Wuhan, 13 (32%) required intensive care and 6 (15%) died. All 41 patients had pneumonia with abnormal findings on chest computerized tomography scans [10]. We recently reported a familial cluster of 2019-nCoV infection in a Shenzhen family with travel history to Wuhan [11]. In the present study, we analyzed a 2019-nCoV complete genome from a patient in this familial cluster and compared it with the genomes of related β CoVs to provide insights into the potential source and control strategies. Materials and methods Viral sequences The complete genome sequence of 2019-nCoV HKU-SZ-005b was available at GenBank (accession no. MN975262) (Table 1). The representative complete genomes of other related βCoVs strains collected from human or mammals were included for comparative analysis. These included strains collected from human, bats, and Himalayan palm civet between 2003 and 2018, with one 229E coronavirus strain as the outgroup. Table 1. List of coronaviruses used in this study. Accession number Name displayed on the tree Name of full-length genome Year AY274119 Human SARS-CoV Tor2 2003 SARS-related coronavirus isolate Tor2 2003 AY278488 Human SARS-CoV BJ01 2003 SARS coronavirus BJ01 2003 AY278491 SARS coronavirus HKU-39849 2003 SARS coronavirus HKU-39849 2003 2003 AY390556 Human SARS-CoV GZ02 2003 SARS coronavirus GZ02 2003 AY391777 Human CoV OC43 2003 Human coronavirus OC43 2003 AY515512 Paguma SARS CoV HC/SZ/61/03 2003 SARS coronavirus HC/SZ/61/03 (paguma SARS) 2018 EF065513 Bat CoV HKU9-1 2006 Bat coronavirus HKU9-1 2006 FJ588686 Bat SL-CoV Rs672 2006 Bat SARS CoV Rs672/2006 2006 KC881005 Bat SL-CoV RsSHC014 2013 Bat SARS-like coronavirus RsSHC014 2013 KC881006 Bat SL-CoV Rs3367 2013 Bat SARS-like coronavirus Rs3367 2013 KY417146 Bat SL-CoV Rs4231 2016 Bat SARS-like coronavirus isolate Rs4231 2016 KY417149 Bat SL-CoV Rs4255 2016 Bat SARS-like coronavirus isolate Rs4255 2016 MG772933 Bat SL-CoV ZC45 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZC45 2018 MG772934 Bat SL-CoV ZXC21 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZXC21 2018 MK211377 Bat CoV YN2018C 2018 Coronavirus BtRs-BetaCoV/YN2018C 2018 MK211378 Bat CoV YN2018D 2018 Coronavirus BtRs-BetaCoV/YN2018Da 2018 MN975262 HKU-SZ-005b Human 2019-nCoV HKU-SZ-005b 2020 NC002645 Human CoV 229E 2000 Human coronavirus 229E 2000 NC006577 Human CoV HKU1 2004 Human coronavirus HKU1 2004 NC009019 Bat CoV HKU4-1 2006 Bat coronavirus HKU4-1 2006 NC009020 Bat CoV HKU5-1 2006 Bat coronavirus HKU5-1 2006 NC014470 Bat SARS-related CoV BM48-31 2009 Bat coronavirus BM48-31/BGR/2008 2008 NC019843 Human MERS-CoV 2012 Middle East respiratory syndrome coronavirus 2012 aOne nucleotide was added within M gene to maintain the sequence in-frame. Genome characterization and phylogenetic analysis Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees [12]. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) was shown next to the branches [13]. The tree was drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method and were in the units of the number of amino acid substitutions per site [14]. All ambiguous positions were removed for each sequence pair (pairwise deletion option). Evolutionary analyses were conducted in MEGA X [15]. Multiple alignment was performed using CLUSTAL 2.1 and further visualized using BOXSHADE 3.21. Structural analysis of orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED) [16]. For the prediction of protein secondary structure including beta sheet, alpha helix, and coil, initial amino acid sequences were input and analysed using neural networking and its own algorithm. Predicted structures were visualized and highlighted on the BOXSHADE alignment. Prediction of transmembrane domains was performed using the TMHMM 2.0 server (http://www.cbs.dtu.dk/services/TMHMM/). Secondary structure prediction in the 5′-untranslated region (UTR) and 3′-UTR was performed using the RNAfold WebServer (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) with minimum free energy (MFE) and partition function in Fold algorithms and basic options. The human SARS-CoV 5′- and 3′- UTR were used as references to adjust the prediction results. Results and discussion Genome organization The single-stranded RNA genome of the 2019-nCoV was 29891 nucleotides in size, encoding 9860 amino acids. The G + C content was 38%. Similar to other βCoVs, the 2019-nCoV genome contains two flanking untranslated regions (UTRs) and a single long open reading frame encoding a polyprotein. The 2019-nCoV genome is arranged in the order of 5′-replicase (orf1/ab)-structural proteins [Spike (S)-Envelope (E)-Membrane (M)-Nucleocapsid (N)]−3′ and lacks the hemagglutinin-esterase gene which is characteristically found in lineage A β-CoVs (Figure 1). Figure 1. Betacoronavirus genome organization. The betacoronavirus genome comprises of the 5'-untranslated region (5'-UTR), open reading frame (orf) 1a/b (yellow box) encoding non-structural proteins (nsp) for replication, structural proteins including spike (blue box), envelop (orange box), membrane (red box), and nucleocapsid (cyan box) proteins, accessory proteins (purple boxes) such as orf 3, 6, 7a, 7b, 8 and 9b in the 2019-nCoV (HKU-SZ-005b) genome, and the 3'-untranslated region (3'-UTR). Examples of lineages A to D betacoronaviruses include human coronavirus (HCoV) HKU1 (lineage A), 2019-nCoV (HKU-SZ-005b) and SARS-CoV (lineage B), MERS-CoV and Tylonycteris bat CoV HKU4 (lineage C), and Rousettus bat CoV HKU9 (lineage D). The length of nsps and orfs are not drawn in scale. There are 12 putative, functional open reading frames (orfs) expressed from a nested set of 9 subgenomic mRNAs carrying a conserved leader sequence in the genome, 9 transcription-regulatory sequences, and 2 terminal untranslated regions. The 5′- and 3′-UTRs are 265 and 358 nucleotides long, respectively. The 5′- and 3 ′-UTR sequences of 2019-nCoV are similar to those of other βCoVs with nucleotide identities of ⩾83.6%. The large replicase polyproteins pp1a and pp1ab encoded by the partially overlapping 5′-terminal orf1a/b within the 5′ two-thirds of the genome is proteolytic cleaved into 16 putative non-structural proteins (nsps). These putative nsps included two viral cysteine proteases, namely, nsp3 (papain-like protease) and nsp5 (chymotrypsin-like, 3C-like, or main protease), nsp12 (RNA-dependent RNA polymerase [RdRp]), nsp13 (helicase), and other nsps which are likely involved in the transcription and replication of the virus (Table 2). There are no remarkable differences between the orfs and nsps of 2019-nCoV with those of SARS-CoV (Table 3). The major distinction between SARSr-CoV and SARS-CoV is in orf3b, Spike and orf8 but especially variable in Spike S1 and orf8 which were previously shown to be recombination hot spots. Table 2. Putative functions and proteolytic cleavage sites of 16 nonstructural proteins in orf1a/b as predicted by bioinformatics. NSP Putative function/domain Amino acid position Putative cleave site nsp1 suppress antiviral host response M1 – G180 (LNGG'AYTR) nsp2 unknown A181 – G818 (LKGG'APTK) nsp3 putative PL-pro domain A819 – G2763 (LKGG'KIVN) nsp4 complex with nsp3 and 6: DMV formation K2764 – Q3263 (AVLQ'SGFR) nsp5 3CL-pro domain S3264 – Q3569 (VTFQ'SAVK) nsp6 complex with nsp3 and 4: DMV formation S3570 – Q3859 (ATVQ'SKMS) nsp7 complex with nsp8: primase S3860 – Q3942 (ATLQ'AIAS) nsp8 complex with nsp7: primase A3943 – Q4140 (VKLQ'NNEL) nsp9 RNA/DNA binding activity N4141 – Q4253 (VRLQ'AGNA) nsp10 complex with nsp14: replication fidelity A4254 – Q4392 (PMLQ'SADA) nsp11 short peptide at the end of orf1a S4393 – V4405 (end of orf1a) nsp12 RNA-dependent RNA polymerase S4393 – Q5324 (TVLQ'AVGA) nsp13 helicase A5325 – Q5925 (ATLQ'AENV) nsp14 ExoN: 3′–5′ exonuclease A5926 – Q6452 (TRLQ'SLEN) nsp15 XendoU: poly(U)-specific endoribonuclease S6453 – Q6798 (PKLQ'SSQA) nsp16 2'-O-MT: 2'-O-ribose methyltransferase S6799 – N7096 (end of orf1b) Table 3. Amino acid identity between the 2019 novel coronavirus and bat SARS-like coronavirus or human SARS-CoV. Amino acid identity (%) 2019-nCoV 2019-nCoV vs. bat-SL-CoVZXC21 vs. SARS-CoV NSP1 96 84 NSP2 96 68 NSP3 93 76 NSP4 96 80 NSP5 99 96 NSP6 98 88 NSP7 99 99 NSP8 96 97 NSP9 96 97 NSP10 98 97 NSP11 85 85 NSP12 96 96 NSP13 99 100 NSP14 95 95 NSP15 88 89 NSP16 98 93 Spike 80 76 Orf3a 92 72 Orf3b 32 32 Envelope 100 95 Membrane 99 91 Orf6 94 69 Orf7a 89 85 Orf7b 93 81 Orf8/Orf8b 94 40 Nucleoprotein 94 94 Orf9b 73 73 Spike Spike glycoprotein comprised of S1 and S2 subunits. The S1 subunit contains a signal peptide, followed by an N-terminal domain (NTD) and receptor-binding domain (RBD), while the S2 subunit contains conserved fusion peptide (FP), heptad repeat (HR) 1 and 2, transmembrane domain (TM), and cytoplasmic domain (CP). We found that the S2 subunit of 2019-nCoV is highly conserved and shares 99% identity with those of the two bat SARS-like CoVs (SL-CoV ZXC21 and ZC45) and human SARS-CoV (Figure 2). Thus the broad spectrum antiviral peptides against S2 would be an important preventive and treatment modality for testing in animal models before clinical trials [18]. Though the S1 subunit of 2019-nCoV shares around 70% identity to that of the two bat SARS-like CoVs and human SARS-CoV (Figure 3(A)), the core domain of RBD (excluding the external subdomain) are highly conserved (Figure 3(B)). Most of the amino acid differences of RBD are located in the external subdomain, which is responsible for the direct interaction with the host receptor. Further investigation of this soluble variable external subdomain region will reveal its receptor usage, interspecies transmission and pathogenesis. Unlike 2019-nCoV and human SARS-CoV, most known bat SARSr-CoVs have two stretches of deletions in the spike receptor binding domain (RBD) when compared with that of human SARS-CoV. But some Yunnan strains such as the WIV1 had no such deletions and can use human ACE2 as a cellular entry receptor. It is interesting to note that the two bat SARS-related coronavirus ZXC21 and ZC45, being closest to 2019-nCoV, can infect suckling rats and cause inflammation in the brain tissue, and pathological changes in lung & intestine. However, these two viruses could not be isolated in Vero E6 cells and were not investigated further. The two retained deletion sites in the Spike genes of ZXC21 and ZC45 may lessen their likelihood of jumping species barriers imposed by receptor specificity. Figure 2. Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences. Figure 3. Comparison of protein sequences of A. Spike globular head S1, and B. S1 receptor-binding domain (RBD) subunit. Multiple alignment of Spike S1 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21, bat-SL-CoVZXC45, bat-SL-CoV-YNLF_31C, bat-SL-CoV-YNLF_34C and bat SL-CoV HKU3-1 (accession number MG772934.1 and MG772933.1, KP886808, KP886809 and DQ022305, respectively), human SARS coronavirus GZ02 and Tor2 (accession number AY390556 and AY274119, respectively) and Paguma SARS-CoV (accession number AY515512) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. Orange box indicates the region of signal peptide, while green and blue boxes indicate the core domain and receptor binding domain respectively. Sequences of RBD, highlighted in (A) were used for comparison. External subdomain variable region of 2019-nCoV HKU-SZ-005b was predicted by comparison of amino acid similarity and published structural analysis [17]. Purple box indicates the external subdomain region. Orf3b A novel short putative protein with 4 helices and no homology to existing SARS-CoV or SARS-r-CoV protein was found within Orf3b (Figure 4). It is notable that SARS-CoV deletion mutants lacking orf3b replicate to levels similar to those of wild-type virus in several cell types [19], suggesting that orf3b is dispensable for viral replication in vitro. But orf3b may have a role in viral pathogenicity as Vero E6 but not 293T cells transfected with a construct expressing Orf3b underwent necrosis as early as 6 h after transfection and underwent simultaneous necrosis and apoptosis at later time points [20]. Orf3b was also shown to inhibit expression of IFN-β at synthesis and signalling [21]. Subsequently, orf3b homologues identified from three bat SARS-related-CoV strains were C-terminally truncated and lacked the C-terminal nucleus localization signal of SARS-CoV [22]. IFN antagonist activity analysis demonstrated that one SARS-related-CoV orf3b still possessed IFN antagonist and IRF3-modulating activities. These results indicated that different orf3b proteins display different IFN antagonist activities and this function is independent of the protein's nuclear localization, suggesting a potential link between bat SARS-related-CoV orf3b function and pathogenesis. The importance of this new protein in 2019-nCoV will require further validation and study. Figure 4. Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b. Orf8 orf8 is an accessory protein found in the Betacoronavirus lineage B coronaviruses. Human SARS-CoVs isolated from early-phase patients, all civet SARS-CoVs, and other bat SARS-related CoVs contain full-length orf8 [23]. However, a 29-nucleotide deletion, which causes the split of full length of orf8 into putative orf8a and orf8b, has been found in all SARS-CoV isolated from mid- and late- phase human patients [24]. In addition, we have previously identified two bat SARS-related-CoV (Bat-CoV YNLF_31C and YNLF_34C) and proposed that the original SARS-CoV full-length orf8 is acquired from these two bat SARS-related-CoV [25]. Since the SARS-CoV is the closest human pathogenic virus to the 2019-nCoV, we performed phylogenetic analysis and multiple alignments to investigate the orf8 amino acid sequences. The orf8 protein sequences used in the analysis derived from early phase SARS-CoV that includes full-length orf8 (human SARS-CoV GZ02), the mid- and late-phase SARS-CoV that includes the split orf8b (human SARS-CoV Tor2), civet SARS-CoV (paguma SARS-CoV), two bat SARS-related-CoV containing full-length orf8 (bat-CoV YNLF_31C and YNLF_34C), 2019-nCoV, the other two closest bat SARS-related-CoV to 2019-nCoV SL-CoV ZXC21 and ZC45), and bat SARS-related-CoV HKU3-1 (Figure 5(A)). As expected, orf8 derived from 2019-nCoV belongs to the group that includes the closest genome sequences of bat SARS-related-CoV ZXC21 and ZC45. Interestingly, the new 2019-nCoV orf8 is distant from the conserved orf8 or orf8b derived from human SARS-CoV or its related viruses derived from civet (paguma SARS-CoV) and bat (bat-CoV YNLF_31C and YNLF_34C). This new orf8 of 2019-nCoV does not contain known functional domain or motif. An aggregation motif VLVVL (amino acid 75–79) has been found in SARS-CoV orf8b (Figure 5(B)) which was shown to trigger intracellular stress pathways and activates NLRP3 inflammasomes [26], but this is absent in this novel orf8 of 2019-nCoV. Based on a secondary structure prediction, this novel orf8 has a high possibility to form a protein with an alpha-helix, following with a beta-sheet(s) containing six strands (Figure 5(C)). Figure 5. Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively. Phylogenetic relationship among 2019-nCoV and other βCoVs The genome of 2019-nCoV has overall 89% nucleotide identity with bat SARS-related-CoV SL-CoVZXC21 (MG772934.1), and 82% with human SARS-CoV BJ01 2003 (AY278488) and human SARS-CoV Tor2 (AY274119). The phylogenetic trees constructed using the amino acid sequences of orf1a/b and the 4 structural genes (S, E, M, and N) were shown (Figure 6(A–E)). For all these 5 genes, the 2019-nCoV was clustered with lineage B βCoVs. It was most closely related to the bat SARS-related CoVs ZXC21 and ZC45 found in Chinese horseshoe bats (Rhinolopus sinicus) collected from Zhoushan city, Zhejiang province, China between 2015 and 2017. Thus this novel coronavirus should belong to the genus Betacoronavirus, subgenus Sabecovirus (previously lineage 2b of Group 2 coronavirus). SARS-related coronaviruses have been found continuously especially in horseshoe bat species in the last 13 years. Between 2003 and 2018, 339 complete SARS-related coronavirus genomes have been sequenced, including 274 human SARS-CoV, 18 civet SARS coronavirus, and 47 bat SARS-related coronaviruses mainly from Rhinolophus bat species. Together, they formed a distinct subclade among other lineage B βCoVs. These results suggested that the 2019-nCoV might have also originated from bats. But we cannot ascertain whether another intermediate or amplification animal host infected by 2019-nCoV could be found in the epidemiological market, just as in the case of Paguma civets for SARS-CoV. Figure 6. Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein. RNA secondary structures As shown in Figure 7(A–C), the SARS-CoV 5′-UTR contains SL1, SL2, SL3, SL4, S5, SL5A, SL5B, SL5C, SL6, SL7, and SL8. The SL3 contains trans–cis motif [27]. The SL1, SL2, SL3, SL4, S5, SL5A, SL5B, and SL5C structures were similar among the 2019-nCoV, human SARS-CoV and the bat SARS-related ZC45. In the 2019-nCoV, part of the S5 found was inside the orf1a/b (marked in red), which was similar to SARS-CoV. In bat SARS-related CoV ZC45, the S5 was not found inside orf1a/b. The 2019-nCoV had the same SL6, SL7, and SL8 as SARS-CoV, and an additional stem loop. Bat SARS-related CoV ZC45 did not have the SARS-COV SL6-like stem loop. Instead, it possessed two other stem loops in this region. All three strains had similar SL7 and SL8. The bat SARS-like CoV ZC45 also had an additional stem loop between SL7 and SL8. Overall, the 5′-UTR of 2019-nCoV was more similar to that of SARS-CoV than the bat SARS-related CoV ZC 45. The biological relevance and effects of virulence of the 5′-UTR structures should be investigated further. The 2019-nCoV had various 3′-UTR structures, including BSL, S1, S2, S3, S4, L1, L2, L3, and HVR (Figure 7(D–F)). The 3′-UTR was conserved among 2019-nCoV, human SARS-CoV and SARS-related CoVs [27]. Figure 7. Secondary structure prediction and comparison in the 5′-untranslated region (UTR) and 3′-UTR using the RNAfold WebServer (with minimum free energy and partition function in Fold algorithms and basic options. The SARS 5′- and 3′- UTR was used as a reference to adjust the prediction results.(A) SARS-CoV 5'-UTR; (B) 2019-nCoV (HKU-SZ-005b) 5'-UTR; (C) ZC45 5'-UTR; (D) SARS-CoV 3'-UTR; (E) 2019-nCoV (HKU-SZ-005b) 3'-UTR; (F) ZC45 3'-UTR. In summary, 2019-nCoV is a novel lineage B Betacoronavirus closely related to bat SARS-related coronaviruses. It also has unique genomic features which deserves further investigation to ascertain their roles in viral replication cycle and pathogenesis. More animal sampling to determine its natural animal reservoir and intermediate animal host in the market is important. This will shed light on the evolutionary history of this emerging coronavirus which has jumped into human after the other two zoonotic Betacoroanviruses, SARS-CoV and MERS-CoV.
sec	Introduction Coronaviruses (CoVs) are enveloped, positive-sense, single-stranded RNA viruses that belong to the subfamily Coronavirinae, family Coronavirdiae, order Nidovirales. There are four genera of CoVs, namely, Alphacoronavirus (αCoV), Betacoronavirus (βCoV), Deltacoronavirus (δCoV), and Gammacoronavirus (γCoV) [1]. Evolutionary analyses have shown that bats and rodents are the gene sources of most αCoVs and βCoVs, while avian species are the gene sources of most δCoVs and γCoVs. CoVs have repeatedly crossed species barriers and some have emerged as important human pathogens. The best-known examples include severe acute respiratory syndrome CoV (SARS-CoV) which emerged in China in 2002–2003 to cause a large-scale epidemic with about 8000 infections and 800 deaths, and Middle East respiratory syndrome CoV (MERS-CoV) which has caused a persistent epidemic in the Arabian Peninsula since 2012 [2,3]. In both of these epidemics, these viruses have likely originated from bats and then jumped into another amplification mammalian host [the Himalayan palm civet (Paguma larvata) for SARS-CoV and the dromedary camel (Camelus dromedarius) for MERS-CoV] before crossing species barriers to infect humans. Prior to December 2019, 6 CoVs were known to infect human, including 2 αCoV (HCoV-229E and HKU-NL63) and 4 βCoV (HCoV-OC43 [lineage A], HCoV-HKU1 [lineage A], SARS-CoV [lineage B] and MERS-CoV [lineage C]). The βCoV lineage A HCoV-OC43 and HCoV-HKU1 usually cause self-limiting upper respiratory infections in immunocompetent hosts and occasionally lower respiratory tract infections in immunocompromised hosts and elderly [4]. In contrast, SARS-CoV (lineage B βCoV) and MERS-CoV (lineage C βCoV) may cause severe lower respiratory tract infection with acute respiratory distress syndrome and extrapulmonary manifestations, such as diarrhea, lymphopenia, deranged liver and renal function tests, and multiorgan dysfunction syndrome, among both immunocompetent and immunocompromised hosts with mortality rates of ∼10% and ∼35%, respectively [5,6]. On 31 December 2019, the World Health Organization (WHO) was informed of cases of pneumonia of unknown cause in Wuhan City, Hubei Province, China [7]. Subsequent virological testing showed that a novel CoV was detected in these patients. As of 16 January 2020, 43 patients have been diagnosed to have infection with this novel CoV, including two exported cases of mild pneumonia in Thailand and Japan [8,9]. The earliest date of symptom onset was 1 December 2019 [10]. The symptomatology of these patients included fever, malaise, dry cough, and dyspnea. Among 41 patients admitted to a designated hospital in Wuhan, 13 (32%) required intensive care and 6 (15%) died. All 41 patients had pneumonia with abnormal findings on chest computerized tomography scans [10]. We recently reported a familial cluster of 2019-nCoV infection in a Shenzhen family with travel history to Wuhan [11]. In the present study, we analyzed a 2019-nCoV complete genome from a patient in this familial cluster and compared it with the genomes of related β CoVs to provide insights into the potential source and control strategies.
title	Introduction
p	Coronaviruses (CoVs) are enveloped, positive-sense, single-stranded RNA viruses that belong to the subfamily Coronavirinae, family Coronavirdiae, order Nidovirales. There are four genera of CoVs, namely, Alphacoronavirus (αCoV), Betacoronavirus (βCoV), Deltacoronavirus (δCoV), and Gammacoronavirus (γCoV) [1]. Evolutionary analyses have shown that bats and rodents are the gene sources of most αCoVs and βCoVs, while avian species are the gene sources of most δCoVs and γCoVs. CoVs have repeatedly crossed species barriers and some have emerged as important human pathogens. The best-known examples include severe acute respiratory syndrome CoV (SARS-CoV) which emerged in China in 2002–2003 to cause a large-scale epidemic with about 8000 infections and 800 deaths, and Middle East respiratory syndrome CoV (MERS-CoV) which has caused a persistent epidemic in the Arabian Peninsula since 2012 [2,3]. In both of these epidemics, these viruses have likely originated from bats and then jumped into another amplification mammalian host [the Himalayan palm civet (Paguma larvata) for SARS-CoV and the dromedary camel (Camelus dromedarius) for MERS-CoV] before crossing species barriers to infect humans.
p	Prior to December 2019, 6 CoVs were known to infect human, including 2 αCoV (HCoV-229E and HKU-NL63) and 4 βCoV (HCoV-OC43 [lineage A], HCoV-HKU1 [lineage A], SARS-CoV [lineage B] and MERS-CoV [lineage C]). The βCoV lineage A HCoV-OC43 and HCoV-HKU1 usually cause self-limiting upper respiratory infections in immunocompetent hosts and occasionally lower respiratory tract infections in immunocompromised hosts and elderly [4]. In contrast, SARS-CoV (lineage B βCoV) and MERS-CoV (lineage C βCoV) may cause severe lower respiratory tract infection with acute respiratory distress syndrome and extrapulmonary manifestations, such as diarrhea, lymphopenia, deranged liver and renal function tests, and multiorgan dysfunction syndrome, among both immunocompetent and immunocompromised hosts with mortality rates of ∼10% and ∼35%, respectively [5,6]. On 31 December 2019, the World Health Organization (WHO) was informed of cases of pneumonia of unknown cause in Wuhan City, Hubei Province, China [7]. Subsequent virological testing showed that a novel CoV was detected in these patients. As of 16 January 2020, 43 patients have been diagnosed to have infection with this novel CoV, including two exported cases of mild pneumonia in Thailand and Japan [8,9]. The earliest date of symptom onset was 1 December 2019 [10]. The symptomatology of these patients included fever, malaise, dry cough, and dyspnea. Among 41 patients admitted to a designated hospital in Wuhan, 13 (32%) required intensive care and 6 (15%) died. All 41 patients had pneumonia with abnormal findings on chest computerized tomography scans [10].
p	We recently reported a familial cluster of 2019-nCoV infection in a Shenzhen family with travel history to Wuhan [11]. In the present study, we analyzed a 2019-nCoV complete genome from a patient in this familial cluster and compared it with the genomes of related β CoVs to provide insights into the potential source and control strategies.
sec	Materials and methods Viral sequences The complete genome sequence of 2019-nCoV HKU-SZ-005b was available at GenBank (accession no. MN975262) (Table 1). The representative complete genomes of other related βCoVs strains collected from human or mammals were included for comparative analysis. These included strains collected from human, bats, and Himalayan palm civet between 2003 and 2018, with one 229E coronavirus strain as the outgroup. Table 1. List of coronaviruses used in this study. Accession number Name displayed on the tree Name of full-length genome Year AY274119 Human SARS-CoV Tor2 2003 SARS-related coronavirus isolate Tor2 2003 AY278488 Human SARS-CoV BJ01 2003 SARS coronavirus BJ01 2003 AY278491 SARS coronavirus HKU-39849 2003 SARS coronavirus HKU-39849 2003 2003 AY390556 Human SARS-CoV GZ02 2003 SARS coronavirus GZ02 2003 AY391777 Human CoV OC43 2003 Human coronavirus OC43 2003 AY515512 Paguma SARS CoV HC/SZ/61/03 2003 SARS coronavirus HC/SZ/61/03 (paguma SARS) 2018 EF065513 Bat CoV HKU9-1 2006 Bat coronavirus HKU9-1 2006 FJ588686 Bat SL-CoV Rs672 2006 Bat SARS CoV Rs672/2006 2006 KC881005 Bat SL-CoV RsSHC014 2013 Bat SARS-like coronavirus RsSHC014 2013 KC881006 Bat SL-CoV Rs3367 2013 Bat SARS-like coronavirus Rs3367 2013 KY417146 Bat SL-CoV Rs4231 2016 Bat SARS-like coronavirus isolate Rs4231 2016 KY417149 Bat SL-CoV Rs4255 2016 Bat SARS-like coronavirus isolate Rs4255 2016 MG772933 Bat SL-CoV ZC45 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZC45 2018 MG772934 Bat SL-CoV ZXC21 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZXC21 2018 MK211377 Bat CoV YN2018C 2018 Coronavirus BtRs-BetaCoV/YN2018C 2018 MK211378 Bat CoV YN2018D 2018 Coronavirus BtRs-BetaCoV/YN2018Da 2018 MN975262 HKU-SZ-005b Human 2019-nCoV HKU-SZ-005b 2020 NC002645 Human CoV 229E 2000 Human coronavirus 229E 2000 NC006577 Human CoV HKU1 2004 Human coronavirus HKU1 2004 NC009019 Bat CoV HKU4-1 2006 Bat coronavirus HKU4-1 2006 NC009020 Bat CoV HKU5-1 2006 Bat coronavirus HKU5-1 2006 NC014470 Bat SARS-related CoV BM48-31 2009 Bat coronavirus BM48-31/BGR/2008 2008 NC019843 Human MERS-CoV 2012 Middle East respiratory syndrome coronavirus 2012 aOne nucleotide was added within M gene to maintain the sequence in-frame. Genome characterization and phylogenetic analysis Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees [12]. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) was shown next to the branches [13]. The tree was drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method and were in the units of the number of amino acid substitutions per site [14]. All ambiguous positions were removed for each sequence pair (pairwise deletion option). Evolutionary analyses were conducted in MEGA X [15]. Multiple alignment was performed using CLUSTAL 2.1 and further visualized using BOXSHADE 3.21. Structural analysis of orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED) [16]. For the prediction of protein secondary structure including beta sheet, alpha helix, and coil, initial amino acid sequences were input and analysed using neural networking and its own algorithm. Predicted structures were visualized and highlighted on the BOXSHADE alignment. Prediction of transmembrane domains was performed using the TMHMM 2.0 server (http://www.cbs.dtu.dk/services/TMHMM/). Secondary structure prediction in the 5′-untranslated region (UTR) and 3′-UTR was performed using the RNAfold WebServer (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) with minimum free energy (MFE) and partition function in Fold algorithms and basic options. The human SARS-CoV 5′- and 3′- UTR were used as references to adjust the prediction results.
title	Materials and methods
sec	Viral sequences The complete genome sequence of 2019-nCoV HKU-SZ-005b was available at GenBank (accession no. MN975262) (Table 1). The representative complete genomes of other related βCoVs strains collected from human or mammals were included for comparative analysis. These included strains collected from human, bats, and Himalayan palm civet between 2003 and 2018, with one 229E coronavirus strain as the outgroup. Table 1. List of coronaviruses used in this study. Accession number Name displayed on the tree Name of full-length genome Year AY274119 Human SARS-CoV Tor2 2003 SARS-related coronavirus isolate Tor2 2003 AY278488 Human SARS-CoV BJ01 2003 SARS coronavirus BJ01 2003 AY278491 SARS coronavirus HKU-39849 2003 SARS coronavirus HKU-39849 2003 2003 AY390556 Human SARS-CoV GZ02 2003 SARS coronavirus GZ02 2003 AY391777 Human CoV OC43 2003 Human coronavirus OC43 2003 AY515512 Paguma SARS CoV HC/SZ/61/03 2003 SARS coronavirus HC/SZ/61/03 (paguma SARS) 2018 EF065513 Bat CoV HKU9-1 2006 Bat coronavirus HKU9-1 2006 FJ588686 Bat SL-CoV Rs672 2006 Bat SARS CoV Rs672/2006 2006 KC881005 Bat SL-CoV RsSHC014 2013 Bat SARS-like coronavirus RsSHC014 2013 KC881006 Bat SL-CoV Rs3367 2013 Bat SARS-like coronavirus Rs3367 2013 KY417146 Bat SL-CoV Rs4231 2016 Bat SARS-like coronavirus isolate Rs4231 2016 KY417149 Bat SL-CoV Rs4255 2016 Bat SARS-like coronavirus isolate Rs4255 2016 MG772933 Bat SL-CoV ZC45 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZC45 2018 MG772934 Bat SL-CoV ZXC21 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZXC21 2018 MK211377 Bat CoV YN2018C 2018 Coronavirus BtRs-BetaCoV/YN2018C 2018 MK211378 Bat CoV YN2018D 2018 Coronavirus BtRs-BetaCoV/YN2018Da 2018 MN975262 HKU-SZ-005b Human 2019-nCoV HKU-SZ-005b 2020 NC002645 Human CoV 229E 2000 Human coronavirus 229E 2000 NC006577 Human CoV HKU1 2004 Human coronavirus HKU1 2004 NC009019 Bat CoV HKU4-1 2006 Bat coronavirus HKU4-1 2006 NC009020 Bat CoV HKU5-1 2006 Bat coronavirus HKU5-1 2006 NC014470 Bat SARS-related CoV BM48-31 2009 Bat coronavirus BM48-31/BGR/2008 2008 NC019843 Human MERS-CoV 2012 Middle East respiratory syndrome coronavirus 2012 aOne nucleotide was added within M gene to maintain the sequence in-frame.
title	Viral sequences
p	The complete genome sequence of 2019-nCoV HKU-SZ-005b was available at GenBank (accession no. MN975262) (Table 1). The representative complete genomes of other related βCoVs strains collected from human or mammals were included for comparative analysis. These included strains collected from human, bats, and Himalayan palm civet between 2003 and 2018, with one 229E coronavirus strain as the outgroup. Table 1. List of coronaviruses used in this study. Accession number Name displayed on the tree Name of full-length genome Year AY274119 Human SARS-CoV Tor2 2003 SARS-related coronavirus isolate Tor2 2003 AY278488 Human SARS-CoV BJ01 2003 SARS coronavirus BJ01 2003 AY278491 SARS coronavirus HKU-39849 2003 SARS coronavirus HKU-39849 2003 2003 AY390556 Human SARS-CoV GZ02 2003 SARS coronavirus GZ02 2003 AY391777 Human CoV OC43 2003 Human coronavirus OC43 2003 AY515512 Paguma SARS CoV HC/SZ/61/03 2003 SARS coronavirus HC/SZ/61/03 (paguma SARS) 2018 EF065513 Bat CoV HKU9-1 2006 Bat coronavirus HKU9-1 2006 FJ588686 Bat SL-CoV Rs672 2006 Bat SARS CoV Rs672/2006 2006 KC881005 Bat SL-CoV RsSHC014 2013 Bat SARS-like coronavirus RsSHC014 2013 KC881006 Bat SL-CoV Rs3367 2013 Bat SARS-like coronavirus Rs3367 2013 KY417146 Bat SL-CoV Rs4231 2016 Bat SARS-like coronavirus isolate Rs4231 2016 KY417149 Bat SL-CoV Rs4255 2016 Bat SARS-like coronavirus isolate Rs4255 2016 MG772933 Bat SL-CoV ZC45 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZC45 2018 MG772934 Bat SL-CoV ZXC21 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZXC21 2018 MK211377 Bat CoV YN2018C 2018 Coronavirus BtRs-BetaCoV/YN2018C 2018 MK211378 Bat CoV YN2018D 2018 Coronavirus BtRs-BetaCoV/YN2018Da 2018 MN975262 HKU-SZ-005b Human 2019-nCoV HKU-SZ-005b 2020 NC002645 Human CoV 229E 2000 Human coronavirus 229E 2000 NC006577 Human CoV HKU1 2004 Human coronavirus HKU1 2004 NC009019 Bat CoV HKU4-1 2006 Bat coronavirus HKU4-1 2006 NC009020 Bat CoV HKU5-1 2006 Bat coronavirus HKU5-1 2006 NC014470 Bat SARS-related CoV BM48-31 2009 Bat coronavirus BM48-31/BGR/2008 2008 NC019843 Human MERS-CoV 2012 Middle East respiratory syndrome coronavirus 2012 aOne nucleotide was added within M gene to maintain the sequence in-frame.
table-wrap	Table 1. List of coronaviruses used in this study. Accession number Name displayed on the tree Name of full-length genome Year AY274119 Human SARS-CoV Tor2 2003 SARS-related coronavirus isolate Tor2 2003 AY278488 Human SARS-CoV BJ01 2003 SARS coronavirus BJ01 2003 AY278491 SARS coronavirus HKU-39849 2003 SARS coronavirus HKU-39849 2003 2003 AY390556 Human SARS-CoV GZ02 2003 SARS coronavirus GZ02 2003 AY391777 Human CoV OC43 2003 Human coronavirus OC43 2003 AY515512 Paguma SARS CoV HC/SZ/61/03 2003 SARS coronavirus HC/SZ/61/03 (paguma SARS) 2018 EF065513 Bat CoV HKU9-1 2006 Bat coronavirus HKU9-1 2006 FJ588686 Bat SL-CoV Rs672 2006 Bat SARS CoV Rs672/2006 2006 KC881005 Bat SL-CoV RsSHC014 2013 Bat SARS-like coronavirus RsSHC014 2013 KC881006 Bat SL-CoV Rs3367 2013 Bat SARS-like coronavirus Rs3367 2013 KY417146 Bat SL-CoV Rs4231 2016 Bat SARS-like coronavirus isolate Rs4231 2016 KY417149 Bat SL-CoV Rs4255 2016 Bat SARS-like coronavirus isolate Rs4255 2016 MG772933 Bat SL-CoV ZC45 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZC45 2018 MG772934 Bat SL-CoV ZXC21 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZXC21 2018 MK211377 Bat CoV YN2018C 2018 Coronavirus BtRs-BetaCoV/YN2018C 2018 MK211378 Bat CoV YN2018D 2018 Coronavirus BtRs-BetaCoV/YN2018Da 2018 MN975262 HKU-SZ-005b Human 2019-nCoV HKU-SZ-005b 2020 NC002645 Human CoV 229E 2000 Human coronavirus 229E 2000 NC006577 Human CoV HKU1 2004 Human coronavirus HKU1 2004 NC009019 Bat CoV HKU4-1 2006 Bat coronavirus HKU4-1 2006 NC009020 Bat CoV HKU5-1 2006 Bat coronavirus HKU5-1 2006 NC014470 Bat SARS-related CoV BM48-31 2009 Bat coronavirus BM48-31/BGR/2008 2008 NC019843 Human MERS-CoV 2012 Middle East respiratory syndrome coronavirus 2012 aOne nucleotide was added within M gene to maintain the sequence in-frame.
label	Table 1.
caption	List of coronaviruses used in this study.
title	List of coronaviruses used in this study.
table	Accession number Name displayed on the tree Name of full-length genome Year AY274119 Human SARS-CoV Tor2 2003 SARS-related coronavirus isolate Tor2 2003 AY278488 Human SARS-CoV BJ01 2003 SARS coronavirus BJ01 2003 AY278491 SARS coronavirus HKU-39849 2003 SARS coronavirus HKU-39849 2003 2003 AY390556 Human SARS-CoV GZ02 2003 SARS coronavirus GZ02 2003 AY391777 Human CoV OC43 2003 Human coronavirus OC43 2003 AY515512 Paguma SARS CoV HC/SZ/61/03 2003 SARS coronavirus HC/SZ/61/03 (paguma SARS) 2018 EF065513 Bat CoV HKU9-1 2006 Bat coronavirus HKU9-1 2006 FJ588686 Bat SL-CoV Rs672 2006 Bat SARS CoV Rs672/2006 2006 KC881005 Bat SL-CoV RsSHC014 2013 Bat SARS-like coronavirus RsSHC014 2013 KC881006 Bat SL-CoV Rs3367 2013 Bat SARS-like coronavirus Rs3367 2013 KY417146 Bat SL-CoV Rs4231 2016 Bat SARS-like coronavirus isolate Rs4231 2016 KY417149 Bat SL-CoV Rs4255 2016 Bat SARS-like coronavirus isolate Rs4255 2016 MG772933 Bat SL-CoV ZC45 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZC45 2018 MG772934 Bat SL-CoV ZXC21 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZXC21 2018 MK211377 Bat CoV YN2018C 2018 Coronavirus BtRs-BetaCoV/YN2018C 2018 MK211378 Bat CoV YN2018D 2018 Coronavirus BtRs-BetaCoV/YN2018Da 2018 MN975262 HKU-SZ-005b Human 2019-nCoV HKU-SZ-005b 2020 NC002645 Human CoV 229E 2000 Human coronavirus 229E 2000 NC006577 Human CoV HKU1 2004 Human coronavirus HKU1 2004 NC009019 Bat CoV HKU4-1 2006 Bat coronavirus HKU4-1 2006 NC009020 Bat CoV HKU5-1 2006 Bat coronavirus HKU5-1 2006 NC014470 Bat SARS-related CoV BM48-31 2009 Bat coronavirus BM48-31/BGR/2008 2008 NC019843 Human MERS-CoV 2012 Middle East respiratory syndrome coronavirus 2012
tr	Accession number Name displayed on the tree Name of full-length genome Year
th	Accession number
th	Name displayed on the tree
th	Name of full-length genome
th	Year
tr	AY274119 Human SARS-CoV Tor2 2003 SARS-related coronavirus isolate Tor2 2003
td	AY274119
td	Human SARS-CoV Tor2 2003
td	SARS-related coronavirus isolate Tor2
td	2003
tr	AY278488 Human SARS-CoV BJ01 2003 SARS coronavirus BJ01 2003
td	AY278488
td	Human SARS-CoV BJ01 2003
td	SARS coronavirus BJ01
td	2003
tr	AY278491 SARS coronavirus HKU-39849 2003 SARS coronavirus HKU-39849 2003 2003
td	AY278491
td	SARS coronavirus HKU-39849 2003
td	SARS coronavirus HKU-39849 2003
td	2003
tr	AY390556 Human SARS-CoV GZ02 2003 SARS coronavirus GZ02 2003
td	AY390556
td	Human SARS-CoV GZ02 2003
td	SARS coronavirus GZ02
td	2003
tr	AY391777 Human CoV OC43 2003 Human coronavirus OC43 2003
td	AY391777
td	Human CoV OC43 2003
td	Human coronavirus OC43
td	2003
tr	AY515512 Paguma SARS CoV HC/SZ/61/03 2003 SARS coronavirus HC/SZ/61/03 (paguma SARS) 2018
td	AY515512
td	Paguma SARS CoV HC/SZ/61/03 2003
td	SARS coronavirus HC/SZ/61/03 (paguma SARS)
td	2018
tr	EF065513 Bat CoV HKU9-1 2006 Bat coronavirus HKU9-1 2006
td	EF065513
td	Bat CoV HKU9-1 2006
td	Bat coronavirus HKU9-1
td	2006
tr	FJ588686 Bat SL-CoV Rs672 2006 Bat SARS CoV Rs672/2006 2006
td	FJ588686
td	Bat SL-CoV Rs672 2006
td	Bat SARS CoV Rs672/2006
td	2006
tr	KC881005 Bat SL-CoV RsSHC014 2013 Bat SARS-like coronavirus RsSHC014 2013
td	KC881005
td	Bat SL-CoV RsSHC014 2013
td	Bat SARS-like coronavirus RsSHC014
td	2013
tr	KC881006 Bat SL-CoV Rs3367 2013 Bat SARS-like coronavirus Rs3367 2013
td	KC881006
td	Bat SL-CoV Rs3367 2013
td	Bat SARS-like coronavirus Rs3367
td	2013
tr	KY417146 Bat SL-CoV Rs4231 2016 Bat SARS-like coronavirus isolate Rs4231 2016
td	KY417146
td	Bat SL-CoV Rs4231 2016
td	Bat SARS-like coronavirus isolate Rs4231
td	2016
tr	KY417149 Bat SL-CoV Rs4255 2016 Bat SARS-like coronavirus isolate Rs4255 2016
td	KY417149
td	Bat SL-CoV Rs4255 2016
td	Bat SARS-like coronavirus isolate Rs4255
td	2016
tr	MG772933 Bat SL-CoV ZC45 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZC45 2018
td	MG772933
td	Bat SL-CoV ZC45 2018
td	Bat SARS-like coronavirus isolate bat-SL-CoVZC45
td	2018
tr	MG772934 Bat SL-CoV ZXC21 2018 Bat SARS-like coronavirus isolate bat-SL-CoVZXC21 2018
td	MG772934
td	Bat SL-CoV ZXC21 2018
td	Bat SARS-like coronavirus isolate bat-SL-CoVZXC21
td	2018
tr	MK211377 Bat CoV YN2018C 2018 Coronavirus BtRs-BetaCoV/YN2018C 2018
td	MK211377
td	Bat CoV YN2018C 2018
td	Coronavirus BtRs-BetaCoV/YN2018C
td	2018
tr	MK211378 Bat CoV YN2018D 2018 Coronavirus BtRs-BetaCoV/YN2018Da 2018
td	MK211378
td	Bat CoV YN2018D 2018
td	Coronavirus BtRs-BetaCoV/YN2018Da
td	2018
tr	MN975262 HKU-SZ-005b Human 2019-nCoV HKU-SZ-005b 2020
td	MN975262
td	HKU-SZ-005b
td	Human 2019-nCoV HKU-SZ-005b
td	2020
tr	NC002645 Human CoV 229E 2000 Human coronavirus 229E 2000
td	NC002645
td	Human CoV 229E 2000
td	Human coronavirus 229E
td	2000
tr	NC006577 Human CoV HKU1 2004 Human coronavirus HKU1 2004
td	NC006577
td	Human CoV HKU1 2004
td	Human coronavirus HKU1
td	2004
tr	NC009019 Bat CoV HKU4-1 2006 Bat coronavirus HKU4-1 2006
td	NC009019
td	Bat CoV HKU4-1 2006
td	Bat coronavirus HKU4-1
td	2006
tr	NC009020 Bat CoV HKU5-1 2006 Bat coronavirus HKU5-1 2006
td	NC009020
td	Bat CoV HKU5-1 2006
td	Bat coronavirus HKU5-1
td	2006
tr	NC014470 Bat SARS-related CoV BM48-31 2009 Bat coronavirus BM48-31/BGR/2008 2008
td	NC014470
td	Bat SARS-related CoV BM48-31 2009
td	Bat coronavirus BM48-31/BGR/2008
td	2008
tr	NC019843 Human MERS-CoV 2012 Middle East respiratory syndrome coronavirus 2012
td	NC019843
td	Human MERS-CoV 2012
td	Middle East respiratory syndrome coronavirus
td	2012
table-wrap-foot	aOne nucleotide was added within M gene to maintain the sequence in-frame.
p	aOne nucleotide was added within M gene to maintain the sequence in-frame.
sec	Genome characterization and phylogenetic analysis Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees [12]. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) was shown next to the branches [13]. The tree was drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method and were in the units of the number of amino acid substitutions per site [14]. All ambiguous positions were removed for each sequence pair (pairwise deletion option). Evolutionary analyses were conducted in MEGA X [15]. Multiple alignment was performed using CLUSTAL 2.1 and further visualized using BOXSHADE 3.21. Structural analysis of orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED) [16]. For the prediction of protein secondary structure including beta sheet, alpha helix, and coil, initial amino acid sequences were input and analysed using neural networking and its own algorithm. Predicted structures were visualized and highlighted on the BOXSHADE alignment. Prediction of transmembrane domains was performed using the TMHMM 2.0 server (http://www.cbs.dtu.dk/services/TMHMM/). Secondary structure prediction in the 5′-untranslated region (UTR) and 3′-UTR was performed using the RNAfold WebServer (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) with minimum free energy (MFE) and partition function in Fold algorithms and basic options. The human SARS-CoV 5′- and 3′- UTR were used as references to adjust the prediction results.
title	Genome characterization and phylogenetic analysis
p	Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees [12]. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) was shown next to the branches [13]. The tree was drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method and were in the units of the number of amino acid substitutions per site [14]. All ambiguous positions were removed for each sequence pair (pairwise deletion option). Evolutionary analyses were conducted in MEGA X [15]. Multiple alignment was performed using CLUSTAL 2.1 and further visualized using BOXSHADE 3.21. Structural analysis of orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED) [16]. For the prediction of protein secondary structure including beta sheet, alpha helix, and coil, initial amino acid sequences were input and analysed using neural networking and its own algorithm. Predicted structures were visualized and highlighted on the BOXSHADE alignment. Prediction of transmembrane domains was performed using the TMHMM 2.0 server (http://www.cbs.dtu.dk/services/TMHMM/). Secondary structure prediction in the 5′-untranslated region (UTR) and 3′-UTR was performed using the RNAfold WebServer (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) with minimum free energy (MFE) and partition function in Fold algorithms and basic options. The human SARS-CoV 5′- and 3′- UTR were used as references to adjust the prediction results.
sec	Results and discussion Genome organization The single-stranded RNA genome of the 2019-nCoV was 29891 nucleotides in size, encoding 9860 amino acids. The G + C content was 38%. Similar to other βCoVs, the 2019-nCoV genome contains two flanking untranslated regions (UTRs) and a single long open reading frame encoding a polyprotein. The 2019-nCoV genome is arranged in the order of 5′-replicase (orf1/ab)-structural proteins [Spike (S)-Envelope (E)-Membrane (M)-Nucleocapsid (N)]−3′ and lacks the hemagglutinin-esterase gene which is characteristically found in lineage A β-CoVs (Figure 1). Figure 1. Betacoronavirus genome organization. The betacoronavirus genome comprises of the 5'-untranslated region (5'-UTR), open reading frame (orf) 1a/b (yellow box) encoding non-structural proteins (nsp) for replication, structural proteins including spike (blue box), envelop (orange box), membrane (red box), and nucleocapsid (cyan box) proteins, accessory proteins (purple boxes) such as orf 3, 6, 7a, 7b, 8 and 9b in the 2019-nCoV (HKU-SZ-005b) genome, and the 3'-untranslated region (3'-UTR). Examples of lineages A to D betacoronaviruses include human coronavirus (HCoV) HKU1 (lineage A), 2019-nCoV (HKU-SZ-005b) and SARS-CoV (lineage B), MERS-CoV and Tylonycteris bat CoV HKU4 (lineage C), and Rousettus bat CoV HKU9 (lineage D). The length of nsps and orfs are not drawn in scale. There are 12 putative, functional open reading frames (orfs) expressed from a nested set of 9 subgenomic mRNAs carrying a conserved leader sequence in the genome, 9 transcription-regulatory sequences, and 2 terminal untranslated regions. The 5′- and 3′-UTRs are 265 and 358 nucleotides long, respectively. The 5′- and 3 ′-UTR sequences of 2019-nCoV are similar to those of other βCoVs with nucleotide identities of ⩾83.6%. The large replicase polyproteins pp1a and pp1ab encoded by the partially overlapping 5′-terminal orf1a/b within the 5′ two-thirds of the genome is proteolytic cleaved into 16 putative non-structural proteins (nsps). These putative nsps included two viral cysteine proteases, namely, nsp3 (papain-like protease) and nsp5 (chymotrypsin-like, 3C-like, or main protease), nsp12 (RNA-dependent RNA polymerase [RdRp]), nsp13 (helicase), and other nsps which are likely involved in the transcription and replication of the virus (Table 2). There are no remarkable differences between the orfs and nsps of 2019-nCoV with those of SARS-CoV (Table 3). The major distinction between SARSr-CoV and SARS-CoV is in orf3b, Spike and orf8 but especially variable in Spike S1 and orf8 which were previously shown to be recombination hot spots. Table 2. Putative functions and proteolytic cleavage sites of 16 nonstructural proteins in orf1a/b as predicted by bioinformatics. NSP Putative function/domain Amino acid position Putative cleave site nsp1 suppress antiviral host response M1 – G180 (LNGG'AYTR) nsp2 unknown A181 – G818 (LKGG'APTK) nsp3 putative PL-pro domain A819 – G2763 (LKGG'KIVN) nsp4 complex with nsp3 and 6: DMV formation K2764 – Q3263 (AVLQ'SGFR) nsp5 3CL-pro domain S3264 – Q3569 (VTFQ'SAVK) nsp6 complex with nsp3 and 4: DMV formation S3570 – Q3859 (ATVQ'SKMS) nsp7 complex with nsp8: primase S3860 – Q3942 (ATLQ'AIAS) nsp8 complex with nsp7: primase A3943 – Q4140 (VKLQ'NNEL) nsp9 RNA/DNA binding activity N4141 – Q4253 (VRLQ'AGNA) nsp10 complex with nsp14: replication fidelity A4254 – Q4392 (PMLQ'SADA) nsp11 short peptide at the end of orf1a S4393 – V4405 (end of orf1a) nsp12 RNA-dependent RNA polymerase S4393 – Q5324 (TVLQ'AVGA) nsp13 helicase A5325 – Q5925 (ATLQ'AENV) nsp14 ExoN: 3′–5′ exonuclease A5926 – Q6452 (TRLQ'SLEN) nsp15 XendoU: poly(U)-specific endoribonuclease S6453 – Q6798 (PKLQ'SSQA) nsp16 2'-O-MT: 2'-O-ribose methyltransferase S6799 – N7096 (end of orf1b) Table 3. Amino acid identity between the 2019 novel coronavirus and bat SARS-like coronavirus or human SARS-CoV. Amino acid identity (%) 2019-nCoV 2019-nCoV vs. bat-SL-CoVZXC21 vs. SARS-CoV NSP1 96 84 NSP2 96 68 NSP3 93 76 NSP4 96 80 NSP5 99 96 NSP6 98 88 NSP7 99 99 NSP8 96 97 NSP9 96 97 NSP10 98 97 NSP11 85 85 NSP12 96 96 NSP13 99 100 NSP14 95 95 NSP15 88 89 NSP16 98 93 Spike 80 76 Orf3a 92 72 Orf3b 32 32 Envelope 100 95 Membrane 99 91 Orf6 94 69 Orf7a 89 85 Orf7b 93 81 Orf8/Orf8b 94 40 Nucleoprotein 94 94 Orf9b 73 73 Spike Spike glycoprotein comprised of S1 and S2 subunits. The S1 subunit contains a signal peptide, followed by an N-terminal domain (NTD) and receptor-binding domain (RBD), while the S2 subunit contains conserved fusion peptide (FP), heptad repeat (HR) 1 and 2, transmembrane domain (TM), and cytoplasmic domain (CP). We found that the S2 subunit of 2019-nCoV is highly conserved and shares 99% identity with those of the two bat SARS-like CoVs (SL-CoV ZXC21 and ZC45) and human SARS-CoV (Figure 2). Thus the broad spectrum antiviral peptides against S2 would be an important preventive and treatment modality for testing in animal models before clinical trials [18]. Though the S1 subunit of 2019-nCoV shares around 70% identity to that of the two bat SARS-like CoVs and human SARS-CoV (Figure 3(A)), the core domain of RBD (excluding the external subdomain) are highly conserved (Figure 3(B)). Most of the amino acid differences of RBD are located in the external subdomain, which is responsible for the direct interaction with the host receptor. Further investigation of this soluble variable external subdomain region will reveal its receptor usage, interspecies transmission and pathogenesis. Unlike 2019-nCoV and human SARS-CoV, most known bat SARSr-CoVs have two stretches of deletions in the spike receptor binding domain (RBD) when compared with that of human SARS-CoV. But some Yunnan strains such as the WIV1 had no such deletions and can use human ACE2 as a cellular entry receptor. It is interesting to note that the two bat SARS-related coronavirus ZXC21 and ZC45, being closest to 2019-nCoV, can infect suckling rats and cause inflammation in the brain tissue, and pathological changes in lung & intestine. However, these two viruses could not be isolated in Vero E6 cells and were not investigated further. The two retained deletion sites in the Spike genes of ZXC21 and ZC45 may lessen their likelihood of jumping species barriers imposed by receptor specificity. Figure 2. Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences. Figure 3. Comparison of protein sequences of A. Spike globular head S1, and B. S1 receptor-binding domain (RBD) subunit. Multiple alignment of Spike S1 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21, bat-SL-CoVZXC45, bat-SL-CoV-YNLF_31C, bat-SL-CoV-YNLF_34C and bat SL-CoV HKU3-1 (accession number MG772934.1 and MG772933.1, KP886808, KP886809 and DQ022305, respectively), human SARS coronavirus GZ02 and Tor2 (accession number AY390556 and AY274119, respectively) and Paguma SARS-CoV (accession number AY515512) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. Orange box indicates the region of signal peptide, while green and blue boxes indicate the core domain and receptor binding domain respectively. Sequences of RBD, highlighted in (A) were used for comparison. External subdomain variable region of 2019-nCoV HKU-SZ-005b was predicted by comparison of amino acid similarity and published structural analysis [17]. Purple box indicates the external subdomain region. Orf3b A novel short putative protein with 4 helices and no homology to existing SARS-CoV or SARS-r-CoV protein was found within Orf3b (Figure 4). It is notable that SARS-CoV deletion mutants lacking orf3b replicate to levels similar to those of wild-type virus in several cell types [19], suggesting that orf3b is dispensable for viral replication in vitro. But orf3b may have a role in viral pathogenicity as Vero E6 but not 293T cells transfected with a construct expressing Orf3b underwent necrosis as early as 6 h after transfection and underwent simultaneous necrosis and apoptosis at later time points [20]. Orf3b was also shown to inhibit expression of IFN-β at synthesis and signalling [21]. Subsequently, orf3b homologues identified from three bat SARS-related-CoV strains were C-terminally truncated and lacked the C-terminal nucleus localization signal of SARS-CoV [22]. IFN antagonist activity analysis demonstrated that one SARS-related-CoV orf3b still possessed IFN antagonist and IRF3-modulating activities. These results indicated that different orf3b proteins display different IFN antagonist activities and this function is independent of the protein's nuclear localization, suggesting a potential link between bat SARS-related-CoV orf3b function and pathogenesis. The importance of this new protein in 2019-nCoV will require further validation and study. Figure 4. Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b. Orf8 orf8 is an accessory protein found in the Betacoronavirus lineage B coronaviruses. Human SARS-CoVs isolated from early-phase patients, all civet SARS-CoVs, and other bat SARS-related CoVs contain full-length orf8 [23]. However, a 29-nucleotide deletion, which causes the split of full length of orf8 into putative orf8a and orf8b, has been found in all SARS-CoV isolated from mid- and late- phase human patients [24]. In addition, we have previously identified two bat SARS-related-CoV (Bat-CoV YNLF_31C and YNLF_34C) and proposed that the original SARS-CoV full-length orf8 is acquired from these two bat SARS-related-CoV [25]. Since the SARS-CoV is the closest human pathogenic virus to the 2019-nCoV, we performed phylogenetic analysis and multiple alignments to investigate the orf8 amino acid sequences. The orf8 protein sequences used in the analysis derived from early phase SARS-CoV that includes full-length orf8 (human SARS-CoV GZ02), the mid- and late-phase SARS-CoV that includes the split orf8b (human SARS-CoV Tor2), civet SARS-CoV (paguma SARS-CoV), two bat SARS-related-CoV containing full-length orf8 (bat-CoV YNLF_31C and YNLF_34C), 2019-nCoV, the other two closest bat SARS-related-CoV to 2019-nCoV SL-CoV ZXC21 and ZC45), and bat SARS-related-CoV HKU3-1 (Figure 5(A)). As expected, orf8 derived from 2019-nCoV belongs to the group that includes the closest genome sequences of bat SARS-related-CoV ZXC21 and ZC45. Interestingly, the new 2019-nCoV orf8 is distant from the conserved orf8 or orf8b derived from human SARS-CoV or its related viruses derived from civet (paguma SARS-CoV) and bat (bat-CoV YNLF_31C and YNLF_34C). This new orf8 of 2019-nCoV does not contain known functional domain or motif. An aggregation motif VLVVL (amino acid 75–79) has been found in SARS-CoV orf8b (Figure 5(B)) which was shown to trigger intracellular stress pathways and activates NLRP3 inflammasomes [26], but this is absent in this novel orf8 of 2019-nCoV. Based on a secondary structure prediction, this novel orf8 has a high possibility to form a protein with an alpha-helix, following with a beta-sheet(s) containing six strands (Figure 5(C)). Figure 5. Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively. Phylogenetic relationship among 2019-nCoV and other βCoVs The genome of 2019-nCoV has overall 89% nucleotide identity with bat SARS-related-CoV SL-CoVZXC21 (MG772934.1), and 82% with human SARS-CoV BJ01 2003 (AY278488) and human SARS-CoV Tor2 (AY274119). The phylogenetic trees constructed using the amino acid sequences of orf1a/b and the 4 structural genes (S, E, M, and N) were shown (Figure 6(A–E)). For all these 5 genes, the 2019-nCoV was clustered with lineage B βCoVs. It was most closely related to the bat SARS-related CoVs ZXC21 and ZC45 found in Chinese horseshoe bats (Rhinolopus sinicus) collected from Zhoushan city, Zhejiang province, China between 2015 and 2017. Thus this novel coronavirus should belong to the genus Betacoronavirus, subgenus Sabecovirus (previously lineage 2b of Group 2 coronavirus). SARS-related coronaviruses have been found continuously especially in horseshoe bat species in the last 13 years. Between 2003 and 2018, 339 complete SARS-related coronavirus genomes have been sequenced, including 274 human SARS-CoV, 18 civet SARS coronavirus, and 47 bat SARS-related coronaviruses mainly from Rhinolophus bat species. Together, they formed a distinct subclade among other lineage B βCoVs. These results suggested that the 2019-nCoV might have also originated from bats. But we cannot ascertain whether another intermediate or amplification animal host infected by 2019-nCoV could be found in the epidemiological market, just as in the case of Paguma civets for SARS-CoV. Figure 6. Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein. RNA secondary structures As shown in Figure 7(A–C), the SARS-CoV 5′-UTR contains SL1, SL2, SL3, SL4, S5, SL5A, SL5B, SL5C, SL6, SL7, and SL8. The SL3 contains trans–cis motif [27]. The SL1, SL2, SL3, SL4, S5, SL5A, SL5B, and SL5C structures were similar among the 2019-nCoV, human SARS-CoV and the bat SARS-related ZC45. In the 2019-nCoV, part of the S5 found was inside the orf1a/b (marked in red), which was similar to SARS-CoV. In bat SARS-related CoV ZC45, the S5 was not found inside orf1a/b. The 2019-nCoV had the same SL6, SL7, and SL8 as SARS-CoV, and an additional stem loop. Bat SARS-related CoV ZC45 did not have the SARS-COV SL6-like stem loop. Instead, it possessed two other stem loops in this region. All three strains had similar SL7 and SL8. The bat SARS-like CoV ZC45 also had an additional stem loop between SL7 and SL8. Overall, the 5′-UTR of 2019-nCoV was more similar to that of SARS-CoV than the bat SARS-related CoV ZC 45. The biological relevance and effects of virulence of the 5′-UTR structures should be investigated further. The 2019-nCoV had various 3′-UTR structures, including BSL, S1, S2, S3, S4, L1, L2, L3, and HVR (Figure 7(D–F)). The 3′-UTR was conserved among 2019-nCoV, human SARS-CoV and SARS-related CoVs [27]. Figure 7. Secondary structure prediction and comparison in the 5′-untranslated region (UTR) and 3′-UTR using the RNAfold WebServer (with minimum free energy and partition function in Fold algorithms and basic options. The SARS 5′- and 3′- UTR was used as a reference to adjust the prediction results.(A) SARS-CoV 5'-UTR; (B) 2019-nCoV (HKU-SZ-005b) 5'-UTR; (C) ZC45 5'-UTR; (D) SARS-CoV 3'-UTR; (E) 2019-nCoV (HKU-SZ-005b) 3'-UTR; (F) ZC45 3'-UTR. In summary, 2019-nCoV is a novel lineage B Betacoronavirus closely related to bat SARS-related coronaviruses. It also has unique genomic features which deserves further investigation to ascertain their roles in viral replication cycle and pathogenesis. More animal sampling to determine its natural animal reservoir and intermediate animal host in the market is important. This will shed light on the evolutionary history of this emerging coronavirus which has jumped into human after the other two zoonotic Betacoroanviruses, SARS-CoV and MERS-CoV.
title	Results and discussion
sec	Genome organization The single-stranded RNA genome of the 2019-nCoV was 29891 nucleotides in size, encoding 9860 amino acids. The G + C content was 38%. Similar to other βCoVs, the 2019-nCoV genome contains two flanking untranslated regions (UTRs) and a single long open reading frame encoding a polyprotein. The 2019-nCoV genome is arranged in the order of 5′-replicase (orf1/ab)-structural proteins [Spike (S)-Envelope (E)-Membrane (M)-Nucleocapsid (N)]−3′ and lacks the hemagglutinin-esterase gene which is characteristically found in lineage A β-CoVs (Figure 1). Figure 1. Betacoronavirus genome organization. The betacoronavirus genome comprises of the 5'-untranslated region (5'-UTR), open reading frame (orf) 1a/b (yellow box) encoding non-structural proteins (nsp) for replication, structural proteins including spike (blue box), envelop (orange box), membrane (red box), and nucleocapsid (cyan box) proteins, accessory proteins (purple boxes) such as orf 3, 6, 7a, 7b, 8 and 9b in the 2019-nCoV (HKU-SZ-005b) genome, and the 3'-untranslated region (3'-UTR). Examples of lineages A to D betacoronaviruses include human coronavirus (HCoV) HKU1 (lineage A), 2019-nCoV (HKU-SZ-005b) and SARS-CoV (lineage B), MERS-CoV and Tylonycteris bat CoV HKU4 (lineage C), and Rousettus bat CoV HKU9 (lineage D). The length of nsps and orfs are not drawn in scale. There are 12 putative, functional open reading frames (orfs) expressed from a nested set of 9 subgenomic mRNAs carrying a conserved leader sequence in the genome, 9 transcription-regulatory sequences, and 2 terminal untranslated regions. The 5′- and 3′-UTRs are 265 and 358 nucleotides long, respectively. The 5′- and 3 ′-UTR sequences of 2019-nCoV are similar to those of other βCoVs with nucleotide identities of ⩾83.6%. The large replicase polyproteins pp1a and pp1ab encoded by the partially overlapping 5′-terminal orf1a/b within the 5′ two-thirds of the genome is proteolytic cleaved into 16 putative non-structural proteins (nsps). These putative nsps included two viral cysteine proteases, namely, nsp3 (papain-like protease) and nsp5 (chymotrypsin-like, 3C-like, or main protease), nsp12 (RNA-dependent RNA polymerase [RdRp]), nsp13 (helicase), and other nsps which are likely involved in the transcription and replication of the virus (Table 2). There are no remarkable differences between the orfs and nsps of 2019-nCoV with those of SARS-CoV (Table 3). The major distinction between SARSr-CoV and SARS-CoV is in orf3b, Spike and orf8 but especially variable in Spike S1 and orf8 which were previously shown to be recombination hot spots. Table 2. Putative functions and proteolytic cleavage sites of 16 nonstructural proteins in orf1a/b as predicted by bioinformatics. NSP Putative function/domain Amino acid position Putative cleave site nsp1 suppress antiviral host response M1 – G180 (LNGG'AYTR) nsp2 unknown A181 – G818 (LKGG'APTK) nsp3 putative PL-pro domain A819 – G2763 (LKGG'KIVN) nsp4 complex with nsp3 and 6: DMV formation K2764 – Q3263 (AVLQ'SGFR) nsp5 3CL-pro domain S3264 – Q3569 (VTFQ'SAVK) nsp6 complex with nsp3 and 4: DMV formation S3570 – Q3859 (ATVQ'SKMS) nsp7 complex with nsp8: primase S3860 – Q3942 (ATLQ'AIAS) nsp8 complex with nsp7: primase A3943 – Q4140 (VKLQ'NNEL) nsp9 RNA/DNA binding activity N4141 – Q4253 (VRLQ'AGNA) nsp10 complex with nsp14: replication fidelity A4254 – Q4392 (PMLQ'SADA) nsp11 short peptide at the end of orf1a S4393 – V4405 (end of orf1a) nsp12 RNA-dependent RNA polymerase S4393 – Q5324 (TVLQ'AVGA) nsp13 helicase A5325 – Q5925 (ATLQ'AENV) nsp14 ExoN: 3′–5′ exonuclease A5926 – Q6452 (TRLQ'SLEN) nsp15 XendoU: poly(U)-specific endoribonuclease S6453 – Q6798 (PKLQ'SSQA) nsp16 2'-O-MT: 2'-O-ribose methyltransferase S6799 – N7096 (end of orf1b) Table 3. Amino acid identity between the 2019 novel coronavirus and bat SARS-like coronavirus or human SARS-CoV. Amino acid identity (%) 2019-nCoV 2019-nCoV vs. bat-SL-CoVZXC21 vs. SARS-CoV NSP1 96 84 NSP2 96 68 NSP3 93 76 NSP4 96 80 NSP5 99 96 NSP6 98 88 NSP7 99 99 NSP8 96 97 NSP9 96 97 NSP10 98 97 NSP11 85 85 NSP12 96 96 NSP13 99 100 NSP14 95 95 NSP15 88 89 NSP16 98 93 Spike 80 76 Orf3a 92 72 Orf3b 32 32 Envelope 100 95 Membrane 99 91 Orf6 94 69 Orf7a 89 85 Orf7b 93 81 Orf8/Orf8b 94 40 Nucleoprotein 94 94 Orf9b 73 73 Spike Spike glycoprotein comprised of S1 and S2 subunits. The S1 subunit contains a signal peptide, followed by an N-terminal domain (NTD) and receptor-binding domain (RBD), while the S2 subunit contains conserved fusion peptide (FP), heptad repeat (HR) 1 and 2, transmembrane domain (TM), and cytoplasmic domain (CP). We found that the S2 subunit of 2019-nCoV is highly conserved and shares 99% identity with those of the two bat SARS-like CoVs (SL-CoV ZXC21 and ZC45) and human SARS-CoV (Figure 2). Thus the broad spectrum antiviral peptides against S2 would be an important preventive and treatment modality for testing in animal models before clinical trials [18]. Though the S1 subunit of 2019-nCoV shares around 70% identity to that of the two bat SARS-like CoVs and human SARS-CoV (Figure 3(A)), the core domain of RBD (excluding the external subdomain) are highly conserved (Figure 3(B)). Most of the amino acid differences of RBD are located in the external subdomain, which is responsible for the direct interaction with the host receptor. Further investigation of this soluble variable external subdomain region will reveal its receptor usage, interspecies transmission and pathogenesis. Unlike 2019-nCoV and human SARS-CoV, most known bat SARSr-CoVs have two stretches of deletions in the spike receptor binding domain (RBD) when compared with that of human SARS-CoV. But some Yunnan strains such as the WIV1 had no such deletions and can use human ACE2 as a cellular entry receptor. It is interesting to note that the two bat SARS-related coronavirus ZXC21 and ZC45, being closest to 2019-nCoV, can infect suckling rats and cause inflammation in the brain tissue, and pathological changes in lung & intestine. However, these two viruses could not be isolated in Vero E6 cells and were not investigated further. The two retained deletion sites in the Spike genes of ZXC21 and ZC45 may lessen their likelihood of jumping species barriers imposed by receptor specificity. Figure 2. Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences. Figure 3. Comparison of protein sequences of A. Spike globular head S1, and B. S1 receptor-binding domain (RBD) subunit. Multiple alignment of Spike S1 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21, bat-SL-CoVZXC45, bat-SL-CoV-YNLF_31C, bat-SL-CoV-YNLF_34C and bat SL-CoV HKU3-1 (accession number MG772934.1 and MG772933.1, KP886808, KP886809 and DQ022305, respectively), human SARS coronavirus GZ02 and Tor2 (accession number AY390556 and AY274119, respectively) and Paguma SARS-CoV (accession number AY515512) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. Orange box indicates the region of signal peptide, while green and blue boxes indicate the core domain and receptor binding domain respectively. Sequences of RBD, highlighted in (A) were used for comparison. External subdomain variable region of 2019-nCoV HKU-SZ-005b was predicted by comparison of amino acid similarity and published structural analysis [17]. Purple box indicates the external subdomain region. Orf3b A novel short putative protein with 4 helices and no homology to existing SARS-CoV or SARS-r-CoV protein was found within Orf3b (Figure 4). It is notable that SARS-CoV deletion mutants lacking orf3b replicate to levels similar to those of wild-type virus in several cell types [19], suggesting that orf3b is dispensable for viral replication in vitro. But orf3b may have a role in viral pathogenicity as Vero E6 but not 293T cells transfected with a construct expressing Orf3b underwent necrosis as early as 6 h after transfection and underwent simultaneous necrosis and apoptosis at later time points [20]. Orf3b was also shown to inhibit expression of IFN-β at synthesis and signalling [21]. Subsequently, orf3b homologues identified from three bat SARS-related-CoV strains were C-terminally truncated and lacked the C-terminal nucleus localization signal of SARS-CoV [22]. IFN antagonist activity analysis demonstrated that one SARS-related-CoV orf3b still possessed IFN antagonist and IRF3-modulating activities. These results indicated that different orf3b proteins display different IFN antagonist activities and this function is independent of the protein's nuclear localization, suggesting a potential link between bat SARS-related-CoV orf3b function and pathogenesis. The importance of this new protein in 2019-nCoV will require further validation and study. Figure 4. Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b. Orf8 orf8 is an accessory protein found in the Betacoronavirus lineage B coronaviruses. Human SARS-CoVs isolated from early-phase patients, all civet SARS-CoVs, and other bat SARS-related CoVs contain full-length orf8 [23]. However, a 29-nucleotide deletion, which causes the split of full length of orf8 into putative orf8a and orf8b, has been found in all SARS-CoV isolated from mid- and late- phase human patients [24]. In addition, we have previously identified two bat SARS-related-CoV (Bat-CoV YNLF_31C and YNLF_34C) and proposed that the original SARS-CoV full-length orf8 is acquired from these two bat SARS-related-CoV [25]. Since the SARS-CoV is the closest human pathogenic virus to the 2019-nCoV, we performed phylogenetic analysis and multiple alignments to investigate the orf8 amino acid sequences. The orf8 protein sequences used in the analysis derived from early phase SARS-CoV that includes full-length orf8 (human SARS-CoV GZ02), the mid- and late-phase SARS-CoV that includes the split orf8b (human SARS-CoV Tor2), civet SARS-CoV (paguma SARS-CoV), two bat SARS-related-CoV containing full-length orf8 (bat-CoV YNLF_31C and YNLF_34C), 2019-nCoV, the other two closest bat SARS-related-CoV to 2019-nCoV SL-CoV ZXC21 and ZC45), and bat SARS-related-CoV HKU3-1 (Figure 5(A)). As expected, orf8 derived from 2019-nCoV belongs to the group that includes the closest genome sequences of bat SARS-related-CoV ZXC21 and ZC45. Interestingly, the new 2019-nCoV orf8 is distant from the conserved orf8 or orf8b derived from human SARS-CoV or its related viruses derived from civet (paguma SARS-CoV) and bat (bat-CoV YNLF_31C and YNLF_34C). This new orf8 of 2019-nCoV does not contain known functional domain or motif. An aggregation motif VLVVL (amino acid 75–79) has been found in SARS-CoV orf8b (Figure 5(B)) which was shown to trigger intracellular stress pathways and activates NLRP3 inflammasomes [26], but this is absent in this novel orf8 of 2019-nCoV. Based on a secondary structure prediction, this novel orf8 has a high possibility to form a protein with an alpha-helix, following with a beta-sheet(s) containing six strands (Figure 5(C)). Figure 5. Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively. Phylogenetic relationship among 2019-nCoV and other βCoVs The genome of 2019-nCoV has overall 89% nucleotide identity with bat SARS-related-CoV SL-CoVZXC21 (MG772934.1), and 82% with human SARS-CoV BJ01 2003 (AY278488) and human SARS-CoV Tor2 (AY274119). The phylogenetic trees constructed using the amino acid sequences of orf1a/b and the 4 structural genes (S, E, M, and N) were shown (Figure 6(A–E)). For all these 5 genes, the 2019-nCoV was clustered with lineage B βCoVs. It was most closely related to the bat SARS-related CoVs ZXC21 and ZC45 found in Chinese horseshoe bats (Rhinolopus sinicus) collected from Zhoushan city, Zhejiang province, China between 2015 and 2017. Thus this novel coronavirus should belong to the genus Betacoronavirus, subgenus Sabecovirus (previously lineage 2b of Group 2 coronavirus). SARS-related coronaviruses have been found continuously especially in horseshoe bat species in the last 13 years. Between 2003 and 2018, 339 complete SARS-related coronavirus genomes have been sequenced, including 274 human SARS-CoV, 18 civet SARS coronavirus, and 47 bat SARS-related coronaviruses mainly from Rhinolophus bat species. Together, they formed a distinct subclade among other lineage B βCoVs. These results suggested that the 2019-nCoV might have also originated from bats. But we cannot ascertain whether another intermediate or amplification animal host infected by 2019-nCoV could be found in the epidemiological market, just as in the case of Paguma civets for SARS-CoV. Figure 6. Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein. RNA secondary structures As shown in Figure 7(A–C), the SARS-CoV 5′-UTR contains SL1, SL2, SL3, SL4, S5, SL5A, SL5B, SL5C, SL6, SL7, and SL8. The SL3 contains trans–cis motif [27]. The SL1, SL2, SL3, SL4, S5, SL5A, SL5B, and SL5C structures were similar among the 2019-nCoV, human SARS-CoV and the bat SARS-related ZC45. In the 2019-nCoV, part of the S5 found was inside the orf1a/b (marked in red), which was similar to SARS-CoV. In bat SARS-related CoV ZC45, the S5 was not found inside orf1a/b. The 2019-nCoV had the same SL6, SL7, and SL8 as SARS-CoV, and an additional stem loop. Bat SARS-related CoV ZC45 did not have the SARS-COV SL6-like stem loop. Instead, it possessed two other stem loops in this region. All three strains had similar SL7 and SL8. The bat SARS-like CoV ZC45 also had an additional stem loop between SL7 and SL8. Overall, the 5′-UTR of 2019-nCoV was more similar to that of SARS-CoV than the bat SARS-related CoV ZC 45. The biological relevance and effects of virulence of the 5′-UTR structures should be investigated further. The 2019-nCoV had various 3′-UTR structures, including BSL, S1, S2, S3, S4, L1, L2, L3, and HVR (Figure 7(D–F)). The 3′-UTR was conserved among 2019-nCoV, human SARS-CoV and SARS-related CoVs [27]. Figure 7. Secondary structure prediction and comparison in the 5′-untranslated region (UTR) and 3′-UTR using the RNAfold WebServer (with minimum free energy and partition function in Fold algorithms and basic options. The SARS 5′- and 3′- UTR was used as a reference to adjust the prediction results.(A) SARS-CoV 5'-UTR; (B) 2019-nCoV (HKU-SZ-005b) 5'-UTR; (C) ZC45 5'-UTR; (D) SARS-CoV 3'-UTR; (E) 2019-nCoV (HKU-SZ-005b) 3'-UTR; (F) ZC45 3'-UTR. In summary, 2019-nCoV is a novel lineage B Betacoronavirus closely related to bat SARS-related coronaviruses. It also has unique genomic features which deserves further investigation to ascertain their roles in viral replication cycle and pathogenesis. More animal sampling to determine its natural animal reservoir and intermediate animal host in the market is important. This will shed light on the evolutionary history of this emerging coronavirus which has jumped into human after the other two zoonotic Betacoroanviruses, SARS-CoV and MERS-CoV.
title	Genome organization
p	The single-stranded RNA genome of the 2019-nCoV was 29891 nucleotides in size, encoding 9860 amino acids. The G + C content was 38%. Similar to other βCoVs, the 2019-nCoV genome contains two flanking untranslated regions (UTRs) and a single long open reading frame encoding a polyprotein. The 2019-nCoV genome is arranged in the order of 5′-replicase (orf1/ab)-structural proteins [Spike (S)-Envelope (E)-Membrane (M)-Nucleocapsid (N)]−3′ and lacks the hemagglutinin-esterase gene which is characteristically found in lineage A β-CoVs (Figure 1). Figure 1. Betacoronavirus genome organization. The betacoronavirus genome comprises of the 5'-untranslated region (5'-UTR), open reading frame (orf) 1a/b (yellow box) encoding non-structural proteins (nsp) for replication, structural proteins including spike (blue box), envelop (orange box), membrane (red box), and nucleocapsid (cyan box) proteins, accessory proteins (purple boxes) such as orf 3, 6, 7a, 7b, 8 and 9b in the 2019-nCoV (HKU-SZ-005b) genome, and the 3'-untranslated region (3'-UTR). Examples of lineages A to D betacoronaviruses include human coronavirus (HCoV) HKU1 (lineage A), 2019-nCoV (HKU-SZ-005b) and SARS-CoV (lineage B), MERS-CoV and Tylonycteris bat CoV HKU4 (lineage C), and Rousettus bat CoV HKU9 (lineage D). The length of nsps and orfs are not drawn in scale.
figure	Figure 1. Betacoronavirus genome organization. The betacoronavirus genome comprises of the 5'-untranslated region (5'-UTR), open reading frame (orf) 1a/b (yellow box) encoding non-structural proteins (nsp) for replication, structural proteins including spike (blue box), envelop (orange box), membrane (red box), and nucleocapsid (cyan box) proteins, accessory proteins (purple boxes) such as orf 3, 6, 7a, 7b, 8 and 9b in the 2019-nCoV (HKU-SZ-005b) genome, and the 3'-untranslated region (3'-UTR). Examples of lineages A to D betacoronaviruses include human coronavirus (HCoV) HKU1 (lineage A), 2019-nCoV (HKU-SZ-005b) and SARS-CoV (lineage B), MERS-CoV and Tylonycteris bat CoV HKU4 (lineage C), and Rousettus bat CoV HKU9 (lineage D). The length of nsps and orfs are not drawn in scale.
label	Figure 1.
caption	Betacoronavirus genome organization. The betacoronavirus genome comprises of the 5'-untranslated region (5'-UTR), open reading frame (orf) 1a/b (yellow box) encoding non-structural proteins (nsp) for replication, structural proteins including spike (blue box), envelop (orange box), membrane (red box), and nucleocapsid (cyan box) proteins, accessory proteins (purple boxes) such as orf 3, 6, 7a, 7b, 8 and 9b in the 2019-nCoV (HKU-SZ-005b) genome, and the 3'-untranslated region (3'-UTR). Examples of lineages A to D betacoronaviruses include human coronavirus (HCoV) HKU1 (lineage A), 2019-nCoV (HKU-SZ-005b) and SARS-CoV (lineage B), MERS-CoV and Tylonycteris bat CoV HKU4 (lineage C), and Rousettus bat CoV HKU9 (lineage D). The length of nsps and orfs are not drawn in scale.
p	Betacoronavirus genome organization. The betacoronavirus genome comprises of the 5'-untranslated region (5'-UTR), open reading frame (orf) 1a/b (yellow box) encoding non-structural proteins (nsp) for replication, structural proteins including spike (blue box), envelop (orange box), membrane (red box), and nucleocapsid (cyan box) proteins, accessory proteins (purple boxes) such as orf 3, 6, 7a, 7b, 8 and 9b in the 2019-nCoV (HKU-SZ-005b) genome, and the 3'-untranslated region (3'-UTR). Examples of lineages A to D betacoronaviruses include human coronavirus (HCoV) HKU1 (lineage A), 2019-nCoV (HKU-SZ-005b) and SARS-CoV (lineage B), MERS-CoV and Tylonycteris bat CoV HKU4 (lineage C), and Rousettus bat CoV HKU9 (lineage D). The length of nsps and orfs are not drawn in scale.
p	There are 12 putative, functional open reading frames (orfs) expressed from a nested set of 9 subgenomic mRNAs carrying a conserved leader sequence in the genome, 9 transcription-regulatory sequences, and 2 terminal untranslated regions. The 5′- and 3′-UTRs are 265 and 358 nucleotides long, respectively. The 5′- and 3 ′-UTR sequences of 2019-nCoV are similar to those of other βCoVs with nucleotide identities of ⩾83.6%. The large replicase polyproteins pp1a and pp1ab encoded by the partially overlapping 5′-terminal orf1a/b within the 5′ two-thirds of the genome is proteolytic cleaved into 16 putative non-structural proteins (nsps). These putative nsps included two viral cysteine proteases, namely, nsp3 (papain-like protease) and nsp5 (chymotrypsin-like, 3C-like, or main protease), nsp12 (RNA-dependent RNA polymerase [RdRp]), nsp13 (helicase), and other nsps which are likely involved in the transcription and replication of the virus (Table 2). There are no remarkable differences between the orfs and nsps of 2019-nCoV with those of SARS-CoV (Table 3). The major distinction between SARSr-CoV and SARS-CoV is in orf3b, Spike and orf8 but especially variable in Spike S1 and orf8 which were previously shown to be recombination hot spots. Table 2. Putative functions and proteolytic cleavage sites of 16 nonstructural proteins in orf1a/b as predicted by bioinformatics. NSP Putative function/domain Amino acid position Putative cleave site nsp1 suppress antiviral host response M1 – G180 (LNGG'AYTR) nsp2 unknown A181 – G818 (LKGG'APTK) nsp3 putative PL-pro domain A819 – G2763 (LKGG'KIVN) nsp4 complex with nsp3 and 6: DMV formation K2764 – Q3263 (AVLQ'SGFR) nsp5 3CL-pro domain S3264 – Q3569 (VTFQ'SAVK) nsp6 complex with nsp3 and 4: DMV formation S3570 – Q3859 (ATVQ'SKMS) nsp7 complex with nsp8: primase S3860 – Q3942 (ATLQ'AIAS) nsp8 complex with nsp7: primase A3943 – Q4140 (VKLQ'NNEL) nsp9 RNA/DNA binding activity N4141 – Q4253 (VRLQ'AGNA) nsp10 complex with nsp14: replication fidelity A4254 – Q4392 (PMLQ'SADA) nsp11 short peptide at the end of orf1a S4393 – V4405 (end of orf1a) nsp12 RNA-dependent RNA polymerase S4393 – Q5324 (TVLQ'AVGA) nsp13 helicase A5325 – Q5925 (ATLQ'AENV) nsp14 ExoN: 3′–5′ exonuclease A5926 – Q6452 (TRLQ'SLEN) nsp15 XendoU: poly(U)-specific endoribonuclease S6453 – Q6798 (PKLQ'SSQA) nsp16 2'-O-MT: 2'-O-ribose methyltransferase S6799 – N7096 (end of orf1b) Table 3. Amino acid identity between the 2019 novel coronavirus and bat SARS-like coronavirus or human SARS-CoV. Amino acid identity (%) 2019-nCoV 2019-nCoV vs. bat-SL-CoVZXC21 vs. SARS-CoV NSP1 96 84 NSP2 96 68 NSP3 93 76 NSP4 96 80 NSP5 99 96 NSP6 98 88 NSP7 99 99 NSP8 96 97 NSP9 96 97 NSP10 98 97 NSP11 85 85 NSP12 96 96 NSP13 99 100 NSP14 95 95 NSP15 88 89 NSP16 98 93 Spike 80 76 Orf3a 92 72 Orf3b 32 32 Envelope 100 95 Membrane 99 91 Orf6 94 69 Orf7a 89 85 Orf7b 93 81 Orf8/Orf8b 94 40 Nucleoprotein 94 94 Orf9b 73 73
table-wrap	Table 2. Putative functions and proteolytic cleavage sites of 16 nonstructural proteins in orf1a/b as predicted by bioinformatics. NSP Putative function/domain Amino acid position Putative cleave site nsp1 suppress antiviral host response M1 – G180 (LNGG'AYTR) nsp2 unknown A181 – G818 (LKGG'APTK) nsp3 putative PL-pro domain A819 – G2763 (LKGG'KIVN) nsp4 complex with nsp3 and 6: DMV formation K2764 – Q3263 (AVLQ'SGFR) nsp5 3CL-pro domain S3264 – Q3569 (VTFQ'SAVK) nsp6 complex with nsp3 and 4: DMV formation S3570 – Q3859 (ATVQ'SKMS) nsp7 complex with nsp8: primase S3860 – Q3942 (ATLQ'AIAS) nsp8 complex with nsp7: primase A3943 – Q4140 (VKLQ'NNEL) nsp9 RNA/DNA binding activity N4141 – Q4253 (VRLQ'AGNA) nsp10 complex with nsp14: replication fidelity A4254 – Q4392 (PMLQ'SADA) nsp11 short peptide at the end of orf1a S4393 – V4405 (end of orf1a) nsp12 RNA-dependent RNA polymerase S4393 – Q5324 (TVLQ'AVGA) nsp13 helicase A5325 – Q5925 (ATLQ'AENV) nsp14 ExoN: 3′–5′ exonuclease A5926 – Q6452 (TRLQ'SLEN) nsp15 XendoU: poly(U)-specific endoribonuclease S6453 – Q6798 (PKLQ'SSQA) nsp16 2'-O-MT: 2'-O-ribose methyltransferase S6799 – N7096 (end of orf1b)
label	Table 2.
caption	Putative functions and proteolytic cleavage sites of 16 nonstructural proteins in orf1a/b as predicted by bioinformatics.
title	Putative functions and proteolytic cleavage sites of 16 nonstructural proteins in orf1a/b as predicted by bioinformatics.
table	NSP Putative function/domain Amino acid position Putative cleave site nsp1 suppress antiviral host response M1 – G180 (LNGG'AYTR) nsp2 unknown A181 – G818 (LKGG'APTK) nsp3 putative PL-pro domain A819 – G2763 (LKGG'KIVN) nsp4 complex with nsp3 and 6: DMV formation K2764 – Q3263 (AVLQ'SGFR) nsp5 3CL-pro domain S3264 – Q3569 (VTFQ'SAVK) nsp6 complex with nsp3 and 4: DMV formation S3570 – Q3859 (ATVQ'SKMS) nsp7 complex with nsp8: primase S3860 – Q3942 (ATLQ'AIAS) nsp8 complex with nsp7: primase A3943 – Q4140 (VKLQ'NNEL) nsp9 RNA/DNA binding activity N4141 – Q4253 (VRLQ'AGNA) nsp10 complex with nsp14: replication fidelity A4254 – Q4392 (PMLQ'SADA) nsp11 short peptide at the end of orf1a S4393 – V4405 (end of orf1a) nsp12 RNA-dependent RNA polymerase S4393 – Q5324 (TVLQ'AVGA) nsp13 helicase A5325 – Q5925 (ATLQ'AENV) nsp14 ExoN: 3′–5′ exonuclease A5926 – Q6452 (TRLQ'SLEN) nsp15 XendoU: poly(U)-specific endoribonuclease S6453 – Q6798 (PKLQ'SSQA) nsp16 2'-O-MT: 2'-O-ribose methyltransferase S6799 – N7096 (end of orf1b)
tr	NSP Putative function/domain Amino acid position Putative cleave site
th	NSP
th	Putative function/domain
th	Amino acid position
th	Putative cleave site
tr	nsp1 suppress antiviral host response M1 – G180 (LNGG'AYTR)
td	nsp1
td	suppress antiviral host response
td	M1 – G180
td	(LNGG'AYTR)
tr	nsp2 unknown A181 – G818 (LKGG'APTK)
td	nsp2
td	unknown
td	A181 – G818
td	(LKGG'APTK)
tr	nsp3 putative PL-pro domain A819 – G2763 (LKGG'KIVN)
td	nsp3
td	putative PL-pro domain
td	A819 – G2763
td	(LKGG'KIVN)
tr	nsp4 complex with nsp3 and 6: DMV formation K2764 – Q3263 (AVLQ'SGFR)
td	nsp4
td	complex with nsp3 and 6: DMV formation
td	K2764 – Q3263
td	(AVLQ'SGFR)
tr	nsp5 3CL-pro domain S3264 – Q3569 (VTFQ'SAVK)
td	nsp5
td	3CL-pro domain
td	S3264 – Q3569
td	(VTFQ'SAVK)
tr	nsp6 complex with nsp3 and 4: DMV formation S3570 – Q3859 (ATVQ'SKMS)
td	nsp6
td	complex with nsp3 and 4: DMV formation
td	S3570 – Q3859
td	(ATVQ'SKMS)
tr	nsp7 complex with nsp8: primase S3860 – Q3942 (ATLQ'AIAS)
td	nsp7
td	complex with nsp8: primase
td	S3860 – Q3942
td	(ATLQ'AIAS)
tr	nsp8 complex with nsp7: primase A3943 – Q4140 (VKLQ'NNEL)
td	nsp8
td	complex with nsp7: primase
td	A3943 – Q4140
td	(VKLQ'NNEL)
tr	nsp9 RNA/DNA binding activity N4141 – Q4253 (VRLQ'AGNA)
td	nsp9
td	RNA/DNA binding activity
td	N4141 – Q4253
td	(VRLQ'AGNA)
tr	nsp10 complex with nsp14: replication fidelity A4254 – Q4392 (PMLQ'SADA)
td	nsp10
td	complex with nsp14: replication fidelity
td	A4254 – Q4392
td	(PMLQ'SADA)
tr	nsp11 short peptide at the end of orf1a S4393 – V4405 (end of orf1a)
td	nsp11
td	short peptide at the end of orf1a
td	S4393 – V4405
td	(end of orf1a)
tr	nsp12 RNA-dependent RNA polymerase S4393 – Q5324 (TVLQ'AVGA)
td	nsp12
td	RNA-dependent RNA polymerase
td	S4393 – Q5324
td	(TVLQ'AVGA)
tr	nsp13 helicase A5325 – Q5925 (ATLQ'AENV)
td	nsp13
td	helicase
td	A5325 – Q5925
td	(ATLQ'AENV)
tr	nsp14 ExoN: 3′–5′ exonuclease A5926 – Q6452 (TRLQ'SLEN)
td	nsp14
td	ExoN: 3′–5′ exonuclease
td	A5926 – Q6452
td	(TRLQ'SLEN)
tr	nsp15 XendoU: poly(U)-specific endoribonuclease S6453 – Q6798 (PKLQ'SSQA)
td	nsp15
td	XendoU: poly(U)-specific endoribonuclease
td	S6453 – Q6798
td	(PKLQ'SSQA)
tr	nsp16 2'-O-MT: 2'-O-ribose methyltransferase S6799 – N7096 (end of orf1b)
td	nsp16
td	2'-O-MT: 2'-O-ribose methyltransferase
td	S6799 – N7096
td	(end of orf1b)
table-wrap	Table 3. Amino acid identity between the 2019 novel coronavirus and bat SARS-like coronavirus or human SARS-CoV. Amino acid identity (%) 2019-nCoV 2019-nCoV vs. bat-SL-CoVZXC21 vs. SARS-CoV NSP1 96 84 NSP2 96 68 NSP3 93 76 NSP4 96 80 NSP5 99 96 NSP6 98 88 NSP7 99 99 NSP8 96 97 NSP9 96 97 NSP10 98 97 NSP11 85 85 NSP12 96 96 NSP13 99 100 NSP14 95 95 NSP15 88 89 NSP16 98 93 Spike 80 76 Orf3a 92 72 Orf3b 32 32 Envelope 100 95 Membrane 99 91 Orf6 94 69 Orf7a 89 85 Orf7b 93 81 Orf8/Orf8b 94 40 Nucleoprotein 94 94 Orf9b 73 73
label	Table 3.
caption	Amino acid identity between the 2019 novel coronavirus and bat SARS-like coronavirus or human SARS-CoV.
title	Amino acid identity between the 2019 novel coronavirus and bat SARS-like coronavirus or human SARS-CoV.
table	Amino acid identity (%) 2019-nCoV 2019-nCoV vs. bat-SL-CoVZXC21 vs. SARS-CoV NSP1 96 84 NSP2 96 68 NSP3 93 76 NSP4 96 80 NSP5 99 96 NSP6 98 88 NSP7 99 99 NSP8 96 97 NSP9 96 97 NSP10 98 97 NSP11 85 85 NSP12 96 96 NSP13 99 100 NSP14 95 95 NSP15 88 89 NSP16 98 93 Spike 80 76 Orf3a 92 72 Orf3b 32 32 Envelope 100 95 Membrane 99 91 Orf6 94 69 Orf7a 89 85 Orf7b 93 81 Orf8/Orf8b 94 40 Nucleoprotein 94 94 Orf9b 73 73
tr	Amino acid identity (%) 2019-nCoV 2019-nCoV
th	Amino acid identity (%)
th	2019-nCoV
th	2019-nCoV
tr	vs. bat-SL-CoVZXC21 vs. SARS-CoV
th
th	vs. bat-SL-CoVZXC21
th	vs. SARS-CoV
tr	NSP1 96 84
td	NSP1
td	96
td	84
tr	NSP2 96 68
td	NSP2
td	96
td	68
tr	NSP3 93 76
td	NSP3
td	93
td	76
tr	NSP4 96 80
td	NSP4
td	96
td	80
tr	NSP5 99 96
td	NSP5
td	99
td	96
tr	NSP6 98 88
td	NSP6
td	98
td	88
tr	NSP7 99 99
td	NSP7
td	99
td	99
tr	NSP8 96 97
td	NSP8
td	96
td	97
tr	NSP9 96 97
td	NSP9
td	96
td	97
tr	NSP10 98 97
td	NSP10
td	98
td	97
tr	NSP11 85 85
td	NSP11
td	85
td	85
tr	NSP12 96 96
td	NSP12
td	96
td	96
tr	NSP13 99 100
td	NSP13
td	99
td	100
tr	NSP14 95 95
td	NSP14
td	95
td	95
tr	NSP15 88 89
td	NSP15
td	88
td	89
tr	NSP16 98 93
td	NSP16
td	98
td	93
tr	Spike 80 76
td	Spike
td	80
td	76
tr	Orf3a 92 72
td	Orf3a
td	92
td	72
tr	Orf3b 32 32
td	Orf3b
td	32
td	32
tr	Envelope 100 95
td	Envelope
td	100
td	95
tr	Membrane 99 91
td	Membrane
td	99
td	91
tr	Orf6 94 69
td	Orf6
td	94
td	69
tr	Orf7a 89 85
td	Orf7a
td	89
td	85
tr	Orf7b 93 81
td	Orf7b
td	93
td	81
tr	Orf8/Orf8b 94 40
td	Orf8/Orf8b
td	94
td	40
tr	Nucleoprotein 94 94
td	Nucleoprotein
td	94
td	94
tr	Orf9b 73 73
td	Orf9b
td	73
td	73
sec	Spike Spike glycoprotein comprised of S1 and S2 subunits. The S1 subunit contains a signal peptide, followed by an N-terminal domain (NTD) and receptor-binding domain (RBD), while the S2 subunit contains conserved fusion peptide (FP), heptad repeat (HR) 1 and 2, transmembrane domain (TM), and cytoplasmic domain (CP). We found that the S2 subunit of 2019-nCoV is highly conserved and shares 99% identity with those of the two bat SARS-like CoVs (SL-CoV ZXC21 and ZC45) and human SARS-CoV (Figure 2). Thus the broad spectrum antiviral peptides against S2 would be an important preventive and treatment modality for testing in animal models before clinical trials [18]. Though the S1 subunit of 2019-nCoV shares around 70% identity to that of the two bat SARS-like CoVs and human SARS-CoV (Figure 3(A)), the core domain of RBD (excluding the external subdomain) are highly conserved (Figure 3(B)). Most of the amino acid differences of RBD are located in the external subdomain, which is responsible for the direct interaction with the host receptor. Further investigation of this soluble variable external subdomain region will reveal its receptor usage, interspecies transmission and pathogenesis. Unlike 2019-nCoV and human SARS-CoV, most known bat SARSr-CoVs have two stretches of deletions in the spike receptor binding domain (RBD) when compared with that of human SARS-CoV. But some Yunnan strains such as the WIV1 had no such deletions and can use human ACE2 as a cellular entry receptor. It is interesting to note that the two bat SARS-related coronavirus ZXC21 and ZC45, being closest to 2019-nCoV, can infect suckling rats and cause inflammation in the brain tissue, and pathological changes in lung & intestine. However, these two viruses could not be isolated in Vero E6 cells and were not investigated further. The two retained deletion sites in the Spike genes of ZXC21 and ZC45 may lessen their likelihood of jumping species barriers imposed by receptor specificity. Figure 2. Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences. Figure 3. Comparison of protein sequences of A. Spike globular head S1, and B. S1 receptor-binding domain (RBD) subunit. Multiple alignment of Spike S1 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21, bat-SL-CoVZXC45, bat-SL-CoV-YNLF_31C, bat-SL-CoV-YNLF_34C and bat SL-CoV HKU3-1 (accession number MG772934.1 and MG772933.1, KP886808, KP886809 and DQ022305, respectively), human SARS coronavirus GZ02 and Tor2 (accession number AY390556 and AY274119, respectively) and Paguma SARS-CoV (accession number AY515512) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. Orange box indicates the region of signal peptide, while green and blue boxes indicate the core domain and receptor binding domain respectively. Sequences of RBD, highlighted in (A) were used for comparison. External subdomain variable region of 2019-nCoV HKU-SZ-005b was predicted by comparison of amino acid similarity and published structural analysis [17]. Purple box indicates the external subdomain region.
title	Spike
p	Spike glycoprotein comprised of S1 and S2 subunits. The S1 subunit contains a signal peptide, followed by an N-terminal domain (NTD) and receptor-binding domain (RBD), while the S2 subunit contains conserved fusion peptide (FP), heptad repeat (HR) 1 and 2, transmembrane domain (TM), and cytoplasmic domain (CP). We found that the S2 subunit of 2019-nCoV is highly conserved and shares 99% identity with those of the two bat SARS-like CoVs (SL-CoV ZXC21 and ZC45) and human SARS-CoV (Figure 2). Thus the broad spectrum antiviral peptides against S2 would be an important preventive and treatment modality for testing in animal models before clinical trials [18]. Though the S1 subunit of 2019-nCoV shares around 70% identity to that of the two bat SARS-like CoVs and human SARS-CoV (Figure 3(A)), the core domain of RBD (excluding the external subdomain) are highly conserved (Figure 3(B)). Most of the amino acid differences of RBD are located in the external subdomain, which is responsible for the direct interaction with the host receptor. Further investigation of this soluble variable external subdomain region will reveal its receptor usage, interspecies transmission and pathogenesis. Unlike 2019-nCoV and human SARS-CoV, most known bat SARSr-CoVs have two stretches of deletions in the spike receptor binding domain (RBD) when compared with that of human SARS-CoV. But some Yunnan strains such as the WIV1 had no such deletions and can use human ACE2 as a cellular entry receptor. It is interesting to note that the two bat SARS-related coronavirus ZXC21 and ZC45, being closest to 2019-nCoV, can infect suckling rats and cause inflammation in the brain tissue, and pathological changes in lung & intestine. However, these two viruses could not be isolated in Vero E6 cells and were not investigated further. The two retained deletion sites in the Spike genes of ZXC21 and ZC45 may lessen their likelihood of jumping species barriers imposed by receptor specificity. Figure 2. Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences. Figure 3. Comparison of protein sequences of A. Spike globular head S1, and B. S1 receptor-binding domain (RBD) subunit. Multiple alignment of Spike S1 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21, bat-SL-CoVZXC45, bat-SL-CoV-YNLF_31C, bat-SL-CoV-YNLF_34C and bat SL-CoV HKU3-1 (accession number MG772934.1 and MG772933.1, KP886808, KP886809 and DQ022305, respectively), human SARS coronavirus GZ02 and Tor2 (accession number AY390556 and AY274119, respectively) and Paguma SARS-CoV (accession number AY515512) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. Orange box indicates the region of signal peptide, while green and blue boxes indicate the core domain and receptor binding domain respectively. Sequences of RBD, highlighted in (A) were used for comparison. External subdomain variable region of 2019-nCoV HKU-SZ-005b was predicted by comparison of amino acid similarity and published structural analysis [17]. Purple box indicates the external subdomain region.
figure	Figure 2. Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences.
label	Figure 2.
caption	Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences.
p	Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences.
label	Figure 3.
caption	Comparison of protein sequences of A. Spike globular head S1, and B. S1 receptor-binding domain (RBD) subunit. Multiple alignment of Spike S1 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21, bat-SL-CoVZXC45, bat-SL-CoV-YNLF_31C, bat-SL-CoV-YNLF_34C and bat SL-CoV HKU3-1 (accession number MG772934.1 and MG772933.1, KP886808, KP886809 and DQ022305, respectively), human SARS coronavirus GZ02 and Tor2 (accession number AY390556 and AY274119, respectively) and Paguma SARS-CoV (accession number AY515512) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. Orange box indicates the region of signal peptide, while green and blue boxes indicate the core domain and receptor binding domain respectively. Sequences of RBD, highlighted in (A) were used for comparison. External subdomain variable region of 2019-nCoV HKU-SZ-005b was predicted by comparison of amino acid similarity and published structural analysis [17]. Purple box indicates the external subdomain region.
p	Comparison of protein sequences of A. Spike globular head S1, and B. S1 receptor-binding domain (RBD) subunit. Multiple alignment of Spike S1 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21, bat-SL-CoVZXC45, bat-SL-CoV-YNLF_31C, bat-SL-CoV-YNLF_34C and bat SL-CoV HKU3-1 (accession number MG772934.1 and MG772933.1, KP886808, KP886809 and DQ022305, respectively), human SARS coronavirus GZ02 and Tor2 (accession number AY390556 and AY274119, respectively) and Paguma SARS-CoV (accession number AY515512) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. Orange box indicates the region of signal peptide, while green and blue boxes indicate the core domain and receptor binding domain respectively. Sequences of RBD, highlighted in (A) were used for comparison. External subdomain variable region of 2019-nCoV HKU-SZ-005b was predicted by comparison of amino acid similarity and published structural analysis [17]. Purple box indicates the external subdomain region.
sec	Orf3b A novel short putative protein with 4 helices and no homology to existing SARS-CoV or SARS-r-CoV protein was found within Orf3b (Figure 4). It is notable that SARS-CoV deletion mutants lacking orf3b replicate to levels similar to those of wild-type virus in several cell types [19], suggesting that orf3b is dispensable for viral replication in vitro. But orf3b may have a role in viral pathogenicity as Vero E6 but not 293T cells transfected with a construct expressing Orf3b underwent necrosis as early as 6 h after transfection and underwent simultaneous necrosis and apoptosis at later time points [20]. Orf3b was also shown to inhibit expression of IFN-β at synthesis and signalling [21]. Subsequently, orf3b homologues identified from three bat SARS-related-CoV strains were C-terminally truncated and lacked the C-terminal nucleus localization signal of SARS-CoV [22]. IFN antagonist activity analysis demonstrated that one SARS-related-CoV orf3b still possessed IFN antagonist and IRF3-modulating activities. These results indicated that different orf3b proteins display different IFN antagonist activities and this function is independent of the protein's nuclear localization, suggesting a potential link between bat SARS-related-CoV orf3b function and pathogenesis. The importance of this new protein in 2019-nCoV will require further validation and study. Figure 4. Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b.
title	Orf3b
p	A novel short putative protein with 4 helices and no homology to existing SARS-CoV or SARS-r-CoV protein was found within Orf3b (Figure 4). It is notable that SARS-CoV deletion mutants lacking orf3b replicate to levels similar to those of wild-type virus in several cell types [19], suggesting that orf3b is dispensable for viral replication in vitro. But orf3b may have a role in viral pathogenicity as Vero E6 but not 293T cells transfected with a construct expressing Orf3b underwent necrosis as early as 6 h after transfection and underwent simultaneous necrosis and apoptosis at later time points [20]. Orf3b was also shown to inhibit expression of IFN-β at synthesis and signalling [21]. Subsequently, orf3b homologues identified from three bat SARS-related-CoV strains were C-terminally truncated and lacked the C-terminal nucleus localization signal of SARS-CoV [22]. IFN antagonist activity analysis demonstrated that one SARS-related-CoV orf3b still possessed IFN antagonist and IRF3-modulating activities. These results indicated that different orf3b proteins display different IFN antagonist activities and this function is independent of the protein's nuclear localization, suggesting a potential link between bat SARS-related-CoV orf3b function and pathogenesis. The importance of this new protein in 2019-nCoV will require further validation and study. Figure 4. Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b.
figure	Figure 4. Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b.
label	Figure 4.
caption	Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b.
p	Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b.
sec	Orf8 orf8 is an accessory protein found in the Betacoronavirus lineage B coronaviruses. Human SARS-CoVs isolated from early-phase patients, all civet SARS-CoVs, and other bat SARS-related CoVs contain full-length orf8 [23]. However, a 29-nucleotide deletion, which causes the split of full length of orf8 into putative orf8a and orf8b, has been found in all SARS-CoV isolated from mid- and late- phase human patients [24]. In addition, we have previously identified two bat SARS-related-CoV (Bat-CoV YNLF_31C and YNLF_34C) and proposed that the original SARS-CoV full-length orf8 is acquired from these two bat SARS-related-CoV [25]. Since the SARS-CoV is the closest human pathogenic virus to the 2019-nCoV, we performed phylogenetic analysis and multiple alignments to investigate the orf8 amino acid sequences. The orf8 protein sequences used in the analysis derived from early phase SARS-CoV that includes full-length orf8 (human SARS-CoV GZ02), the mid- and late-phase SARS-CoV that includes the split orf8b (human SARS-CoV Tor2), civet SARS-CoV (paguma SARS-CoV), two bat SARS-related-CoV containing full-length orf8 (bat-CoV YNLF_31C and YNLF_34C), 2019-nCoV, the other two closest bat SARS-related-CoV to 2019-nCoV SL-CoV ZXC21 and ZC45), and bat SARS-related-CoV HKU3-1 (Figure 5(A)). As expected, orf8 derived from 2019-nCoV belongs to the group that includes the closest genome sequences of bat SARS-related-CoV ZXC21 and ZC45. Interestingly, the new 2019-nCoV orf8 is distant from the conserved orf8 or orf8b derived from human SARS-CoV or its related viruses derived from civet (paguma SARS-CoV) and bat (bat-CoV YNLF_31C and YNLF_34C). This new orf8 of 2019-nCoV does not contain known functional domain or motif. An aggregation motif VLVVL (amino acid 75–79) has been found in SARS-CoV orf8b (Figure 5(B)) which was shown to trigger intracellular stress pathways and activates NLRP3 inflammasomes [26], but this is absent in this novel orf8 of 2019-nCoV. Based on a secondary structure prediction, this novel orf8 has a high possibility to form a protein with an alpha-helix, following with a beta-sheet(s) containing six strands (Figure 5(C)). Figure 5. Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively.
title	Orf8
p	orf8 is an accessory protein found in the Betacoronavirus lineage B coronaviruses. Human SARS-CoVs isolated from early-phase patients, all civet SARS-CoVs, and other bat SARS-related CoVs contain full-length orf8 [23]. However, a 29-nucleotide deletion, which causes the split of full length of orf8 into putative orf8a and orf8b, has been found in all SARS-CoV isolated from mid- and late- phase human patients [24]. In addition, we have previously identified two bat SARS-related-CoV (Bat-CoV YNLF_31C and YNLF_34C) and proposed that the original SARS-CoV full-length orf8 is acquired from these two bat SARS-related-CoV [25]. Since the SARS-CoV is the closest human pathogenic virus to the 2019-nCoV, we performed phylogenetic analysis and multiple alignments to investigate the orf8 amino acid sequences. The orf8 protein sequences used in the analysis derived from early phase SARS-CoV that includes full-length orf8 (human SARS-CoV GZ02), the mid- and late-phase SARS-CoV that includes the split orf8b (human SARS-CoV Tor2), civet SARS-CoV (paguma SARS-CoV), two bat SARS-related-CoV containing full-length orf8 (bat-CoV YNLF_31C and YNLF_34C), 2019-nCoV, the other two closest bat SARS-related-CoV to 2019-nCoV SL-CoV ZXC21 and ZC45), and bat SARS-related-CoV HKU3-1 (Figure 5(A)). As expected, orf8 derived from 2019-nCoV belongs to the group that includes the closest genome sequences of bat SARS-related-CoV ZXC21 and ZC45. Interestingly, the new 2019-nCoV orf8 is distant from the conserved orf8 or orf8b derived from human SARS-CoV or its related viruses derived from civet (paguma SARS-CoV) and bat (bat-CoV YNLF_31C and YNLF_34C). This new orf8 of 2019-nCoV does not contain known functional domain or motif. An aggregation motif VLVVL (amino acid 75–79) has been found in SARS-CoV orf8b (Figure 5(B)) which was shown to trigger intracellular stress pathways and activates NLRP3 inflammasomes [26], but this is absent in this novel orf8 of 2019-nCoV. Based on a secondary structure prediction, this novel orf8 has a high possibility to form a protein with an alpha-helix, following with a beta-sheet(s) containing six strands (Figure 5(C)). Figure 5. Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively.
figure	Figure 5. Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively.
label	Figure 5.
caption	Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively.
p	Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively.
sec	Phylogenetic relationship among 2019-nCoV and other βCoVs The genome of 2019-nCoV has overall 89% nucleotide identity with bat SARS-related-CoV SL-CoVZXC21 (MG772934.1), and 82% with human SARS-CoV BJ01 2003 (AY278488) and human SARS-CoV Tor2 (AY274119). The phylogenetic trees constructed using the amino acid sequences of orf1a/b and the 4 structural genes (S, E, M, and N) were shown (Figure 6(A–E)). For all these 5 genes, the 2019-nCoV was clustered with lineage B βCoVs. It was most closely related to the bat SARS-related CoVs ZXC21 and ZC45 found in Chinese horseshoe bats (Rhinolopus sinicus) collected from Zhoushan city, Zhejiang province, China between 2015 and 2017. Thus this novel coronavirus should belong to the genus Betacoronavirus, subgenus Sabecovirus (previously lineage 2b of Group 2 coronavirus). SARS-related coronaviruses have been found continuously especially in horseshoe bat species in the last 13 years. Between 2003 and 2018, 339 complete SARS-related coronavirus genomes have been sequenced, including 274 human SARS-CoV, 18 civet SARS coronavirus, and 47 bat SARS-related coronaviruses mainly from Rhinolophus bat species. Together, they formed a distinct subclade among other lineage B βCoVs. These results suggested that the 2019-nCoV might have also originated from bats. But we cannot ascertain whether another intermediate or amplification animal host infected by 2019-nCoV could be found in the epidemiological market, just as in the case of Paguma civets for SARS-CoV. Figure 6. Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein.
title	Phylogenetic relationship among 2019-nCoV and other βCoVs
p	The genome of 2019-nCoV has overall 89% nucleotide identity with bat SARS-related-CoV SL-CoVZXC21 (MG772934.1), and 82% with human SARS-CoV BJ01 2003 (AY278488) and human SARS-CoV Tor2 (AY274119). The phylogenetic trees constructed using the amino acid sequences of orf1a/b and the 4 structural genes (S, E, M, and N) were shown (Figure 6(A–E)). For all these 5 genes, the 2019-nCoV was clustered with lineage B βCoVs. It was most closely related to the bat SARS-related CoVs ZXC21 and ZC45 found in Chinese horseshoe bats (Rhinolopus sinicus) collected from Zhoushan city, Zhejiang province, China between 2015 and 2017. Thus this novel coronavirus should belong to the genus Betacoronavirus, subgenus Sabecovirus (previously lineage 2b of Group 2 coronavirus). SARS-related coronaviruses have been found continuously especially in horseshoe bat species in the last 13 years. Between 2003 and 2018, 339 complete SARS-related coronavirus genomes have been sequenced, including 274 human SARS-CoV, 18 civet SARS coronavirus, and 47 bat SARS-related coronaviruses mainly from Rhinolophus bat species. Together, they formed a distinct subclade among other lineage B βCoVs. These results suggested that the 2019-nCoV might have also originated from bats. But we cannot ascertain whether another intermediate or amplification animal host infected by 2019-nCoV could be found in the epidemiological market, just as in the case of Paguma civets for SARS-CoV. Figure 6. Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein.
label	Figure 6.
caption	Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein.
p	Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein.
sec	RNA secondary structures As shown in Figure 7(A–C), the SARS-CoV 5′-UTR contains SL1, SL2, SL3, SL4, S5, SL5A, SL5B, SL5C, SL6, SL7, and SL8. The SL3 contains trans–cis motif [27]. The SL1, SL2, SL3, SL4, S5, SL5A, SL5B, and SL5C structures were similar among the 2019-nCoV, human SARS-CoV and the bat SARS-related ZC45. In the 2019-nCoV, part of the S5 found was inside the orf1a/b (marked in red), which was similar to SARS-CoV. In bat SARS-related CoV ZC45, the S5 was not found inside orf1a/b. The 2019-nCoV had the same SL6, SL7, and SL8 as SARS-CoV, and an additional stem loop. Bat SARS-related CoV ZC45 did not have the SARS-COV SL6-like stem loop. Instead, it possessed two other stem loops in this region. All three strains had similar SL7 and SL8. The bat SARS-like CoV ZC45 also had an additional stem loop between SL7 and SL8. Overall, the 5′-UTR of 2019-nCoV was more similar to that of SARS-CoV than the bat SARS-related CoV ZC 45. The biological relevance and effects of virulence of the 5′-UTR structures should be investigated further. The 2019-nCoV had various 3′-UTR structures, including BSL, S1, S2, S3, S4, L1, L2, L3, and HVR (Figure 7(D–F)). The 3′-UTR was conserved among 2019-nCoV, human SARS-CoV and SARS-related CoVs [27]. Figure 7. Secondary structure prediction and comparison in the 5′-untranslated region (UTR) and 3′-UTR using the RNAfold WebServer (with minimum free energy and partition function in Fold algorithms and basic options. The SARS 5′- and 3′- UTR was used as a reference to adjust the prediction results.(A) SARS-CoV 5'-UTR; (B) 2019-nCoV (HKU-SZ-005b) 5'-UTR; (C) ZC45 5'-UTR; (D) SARS-CoV 3'-UTR; (E) 2019-nCoV (HKU-SZ-005b) 3'-UTR; (F) ZC45 3'-UTR. In summary, 2019-nCoV is a novel lineage B Betacoronavirus closely related to bat SARS-related coronaviruses. It also has unique genomic features which deserves further investigation to ascertain their roles in viral replication cycle and pathogenesis. More animal sampling to determine its natural animal reservoir and intermediate animal host in the market is important. This will shed light on the evolutionary history of this emerging coronavirus which has jumped into human after the other two zoonotic Betacoroanviruses, SARS-CoV and MERS-CoV.
title	RNA secondary structures
p	As shown in Figure 7(A–C), the SARS-CoV 5′-UTR contains SL1, SL2, SL3, SL4, S5, SL5A, SL5B, SL5C, SL6, SL7, and SL8. The SL3 contains trans–cis motif [27]. The SL1, SL2, SL3, SL4, S5, SL5A, SL5B, and SL5C structures were similar among the 2019-nCoV, human SARS-CoV and the bat SARS-related ZC45. In the 2019-nCoV, part of the S5 found was inside the orf1a/b (marked in red), which was similar to SARS-CoV. In bat SARS-related CoV ZC45, the S5 was not found inside orf1a/b. The 2019-nCoV had the same SL6, SL7, and SL8 as SARS-CoV, and an additional stem loop. Bat SARS-related CoV ZC45 did not have the SARS-COV SL6-like stem loop. Instead, it possessed two other stem loops in this region. All three strains had similar SL7 and SL8. The bat SARS-like CoV ZC45 also had an additional stem loop between SL7 and SL8. Overall, the 5′-UTR of 2019-nCoV was more similar to that of SARS-CoV than the bat SARS-related CoV ZC 45. The biological relevance and effects of virulence of the 5′-UTR structures should be investigated further. The 2019-nCoV had various 3′-UTR structures, including BSL, S1, S2, S3, S4, L1, L2, L3, and HVR (Figure 7(D–F)). The 3′-UTR was conserved among 2019-nCoV, human SARS-CoV and SARS-related CoVs [27]. Figure 7. Secondary structure prediction and comparison in the 5′-untranslated region (UTR) and 3′-UTR using the RNAfold WebServer (with minimum free energy and partition function in Fold algorithms and basic options. The SARS 5′- and 3′- UTR was used as a reference to adjust the prediction results.(A) SARS-CoV 5'-UTR; (B) 2019-nCoV (HKU-SZ-005b) 5'-UTR; (C) ZC45 5'-UTR; (D) SARS-CoV 3'-UTR; (E) 2019-nCoV (HKU-SZ-005b) 3'-UTR; (F) ZC45 3'-UTR.
label	Figure 7.
caption	Secondary structure prediction and comparison in the 5′-untranslated region (UTR) and 3′-UTR using the RNAfold WebServer (with minimum free energy and partition function in Fold algorithms and basic options. The SARS 5′- and 3′- UTR was used as a reference to adjust the prediction results.(A) SARS-CoV 5'-UTR; (B) 2019-nCoV (HKU-SZ-005b) 5'-UTR; (C) ZC45 5'-UTR; (D) SARS-CoV 3'-UTR; (E) 2019-nCoV (HKU-SZ-005b) 3'-UTR; (F) ZC45 3'-UTR.
p	Secondary structure prediction and comparison in the 5′-untranslated region (UTR) and 3′-UTR using the RNAfold WebServer (with minimum free energy and partition function in Fold algorithms and basic options. The SARS 5′- and 3′- UTR was used as a reference to adjust the prediction results.(A) SARS-CoV 5'-UTR; (B) 2019-nCoV (HKU-SZ-005b) 5'-UTR; (C) ZC45 5'-UTR; (D) SARS-CoV 3'-UTR; (E) 2019-nCoV (HKU-SZ-005b) 3'-UTR; (F) ZC45 3'-UTR.
p	In summary, 2019-nCoV is a novel lineage B Betacoronavirus closely related to bat SARS-related coronaviruses. It also has unique genomic features which deserves further investigation to ascertain their roles in viral replication cycle and pathogenesis. More animal sampling to determine its natural animal reservoir and intermediate animal host in the market is important. This will shed light on the evolutionary history of this emerging coronavirus which has jumped into human after the other two zoonotic Betacoroanviruses, SARS-CoV and MERS-CoV.
back	Acknowledgements The funding sources had no role in the study design, data collection, analysis, interpretation, or writing of the report. Disclosure statement No potential conflict of interest was reported by the author(s). ORCID Jasper Fuk-Woo Chan http://orcid.org/0000-0001-6336-6657 Kin-Hang Kok http://orcid.org/0000-0003-3426-332X
sec	Acknowledgements The funding sources had no role in the study design, data collection, analysis, interpretation, or writing of the report.
title	Acknowledgements
p	The funding sources had no role in the study design, data collection, analysis, interpretation, or writing of the report.
sec	Disclosure statement No potential conflict of interest was reported by the author(s).
title	Disclosure statement
p	No potential conflict of interest was reported by the author(s).
sec	ORCID Jasper Fuk-Woo Chan http://orcid.org/0000-0001-6336-6657 Kin-Hang Kok http://orcid.org/0000-0003-3426-332X
title	ORCID
p	Jasper Fuk-Woo Chan http://orcid.org/0000-0001-6336-6657
p	Kin-Hang Kok http://orcid.org/0000-0003-3426-332X

Annnotations TAB TSV DIC JSON TextAE

last updated at 2021-06-01 21:10:12 UTC

Denotations: 0
Blocks: 0
Relations: 0

PMC:7067204 / 6364-6372 JSONTXT

Document structure show

Annnotations TAB TSV DIC JSON TextAE

PMC:7067204 / 6364-6372 JSON TXT