CORD-19:fd14ed7c073b7ff03afa517e9c0fd1e849878252 JSONTXT 8 Projects

Direct RNA sequencing and early evolution of SARS-CoV-2 2 3 Abstract The rapid sharing of sequence information as seen throughout the current SARS-CoV-2 25 epidemic, represents an inflection point for genomic epidemiology. Here we describe 26 aspects of coronavirus evolutionary genetics revealed from these data, and provide the first 27 direct RNA sequence of SARS-CoV-2, detailing coronaviral subgenome-length mRNA 28 architecture. The ongoing epidemic of 2019 novel coronavirus (now called SARS-CoV-2, causing the 31 disease COVID-19), which originated in Wuhan, China, has been declared a public health 32 emergency of international concern by the World Health Organisation (WHO) [1] [2][3][4]. SARS- CoV-2 is a positive-sense single-stranded RNA ((+)ssRNA) virus of the Coronaviridae family, 34 with related Betacoronaviruses capable of infecting mammalian and avian hosts, resulting in 35 author/funder. All rights reserved. No reuse allowed without permission. illness in humans such as Middle East respiratory syndrome (MERS) and the original severe 36 acute respiratory syndrome (SARS) [2, [5] [6] . Based on limited sampling of potential reservoir 37 species, SARS-CoV-2 has been found to be most similar to bat coronaviruses at the 38 genomic level, potentially indicating that bats are its natural reservoir [7] [8] . Following the emergence of SARS-CoV-2, genomic analyses have played a key role in the 41 public health response by informing the design of appropriate molecular diagnostics and 42 corroborating epidemiological efforts to trace contacts [8] [9] [10] . Taken together, publicly 43 available sequence data suggest a recently occurring, point-source outbreak, as described 44 in online sources [10] [11] [12] . Aspects of the response make the assumption that the genetics of 45 SARS-CoV-2, including mechanisms of gene expression and molecular evolutionary rates, 46 are comparable with previously characterised coronaviruses [11] [12] . It remains highly 47 relevant to validate these assumptions experimentally with SARS-CoV-2-specific data, with 48 the potential to reveal further insights into the biology of this emergent pathogen. To address 49 this, we describe (i) the architecture of the coronaviral subgenome-length mRNAs, and (ii) 50 phylogenetic approaches able to provide robust estimates of coronaviral evolutionary rates 51 and timescales at this early stage of the outbreak. Characterised coronaviral species produce a nested set of polyadenylated subgenome-54 length mRNA transcripts through a mechanism termed discontinuous extension of minus 55 strands that yields mRNA transcripts of different length. The discontinuous transcription 56 mechanism repositions the 5′ leader sequence upstream of consecutive viral open-reading 57 frames (ORF) where each translation start site becomes located at the primary position for 58 ribosome scanning (Figure 1a ). Subgenome-length mRNAs have a common 5′ leader 59 sequence, near-identical to that located in the 5′-UTR of the viral genome, with the genome-60 length RNAs also having an mRNA function [13] [14] To define the architecture of the coronaviral subgenome-length mRNAs, a recently 67 established direct RNA sequencing approach was used, based on a highly parallel array of 68 nanopores [16] . In brief, nucleic acids were prepared from culture material with high levels of 69 SARS-CoV-2 growth, and sequenced with use of poly(T) adaptors and an R9.4 flowcell on a 70 author/funder. All rights reserved. No reuse allowed without permission. Aligning to the genome of the cultured SARS-CoV-2 isolate (MT007544.1), a subset of reads 76 were attributed to coronaviruses sequence (28.9%), comprising 367Mb of sequence 77 distributed across the 29,893 base genome. Of these, a number had lengths >20,000 bases, 104 105 author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10. 1101 /2020 In addition to methylation at the 5' cap structure and 3' polyadenylation needed for efficient Table 2 ). The sampling times were sufficient to calibrate a molecular clock 126 and infer the evolutionary rate and timescale of the outbreak; the evolutionary rate of SARS- CoV-2 was estimated to be 1.16 × 10 -3 substitutions/site/year (95% HPD 6.32×10 -4 - The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.03.05.976167 doi: bioRxiv preprint Coverage statistics were determined from the resulting read alignments. To identify intact subgenome-length mRNAs, reads were aligned to a 62 base SARS-COV-2 299 leader sequence (5'ACCUUCCCAGGUAACAAACCAACCAACUUUCGAUCUCUUGUAGAU 300 CUGUUCUCUAAACGAAC), with reads aligning to the leader sequence being pooled and 301 visualized in a length histogram. Significant peaks were identified visually and confirmed 302 with a smoothed z-score algorithm. Reads captured in this binning-by-length strategy were 303 re-aligned to the reference genome using the above methods and visualized in Tablet The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.03.05.976167 doi: bioRxiv preprint

Annnotations TAB TSV DIC JSON TextAE

  • Denotations: 0
  • Blocks: 0
  • Relations: 0