CORD-19:cf1631f486167031d3429ac3e81d2bb99f33f15e JSONTXT 8 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
TextSentencer_T1 0-61 Sentence denotes Modular organization of SARS coronavirus nucleocapsid protein
TextSentencer_T2 63-71 Sentence denotes Abstract
TextSentencer_T3 72-166 Sentence denotes The SARS-CoV nucleocapsid (N) protein is a major antigen in severe acute respiratory syndrome.
TextSentencer_T4 167-237 Sentence denotes It binds to the viral RNA genome and forms the ribonucleoprotein core.
TextSentencer_T5 238-353 Sentence denotes The SARS-CoV N protein has also been suggested to be involved in other important functions in the viral life cycle.
TextSentencer_T6 354-591 Sentence denotes Here we show that the N protein consists of two non-interacting structural domains, the N-terminal RNA-binding domain (RBD) (residues 45-181) and the C-terminal dimerization domain (residues 248-365) (DD), surrounded by flexible linkers.
TextSentencer_T7 592-656 Sentence denotes The C-terminal domain exists exclusively as a dimer in solution.
TextSentencer_T8 657-793 Sentence denotes The flexible linkers are intrinsically disordered and represent potential interaction sites with other protein and protein-RNA partners.
TextSentencer_T9 794-892 Sentence denotes Bioinformatics reveal that other coronavirus N proteins could share the same modular organization.
TextSentencer_T10 893-1093 Sentence denotes This study provides information on the domain structure partition of SARS-CoV N protein and insights into the differing roles of structured and disordered regions in coronavirus nucleocapsid proteins.
TextSentencer_T11 1095-1246 Sentence denotes Coronaviruses are the causative agents of a number of mammalian diseases which often have significant economic and health-related consequences [1, 2] .
TextSentencer_T12 1247-1415 Sentence denotes Diseases such as transmissible gastroenteritis in pigs and avian infectious bronchitis in chicken often have great impact on the agricultural industry of a nation [3] .
TextSentencer_T13 1416-1517 Sentence denotes In humans, coronaviruses are often associated with mild respiratory illnesses, including common cold.
TextSentencer_T14 1518-1671 Sentence denotes However, a novel coronavirus has been identified as the etiology agent of severe acute respiratory syndrome (SARS), which has a case fatality rate of ca.
TextSentencer_T15 1672-1680 Sentence denotes 8% [4] .
TextSentencer_T16 1681-1820 Sentence denotes Sequence analysis reveals that SARS-CoV represents either a new coronavirus group or an outliner of group 2 coronaviruses [5] [6] [7] [8] .
TextSentencer_T17 1821-2017 Sentence denotes The SARS CoV genome contains five major open reading frames that encode the replicase polyprotein, the spike protein (S), envelope (E), membrane glycoprotein (M), and the nucleocapsid protein (N).
TextSentencer_T18 2018-2099 Sentence denotes SARS-CoV is an enveloped virus with S, M and E proteins as the envelope proteins.
TextSentencer_T19 2100-2214 Sentence denotes The N protein binds to the viral RNA genome and forms the ribonucleoprotein core, which is presumed to be helical.
TextSentencer_T20 2215-2326 Sentence denotes The M protein may also be involved in the formation of the nucleocapsid through interaction with the N protein.
TextSentencer_T21 2327-2470 Sentence denotes Upon infection, the N protein enters the host cell with the ribonucleoprotein core and is able to interact with a number of host proteins [9] .
TextSentencer_T22 2471-2638 Sentence denotes The high abundance of the N protein makes it a major antigen, an attribute which has often been used in the development of rapid-diagnosis kits against SARS [10, 11] .
TextSentencer_T23 2639-2773 Sentence denotes The nucleocapsid protein is a 422 amino-acid protein, sharing only 20-30% homology with the N proteins of other coronaviruses [6, 7] .
TextSentencer_T24 2774-2955 Sentence denotes From genetic and bioinformatics studies, the N protein can be divided into three putative regions: an Nterminal domain, a RNA-binding domain (RBD) and a C-terminal domain [12, 13] .
TextSentencer_T25 2956-3047 Sentence denotes The N-and Cterminal domains are believed to play a role in interaction with other proteins.
TextSentencer_T26 3048-3211 Sentence denotes A number of recent studies have shown that part of the C-terminus in the N protein of SARS-CoV is involved in the oligomerization process of the protein [14, 15] .
TextSentencer_T27 3212-3484 Sentence denotes Rather surprising, the mid-portion of the protein has been shown to interact with the M protein and hnRNP A1 [16, 17] , and structural studies have identified the region between amino acids 45-181 as the putative RNA-binding region, which is close to the N-terminus [18] .
TextSentencer_T28 3485-3636 Sentence denotes These discrepancies from the putative domain partition necessitate the determination of both the functional and structural organization of the protein.
TextSentencer_T29 3637-3747 Sentence denotes However, the structural organization of coronavirus N proteins in general remains largely unknown to this day.
TextSentencer_T30 3748-3888 Sentence denotes We have employed a blend of experimental techniques and bioinformatics analyses to define the structural organization of SARS-CoV N protein.
TextSentencer_T31 3889-4062 Sentence denotes Through the power of nuclear magnetic resonance (NMR) spectroscopy, we present the first evidence that the SARS-CoV N protein consists of two independent structural domains.
TextSentencer_T32 4063-4161 Sentence denotes The first domain lies inside the putative RNAbinding domain identified in a previous report [18] .
TextSentencer_T33 4162-4268 Sentence denotes The second domain lies in the C-terminal half of the protein and is capable of forming dimers in solution.
TextSentencer_T34 4269-4406 Sentence denotes The rest of the protein is highly accessible to the solvent, and bioinformatics analysis predicts that they are intrinsically disordered.
TextSentencer_T35 4407-4537 Sentence denotes Other coronavirus N proteins share similar features of SARS-CoV N protein at the sequence level, implying functional significance.
TextSentencer_T36 4538-4770 Sentence denotes The elucidation of the modular organization of the SARS-CoV N protein, particularly the boundary between disordered and structured regions, facilitates future studies of this class of proteins at the functional and structural level.
TextSentencer_T37 4771-4839 Sentence denotes Sequence alignment, secondary structure and orderdisorder prediction
TextSentencer_T38 4840-5053 Sentence denotes The full-length sequences of SARS and other coronavirus N proteins were aligned using CLU-STALW version 1.83 with the slow algorithm, an identity matrix, a window of 4 amino acids and standard gap penalties [19] .
TextSentencer_T39 5054-5170 Sentence denotes The result was then edited with SeaView based on the position of the known structural domains of SARS-CoV N protein.
TextSentencer_T40 5171-5237 Sentence denotes The JPred server [20] was used for secondary structure prediction.
TextSentencer_T41 5238-5444 Sentence denotes Order-disorder prediction was obtained through sequence submission to the PONDR server (http://www.pondr.com) using the predictor VSL1, which is an implementation of the IST-Zoran predictor [21] [22] [23] .
TextSentencer_T42 5445-5521 Sentence denotes Access to PONDRÒ was provided by Molecular Kinetics (Indianapolis, IN, USA).
TextSentencer_T43 5522-5713 Sentence denotes We cloned fragments spanning the different ordered and disordered regions of the SARS-CoV N protein ( Figure 1B ) based on PONDR information ( Figure 1a ) and reports in the literature [18] .
TextSentencer_T44 5714-5790 Sentence denotes SARS-CoV TW1 strain cDNA sequencing clones were kindly provided to us by Dr.
TextSentencer_T45 5791-5796 Sentence denotes P.-J.
TextSentencer_T46 5797-5847 Sentence denotes Chen of National Taiwan University Hospital [24] .
TextSentencer_T47 5848-6008 Sentence denotes Clones for SARS-CoV N protein fragments were obtained by polymerase chain reaction (PCR) on a RoboCycler Gradient 96 (Stratagene, CA) using appropriate primers.
TextSentencer_T48 6009-6101 Sentence denotes The resulting PCR fragments contained an NcoI site at one end and a BamHI site at the other.
TextSentencer_T49 6102-6203 Sentence denotes After restriction enzyme digestion, the resulting fragments were cloned into pET6H (a gift from Prof.
TextSentencer_T50 6204-6209 Sentence denotes J.-J.
TextSentencer_T51 6210-6289 Sentence denotes Lin, National Yang Ming University, Taiwan) containing a His-tag coding region.
TextSentencer_T52 6290-6438 Sentence denotes Full-length SARS-CoV N protein construct was obtained by sequential ligation of the cloned PCR fragments using appropriate restriction enzyme sites.
TextSentencer_T53 6439-6504 Sentence denotes The sequences of all constructs were confirmed by DNA sequencing.
TextSentencer_T54 6505-6596 Sentence denotes The resultant protein fragments all include an extra MHHHHHHAMG sequence at the N-terminus.
TextSentencer_T55 6597-6750 Sentence denotes For biochemical studies, the SARS-CoV N protein clones were expressed in Escherichia coli BL21(DE3) strain in Luria broth media using standard protocols.
TextSentencer_T56 6751-6922 Sentence denotes To prepare samples suitable for NMR studies, the cells were cultured in standard M9 media supplemented with 15 NH 4 Cl (1 g/l) and 15 N-Isogro (0.5 g/l) (Isotec, OH, USA).
TextSentencer_T57 6923-7124 Sentence denotes The cells were then broken with a microfluidizer and the protein purified through a Ni-NTA affinity column (Qiagen, CA, USA) in buffer (50 mM sodium phosphate, 150 mM NaCl, pH 7.4) containing 7 M urea.
TextSentencer_T58 7125-7335 Sentence denotes The protein was then allowed to refold by gradually lowering the denaturant concentration through dialysis in liquid chromatography buffer (50 mM sodium phosphate, 150 mM NaCl, 1 mM EDTA, 0.01% NaN 3 , pH 7.4).
TextSentencer_T59 7336-7525 Sentence denotes Renatured protein was loaded onto an AKTA-EXPLORER fast performance liquid chromatography (FPLC) system equipped with a HiLoad 16/60 Superdex 75 column (Amersham Pharmacia Biotech, Sweden).
TextSentencer_T60 7526-7614 Sentence denotes Complete Protease Inhibitor cocktail (Roche, Germany) was added to the purified protein.
TextSentencer_T61 7615-7748 Sentence denotes Protein concentration was determined with the Bio-Rad Protein Assay kit as per instructions from the manufacturer (Bio-Rad, CA, USA).
TextSentencer_T62 7749-7841 Sentence denotes The correct molecular weights of the expressed proteins were confirmed by mass spectroscopy.
TextSentencer_T63 7842-8005 Sentence denotes The experiments were conducted using a FPLC System (Pharmacia Biotech, Sweden) with a Hi-Load 16/60 Superdex 75 (prep grade) column at an elution rate of 1 ml/min.
TextSentencer_T64 8006-8154 Sentence denotes The molecular weights of the proteins were estimated from the elution profile calibrated with the LMW Gel Filtration Calibration Kit (Amersham, UK).
TextSentencer_T65 8156-8344 Sentence denotes The homo-bifunctional amine cross-linker disuccinimidyl suberate was purchased from Sigma-Aldrich (MO, USA) and was dissolved in N,N-dimethylformamide (DMF) to a concentration of 25 mg/ml.
TextSentencer_T66 8345-8474 Sentence denotes Reactions were carried out in a final protein concentration of 0.35 mM and a final disuccinimidyl suberate concentration of 5 mM.
TextSentencer_T67 8475-8596 Sentence denotes Mock reactions were set up as controls which contained only the protein solution and DMF without disuccinimidyl suberate.
TextSentencer_T68 8597-8736 Sentence denotes The reaction mixtures in standard buffer were allowed to react for 1 h at 4°C prior to quenching with 100 mM glycine (final concentration).
TextSentencer_T69 8737-8818 Sentence denotes The results were visualized on SDS-PhastGel minigels (Pharmacia Biotech, Sweden).
TextSentencer_T70 8819-8966 Sentence denotes Sedimentation velocity studies were carried out with a Beckman-Coulter XL-A analytical ultracentrifuge with an An60Ti rotor at 20°C and 40,000 rpm.
TextSentencer_T71 8967-9111 Sentence denotes Protein samples were diluted to 0.40-0.75 mg/ml and loaded into standard double sector cells with aluminum or Epon charcoal-filled centerpieces.
TextSentencer_T72 9112-9217 Sentence denotes The UV absorption of the cells was scanned at 280 nm in continuous mode every 10 min for a period of 5 h.
TextSentencer_T73 9218-9266 Sentence denotes The data were analyzed with Sedfit version 8.9d.
TextSentencer_T74 9267-9430 Sentence denotes Collections of 10-15 radial scans were used for analysis, and 200 sedimentation coefficients between 2 and 10 S were employed in calculating the c(S) distribution.
TextSentencer_T75 9431-9549 Sentence denotes The positions of the meniscus and cell bottom were determined by visual inspection, and then refined in the final fit.
TextSentencer_T76 9550-9713 Sentence denotes The partial specific volumes for N45-181, N245-365 and N45-365 were calculated from the amino acid compositions to be 0.7192, 0.7244 and 0.7198 ml/g, respectively.
TextSentencer_T77 9714-9791 Sentence denotes The solvent density and viscosity were calculated with Sednterp version 1.08.
TextSentencer_T78 9792-9910 Sentence denotes All samples were visually checked for clarity after ultracentrifugation, and no indication of precipitation was found.
TextSentencer_T79 9911-10236 Sentence denotes NMR spectroscopy 15 N-labeled protein samples were extensively exchanged with NMR buffer (100 mM sodium phosphate buffer, pH 6.0, containing 50 mM NaCl, 1 mM EDTA, 1 mM 2,2-dimethyl-2-silapentane-5-sulfonate, 0.01% NaN 3 , 10% D 2 O and Complete Protease Inhibitor cocktail) using an Amicon-15 concentrator (Amicon, MA, USA).
TextSentencer_T80 10237-10359 Sentence denotes The final concentrations of the samples were between 0.2 and 3 mM, depending on the solubility of the different fragments.
TextSentencer_T81 10360-10559 Sentence denotes All the NMR data were acquired at 27 and 30°C on 500, 600 or 800 MHz Bruker AVANCE spectrometers equipped with a triple resonance ( 1 H, 13 C and 15 N) TXI probe with an actively shielded Z-gradient.
TextSentencer_T82 10560-10627 Sentence denotes Experimental parameters were set as described previously [25, 26] .
TextSentencer_T83 10628-10762 Sentence denotes CLEANEX-PM spectra, which only show resonances exchanging rapidly with the solvent (k ex >2 Hz), were obtained as described [27, 28] .
TextSentencer_T84 10763-10865 Sentence denotes Data were processed with the XWINNMR suite and AURELIA software (Bruker, Germany) on SGI workstations.
TextSentencer_T85 10866-10955 Sentence denotes The 1 H chemical shift was referenced to 2,2-dimethyl-2-silapentane-5-sulfonate at 0 ppm.
TextSentencer_T86 10956-11043 Sentence denotes The 15 N was referenced using the consensus ratio N of 0.101329118 for 15 N/ 1 H [29] .
TextSentencer_T87 11044-11163 Sentence denotes A series of N protein fragments spanning different regions were constructed based on the PONDR prediction ( Figure 1 ).
TextSentencer_T88 11164-11305 Sentence denotes We used a series of 15 N-HSQC spectra of these fragments to define the position of the structural domains of SARS-CoV N protein ( Figure 2 ).
TextSentencer_T89 11306-11487 Sentence denotes NMR chemical shifts of amide resonances are sensitive to structural changes and the pattern of 15 N-HSQC spectrum has been commonly used to monitor order-disorder of proteins [30] .
TextSentencer_T90 11488-11677 Sentence denotes Well-dispersed spectra are indicative of structured protein whilst congested spectra having resonances clustered around a small region of 8.3±0.5 ppm in the proton dimension are disordered.
TextSentencer_T91 11678-11838 Sentence denotes We observed that the resonances from residues N45-181 have good chemical shift dispersion (Figure 2a) , indicating that the fragment has a structured character.
TextSentencer_T92 11839-11982 Sentence denotes The spectrum of N1-181 is a superposition of well-dispersed resonances and a cluster of overlapping resonances around 8.3±0.4 ppm (Figure 2b ).
TextSentencer_T93 11983-12153 Sentence denotes Comparing the spectra of N1-181 and N45-181 revealed that all resonances belonging to N45-181 were present in the spectrum of N1-181 with no change in resonance position.
TextSentencer_T94 12154-12290 Sentence denotes These results indicate that the N-terminal flanking region between amino acids 1-44 does not affect the structure of the N45-181 domain.
TextSentencer_T95 12291-12423 Sentence denotes To assess the structure of the C-terminal region several C-terminal fragments were prepared for the collection of 15 N-HSQC spectra.
TextSentencer_T96 12424-12551 Sentence denotes We found that the resonances from N248-365 are welldispersed (Figure 2c ), suggesting that N248-365 forms an ordered structure.
TextSentencer_T97 12552-12652 Sentence denotes To define the structural boundaries we constructed fragments containing N-and C-terminal extensions.
TextSentencer_T98 12653-12734 Sentence denotes Figure 2d shows the 15 N-HSQC spectrum of uniformly 15 N-labeled N248-422 sample.
TextSentencer_T99 12735-12883 Sentence denotes Comparing the spectrum of N248-422 with that of N248-365 ( Figure 2c ) we found that all resonances due to N248-365 can be identified in Figure 2d .
TextSentencer_T100 12884-12988 Sentence denotes These results indicate that residues from 365 to the C-terminal do not affect the structure of N248-365.
TextSentencer_T101 12989-13195 Sentence denotes Shortening the fragment to span amino acids 274-365 changes the 15 N-HSQC resonance pattern, which indicates that the 248-273 region is important for structure stabilization of this domain (data not shown).
TextSentencer_T102 13196-13546 Sentence denotes To explore the structure of the region between residues 182-247 and their effect on the structure of N45-181 and N248-365, we constructed the fragment N45-365 which contains the two struc- The lack of resonance perturbation when the two domains are linked together suggests that interaction between these two domains is weak, if they interact at all.
TextSentencer_T103 13547-13651 Sentence denotes Our results conclude that SARS-CoV N protein contains two independent structural domains located at a.a.
TextSentencer_T104 13652-13671 Sentence denotes 45-181 and 248-365.
TextSentencer_T105 13672-13723 Sentence denotes These results are consistent with PONDR prediction.
TextSentencer_T106 13724-13891 Sentence denotes PONDR predicts three intrinsically disordered regions in SARS-CoV N protein located at the N-terminus, the C-terminus and between the two ordered regions (Figure 1b) .
TextSentencer_T107 13892-14065 Sentence denotes We also observed additional resonances clustered around 8.3±0.5 ppm in the proton dimension whenever the fragment was extended beyond the two structural domains (Figure 2 ).
TextSentencer_T108 14066-14233 Sentence denotes To test whether the residues beyond the structural domains are truly disordered, we employed the CLEANEX-PM experiment to identify solvent-accessible resonances [27] .
TextSentencer_T109 14234-14356 Sentence denotes The 15 N-HSQC spectrum obtained with CLEANEX-PM pulse sequence contains only resonances from solvent-exposed amide groups.
TextSentencer_T110 14357-14530 Sentence denotes When we compared the CLEA-NEX-PM spectrum of N1-181 (Figure 3b) with that of N45-181 (Figure 3a) , we observed 40 resonances that only appeared in N1-181 but not in N45-181.
TextSentencer_T111 14531-14709 Sentence denotes This number agrees with that expected for the N-terminal region (5 prolines), indicating that all amide protons in the Nterminus of SARS-CoV N protein are exposed to the solvent.
TextSentencer_T112 14710-14959 Sentence denotes We counted 39 additional peaks in the CLEANEX-PM spectrum of N248-422 (Figure 3d ) compared to that of N248-365 ( Figure 3c ) (51 expected since there are 6 prolines), suggesting that the majority of the C-terminal residues are also solvent-exposed.
TextSentencer_T113 14960-15156 Sentence denotes When we compared the CLEANEX-PM spectra of N45-181 (Figure 3a) , N248-365 ( Figure 3c ) and N45-365 (Figure 3f) , we observed the extra resonances representing the region between residues 182-247.
TextSentencer_T114 15157-15342 Sentence denotes A total of 27 additional peaks can be resolved, compared to 64 expected (2 prolines), indicating that about half of the linker region between residues 182-247 is exposed to the solvent.
TextSentencer_T115 15343-15498 Sentence denotes It should be noted here that due to resonance overlapping the numbers counted should be viewed as a lower limit for the number of solvent-exposed residues.
TextSentencer_T116 15499-15717 Sentence denotes Nevertheless we can conclude that all N-terminal residues are solvent exposed whilst most of the residues in the Cterminus and in the linker region between the two structural domains are exposed to the solvent as well.
TextSentencer_T117 15718-15934 Sentence denotes In conjunction with the observation that all additional resonances are observed in between 8.3±0.5 ppm in the proton dimension and PONDR results, we conclude that amino acids 1-44, 182-247 and 366-422 are disordered.
TextSentencer_T118 15935-16089 Sentence denotes The long disordered linker between the two structural domains is consistent with the observation that there is little interaction between the two domains.
TextSentencer_T119 16090-16343 Sentence denotes However, the number of counted peaks in the CLEANEX-PM spectra of the Cterminus and the linker region are less than that expected, so it is likely that parts of these regions are solvent-protected, possibly through the formation of transient structures.
TextSentencer_T120 16344-16505 Sentence denotes Attempt to obtain a spectrum of the linker region alone was unsuccessful due to the extremely poor protein expression of the clone harboring the linker sequence.
TextSentencer_T121 16506-16559 Sentence denotes N45-181 has been identified as an RNA-binding domain.
TextSentencer_T122 16560-16722 Sentence denotes The function of the N248-365 is not clear, but many reports have identified the C-terminal half of SARS-CoV N protein to be involved in oligomerization [14, 15] .
TextSentencer_T123 16723-16931 Sentence denotes To test this possibility, we have applied analytical gel-filtration chromatography, chemical cross-linking and analytical ultracentrifugation to assay the self-association property of the N protein fragments.
TextSentencer_T124 16932-17128 Sentence denotes As shown in Figure 4a , N45-181 elutes out at a molecular weight of 18 kDa and N248-365 elutes out as a 28-kDa molecule, suggesting that N45-181 exists as a monomer and N248-365 exists as a dimer.
TextSentencer_T125 17129-17249 Sentence denotes The self-association between the two N248-365 monomers is very strong, since we could not detect any monomeric fraction.
TextSentencer_T126 17250-17359 Sentence denotes Similarly, N45-365 eluted out at molecular weight of $70 kDa, suggesting that N45-365 also exists as a dimer.
TextSentencer_T127 17360-17558 Sentence denotes Furthermore, when N45-181 sample was mixed with N248-365 sample two peaks at 18 and 28 kDa were observed in the elution profile, demonstrating that the two fragments do not interact with each other.
TextSentencer_T128 17559-17659 Sentence denotes Figure 4b detected the presence of only monomer for N45-181 and both monomer and dimer for N248-365.
TextSentencer_T129 17660-17785 Sentence denotes The quaternary structures of N45-181, N248-365 and N45-365 fragments were further examined by analytical ultracentrifugation.
TextSentencer_T130 17786-17924 Sentence denotes Only one major peak was detected for each of these three protein fragments, indicating that they are structurally homogeneous in solution.
TextSentencer_T131 17925-18157 Sentence denotes The results of data analysis with Sedfit version 8.9d showed that protein fragments N45-181, N248-365 and N45-365 sediment at 1.4 S, 2.6 S and 3.7 S (Figure 4c ), corresponding to a molecular mass of 10, 36 and 68 kDa, respectively.
TextSentencer_T132 18158-18358 Sentence denotes These results confirmed that N45-181, N248-365 and N45-365 exist as a monomer, dimer and dimer, respectively, in agreement with the results of gel-filtration chromatography and chemical cross-linking.
TextSentencer_T133 18359-18459 Sentence denotes Taking together all three results indicate that N45-181 exists as a monomer and N248-365 as a dimer.
TextSentencer_T134 18460-18586 Sentence denotes The fact that dimerization occurs through a structural domain strongly suggest that the process is dependent on the structure.
TextSentencer_T135 18587-18685 Sentence denotes A model of the SARS-CoV N protein interaction based on our current results is shown in Figure 4d .
TextSentencer_T136 18686-18883 Sentence denotes It is interesting to note that we did not observe the formation of higher-order multimer in our studies, which may be important for the formation of the ribonucleoprotein complex within the virion.
TextSentencer_T137 18884-19063 Sentence denotes A possible explanation is that multimer formation may require additional factors, such as the presence of RNA or other parts of the N protein that were not present in our samples.
TextSentencer_T138 19064-19201 Sentence denotes Also we can not exclude the possibility that multimers do form at much higher protein concentrations than the ones used in these studies.
TextSentencer_T139 19202-19301 Sentence denotes We suggest that the dimeric form represents a basic building block of the nucleocapsid of SARS-CoV.
TextSentencer_T140 19302-19425 Sentence denotes Since coronavirus N proteins belong to the same protein family, it is probable that they share similar structural features.
TextSentencer_T141 19426-19570 Sentence denotes Comparison of the order-disorder profile of these proteins ( Figure 5 ) shows that they all share the same disordered regions (hatched regions).
TextSentencer_T142 19571-19739 Sentence denotes There are two long disordered regions in the middle and at the C-termini of the proteins, whereas the length of the N-terminal disordered region shows more variability.
TextSentencer_T143 19740-19894 Sentence denotes Two ordered regions are located between the disordered regions, and their locations generally match those of the structural domains in SARS-CoV N protein.
TextSentencer_T144 19895-19962 Sentence denotes Disordered regions are often involved in biomolecular interactions.
TextSentencer_T145 19963-20163 Sentence denotes The C-terminus of MHV N protein, which is disordered, has been shown to interact with hnRNP A1 [31] , whereas the disordered region in the middle is responsible for its RNA-binding activity [13, 32] .
TextSentencer_T146 20164-20353 Sentence denotes In SARS-CoV, the disordered region in the middle of the N protein has been implicated in N-protein selfinteraction [33] , interaction with the M protein [16] and hnRNP A1 interaction [17] .
TextSentencer_T147 20354-20504 Sentence denotes These experimental observations suggest that disordered regions of coronavirus N proteins are probable interaction sites with functional implications.
TextSentencer_T148 20505-20789 Sentence denotes Ordered regions of coronavirus n proteins share similar secondary structure profiles Secondary structure alignment of coronavirus N protein sequences based on the two structural domains of SARS-CoV N protein show that they share very similar secondary structure profiles ( Figure 6 ).
TextSentencer_T149 20790-20902 Sentence denotes The N-terminal domain has three conserved b strands which have been implicated in RNA binding in SARS-CoV [18] .
TextSentencer_T150 20903-21011 Sentence denotes The C-terminal domain is also mostly conserved in terms of secondary structure position within the sequence.
TextSentencer_T151 21012-21190 Sentence denotes The extensive secondary structure and high similarity suggests that the two structural domains observed in SARS-CoV N protein also exist in the N proteins of other coronaviruses.
TextSentencer_T152 21191-21381 Sentence denotes The results from the order-disorder prediction and secondary structure prediction coupled with sequence alignment suggest that coronavirus N proteins all share the same modular organization.
TextSentencer_T153 21382-21506 Sentence denotes The two structural domains are connected by a disordered linker and capped by disordered Nterminal head and C-terminal tail.
TextSentencer_T154 21507-21589 Sentence denotes The two structural domains of SARS-CoV N protein carry out two distinct functions.
TextSentencer_T155 21590-21693 Sentence denotes The N-terminal domain is able to bind RNA, whereas the C-terminal domain acts as a dimerization domain.
TextSentencer_T156 21694-21779 Sentence denotes The ability of the N-terminal domain to bind RNA is closely related to its structure.
TextSentencer_T157 21780-21910 Sentence denotes Although the structure of the C-terminal domain has not been determined, we suggest that dimerization is also structure-dependent.
TextSentencer_T158 21911-21972 Sentence denotes A number of experimental observations support our hypothesis:
TextSentencer_T159 21973-22228 Sentence denotes First, it has been found that oligomer dissociation and protein unfolding of SARS-CoV N protein occur simultaneously [34] ; second, most self-interaction studies have mapped the oligomerization domain to regions containing the structural domain [14, 15] .
TextSentencer_T160 22229-22288 Sentence denotes The structural domains may also serve additional functions.
TextSentencer_T161 22289-22415 Sentence denotes For example, a putative loop between W302 and P310 in the C-terminal domain has been suggested to bind to cyclophilin A [35] .
TextSentencer_T162 22416-22497 Sentence denotes These additional functions may also be dependent on the structure of the protein.
TextSentencer_T163 22498-22688 Sentence denotes Although the two structural domains do not interact with each other, we cannot discount the possibility that the two domains could act in concert to carry out important biological functions.
TextSentencer_T164 22689-22793 Sentence denotes The long flexible linker between the two domains provides enough freedom to make this scenario possible.
TextSentencer_T165 22794-22909 Sentence denotes Previously, the lack of information on structural organization precluded the study of multiple-domain interactions.
TextSentencer_T166 22910-22982 Sentence denotes Now our findings provide a structural framework to perform such studies.
TextSentencer_T167 22983-23060 Sentence denotes The flexible linker between the two structural domains is largely disordered.
TextSentencer_T168 23061-23162 Sentence denotes This disordered region may enable transient interactions with several structurally distinct partners.
TextSentencer_T169 23163-23245 Sentence denotes It has been shown that the M protein of SARS-CoV binds to this region between a.a.
TextSentencer_T170 23246-23260 Sentence denotes 168-208 [16] .
TextSentencer_T171 23261-23366 Sentence denotes Interestingly, human cellular hnRNP A1 has also been shown to bind to almost the same region between a.a.
TextSentencer_T172 23367-23381 Sentence denotes 161-210 [17] .
TextSentencer_T173 23382-23589 Sentence denotes The disordered state of this region potentially allows it to interact with different partners depending on context, e.g. with the M protein during virus assembly and with hnRNP A1 during host cell infection.
TextSentencer_T174 23590-23775 Sentence denotes The exact mechanism by which this occurs is not known, but it could involve different induced folding pathways, which has been shown to occur in other disordered proteins [23, 36, 37] .
TextSentencer_T175 23776-23840 Sentence denotes The same phenomenon is observed in other coronavirus N proteins.
TextSentencer_T176 23841-23975 Sentence denotes In mouse hepatitis virus (MHV), the region corresponding to the flexible linker in its N protein is involved in RNA binding [13, 32] .
TextSentencer_T177 23976-24060 Sentence denotes The same region has also been shown to bind murine hnRNP A1 in infected cells [31] .
TextSentencer_T178 24061-24294 Sentence denotes It seems that the coronavirus N proteins share the common theme of using the flexible linker as an interaction ''hotspot'', and use characteristics of disordered regions to achieve multiple functions within a limited sequence length.
TextSentencer_T179 24295-24395 Sentence denotes Phosphorylation is one of the most important regulatory post-translational modification in proteins.
TextSentencer_T180 24396-24593 Sentence denotes SARS-CoV N protein has been shown to get serine-phosphorylated by multiple kinases and phosphorylation is proposed to be a possible mechanism for nucleocytoplasmic shuttling of the N protein [38] .
TextSentencer_T181 24594-24659 Sentence denotes Disordered regions represent potential sites for phosphorylation.
TextSentencer_T182 24660-24776 Sentence denotes The flexible linker of SARS-CoV N protein contains an SRrich region, which is targeted by a number of kinases [39] .
TextSentencer_T183 24777-24833 Sentence denotes In fact, this region can be phosphorylated in vitro (Dr.
TextSentencer_T184 24834-24839 Sentence denotes W.-Y.
TextSentencer_T185 24840-24870 Sentence denotes Tarn, personal communication).
TextSentencer_T186 24871-25078 Sentence denotes Recent in silico prediction suggested that most of the potential phosphorylation sites fall in the disordered regions, although the exact phosphorylations sites have not been identified experimentally [38] .
TextSentencer_T187 25079-25246 Sentence denotes Although the exact role of phosphorylation has not been elucidated, it could be related to regulate functions such as RNAbinding and localization within the host cell.
TextSentencer_T188 25247-25368 Sentence denotes The phosphorylation patterns of other coronavirus N proteins which have been studied also fall in the disordered regions.
TextSentencer_T189 25369-25480 Sentence denotes In avian infections bronchitis virus (IBV), the phosphorylation sites of the N protein have been mapped to a.a.
TextSentencer_T190 25481-25507 Sentence denotes 186-198 and 367-394 [40] .
TextSentencer_T191 25508-25602 Sentence denotes These two regions are all located in the disordered region as predicted by PONDR ( Figure 5 ).
TextSentencer_T192 25603-25783 Sentence denotes Phosphorylation of transmissible gastroenteritis virus (TGEV) N protein has also been mapped to residues 9, 156, 254 and 256, which are at or close to the disordered regions [41] .
TextSentencer_T193 25784-25920 Sentence denotes Phosphorylation in disordered regions of structural proteins is also observed in other virus families, such as in Paramyxovirinae [42] .
TextSentencer_T194 25921-26007 Sentence denotes Coronavirus N proteins seem to employ a widespread property to allow for modification.
TextSentencer_T195 26008-26170 Sentence denotes Whether or not such modification affects the folding or structural properties of the protein and how these properties affect its function remain to be determined.
TextSentencer_T196 26171-26293 Sentence denotes Identification of the disordered regions of SARS-CoV N protein provides a blueprint for structural studies of the protein.
TextSentencer_T197 26294-26423 Sentence denotes The structural domains are logical candidates for structural determination through X-ray crystallography or solution NMR studies.
TextSentencer_T198 26424-26572 Sentence denotes However, structure determination of the full-length protein is hindered by the disordered regions, which often interfere with crystallization [43] .
TextSentencer_T199 26573-26615 Sentence denotes The large size of the dimeric protein (ca.
TextSentencer_T200 26616-26720 Sentence denotes 90 kDa) also makes full-length structure determination through NMR extremely difficult due to T2 issues.
TextSentencer_T201 26721-26818 Sentence denotes The fact that the two structural domains do not interact provides a handle to solve this problem.
TextSentencer_T202 26819-26939 Sentence denotes The two structural domains can be solved independently and still provide fair representation of the full-length protein.
TextSentencer_T203 26940-27021 Sentence denotes The modular organization of SARS-CoV N protein is shared among other coronavirus.
TextSentencer_T204 27022-27192 Sentence denotes The relative positions of the two structural domains are fairly conserved in all coronavirus N proteins, making them excellent targets for comparative structural studies.
TextSentencer_T205 27193-27396 Sentence denotes The structures of the N-terminal domains would be of special interest since in SARS-CoV it has been identified as an RNAbinding domain, whereas in other coronaviruses the exact function is not yet known.
TextSentencer_T206 27397-27543 Sentence denotes Of special note is the RNA-binding domain of MHV, which has been mapped to the flexible linker region instead of the N-terminal structural domain.
TextSentencer_T207 27544-27709 Sentence denotes At present the molecular mechanism involving N protein/RNA interaction is still not fully understood and the RNA binding site(s) have not been unequivocally defined.
TextSentencer_T208 27710-27865 Sentence denotes It is possible that the Nterminal structural domain folds into different tertiary structures and plays different roles in different coronavirus N proteins.
TextSentencer_T209 27866-27945 Sentence denotes It is also possible that the linker region may also be involved in RNA binding.
TextSentencer_T210 27946-28045 Sentence denotes Another interesting point that needs further study is the role of the C-terminal structural domain.
TextSentencer_T211 28046-28196 Sentence denotes It is not yet known whether it plays the same dimerization role in other coronavirus as in SARS-CoV, although there are hints in the literature [44] .
TextSentencer_T212 28197-28327 Sentence denotes In summary, we have the following conclusions: (1) The N protein of SARS-CoV is a didomain protein connected by a flexible linker.
TextSentencer_T213 28328-28513 Sentence denotes The protein is capped by disordered N-terminal head and C-terminal tail. ( 2) The C-terminal structural domain is sufficient for dimerization, implying a structural role in the process.