Sequencing of coronavirus IBV genomic RNA: a 195base open reading frame encoded by mRNA B (Avian infectious bronchitis virus; cDNA clones; recombinant DNA)
Abstract
DNA sequencing of genomic cDNA clones of avian infectious bronchitis virus (IBV) has been carried out. 770 bases have been determined which include genomic sequences spanning the 5' termini of the two smallest mRNAs of the 3'-coterminal "nested" set: mRNA A and mRNA B. This region contains the complete coding sequences for mRNA B which are additional to those present in mRNA A. Two open reading frames are present, predicting proteins of M,s 7500 and 9500.
Avian IBV, in common with other coronaviruses, has a single-stranded, polyadenylated, infectious RNA genome approx. 20 kb in length (Stern and Kennedy, 1980a) . In infected cells multiple subgenomic positive-stranded RNAs are produced . For IBV and MHV these have been shown to consist of a 3'-coterminal "nested" set (Stem and Kennedy, 1980a; Lai et al., 1981; Leibowitz et al., 1981) . For both IBV and MHV the messenger function of these subgenomic RNAs has been demonstrated (Rottier et al., 1981; Stem et al., 1982; Siddell, 1983) . However, certain differences of genome organisation have become apparent between (Stern and Kennedy, 1980b; Lai et al., 1981) . Second, the coding function of the various messenger RNAs is different. Both IBV and MHV contain three main structural polypeptides: the nucleocapsid, the membrane (El), and the spike or peplomer (E2) polypeptides (Cavanagh, 1981; Siddell et al., 1983) . In both systems the smallest RNA codes for the nucleocapsid protein but in MHV the next smallest RNA codes for the membrane polypeptide, whereas in IBV the membrane polypeptide is coded for by the third smallest RNA (Siddell et al., 1980; Siddell, 1983; Rottier et al., 1981; Stern et al., 1982; Stem and Sefton, 1984) . These differences are summarised in Fig. 1 . The organisation of the messenger RNAs and in vitro translation studies have led to the hypothesis that the 5'-most sequences of each mRNA, which are not present in the next smallest mRNA, contain the complete coding sequences for the major protein product produced by that messenger species (Stern and Kennedy, 1980b; Lai et al., 1981) . However, since the only coronavirus sequences published are those of the smallest RNA species Skinner and Siddell, 1983) it has not been possible to examine this hypothesis at the RNA sequence level. RNA sequence data from this region might also enable us to predict the properties and thus aid the identification of possible polypeptides coded for by mRNA B. In this paper we report the nucleotide sequence of a cloned cDNA copy of IBV genomic RNA in the region corresponding to the 5' end of mRNA B. The sequence shows that the 5'-most sequences of mRNA B could code for a hydrophobic 7.5-kDa1
protein.
The preparation of cDNA clones has been previously described (Brown and Boursnell, 1984) . Briefly, virion RNA was isolated from IBV strain Beaudette grown in embryonated eggs. cDNA was produced by oligo(dT)-primed reverse transcription of the RNA, followed by self-primed reverse transcription to generate the second strand. S 1 nucleasetreated cDNA was dC-tailed using terminal transferase, annealed to dG-tailed Pst I-cleaved PAT 153 (Twigg and Sherratt, 1980) and transformed into Escherichia co/i HB 10 1. Ampicillin-sensitive colonies were selected for further characterisation.
Viral clones were identified by hybridisation with a probe prepared by polynucleotide kinase labelling of alkali-treated, full-length IBV genomic RNA.
Restriction sites were mapped on a series of clones and this enabled construction of a continuous map,
sequences at 3'-terminus of the viral genome was confirmed by hybridisation with a kinase-labelled poly(U) probe.
(c) Formaldehyde-agarose gel analysis of IBV mRNAs 1 S yO formaldehyde-agarose gels were run essentially as described by Maniatis et al. (1982) . Total RNA samples from IBV-infected chick kidney cell cultures were run overnight at 60 V on 16 cm vertical gels. IBV mRNAs were detected by blotting onto nitrocellulose and probing with nick-translated cloned IBV sequences (Maniatis et al., 1982 kinase, were sequenced essentially as described by Maxam and Gilbert (1980) . The depurination reaction was carried out in 66"; formic acid for 10 min at 20' C, after which the samples were treated in the same way as the pyrimidine reaction. For sequencing some regions of the DNA, restriction digests of the viral insert were recloned into the plasmid pUC9 allowing sequencing from adjacent vector restriction sites (Messing and Vieira, 1982) . Sequence data were stored and analysed on an Apple IIe microcomputer using the programs of Larson and Messing (1983) and on a VAX 1 l/780 minicomputer using the programs of Staden (1984) . RESULTS 770 bp of DNA sequence from one IBV genomic clone, C5.136 (Brown and Boursnell, 1984) have been determined. This sequence corresponds to the genomic RNA sequence stretching from 2.40 kb to 1.63 kb from the 3' end of the viral genome. In 95% of the sequence has been determined on both strands, and on each strand most regions have been sequenced more than once from different restriction sites. Fig. 2c shows the positions of the initiation and termination codons in the three reading frames and the positions of the 5' ends of mRNAs A and B as determined by Sl nuclease mapping (Brown and Boursnell, 1984) . It should be noted that S 1 nuclease mapping will determine the 5' end of the "body" of the mRNA (see DISCUSSION) . The DNA sequence of 770 nucleotides, with a translation of the three main ORFs, is shown in Fig. 3 (Brown and Boursnell, 1984) .
In Skinner and Siddell, 1983 ) reveals no significant homology at the RNA or protein level. Although some homology might be expected it should be noted that no serological crossreaction between IBV and any of the mammalian coronaviruses has been reported .
The region of the IBV sequence presented in this paper contains the 5' ends, on the viral genome, of mRNAs A and B. The messenger RNAs of coronavirus IBV are probably transcribed from non-contiguous regions of the viral genome. Work on the murine coronavirus MHV has shown that a common 5' leader sequence, originating from the extreme 5' end of the viral genome, is fused to the body of each messenger RNA (Lai et al., 1982; 1983; Spaan et al.. 1983; Baric et al., 1983) . It is likely that the IBV messenger RNAs have the same structure. However, whether or not leader sequences are present, Sl nuclease mapping experiments have shown that the ends ofthe bodies ofmRNAs A and B lie at positions 139 and 443 respectively (Brown and Boursnell. 1984 ) as shown in Fig. 3 .
Of the two ORFs which are present at the 5' end of mRNA B, the M, 7500 ORF seems to be the most likely candidate for translation in vivo. The location of the coding sequence for the putative M, 7500
polypeptide fits in well with the hypothesis that the major polypeptide product of each mRNA is translated from those 5' sequences not present in the next smallest RNA. If the A4, 9500 polypeptide were translated then this would no longer be true, since its coding region stretches well into mRNA A. Furthermore, the RNA sequences flanking the initiation codons for the putative A4r 7500 and nucleocapsid genes correspond well to those preferred for functional eukaryotic initiation codons (Kozak, 1983) . However, the sequence GNNAUGA around the AUG codon at the start of the M, 9500 ORF is rare in this context. In addition the AUG codon at the start of the Mr 7500 ORF is the first initiation codon to occur in the body of mRNA B ; since initiation of translation at anything other than the first AUG codon is known to be rare (Kozak, 1983) this suggests that this ORF codes for the major product of mRNA B.
The amino acid sequence of the putative M, 7500
polypeptide shows it to be hydrophobic in nature and to have an unusual composition in that 26"" (17 out of 65) of its residues are leucine. Of the six possible triplets coding for leucine, one (UUA) is used 8 times out of 17. This unusual composition and ccdon bias, which would not be expected from a chance ORF, suggests that this polypeptide is translated in vivo. A computer analysis of the sequences presented here has been carried out using the program ANALY SEQ (Staden, 1984) . This program uses certain criteria to select one of the three reading frames as being the most likely protein coding frame. Although better suited to analysing large ORFs, it is interesting to note that a search based on looking for codon biases above those expected from the base composition selects 857; of the M, 7500 ORF as the most likely coding frame. The 15"; of codons not selected are not in a single block, which might have suggested a sequencing error leading to an artificial frameshift. All of the nucleocapsid which has so far been sequenced was selected as the most likely coding frame, but none of the A4, 9500 ORF.
We have carried out in vitro translation of total and poly(A) + RNA populations from IBV-infected chick kidney cell cultures using a rabbit reticulocyte lysate system. However, analysis of the products on lo-18% polyacrylamide gradient gels containing urea could not resolve any small polypeptides due to high background in the relevant low-M, range. A similar problem has been found by Stem and Sefton (1984; Stern, D.F., personal communication) who have carried out in vitro translation studies of gelpurified and gradient-fractionated mRNAs and have identified no major specific product from mRNAs B or D.
At present, therefore, it is not possible to say definitively whether either of these polypeptides is produced. In a search for small structural polypeptides, a [ 3H]leucine-labelled preparation of virus has been analysed on a 12.5% polyacrylamide tube gel (Cavanagh, D., personal communication) , using the phosphate buffer system of Swank and Munkres (197 1) . This showed that there were three detectable polypeptides of apparent M,s: 16000, 12000, and 10000. The percentage of the total counts in the gel accounted for by these polypeptides was < 1%. Thus, if either of these ORFs codes for a structural polypeptide, it must only be present in very small quantities. It is also possible that they could code for non-structural polypeptides. These would be difficult to identify in IBV-infected cells by pulse-labelling techniques without using immunoprecipitation to lower the background of host-cell incorporation which is poorly shut off by IBV infection. The availability of sequence data for these putative polypeptides, however, opens up the possibility of using immunoprecipitation with antisera prepared against synthetic oligopeptides to search for the presence of these polypeptides in IBV-infected cells.
|