Results At least 419 mouse olfactory receptor genes are expressed in the olfactory epithelium We have isolated 1,264 olfactory receptor cDNA clones, which together confirm the olfactory epithelial expression of 419 annotated olfactory receptor genes. We used low-stringency hybridization with degenerate olfactory receptor DNA probes to screen around 4,100,000 plaque-forming units (pfu) of an adult mouse olfactory epithelium cDNA library and around 640,000 pfu of an embryonic olfactory epithelium library. We obtained sequences from 1,715 hybridization-positive cDNAs following secondary screens to isolate single clones. Of these clones, 1,264 yielded olfactory receptor-containing sequences. The 26% false-positive rate is a consequence of using low-stringency hybridization to obtain maximal sensitivity. Continuing the screen would have resulted in cDNAs from additional olfactory receptors, but we reached a point of limiting returns: our final batch of 45 olfactory receptor-positive sequences represented 33 different olfactory receptors, of which only four had not been encountered previously in our screen. Sequence analysis shows that the libraries are of high quality. Firstly, directional cloning was successful: only eight out of 1,430 cDNA sequences with any protein homology matched that protein on the reverse strand. Secondly, genomic contamination is rare: when the 83 olfactory receptor-containing sequences that had a 5' UTR of 400 bp or longer were aligned to genomic sequence, 80 spliced across at least one intron, leaving a maximum of three clones (3.6%) that potentially represent genomic contamination of the libraries. Thirdly, most clones are of a reasonable length: although we did not determine whether clones are full-length, 881 of 1,264 (70%) olfactory receptor cDNAs contain the gene's start codon and at least some 5' UTR sequence. In order to match cDNAs to their genomic counterparts, first we updated our catalog of mouse olfactory receptor genes [1] based on Celera's most recent genome assembly (Release 13) [29]. Previous reports of the mouse olfactory receptor repertoire [1,2] were based on the Release 12 assembly. Release 13 consists of fewer, longer scaffold sequences containing fewer, smaller gaps than Release 12. Using the BLAST-based methods detailed previously [1], we identified 1,490 olfactory receptor sequences in the new assembly, including 1,107 intact olfactory receptor genes (compared to 866 intact olfactory receptors in the old assembly) reflecting the reduced sequence error rate and increased coverage of the new assembly (Table 1). We created a local database of genomic sequences including all olfactory receptor loci and 0.5 Mb flanking sequences (if available) and compared each cDNA sequence to this 'olfactory subgenome' database using sim4 [30]. cDNAs were assigned to individual genes based on their best match to an olfactory receptor coding region or its upstream region (see Materials and methods). Of the 1,264 olfactory receptor cDNAs, 1,176 matched a total of 419 olfactory receptor genes; the remaining cDNAs either matched an olfactory receptor below our 96% nucleotide identity threshold or had ambiguous matches encompassing more than one olfactory receptor gene region (see below). A class I olfactory receptor degenerate primer broadens phylogenetic distribution of confirmed olfactory receptor genes Previous analyses of the mammalian olfactory receptor family define two major phylogenetic clades, referred to as class I and II olfactory receptors, and suggest that class I olfactory receptors are more similar to fish olfactory receptors than are class IIs [5]. Figure 1 illustrates the phylogenetic diversity of our cDNA collection, showing that we have confirmed expression of at least one olfactory receptor gene in each major clade of the class II olfactory receptor genes, or 391 out of 983 (40%) of all intact class II olfactory receptor genes where full-length genomic sequence data are available (blue branches). The screen thus appears relatively unbiased in its coverage of class II olfactory receptors. However, our random screen provided cDNAs for only two out of 124 intact, full-length class I olfactory receptors. In an attempt to broaden the phylogenetic coverage of our hybridization screen, we used additional degenerate probes on the adult library and screened an embryonic library (Table 2). These experiments did not increase the diversity of clones identified (not shown). This severe class I underrepresentation could be due to experimental bias - a consequence of using degenerate primers to create our hybridization probe. Alternatively, class I genes might be expressed at extremely low levels in the olfactory epithelium. In order to determine whether class I olfactory receptors are expressed in the olfactory epithelium, we designed a reverse-strand degenerate primer to recognize a motif in transmembrane domain 7 (PP{V/M/A/T}{F/L/I/M}NP) enriched among class I olfactory receptor sequences. Most of the motif is shared among all olfactory receptors, but the first proline residue (at the primer's 3' end) is found in 121 out of 124 (98%) intact class I genes compared to only 37 out of 983 (4%) intact class II genes. When combined with another olfactory receptor degenerate primer, P26 [17], this primer preferentially amplifies class I olfactory receptors from mouse genomic DNA: of 33 sequenced, cloned PCR products, 17 represented seven different class I olfactory receptors, six represented three different class II olfactory receptors, and ten represented five different non-olfactory receptor contaminants. Degenerate PCR, cloning, and sequencing from reverse-transcribed olfactory epithelium RNA showed that at least seven class I olfactory receptors are expressed, as well as one additional class II gene (colored red in Figure 1). However, no products could be obtained from the adult or the fetal olfactory epithelium cDNA libraries using the class I primer, suggesting that the libraries contain very low levels of class I olfactory receptors. We also confirmed expression of nine additional olfactory receptors (three class I and six class II, colored red in Figure 1) from subclades that were poorly represented in our cDNA screen using gene-specific primer pairs to amplify cDNA library or reverse-transcribed RNA templates. For two of the class I genes we had shown to be expressed, we determined relative transcript levels using quantitative RT-PCR (see below). Expression levels were similar to those observed for genes that were represented in our cDNA collection, suggesting that class I olfactory receptors are not under-represented in the olfactory epithelium, and that the dearth of class I cDNAs in our screen is likely to be due to bias in the libraries and/or hybridization probes. Some olfactory receptor genes are expressed at higher levels than others Our cDNA screen suggests that some olfactory receptor genes are expressed at higher levels than others. If all olfactory receptor genes were expressed at equal levels, and our screen and library were unbiased in their coverage of the class II olfactory receptors, the number of cDNAs detected per class II olfactory receptor should follow a Poisson distribution, calculated based on the assumption that all 983 intact class II olfactory receptors have an equal chance of being represented in the screen, but that class I olfactory receptors and pseudogenes cannot be found (Figure 2). We calculate a low probability (approximately one in 28) that we would observe any gene with at least eight matching cDNAs in the set of 1,176 cDNAs we assigned to single olfactory receptor sequences. However, for 17 olfactory receptors, we found ten or more matching cDNAs, suggesting that they might be expressed at higher levels than other olfactory receptor genes (Figure 2). The two genes for which we found most cDNAs (AY318726/MOR28 and AY318727/MOR10) are genomically adjacent and in the well-studied olfactory receptor cluster next to the T-cell receptor α/δ locus [18,31]. Quantitative RT-PCR of six olfactory receptors confirms that expression levels do indeed vary considerably between genes. We used quantitative (real-time) PCR to measure olfactory epithelium transcript levels of six olfactory receptor genes and the ribosomal S16 gene in three mice of the same inbred strain (Figure 3). These genes include two olfactory receptors with more than 20 matching cDNAs, two with one or two matching cDNAs and two class I olfactory receptors with no matching cDNAs. In these assays, we measure transcript level per genomic copy of the gene by comparing how well a gene-specific primer pair amplifies reverse-transcribed RNA, relative to a standard curve of amplification of mouse genomic DNA. We find that expression levels can vary by almost 300-fold between genes (for example, genes A and D, Figure 3). However, cDNA numbers are not a good indicator of expression level, a discrepancy that is likely to be due to bias in the screen (we used degenerate primers to make the probes, which will favor some genes over others) and in the libraries (oligo-dT priming will favor genes with shorter 3' UTRs). For example, we observe large expression differences in all three mice between two genes for which similar numbers of cDNAs were found (genes A and B, Figure 3), and conversely, similar expression levels between two genes with a ten-fold difference in number of cDNAs found (genes B and C, Figure 3). Expression levels are mostly consistent between different mice: we find similar expression-level differences between olfactory receptor genes in all three mice examined (that is, the rank order of the six genes is similar among the three mice), although there is variation in expression level of some genes between mice (for example, gene E, Figure 3). In situ hybridization (Figure 4) shows that increased numbers of expressing cells account for some, but not all, of the difference in transcript levels between two of the genes tested by real-time PCR (genes A and D in Figure 3). We hybridized alternate coronal serial sections spanning an entire olfactory epithelium of a young mouse (P6) with probes for gene A and gene D. Southern blot and BLAST analyses show that both probes are likely to hybridize to their intended target genes and no others (not shown). Gene A is expressed in zone 4 of the epithelium according to the nomenclature of Sullivan et al. [32] (Figure 4a). The expression pattern of gene D does not correspond to any of the four 'classical' olfactory epithelial zones [14,15,32]: positive cells are found in regions of endoturbinates II and III and ectoturbinate 3, resembling the expression pattern seen previously for the OR37 subfamily and ORZ6 olfactory receptors [33,34] (Figure 4b). Counting the total number of positive cells in alternate sections across the entire epithelium, we find that gene A is expressed in 2,905 cells, about 12 times more cells than gene D, which is expressed in a total of 249 cells. This 12-fold difference in numbers of expressing cells does not account for the almost 300-fold difference in RNA levels observed by real-time PCR, implying that the transcript level per expressing cell for gene A is about 25 times higher than transcript level in each expressing cell for gene D. We note that hybridization intensities per positive neuron appear stronger for gene A than gene D after comparable exposure times, in accordance with the idea that transcript levels are higher per cell. Thus, we suggest that expression in more cells and in higher levels per cell together account for the almost 300-fold higher olfactory epithelial RNA levels of gene A relative to gene D (Figure 3). Most olfactory receptor genes have several transcriptional isoforms Our cDNA collection reveals that at least two thirds of the olfactory receptors sampled show alternative splicing of their 5' untranslated exons. Using a custom script to process sim4 alignments of cDNA and genomic sequences, we find two to eight different splice forms for 85 (45%) of the 191 genes for which we have had some opportunity to observe alternate splicing (that is, a minimum of two cDNAs, at least one of which is spliced), and 55 (67%) of the 82 genes for which we have four or more cDNAs (and thus a higher chance of observing any alternate splicing) (Figure 5). These alternative splice events are almost all restricted to the 5' UTR and include exon skipping and alternate splice-donor and -acceptor use. At least half of the olfactory receptors represented in our cDNA collection utilize more than one polyadenylation site, resulting in alternative 3' UTR isoforms. We have crudely estimated 3' UTR size for 1,169 cDNA clones by combining approximate insert size information with 5' sequence data. More than one 3' UTR isoform is predicted for 43 of the 77 (56%) genes for which there are at least four cDNAs with 3' UTR size information. We confirmed the alternative polyadenylation isoforms of four out of five selected genes by sequencing the 3' end of 14 cDNA clones. These 14 sequences also revealed one cDNA where the poly(A) tail was added 27 bp before the stop codon, and another where an intron was spliced out of the 3' UTR, contrary to the conventional stereotype of olfactory receptor gene structure. A subset of olfactory receptors shows unusual splicing We identified 62 cDNAs (5% of all olfactory receptor clones) from 38 intact olfactory receptors and one olfactory receptor pseudogene where a splice site within the protein-coding region is used. For two genes (top two cDNAs, Figure 6), the predicted protein appears to be an intact olfactory receptor with three or ten amino acids, including the initiating methionine, contributed by an upstream exon. A similar gene structure was described previously for a human olfactory receptor [25]. One of these two mouse genes has no start codon in its otherwise intact main coding exon. The unusual splicing thus rescues what would otherwise be a dysfunctional gene. In most cases (60 out of 62 cDNAs), the unusual transcript appears to be an aberrant splice form - the transcript would probably not encode a functional protein because the splice introduces a frameshift or removes conserved functional residues (Figure 6). For two clones (bottom two cDNAs, Figure 6), exon order in the cDNA clone is inconsistent with the corresponding genomic sequence. It is difficult to imagine what kind of cloning artefact resulted in these severely scrambled cDNAs: we suggest that they derive from real but rare transcripts. However, their low frequency in our cDNA collection suggests that splicing contrary to genomic organization does not contribute significantly to the olfactory receptor transcript repertoire. For 21 of the 26 genes for which unusually spliced cDNAs were found, we also observe an alternative ('normal') isoform that does not use splice sites within the coding region. (For the remaining 13 of the 3' genes showing odd splicing, we have identified only one cDNA so have not determined whether normal isoforms are present.) We were intrigued both by previous reports of splicing of human olfactory receptors near the major histocompatibility complex (MHC) cluster, where several genes splice over long distances to a common upstream exon [26,27] and by the idea that olfactory receptor transcriptional control could be achieved by DNA recombination mechanisms, perhaps with the result that transcripts would contain some sequence from another locus. We therefore verified that the entire sequence of each olfactory receptor EST matches the corresponding gene's genomic 'territory' (defined for this purpose as from 1 kb after the preceding gene to 1 kb after the gene's stop codon). We found no cDNAs where introns encompassed other olfactory receptor genes, as reported for olfactory receptors in the human MHC region [26,27]. Six cDNAs do extend further than a single gene's 'territory' and appear not to be artifacts of the sequencing or analysis process. In each of these cases, the clones use splice sites within the 3' UTR and thus extend further than the arbitrary 1 kb downstream of the stop codon. Five of these six cDNAs also use splice-donor sites within the coding region and encode disrupted olfactory receptors (Figure 6). In the sixth cDNA, a 2.6-kb intron is spliced out of the 3' UTR, leaving the coding region intact. If olfactory receptor transcriptional control is achieved by DNA recombination, the beginning of each transcript might derive from a donated promoter region, with the rest of the transcript coming from the native ORF-containing locus. In order to examine the recombination hypothesis, we analyzed 115 cDNA clones for which sim4 failed to align 20 bp or more to the corresponding genomic locus. In most cases, the missing sequence was explained by gaps in the genomic sequence or by matches that fell below our percent identity-based cutoff for reporting matches. For three cDNAs (from three different olfactory receptors), we found that the missing piece of sequence matched elsewhere in the genome. Comparison with the public mouse genome assembly confirmed the distant matches. With such a small number of cDNAs exhibiting a possible sign of DNA recombination (a sign that could also be interpreted as chimeric cDNA clones), we conclude that such rearrangement is unlikely to occur. However, the possibility remains that DNA recombination is responsible for olfactory receptor transcriptional regulation, with the donated region contributing only promoter sequences but no part of the transcript. Both unclustered olfactory receptors and olfactory receptor pseudogenes can be expressed We were interested in whether olfactory receptors need to be part of a cluster in the genome in order to be transcribed, or if the clustered genomic organization of olfactory receptors is simply a consequence of the fact that local duplication is the major mechanism for expanding the gene family [1]. 'Singleton' olfactory receptors (defined as full-length olfactory receptors without another olfactory receptor within 0.5 Mb) are more often pseudogenes than are olfactory receptors in clusters (8 out of 16 versus 271 out of 1,358; χ2 = 8.8, P < 0.005). Of the eight intact singleton olfactory receptors, two have matching cDNAs in our collection, a similar proportion as found for olfactory receptors in clusters, showing that clustering is not an absolute requirement for olfactory receptor expression. However, it is possible these two expressed singleton genes are part of 'extended' olfactory receptor clusters - their nearest olfactory receptor neighbors are 1.7 Mb and 2.6 Mb away, respectively. We also find that some olfactory receptor pseudogenes are expressed, albeit with a lower probability than intact olfactory receptors. Considering the 1,392 olfactory receptor gene sequences for which reliable full-length data are available, 15 out of 285 (5%) apparent pseudogenes are represented in our cDNA collection, compared to 393 out of 1,107 (36%) intact olfactory receptors. However, three of these 15 'expressed pseudogenes' are intact genes in the public mouse genome sequence. The defects in Celera's version of these genes may be due to sequencing errors or true polymorphism. Publicly available mouse sequence confirms that 11 of the 12 remaining expressed pseudogenes are indeed pseudogenes. No public sequence matches the 12th 'expressed pseudogene' with 99% identity or more.