Materials and methods Identification of olfactory receptor cDNAs An adult mouse cDNA library made from the olfactory epithelium of a single animal was provided by Leslie Vosshall (Rockefeller University, New York, NY, USA), and an embryonic library (made from the olfactory epithelia of several E16.5-E18.5 embryos) was provided by Tyler Cutforth (Columbia University, New York, NY, USA). Both libraries were oligo-dT primed and directionally cloned into the lambdaZAP-XR vector (Stratagene, La Jolla, CA, USA). The adult library has a complexity of 6.5 × 106 primary clones, and the embryonic library has a complexity of 1.65 × 106. Libraries were amplified to give titers of 5 × 109 pfu/ml (adult) or 2 × 1010 pfu/ml (embryonic). Hybridization probes were made by degenerate PCR of mouse genomic DNA, in a fashion similar to those described previously [1], with primer pairs and annealing temperatures given in Table 2. Low-stringency hybridization conditions were as described [1]. Clonally-pure plaques were obtained through secondary screens using the same probe as the corresponding primary screen. PCR with vector primers (M13F/R) was performed to prepare sequencing templates. cDNA size estimates were obtained by agarose gel electrophoresis, and inserts were sequenced from the 5' end using the M13R primer and big-dye terminator chemistry according to ABI's protocols (Applied Biosystems, Foster City, CA, USA). In order to obtain 3' sequence, selected phage clones were converted to plasmid stocks following a scaled-down version of Stratagene's in vivo excision protocol. Plasmid DNA gave better 3'-end sequence than PCR products, which often suffered from polymerase stuttering through the poly(A) tract. cDNA sequences and associated information are available through dbEST (Genbank accessions CB172832-CB174569) and our olfactory receptor database [46]. The updated olfactory receptor gene catalog is available through Genbank (accessions AY317244-AY318733). Throughout the manuscript, genes are referred to by their Genbank accession numbers. Sequence analysis cDNA sequences were base-called and quality-trimmed using phred (trim_cutoff = 0.05) [47], and vector sequences were removed using cross_match [48]. Any sequences of less than 50 bp after trimming were discarded. 3' UTR lengths were estimated by combining approximate insert sizes determined by PCR with 5' sequence data where possible (if the 5' sequence did not extend into the coding region we could not estimate 3' UTR size). We counted cDNAs from a given gene as showing alternative polyadenylation site usage if 3' UTR length estimates varied by at least 400 bp - smaller variation could be real, but may not be distinguishable from error in our size estimates. To assign cDNAs to their corresponding olfactory receptor genes, we first defined a genomic 'territory' for each gene, with the following attributes: strand, start position (100 kb upstream of the start codon or 1 kb after the previous gene upstream on the same strand, whichever is closer) and end position (1 kb downstream of the stop codon). Trimmed sequences were compared with genomic sequences using sim4 [30] (settings P = 1 to remove polyA tails and N = 1 to perform an intensive search for small exons). The sim4 algorithm uses splice-site consensus sequences to refine alignments. Only matches of 96% or greater nucleotide identity were considered. RepeatMasked sequences [49] were also compared to genomic sequences; cDNA:genomic sequence pairings not found in both masked and unmasked alignments were rejected. Coordinates from the unmasked alignment were used for further analysis. Any cDNA sequence matching entirely within a territory was assigned to that gene. If a cDNA matched more than one gene territory, the best match was chosen (that is, the one with highest 'score', where score is the total of all exons' lengths multiplied by their respective percent identities). We found 27 cDNAs that spanned a larger genomic range than one gene territory and flagged them for more careful analysis. Of these, six cDNAs showed unusual splicing within the 3' UTR, but the remaining 'territory violators' were found to be artifacts of the analysis process which fell into three types. These included: cDNAs where the insert appeared to be cloned in the reverse orientation (six cDNAs); sequences from recently duplicated gene pairs, where sim4 assigned coding region and upstream exons to different members of the pair, although exons could equally well have been aligned closer to one another (six cDNAs); and artifacts due to use of sim4's N = 1 parameter (nine cDNAs). This parameter instructs the program to make extra effort to match small upstream exons, allowing a greater total length of EST sequence to be matched. However, occasionally the N = 1 parameter caused the program to assign very small sequences (1-4 bp) to distant upstream exons, when they probably match nearer to the corresponding coding sequence. The expected distribution shown in Figure 2 was calculated using the equation P(x) = e-μμx/x!, where P(x) is the Poisson probability of observing x cDNAs per gene, and μ is the mean number of cDNAs observed per gene (μ = 1,176/983: 1,176 cDNAs matching olfactory receptor genes in our dataset and 983 intact class II olfactory receptors). In our analysis of expressed pseudogenes, we ignored two olfactory receptor pseudogenes found very near the ends of genomic sequences and thus likely to be error-prone. A protein sequence alignment of intact mouse olfactory receptors was generated using CLUSTALW [50], edited by hand, and used to produce the phylogenetic tree shown in Figure 1 using PAUP's neighbor-joining algorithm (v4.0b6 Version 4, Sinauer Associates, Sunderland, MA). The tree was colored using a custom script. Information content (the measure of sequence conservation shown in Figure 6) was calculated for each position in the alignment using alpro [51]. To determine the number of transcriptional isoforms for each gene, we examined the sim4 output for every matching cDNA in decreasing order of number of exons. The first cDNA was counted as one splice form, and for each subsequent cDNA, we determined whether exon structure was mutually exclusive to isoforms already counted. We were conservative in our definition of mutually exclusive, and thus our count represents the minimum number of isoforms represented in the cDNA collection. RT-PCR The olfactory epithelia were dissected from three adult female C57BL/6 mice, including tissues attached to the skull and septum. RNA was isolated using the Qiagen RNeasy midi kit (Qiagen, Valencia, CA, USA), including a DNase treatment step. First-strand cDNA was produced from 2.5 μg of RNA in a volume of 50 μl using random hexamers and Invitrogen's Superscript II reverse transcriptase (Invitrogen, Carlsbad, CA, USA), according to the manufacturer's recommendations. One-twenty-fifth of the resulting cDNA was used as template in subsequent PCR reactions. PCR amplification biased towards class I olfactory receptors was performed using degenerate primers P26 [17] and classI_R1 (5'-GGRTTIADIRYIGGNGG-3') with an annealing temperature of 44°C. The product was cloned (TA cloning kit, Invitrogen), and individual clones were sequenced. Specific PCR primers used to confirm expression of individual olfactory receptor genes are given in Additional data file 1. Each PCR product was sequenced to confirm that the expected gene and no others had been amplified. Control reactions on a template made by omitting reverse transcriptase gave no product, indicating that the RNA preparation was uncontaminated by genomic DNA. Relative transcript levels were estimated using real-time PCR according to Applied Biosystems' protocols, with magnesium concentration, primer pair and fluorescent probe given in Additional data file 2. The increase in fluorescence during thermocycling was measured on an ABI PRISM 7900HT. Standard curves were constructed for each primer pair using triplicate samples of mouse genomic DNA of nine known concentrations (range 0.01-100 ng, or about 3-30,000 copies of the haploid genome). Relative expression level of each gene was determined by comparing the mean Ct (cycle where fluorescence reaches an arbitrary threshold value) obtained with six replicate samples of reverse-transcribed RNA to the standard curve for the corresponding primers. Relative RNA levels of a housekeeping gene, ribosomal S16, were measured as previously described [52]. Control reactions on template prepared by omitting reverse transcriptase amplified at a relative level of 0.03 ± 0.01 ng or less in each case. Expression measurements of the seven genes were normalized for each mouse so that S16 levels were equal to 1 (arbitrary units). In situ hybridization Coronal sections were cut from the olfactory epithelia of an adult mouse (Figure 4) and a young (P6) C57BL/6 mouse. RNA in situ hybridization was carried out as described previously [15,53] with digoxigenin-labeled antisense riboprobes specific for the 3' UTRs of genes AY318555 (0.5 kb) and AY317365 (0.5 kb). Riboprobe sequences were generated by PCR using primer pairs 5'-TCTTCCAAACCTGGACCCCCC-3' and 5'-ATCTCTCCAGCACCTTACTTG-3' for AY318555 and primer pairs 5'-TAAGATGTAAGTGATAATTTAGATTACAGG-3' and 5'-TTTCTGCCTCAGCTATGACAG-3' for AY317365. Hybridization was carried out in 50% formamide at 65°C, and slides were washed at high stringency (65°C, 0.2 × SSC). The probes each hybridize to only one band on a Southern blot, indicating that each probe only detects one olfactory receptor gene. BLAST analyses show that the AY318555 probe is unique in Celera's mouse genome assembly (Release 13), and that the AY317365 probe is similar to only one other genomic region. This potential cross-hybridizing region is over 10 Mb from the nearest olfactory receptor coding region and is thus highly unlikely to be included in any olfactory receptor transcript. Low-power images are composed of three overlapping micrographs (40×) assembled in Adobe Photoshop 7.0.