To assign cDNAs to their corresponding olfactory receptor genes, we first defined a genomic 'territory' for each gene, with the following attributes: strand, start position (100 kb upstream of the start codon or 1 kb after the previous gene upstream on the same strand, whichever is closer) and end position (1 kb downstream of the stop codon). Trimmed sequences were compared with genomic sequences using sim4 [30] (settings P = 1 to remove polyA tails and N = 1 to perform an intensive search for small exons). The sim4 algorithm uses splice-site consensus sequences to refine alignments. Only matches of 96% or greater nucleotide identity were considered. RepeatMasked sequences [49] were also compared to genomic sequences; cDNA:genomic sequence pairings not found in both masked and unmasked alignments were rejected. Coordinates from the unmasked alignment were used for further analysis. Any cDNA sequence matching entirely within a territory was assigned to that gene. If a cDNA matched more than one gene territory, the best match was chosen (that is, the one with highest 'score', where score is the total of all exons' lengths multiplied by their respective percent identities). We found 27 cDNAs that spanned a larger genomic range than one gene territory and flagged them for more careful analysis. Of these, six cDNAs showed unusual splicing within the 3' UTR, but the remaining 'territory violators' were found to be artifacts of the analysis process which fell into three types. These included: cDNAs where the insert appeared to be cloned in the reverse orientation (six cDNAs); sequences from recently duplicated gene pairs, where sim4 assigned coding region and upstream exons to different members of the pair, although exons could equally well have been aligned closer to one another (six cDNAs); and artifacts due to use of sim4's N = 1 parameter (nine cDNAs). This parameter instructs the program to make extra effort to match small upstream exons, allowing a greater total length of EST sequence to be matched. However, occasionally the N = 1 parameter caused the program to assign very small sequences (1-4 bp) to distant upstream exons, when they probably match nearer to the corresponding coding sequence.