Materials and Methods The basic setup of disease-mapping studies is as follows: The number of cases of a particular disease Yi occurring in area Ai is recorded, where the set of areas {Ai},i = 1, 2,…,n represents a partition of the region under study. For each area Ai, the expected number of cases Ei is also computed using reference rates for the disease incidence (or mortality) and the sociodemographic strata (with respect to age, sex, and perhaps socioeconomic characteristics) where census data are available. The distribution of the counts Yi is typically assumed to come from a Poisson distribution, as the diseases usually considered in such studies are rare and this distribution gives a good approximation to the underlying binomial distribution that would hold for each risk stratum. The local variability of the counts is thus modeled as follows: The parameter of interest is θi, the relative risk that quantifies whether the area i has a higher (θi > 1) or lower (θi < 1) occurrence of cases than that expected from the reference rates. It is this parameter that we are trying to estimate to quantify the heterogeneity of the risk and to highlight unusual patterns of risks. Data Generation The spatial structure used throughout the simulations is that of the 532 wards in the county of Yorkshire, England. Wards are administrative areas in the United Kingdom, with a total population of approximately 5,000 on average. We base our expected counts Ei on those calculated by Jarup et al. 2002 for prostate cancer in males 45–64 years of age over the period from 1975 to 1991. We then simulate three spatial patterns of increased risks. For each pattern, we examine three magnitudes for the elevated risks. We also examine how the inference is changed if the expected counts are multiplicatively increased by a scale factor (SF) varying from 2 to 10. Three spatial patterns for areas of elevated risk were chosen. The choice of patterns was intended to span a spectrum ranging from a scenario with single isolated areas with elevated risks (the hardest test case for any smoothing method) to a scenario with a number of larger clusters of several contiguous areas with elevated risks (a situation with a substantial amount of heterogeneity). In all cases the elevated areas were selected in turn at random from the set of areas with the required expected counts. In the Simu 1 and Simu 3 cases, once an area was selected, a buffer of neighboring areas with background risk (excluded thereafter from the random selection) was placed around it to produce the required pattern of isolated high-risk clusters. The three generated patterns were defined as follows: Simu 1: five isolated single wards with expected counts ranging from 0.8 to 7.3 corresponding, respectively, to the 10th, 25th, 50th, 75th, and 90th percentiles of the distribution of the expected counts Simu 2: a group of contiguous wards representing 1% of the total expected counts. In effect, this chosen 1% cluster grouped four wards with fairly comparable expected counts ranging from 3.6 to 7.0, giving an average expected count per ward of 5.4 over the four wards Simu 3: a situation with high heterogeneity comprising 20 such 1% clusters that are nonoverlapping. Note that for Simu 3, the twenty 1% clusters each have a total expected count close to 17 but a large disparity in terms of numbers of constitute areas: 10 clusters had 2 or 3 areas, whereas 8 clusters had more than 8 areas, up to a maximum of 18 areas. Correspondingly, the expected counts in each of the wards in the clusters ranged from 0.3 for some wards in the 18-area cluster to 12 for the cluster with 2 areas. Simu 3 thus corresponds to a realistic situation of heterogeneity of risk where both small clusters with high expected counts, for example, typically a populated area, and large clusters each with small expected counts, for example, in rural areas, are present. This high degree of heterogeneity has to be considered when interpreting the results for the Simu 3 case where an average over all the 20 clusters is presented. Note also that contrary to the Simu 2 case, about half the background areas in Simu 3 have a neighbor that belongs to one of the 20 clusters. In each case, apart from the elevated risk areas described above, all other areas are called background areas. For each spatial pattern in Simu 1 and Simu 2, counts Yi were generated as follows: Counts in all background areas were generated from a Poisson distribution with mean Ei. For all the other areas, an elevated relative risk with magnitude θi > 1 was used and counts were simulated as Poisson variables with mean θiEi . The simulation was repeated for three values of θi (1.5, 2, and 3) and for different SFs that multiply the expected counts Ei for all areas. Thus, results reported, for example, for an area with E = 1.92, θ = 2, and SF = 4, correspond to counts generated from a Poisson with mean 15.36 (2 × 4 × 1.92). For Simu 3 a slightly different procedure for generating the cases was used to ensure that Σ Yi = Σ Ei (Appendix A). Note that for Simu 1 and Simu 2, the simulation procedure meant only that ΣYi ≈ ΣEi. This corresponds, for instance, to an epidemiologic situation where expected counts Ei are calculated based on an external reference rate. However, Simu 3 uses internal reference rates because otherwise ΣYi would have been much larger than Ei , which could distort the overall risk estimates. The multinomial procedure used in Simu 3 and detailed in Appendix A implies that, in effect, the multiplicative contrast between areas of elevated risk and background areas is still 3, 2, and 1.5, respectively, but the corresponding relative risks in each area (denoted * θ i ) relative to the internal (i.e., study region average) reference rates are now 2.1, 1.65, and 1.35 for the elevated areas and 0.7, 0.82, and 0.9 for the background areas. To allow for sampling variability, each simulation case was replicated 100 times. The results presented are averaged over these 100 replications. A total of 36 simulation scenarios were investigated, corresponding to three spatial patterns (Simu 1, 2, and 3) × three different magnitudes of elevated risk (θ = 3, 2, and 1.5) × 4 SFs for the expected counts Ei (SF = 1, 2, 4, and 10). Bayesian Disease-Mapping Models Bayesian disease-mapping models treat the relative risks {θi } as random variables and specify a distribution for them. This part of the model is crucial, as the distributional assumptions thus made allow borrowing of information across the areas. The distribution specified is referred to as the second hierarchical level of the model to distinguish it from the first-level distribution specified in equation 1 that pertains to the random sampling variability of the observed counts about their local mean. It is at this second level that the spatial dependence between the relative risks is introduced. This spatial dependence is represented by means of a prescribed neighborhood graph that defines the set of neighbors (denoted by ∂i) for each area i. The most commonly used parametric model at the second level of the hierarchy is the CAR model. This specifies the distribution of the log relative risks vi = log(θi ) by where σ2 is an unknown variance parameter, and &vmacr;i = Σj∈∂ivj/ni , where ni is the number of neighbors of area i. Thus, essentially the log relative risk in one area is influenced by the average log relative risk of its neighbors, with variability characterized by a conditional variance σ2/ni This CAR model makes a strong spatial assumption and has only one free parameter linked to the conditional variance σ 2. To increase flexibility, Besag et al. (1991) recommend modeling log (θi) as the sum of a CAR process and an unstructured exchangeable component δi ~ N(0,τ2), i = 1, …,n independently: This is the BYM model introduced by Besag et al. (1991) that we referred to earlier. We use this model as a benchmark, as its use in disease-mapping studies has been widespread since 1991. The Gaussian distribution used in the CAR specification above induces a high level of smoothness. In the same 1991 article, Besag et al. (1991) discussed an alternative specification using the heavier-tailed, double-exponential distribution rather than the Gaussian distribution in Equation 2. In effect, this is similar to performing a median-based local smoothing (or L1 norm) rather than a mean-based smoothing, thus allowing more abrupt changes in the geographical pattern of risk. We will refer to this model as L1-BYM. With any such parametric specification, the amount of smoothing performed (e.g., controlled by the parameters σ2 and τ2) is affected globally by all the areas and is not adaptive. Concerns that such parametric models could oversmooth have led several authors to develop semiparametric spatial models that replace the continuously varying spatial distribution for {θi} by discrete allocation or partition models. Such models allow discontinuities in the risk surface and make fewer distributional assumptions. Partition models that allow a variable number of clusters have been proposed by Denison and Holmes (2001) and Knorr-Held and Rasser (2000). In this article we investigate the performance of a related spatial mixture model recently proposed by Green and Richardson (2002) that we refer to as MIX. This model leads to good estimation of the relative risks compared with the BYM model for a variety of cases of discontinuities of the risk surface. The idea underlying the MIX model is to replace a continuous model for θi by a mixture model that uses a variable number of risk classes and a spatially correlated allocation model to distribute each area to a class. By averaging over a large number of possible configurations, the marginal distribution of the relative risk is nevertheless smooth. To be precise, it is assumed that θi = θZi , where Zi, i = 1, 2,…,n are allocation variables taking values in 1, 2,…,k and θj, j = 1,2,…,k are the values of the relative risks that characterize the k different components or risk classes. To have maximum flexibility, the number of components k of the mixture is treated as unknown. Given k, the allocations Zi follow a spatially correlated process, the Potts model, which has been used in image processing and other spatial applications and involves a positive interaction parameter ψ (similar to an autocorrelation parameter) that influences the degree of spatial dependence of the allocations. Specifically, the allocation of an area to a risk component will be favored probabilistically by the number of neighbors currently attributed to that component scaled multiplicatively by ψ. In this way the prior knowledge that areas close by tend to have similar risks can be reflected through the allocation structure. The interaction parameter ψ is treated as unknown and jointly estimated with the number of components and their associated risk. The MIX model can adapt to various patterns of risk and model discontinuities by creating a new risk class if there is sufficient information in the data to warrant this. Further details on the specification of the model are given in Green and Richardson (2002). Thus, in the comparison described later, we have implemented one reference model BYM and two alternative models, the parametric L1-BYM and the semiparametric MIX model. Implementation Bayesian inference is based on the joint posterior distribution of all parameters given the data. In our case this joint distribution is mathematically intractable and is simulated using the framework of Markov chain Monte Carlo techniques now commonly used in Bayesian analyses (Gilks et al. 1996). All parameters involved in the models described above, for example, the variances σ2 or τ2 or the interaction parameter ψ, are given prior distributions at a third level of the hierarchy. Implementation of the BYM and L1-BYM was carried out using the free software WinBUGS (Spiegelhalter et al. 2002). Implementation of the MIX model was carried out using a purpose-built Fortran code.