Interpreting the Posterior Distribution of the Risk
Mapping the posterior mean relative risk as discussed previously does not make full use of the output of the Bayesian analysis that provides, for each area, samples from the whole posterior distribution of the relative risk. Mapping the probability that a relative risk is greater than a specified threshold of interest has been proposed by several authors [e.g., Clayton and Bernardinelli (1992)]. We carry this further and investigate the performance of decision rules for classifying an area Ai as having an increased risk based on how much of the posterior distribution of θi exceeds a reference threshold. Figure 4 presents an example of the posterior distribution of the relative risk for such an area. The shaded proportion corresponds to the posterior probability that θ > 1. To be precise, to classify any area as having an elevated risk, we define the decision rule D(c, R0), which depends on a cutoff probability c and a reference threshold R0 such that area Ai is classified as having an elevated risk according to D(c, R0) ↔ Prob(θi > R0) > c. The appropriate rules to investigate will depend on the shape of the posterior distribution of θi for the elevated areas. We first discuss rules adapted to the autoregressive BYM and L1-BYM models. For these two models we have seen that, in general, the mean of the posterior distribution of θi in the raised-risk areas is greater than 1 but rarely above 1.5 in many of the scenarios investigated. Thus, it seems sensible to take R0 = 1 as a reference threshold. We would also expect the bulk of the posterior distribution to be shifted above 1 for these areas, suggesting that cutoff probabilities well above 0.5 are indicated. In the first instance, we choose c = 0.8. Thus, for the BYM and L1-BYM models, we report results corresponding to the decision rule D(0.8, 1). See Appendix B for a detailed justification of this choice of value of c and the performance of different decision rules.
In contrast, we have seen that the mean of the posterior distribution of θi for raised-risk areas for the MIX model is closer to the true value for many scenarios, and there is clear indication that the upper tail of this distribution can be well above 1. Furthermore, the spread of this distribution is less than the corresponding one for the BYM or L1-BYM models, as noted by Green and Richardson (2002). The choice of threshold is thus more crucial for this model, making it harder to find an appropriate decision rule. After some exploratory analyses of the simple clusters in Simu 1 and Simu 2, we found that a suitable decision rule for the MIX model in these two scenarios is to choose R0 = 1.5. For such a high threshold, one would expect that it is enough for a small fraction (e.g., 5 or 10%) of the posterior distribution of θi to be above 1.5 to indicate that an area has elevated risk. Thus, for the MIX model we report results corresponding to the decision rule D(0.05, 1.5).
Two types of errors are associated with any decision rule: a) a false-positive result, that is, declaring an area as having elevated risk when in fact its underlying true rate equals the background level (an error also traditionally referred to as type I error or lack of specificity); and b) a false-negative result, that is, declaring an area to be in the background when in fact its underlying rate is elevated (an error also referred to as type II error or lack of sensitivity). In epidemiology, performances are discussed either by reporting these error rates or their complementary quantities that measure the success rates of the decision rule. The two goals of disease mapping can be summarized as follows: not to overinterpret excesses arising by chance, that is, to minimize the false-positive rate but to detect patterns of true heterogeneity, that is, to maximize the sensitivity. We thus choose to report these two easily interpretable quantities. To be precise, for any decision rule D(c, R0), we compute
the false-positive rate (or 1 – specificity), that is, the proportion of background areas falsely declared elevated by the decision rule D(c, R0)
the sensitivity (or 1 – false-negative rate), that is, the proportion of areas generated with elevated rates correctly declared elevated by the decision rule D(c, R0).
It is clear that there must be a compromise between these two goals: a stricter rule (i.e., one with a higher value of c or R0 or both) reduces the false-positive rate but also decreases the sensitivity and thus increases the false-negative rate. Thus, to judge the performance of any decision rule, one has to consider both types of errors, not necessarily equally weighted. See Appendix B for an illustration of the implication of different weighting on the overall performance of the decision rule.
Table 4 summarizes the probabilities of false-positive results for the three models. For BYM and L1-BYM, the probabilities stay below 10% with no discernible pattern for Simu 1 and Simu 2. The error rates are clearly smaller and around 3% for Simu 3. In this scenario, the background relative risk is shifted below 1, so a decision rule with R0 = 1 is, in effect, a more stringent rule than in the case of Simu 1 and Simu 2 where the background relative risks are close to 1. For the MIX model, the false-positive rates are quite low for Simu 1 and Simu 2 and stay mostly below 3%. However, as shown in the last line of Table 4, these rates have greatly increased for the Simu 3 scenario, indicating that the decision rule D(0.05, 1.5) is no longer appropriate in this heterogeneous context. The heterogeneity creates a lot of uncertainty, with some background areas being grouped with nearby high-risk areas; consequently, the rule D(0.05, 1.5) is not stringent (specific) enough. Thus, we have investigated a series of rules D(c, 1.5) for c = 0.1–0.4 for the MIX model in the Simu 3 scenario. As c increases, the probability of false positive decreases; for D(0.4, 1.5), the probability is, on average, around 3% and always below 7% (Table 5).
Concerning the detection of truly increased relative risks and sensitivity, we first discuss the results for the BYM and L1-BYM models. As expected from the posterior means shown in Tables 1 or 2, the ability to detect true increased risk areas is limited when the increase is only of the order of 1.5. If one takes as a guideline the cases where the detection of true positive is 50% or more, Tables 6 and 7 show that this sensitivity is reached for an expected count of around 50 in the case of a single isolated area and around 20 for the 1% cluster scenario. This shows that for rare diseases and small areas, there is little chance of detecting increased risks of around 1.5 while adequately controlling the false-positive rate.
True relative risks of 2 are detected with at least 75% probability when expected counts are between 10 and 20 per area, depending on the spatial structure of the risk surface, whereas true relative risks of 3 are detected almost certainly when expected counts per area are 5 or more. There is no clear pattern of difference between the results for BYM and L1-BYM; overall, the sensitivity is similar. For Simu 3 we see that the sensitivity is lower than for the other simulation scenarios with equivalent expected counts (as were the rates of false positive in Table 4), in line with the true relative risks being closer to 1 than for Simu 1 and Simu 2. Hence, the decision rule D(0.8, 1) is more specific but less sensitive in this scenario. In situations comprising a large degree of heterogeneity akin to Simu 3, it thus might be advantageous to consider alternative rules, even if the rate of false positive is less well controlled. For example, in the case of a true relative risk (θ) = 1.65 and SF = 4, the use of rule D(0.7, 1) for the BYM model leads to a higher probability of false positive (6% compared with the 3% shown in Table 4). However, the corresponding gain in sensitivity is more than 10%, with the probability of detecting a true positive increasing to 82% compared with 71% when using the rule D(0.8, 1) (Table 5). Nevertheless, even with this relaxed and more sensitive rule, the chance of detecting a true relative risk as small as 1.3 is only around 50% if the SF is 4 (i.e., average cluster with total expected count around 80). On the other hand, true relative risks of around 2 are detected with high probability as soon as the SF is 2 (which corresponds, on average, to a cluster with total expected count of 40).
The contrasting behavior of the MIX model is again apparent in Table 8 when one compares the results for the θ =1.5 scenario with the other columns. For Simu 1 and Simu 2 the sensitivity is generally below that of the BYM model and especially when the true relative risk is 1.5; single clusters with θ = 1.5 are simply not detected. In the 1% cluster case expected counts of at least 20 (10) are necessary to be over 95% certain of detecting a true relative risk of 2 (3) (Table 8). Note that the results of the last line of Table 8 should be discounted in view of the high probability of false-positive results corresponding to this scenario (Simu 3) for the D(0.05, 1.5) rule shown in Table 4. Thus, it is apparent that for the MIX model, it is hard to calibrate a good decision rule appropriate for a variety of spatial patterns of elevated risk. In Table 5 we summarize the results corresponding to the decision rule D(0.4, 1.5), which offers a reasonable compromise between keeping the rate of false positives below 7% and an acceptable detection rate of true clusters. With this rule true relative risks of 1.65 with an SF of 2 (i.e., average cluster with total expected count slightly under 40) or larger have more than a 50% chance of being detected, and true relative risks of around 2 are nearly always detected. However, this model does not detect a true relative risk as small as 1.3.