Discussion Results of this study show that readers are able to improve detection performance when they use CAD for interpretation of mass lesions in an interactive way. The beneficial effect of CAD can be attributed fully to improvement of interpretation, because traditional CAD prompts to avoid perceptual oversights were not shown. The effectiveness was remarkable given that the readers in this study used the interactive system for the first time and had limited training. It is noted that in a previous experiment using a similar observer study design and dataset no significant improvement with traditional CAD prompting was found when readers had limited training [23]. This suggest that for mass detection interactive CAD may be more effective than traditional CAD. This is in accordance with studies suggesting that interpretation errors are more common than perception errors [10, 11]. Results obtained in this study show that readers are able to exploit the predictive power of CAD to improve their decisions. This may come as a surprise, because due to the large number of false positives it is often believed that the performance of CAD for masses is much less than that of an experienced reader. It is noted, however, that in a previous study it was shown that the performance of the CAD system was comparable to that of experienced readers when analysis was restricted to locations identified by the radiologists [9]. This is what counts in this study, because CAD results were only shown on regions probed by the readers. Interestingly, malignancy ratings of CAD were also used previously in the large CADET II trial [1] conducted in the UK, where the size of the CAD marks was used to represent the computed likelihood of cancer. Positive results of this trial could also be related to using CAD as a decision support. The potential gain of using CAD for decision making was also demonstrated in a previous study, in which CAD information was independently combined with reader scores [10]. Results in this study confirm that by independent combination of reader scores with CAD, performance can be improved (Table 2). On average, we found that the improvement in performance was larger when readers used CAD themselves than when CAD was independently combined with their scores. However, the difference was not significant. Interestingly, for one of the radiologists (number 8) detection performance decreased when using interactive CAD, whereas performance increased with independent combination. This may well be due to insufficient training. Readers need to learn how to weight CAD information in their decisions. Table 3 shows the average reading times per reader for the sessions with and without CAD. We found that for the non-radiologists the average reading time was slightly reduced when they used CAD. For the radiologists the reading time increased less than 3 s on average with CAD. It seems that interactive use of CAD does not cost much extra time, because the information is presented at the moment the reader asks for it. In the experiments we used a threshold to adjust the average number of CAD regions per image that could be activated. On average, there were two false positives per normal image. In clinical practice the operating point of prompting systems for masses in mammography are often set to a level near 0.5 false positives per image. We used more regions, because it was thought that in the interactive system more false positives would be tolerable. Many of them are never activated, and if they are activated they are perceived very differently than traditional prompts. The radiologists queried far fewer false-positive CAD regions than the non-radiologists which may indicate they are more confident in their reading. Interactive CAD is intended to aid the reader in decision making and will not help to avoid perceptual oversights. The success of the interactive approach may be explained by assuming that perceptual oversights do not occur frequently. In our study this appeared to be the case. On average only 5 (12.2%) of the true-positive CAD regions were not probed by the reader. Thus, in the reader study at most 12.2% of the cancers were overlooked, while none of them were reported in the original screening. Results also show that on average 274.2 (50.2%) false-positive CAD regions were not activated, limiting the number of false positives to which the readers are exposed. It is noted that the system can easily be extended by displaying the most suspicious, non-queried CAD regions as traditional prompts after the reading is completed. In general, the response of the radiologists to the interactive CAD system was very positive and they preferred it to conventional CAD prompting systems. An advantage of the proposed system is that obvious false positives of the CAD system are rarely shown, as the readers do not probe these regions. This may increase confidence in CAD. In our study the reading conditions were less optimal than in screening practice, because a 4-megapixel color display was, instead of two 5-megapixel grayscale monitors commonly used in mammography. This might have a negative effect on the detection performance, especially for detecting microcalcifications. As microcalcification cases were not included in our study we do not believe that image quality influenced our study outcome. This is supported by a study from Kamitani et al. [24] in which no significant differences were found between the observer performances for detecting breast cancer masses when performing soft-copy reading on 3- or 5-megapixel LCD monitors. Another limitation of our study is the absence of CC views in most cases. In the Dutch screening program, two-view mammography is not always performed at subsequent screens. Obviously, absence of additional CC views might affect the radiologists’ detection performance. However, readers in our study are used to interpreting single-view mammography. We would like to note that both limitations did not affect the difference in detection performance described in this paper, because the conditions were similar in the sessions with CAD and the sessions without CAD. Participants in this study were not reading under normal screening conditions. It may be that their alertness, concentration, and decision thresholds were affected by the knowledge that this study was a controlled laboratory experiment in which their decisions would be recorded and used in a study, and that the balance between cancer and normal cases was artificial. Because their assessments of the mammographic cases in this retrospective observer study would not affect patient care, their decisions could be different from those in an actual clinical setting. This effect has been described, among others, by Gur et al. [25]. However, the reading conditions in the with-CAD and without-CAD sessions were similar, and therefore the observed effect on detection performance can be attributed solely to the use of the interactive CAD system. Because we performed LROC analysis, decision thresholds did not affect study results. As in many other studies, the sample was heavily weighted towards cancer cases. Not doing so would make this form of research extremely expensive. The effect on sensitivity and recall rates of radiologists using this interactive CAD system for real-life screening can only be determined by a large randomized controlled trial in which radiologists use this system during routine use and for a substantial period [17]. Nevertheless, a laboratory study is generally a first step to demonstrate the usefulness of a CAD concept before a large trial is performed. The readers participating in this study had different backgrounds and experience. We expect that when readers gain more experience with the system they will learn how optimize use of it. In addition, readers need to find out how to weight CAD information in their decisions, and we expect them to improve this when they gain more understanding of the strengths and weaknesses of the CAD software.