Results Sample Subjects averaged 59.4 years of age (SD = 10.2; range = 41–83). The 51 cancer patients and 51 normal subjects were well-matched with an insignificant mean difference of 31.0 days in age between abnormal and normal cases (t test = 1.47, df = 50, p = 0.15). Table 1 lists the types and size of cancers in the test set. Identification of cases for further imaging Table 2 details individual performance in the identification of cancer cases for further imaging. Mean sensitivity increased from 50% to 81%, an improvement of 63% in the number of cancer cases identified (25.4 vs. 41.4, F(1, 1,161) = 165.95, p < 0.001). Specificity (60–58%; 30.7 vs. 29.1, F(1, 1,161) = 1.11, p = 0.29), PPV (mean = 47–67%; F(1, 1,297) = 0.02, p = 0.89), and NPV (mean = 65–75%; F(1, 933) = 0.61, p = 0.44) did not change significantly with the addition of AWBU. Table 2 Reader performance categorized by imaging technique (n = 102, 51 positive cases) Reader #a True positives True negatives False positives False negatives M M+A M M+A M M+A M M+A 1 28 45 32 21 27 30 15 6 2 28 45 25 21 33 30 16 6 3 25 44 30 20 32 31 15 7 4 26 43 20 28 43 23 13 8 5 26 43 32 30 28 21 16 8 6 32 43 20 37 43 14 7 8 7 26 41 25 27 33 24 18 10 8 23 40 35 31 17 20 27 11 9 16 40 43 25 21 26 22 11 10 27 39 34 36 27 15 14 12 11 26 37 35 41 24 10 17 14 12 22 37 37 34 20 17 23 14 Mean # of cases 25.4 41.4 30.7 29.3 29.0 21.8 16.9 9.6 % of 51 cases 49.8% 81.2% 60.2% 57.5% 56.9% 42.7% 33.1% 18.8% Mean # of added cases 16.0 −1.4 −7.2 −7.3 Mean % of 51 cases added 31.4% −2.7% −14.1% −14.3% % improvement compared with M alone 63% −4% −25% −43% M mammography, M+A mammography plus automated whole-breast ultrasound (AWBU) aReader # presented by best to worst performance based on sensitivity on M+A Individual success varied from 11 to 24 more cancer cases detected by AWBU. As a percentage of the cancers detected with mammography the range in improvement was 42–150%. Not only did all readers find more cancers individually, but all found 16–29% more cancers than the best mammography reader did with mammography alone. For the best performing mammography reader the cancer detections added by AWBU was predictably lower, as more cancers had already been identified with mammography. For the poorest performer on mammography, the addition of AWBU resulted in a 150% improvement, bringing his overall cancer detection rate near the average for the group. Table 3 shows the average reader performance by tumor size for the 45 image sets of patients with invasive cancer. The greatest percentage increase was for cancers 1 cm and under. This is due largely to the relatively poor performance at detecting these cancers with mammography, where only 26% of cases were correctly identified. By adding AWBU, the detection of these small cancers was increased to 65%. Table 3 Reader performance with 45 invasive cases ≤1 cm >1 to ≤2 cm >2 cm Total # % # % # % # % # of cancers 17 100 22 100 6 100 45 100 Mean cancers by mammography 4.4 26 13.5 61 3.0 50 20.9 46 Mean additional cancers by AWBU 6.7 39 6.6 30 2.0 33 15.3 34 Mean total cases detected 11.1 65 20.1 91 5.0 83 36.2 80 % improvement compared to mammography alone 151% 49% 67% 73% For cases with more than one invasive tumor, the larger of the two was used. For interval cancers after imaging, size is the greatest diameter of the tumor seen retrospectively on the AWBU or mammogram, otherwise the diameter is that reported by pathological diagnosis Accuracy The ROC area was greater for mammography plus AWBU for both BIRADS (0.808 versus 0.701; F(1, 123) = 14.79, p < 0.001) and likelihood scores (0.810 versus 0.703; F(1, 85) = 17.88, p < 0.001) as estimated by multi-reader multi-case analyses. This is highlighted in Fig. 1 by ROC curves that are generated by averaging the results of separate ROC analyses for each reader. The BIRADS and likelihood AUC curves for mammography and mammography plus AWBU in both cases almost superimpose when confidence in malignancy by mammography is high, but when confidence in malignancy by mammography is low, as in the lower portions of the graphs, the curves in both cases diverge significantly. In both cases the mammography plus AWBU approaches the y-axis indicating better cancer recognition. Fig. 1 Receiver operating characteristic curves averaged across 12 readers for mammography alone (circles and dashed line) and mammography plus AWBU (triangles and solid line) Figure 2 shows the areas under the ROC curves for each reader and for the average of all readers as estimated by multi-reader multi-case analyses. These individual line graphs mirror the improvement in reader performance shown in Table 2. Fig. 2 Changes in areas under the receiver operating characteristic curve(s) for each reader (hollow circles) and averaged across 12 readers (solid circles) Similar to ROC areas, the figures of merit (FOM) were higher for mammography plus AWBU across all readers, compared with mammography alone using both the BIRADS scores (0.786 versus 0.613; F(1, 270) = 34.1, p < 0.001) and DMIST likelihood scores (0.791 versus 0.614; F(1, 238) = 37.9, p < 0.001) as accuracy indices. Confidence in identification of cases for further imaging Readers reviewing cancer cases were more confident in correctly identifying cases for further imaging, i.e., TP reading, using mammography plus AWBU compared with mammography alone. On average, both BIRADS scores (mean = 4.8 versus 4.2, F(1, 740) = 81.91, p < 0.001) and DMIST likelihood scores (mean = 4.8 versus 4.1, F(1, 740) = 82.21, p < 0.001) were higher. Interpretation times Average reading time per study for the 102 AWBUs was 7 min 58 s (7:58) varying from 5:54 to 12:51. The difference in review time was unrelated to the number of cancers identified by each reader (correlation = 0.02, p = 0.96).