2_test

PMC:2948156 / 6974-7517 JSON TXT

Breast cancer detection: radiologists’ performance using mammography with and without automated whole-breast ultrasound Abstract Objective Radiologist reader performance for breast cancer detection using mammography plus automated whole-breast ultrasound (AWBU) was compared with mammography alone. Methods Screenings for non-palpable breast malignancies in women with radiographically dense breasts with contemporaneous mammograms and AWBU were reviewed by 12 radiologists blinded to the diagnoses; half the studies were abnormal. Readers first reviewed the 102 mammograms. The American College of Radiology (ACR) Breast Imaging Reporting and Data System (BIRADS) and Digital Mammographic Imaging Screening Trial (DMIST) likelihood ratings were recorded with location information for identified abnormalities. Readers then reviewed the mammograms and AWBU with knowledge of previous mammogram-only evaluation. We compared reader performance across screening techniques using absolute callback, areas under the curve (AUC), and figure of merit (FOM). Results True positivity of cancer detection increased 63%, with only a 4% decrease in true negativity. Reader-averaged AUC was higher for mammography plus AWBU compared with mammography alone by BIRADS (0.808 versus 0.701) and likelihood scores (0.810 versus 0.703). Similarly, FOM was higher for mammography plus AWBU compared with mammography alone by BIRADS (0.786 versus 0.613) and likelihood scores (0.791 versus 0.614). Conclusion Adding AWBU to mammography improved callback rates, accuracy of breast cancer detection, and confidence in callbacks for dense-breasted women. Introduction Screening with mammography has been shown to reduce mortality from breast cancer [1, 2]. However, the sensitivity to non-palpable cancer of screening mammography in radiographically dense-breasted women is as low as 30–48% [3]. Extremely dense-breasted women have an 18-fold increase in interval cancer found between annual mammograms, compared with fatty-breasted women [4]. Magnetic resonance imaging (MRI) has been demonstrated to be, and recommended as, an efficacious adjunct to mammography for very high-risk, dense-breasted women [5, 6]. It has not been recommended for all dense-breasted women. Three limitations to MRI screening for breast cancer are cost, intravenous injection of gadolinium-containing contrast medium, and lower specificity of MRI compared with mammography [7] with increased false positive callbacks and biopsies. For radiographically dense-breasted women, whole-breast ultrasound as an adjunct to screening mammography has shown promise. Berg et al. increased cancer discovery 42% by adding handheld whole-breast ultrasound performed by radiologists [8]. Kelly et al. used an automated whole-breast ultrasound (AWBU) device capturing a ciné loop of 2D breast images [9]. This blinded study of mostly dense-breasted women showed a 100% increase in cancer detection, and a 200% increase in discovery of invasive cancers 1 cm or less, compared with mammograms alone. These ciné loops were recorded and are available for reader trials similar to those performed for comparison of screening mammography with and without computer-aided detection (CAD) [10]. For AWBU to be a useful adjunct to screening mammography for dense-breasted women, interpretation of examinations must be shown as beneficial, when performed by community radiologists. This paper evaluates the performance of such radiologists in detection before and after AWBU is added to a test set of screening mammograms of radiographically dense-breasted women. Materials and methods Imaging studies Mammograms Standard cranio-caudal (CC) and medio-lateral oblique (MLO) views of each breast were available for all cases. If implants were present, displacement views were included. Original analog films (66 cases) or prints of digital films (36 cases) were used for review. All cases used in the study provided informed consent, and the protocol was approved by the Institutional Review Boards at each hospital, or The Western Institutional Review Board [9]. AWBUs Automated whole-breast ultrasound (AWBU) is a computer-based system for performing, recording, and reading whole-breast ultrasound examinations similar in appearance to 2D freehand imaging (SonoCine, Reno, NV). Images were collected with 7- to 12-MHz multi-frequency transducers. The transducer is attached to a computer-guided mechanical arm that acquires images in CC rows overlapping 7 to 10 mm insuring complete coverage of both breasts. Images are collected approximately 0.8 mm apart. The AWBU software creates a ciné loop for review of approximately 3,000 images, simulating real-time imaging. The Windows®-based reading station uses a high-definition 1,600 × 1,200 monitor and special software to increase cancers’ conspicuity. The AWBU procedure was described more fully in a previous publication [9]. Readers Twelve board-certified breast radiologists who use breast ultrasound in their practices were recruited as readers for the trial. Remuneration for 3.5 days was at the prevailing US rate. Eleven readers were from the USA and one from Great Britain. Eleven had no experience with AWBU. One had reviewed limited AWBUs 8 years earlier during the developmental phase of the technology. No reader had foreknowledge of the positivity rate of the test set. Each reader had a 4-h tutorial with one author (KK) explaining the AWBU reading station operation. The readers reviewed and discussed approximately 12 AWBUs with known cancers, not part of the test set. They were not in the test set because either palpable findings were present or there were no concurrent mammograms. Nothing concerning the study was discussed, other than the use of the data form (Appendix A) and the number of cases to be reviewed. Procedure A set of 51 malignant cases (3 cases with bilateral cancers), including invasive and in situ cancer were collected for the trial (Table 1). Screening mammography and AWBU were performed within 2 months of each other. No cancers were associated with prospective palpable findings or symptoms suggestive of cancer. The mammograms were heterogeneously dense or extremely dense breast tissue (BIRADS 3 or 4) on the original reports. All imaging was performed from 2003 to 2008. The data set included all cases meeting the above criteria in the AWBU archives. Twelve cancers were included that were not prospectively reported on either imaging technique, but are visible in retrospect. Four of these became palpable within 1 year, three in more than 1 year; five were discovered in a subsequent screening round, three by AWBU only, and two by both AWBU and mammography. Table 1 Pathological diagnosis of 51 positive cases (54 cancers) ≤1 cm >1 to ≤2 cm >2 cm Total DCIS 2 0 4 6 IDC 17 19 5 41 ILC 3 2 1 6 Mixed IDC and ILC 0 1 0 1 Total 22 22 10 54 DCIS ductal carcinoma in situ, IDC invasive ductal carcinoma, ILC invasive lobular carcinoma Fifty-one normal cases performed from 2003 to 2008 were matched with each of the positive cases for the following factors: Facility Digital or analog mammogram Ultrasound machine model American breast cup size (A–DD) ACR BIRADS breast density Implant (saline or silicone) and location (pre- or retropectoral) Breast cancer history Age The normal case matching factors 1 to 7 closest to the age of the positive case was matched as the normal partner case. The mean difference in age between the positive case and its matched normal was 31 days. Testing occurred on a subsequent date at each reader’s own site with only the reader and a research assistant (monitor) present. The same monitor was present for all readers. She had no knowledge of the test set makeup, had no mammography or ultrasound training, reviewed the test data forms in real-time for completeness, and transferred the data to the study database. At each test site 102 mammograms were placed on a film alternator in random order, generated once, and used for all readers. Excluding breaks, the test subject’s time for review was recorded. The upper half of a data form (Appendix A) was completed for each case, checked by the monitor, and entered into the database. Four questions were asked: Would you request further evaluation based on this mammogram, or recommend routine screening? Where is/are the most suspicious (up to 3) lesions, identifying their location by breast and clock face position? What would be your prediction of the final ACR BIRADS after any needed diagnostic workup was completed? What is the reader’s confidence level that the woman has or does not have cancer (DMIST likelihood scale)? The American College of Radiology Breast Imaging Reporting and Data System (BIRADS) is a seven-point scale (0 = incomplete, needs additional assessment; 1 = normal; 2 = benign; 3 = probably benign; 4a = possible malignancy; 4b = probable malignancy, or 5 = highly suggestive of malignancy) designed to categorize the results of mammography and other imaging studies [3, 11]. Scores from 1 to 5 were allowed. Similar to the DMIST [12], readers were asked to predict a BIRADS score before any diagnostic workup. The DMIST likelihood rating is a seven-point scale to express the confidence of the diagnosis, and ranges from definitely not cancer to definitely cancer [3, 11, 12]. A correct location response was recorded for an hour position marked within the half of the breast centered at the middle of the cancer. A true positive (TP) was recorded for mammography for any malignant case if ‘callback’ was marked for mammography and any correct tumor location was identified. A TP was recorded for mammography plus AWBU if ‘callback’ was marked on either or both halves of the form in the malignant cases, with at least one correct location identified. Thus, a correctly identified TP found with mammography would remain TP even were it not identified again on AWBU. AWBU findings could change the outcome to TP if a cancer was correctly identified with AWBU , but missed with mammography. We evaluated readings on a per-case (i.e., per-patient) basis rather than a per-score basis because screening serves as a “go no-go” gatekeeper for subsequent workup [13]. A true negative (TN) was recorded for mammography for any normal case if ‘callback’ was not marked for mammography. A TN was recorded for mammography plus AWBU for any normal case if ‘callback’ was not marked on the second half of the form. This allowed the reader to reverse a callback for an asymmetric density seen mammographically but cleared by the AWBU as no suspicion. To validate TN cases, all cases were followed for at least 1 year or more. A false positive (FP) was recorded for mammography in two situations: Callback was marked for mammography in a normal case. Callback was marked for mammography in a cancer case, but none of the marked locations corresponded to the cancer. An FP was recorded for mammography plus AWBU in the same two situations as above when callback was marked for AWBU. A false negative (FN) was recorded for mammography when callback was not marked in a cancer case in the mammography portion of the form. Similarly, an FN was recorded for mammography plus AWBU when callback was not marked in a cancer case in either portion of the form. The 102 ABWUs were reviewed by readers on a review station brought by the research assistant acting as a monitor. They worked approximately 8 h daily for 3 days, with breaks at the readers’ choosing. The readers were given the corresponding mammograms with each AWBU and completed the second half of the data sheet with the knowledge from the mammogram-only evaluation available. The same questions were answered for AWBU and the reading time of each AWBU recorded. Statistical analysis Analyses were conducted in a multi-reader multi-case (MRMC) framework where each reader screened all cases and each case contained both screening techniques. The MRMC design efficiently reduces the number of readers and cases needed to detect improvements across techniques [14]. Analyses appropriate for an MRMC design were chosen both to correctly model correlations between readings on the same case across readers and to estimate correctly standard errors. Unless specified otherwise, analyses were conducted in SAS software version 9.2 (SAS Institute Inc., Cary, NC, USA). We present F statistics, shown as F(numerator degrees of freedom, denominator degrees of freedom), and p values for comparisons between mammography plus AWBU and mammography alone. Cases identified for further imaging were assessed by four binary measures: sensitivity = number of TP/number of cancer cases; specificity = number of TN/number of non-cancer cases; positive predictive value (PPV) = number of cancer cases/(number of TP + FP cases); and negative predictive value (NPV) = number of non-cancer cases/(number of FN + TN). Random-effect logistic regression models were used to test whether each binary measure differed significantly between mammography plus AWBU versus mammography alone. To account for the MRMC framework, we included random effects for readers and cases similar to the DBM model [15]. Accuracy was assessed through BIRADS ratings and DMIST likelihood scores, comparing two commonly used indicators of accuracy between mammography plus AWBU versus mammography alone: areas under the curve (AUC) and figures of merit (FOM). The FOM incorporates information from each reader on the region of suspected malignancy, as well as their confidence level in the finding, incorporated in an AUC. Because it includes both confidence level and location accuracy, the FOM is more powerful than AUC in detecting differences between techniques. We include both analyses, as described below: Areas under the curve (AUC) were estimated in DBM MRMC 2.1 [15] (available from http://perception.radiology.uiowa.edu) using the trapezoidal/Wilcoxon method. Readers and patients were treated as random factors. We also present reader-averaged receiver operating characteristic (ROC) curves; average values were calculated from separate ROC analyses conducted on each reader in the PROC LOGISTIC procedure. Figures of merit (FOM) were estimated by using jackknife alternative free-response receiver operating characteristic methodology as implemented in JAFROC Version 1.0 [16] (available from http://www.devchakraborty.com). The FOM is defined as the probability that a cancer on an abnormal image is scored higher than a falsely marked location on a normal image and is analogous to the ROC curve; a higher FOM indicates improvement in reader performance. Confidence in identification of cases for further imaging We used linear regression, comparing BIRADS ratings and DMIST likelihood scores across the two screening techniques among TP cases; mean ratings and scores are estimated by the regression for each screening technique. To account for the MRMC framework, we included random effects similar to the DBM model [15]; the model included a fixed effect for technique, classified as mammography plus AWBU or mammography alone, and random effects for readers and cases. Results Sample Subjects averaged 59.4 years of age (SD = 10.2; range = 41–83). The 51 cancer patients and 51 normal subjects were well-matched with an insignificant mean difference of 31.0 days in age between abnormal and normal cases (t test = 1.47, df = 50, p = 0.15). Table 1 lists the types and size of cancers in the test set. Identification of cases for further imaging Table 2 details individual performance in the identification of cancer cases for further imaging. Mean sensitivity increased from 50% to 81%, an improvement of 63% in the number of cancer cases identified (25.4 vs. 41.4, F(1, 1,161) = 165.95, p < 0.001). Specificity (60–58%; 30.7 vs. 29.1, F(1, 1,161) = 1.11, p = 0.29), PPV (mean = 47–67%; F(1, 1,297) = 0.02, p = 0.89), and NPV (mean = 65–75%; F(1, 933) = 0.61, p = 0.44) did not change significantly with the addition of AWBU. Table 2 Reader performance categorized by imaging technique (n = 102, 51 positive cases) Reader #a True positives True negatives False positives False negatives M M+A M M+A M M+A M M+A 1 28 45 32 21 27 30 15 6 2 28 45 25 21 33 30 16 6 3 25 44 30 20 32 31 15 7 4 26 43 20 28 43 23 13 8 5 26 43 32 30 28 21 16 8 6 32 43 20 37 43 14 7 8 7 26 41 25 27 33 24 18 10 8 23 40 35 31 17 20 27 11 9 16 40 43 25 21 26 22 11 10 27 39 34 36 27 15 14 12 11 26 37 35 41 24 10 17 14 12 22 37 37 34 20 17 23 14 Mean # of cases 25.4 41.4 30.7 29.3 29.0 21.8 16.9 9.6 % of 51 cases 49.8% 81.2% 60.2% 57.5% 56.9% 42.7% 33.1% 18.8% Mean # of added cases 16.0 −1.4 −7.2 −7.3 Mean % of 51 cases added 31.4% −2.7% −14.1% −14.3% % improvement compared with M alone 63% −4% −25% −43% M mammography, M+A mammography plus automated whole-breast ultrasound (AWBU) aReader # presented by best to worst performance based on sensitivity on M+A Individual success varied from 11 to 24 more cancer cases detected by AWBU. As a percentage of the cancers detected with mammography the range in improvement was 42–150%. Not only did all readers find more cancers individually, but all found 16–29% more cancers than the best mammography reader did with mammography alone. For the best performing mammography reader the cancer detections added by AWBU was predictably lower, as more cancers had already been identified with mammography. For the poorest performer on mammography, the addition of AWBU resulted in a 150% improvement, bringing his overall cancer detection rate near the average for the group. Table 3 shows the average reader performance by tumor size for the 45 image sets of patients with invasive cancer. The greatest percentage increase was for cancers 1 cm and under. This is due largely to the relatively poor performance at detecting these cancers with mammography, where only 26% of cases were correctly identified. By adding AWBU, the detection of these small cancers was increased to 65%. Table 3 Reader performance with 45 invasive cases ≤1 cm >1 to ≤2 cm >2 cm Total # % # % # % # % # of cancers 17 100 22 100 6 100 45 100 Mean cancers by mammography 4.4 26 13.5 61 3.0 50 20.9 46 Mean additional cancers by AWBU 6.7 39 6.6 30 2.0 33 15.3 34 Mean total cases detected 11.1 65 20.1 91 5.0 83 36.2 80 % improvement compared to mammography alone 151% 49% 67% 73% For cases with more than one invasive tumor, the larger of the two was used. For interval cancers after imaging, size is the greatest diameter of the tumor seen retrospectively on the AWBU or mammogram, otherwise the diameter is that reported by pathological diagnosis Accuracy The ROC area was greater for mammography plus AWBU for both BIRADS (0.808 versus 0.701; F(1, 123) = 14.79, p < 0.001) and likelihood scores (0.810 versus 0.703; F(1, 85) = 17.88, p < 0.001) as estimated by multi-reader multi-case analyses. This is highlighted in Fig. 1 by ROC curves that are generated by averaging the results of separate ROC analyses for each reader. The BIRADS and likelihood AUC curves for mammography and mammography plus AWBU in both cases almost superimpose when confidence in malignancy by mammography is high, but when confidence in malignancy by mammography is low, as in the lower portions of the graphs, the curves in both cases diverge significantly. In both cases the mammography plus AWBU approaches the y-axis indicating better cancer recognition. Fig. 1 Receiver operating characteristic curves averaged across 12 readers for mammography alone (circles and dashed line) and mammography plus AWBU (triangles and solid line) Figure 2 shows the areas under the ROC curves for each reader and for the average of all readers as estimated by multi-reader multi-case analyses. These individual line graphs mirror the improvement in reader performance shown in Table 2. Fig. 2 Changes in areas under the receiver operating characteristic curve(s) for each reader (hollow circles) and averaged across 12 readers (solid circles) Similar to ROC areas, the figures of merit (FOM) were higher for mammography plus AWBU across all readers, compared with mammography alone using both the BIRADS scores (0.786 versus 0.613; F(1, 270) = 34.1, p < 0.001) and DMIST likelihood scores (0.791 versus 0.614; F(1, 238) = 37.9, p < 0.001) as accuracy indices. Confidence in identification of cases for further imaging Readers reviewing cancer cases were more confident in correctly identifying cases for further imaging, i.e., TP reading, using mammography plus AWBU compared with mammography alone. On average, both BIRADS scores (mean = 4.8 versus 4.2, F(1, 740) = 81.91, p < 0.001) and DMIST likelihood scores (mean = 4.8 versus 4.1, F(1, 740) = 82.21, p < 0.001) were higher. Interpretation times Average reading time per study for the 102 AWBUs was 7 min 58 s (7:58) varying from 5:54 to 12:51. The difference in review time was unrelated to the number of cancers identified by each reader (correlation = 0.02, p = 0.96). Discussion Significant improvement in identification of asymptomatic cancers occurred for all readers in this study. This is shown by a 63% increase in callbacks of cancer cases with only a 4% decrease in correct identification of the true negative cases. The confidence of the diagnoses of the 102 cases with predictive BIRADS and DMIST likelihood scales was confirmed by using AUC and FOM methodology. With a short training period experienced radiologists using 2D AWBU significantly improve their ability to diagnose cancer in dense-breasted women. This type of AWBU is similar in appearance to real-time ultrasound images. The slower transducer speed enforced by the AWBU decreases inter-image distance, allowing the reader more time to identify small masses. At a review speed of 10 images per second the observer has 0.5 s to identify a 5-mm mass. A high-resolution computer screen, along with a post-processing technique to expand the grayscale at the black end of the spectrum, results in visually sharper margins and more contrast of masses against the background tissue. These factors are designed to make recognition of invasive cancers easier and more reliable. This automated process for breast ultrasound eliminates operator variability, provides greater consistency, and ensures reproducibility of quality images. Study radiologists increased discovery of T1a and T1b invasive cancers 150% over mammography alone (Table 3). The average review time per AWBU study was about 11 min shorter than the 19 min for radiologists in the ACRIN 6666 trial of handheld screening ultrasound [8]. As half of our test set subjects had cancers, it would be expected that the average review time we observed for AWBU would be significantly longer than in a typical screening population with mostly normal studies. Our study had a number of inherent weaknesses. Although the test set was confidential, the readers probably quickly realized that it was enriched. They may have been extraordinarily vigilant resulting in increases in both TPs and FPs. A false increase in TPs would occur if all the correctly identified cancers were not subsequently confirmed with biopsy. Also, analysis was performed on a case basis in the three patients in whom cancers were present in both breasts; it was assumed if one of the cancers was identified, the cancer in the other breast would be found by the subsequent workup. This assumption might have falsely raised the TPs and reduced the FNs . In addition, we did not have a comparison with hand-held ultrasound. Any of the following factors could have decreased the readers’ accuracy with AWBU (decreased true positives and negatives, and increased false positives and negatives) compared with a normal screening situation. Fatigue—Readers reviewed an average of 34 AWBUs daily. Inexperience with ultrasound screening—Some readers do not perform screening ultrasound. Limited experience with AWBU—This was the first exposure to AWBU for 11 of the readers. Unfamiliarity with some ultrasound formats—Images from many different manufacturers were used. In spite of these hindrances our observations clearly show that radiologists improve detection of cancers, especially small invasive ones, by adding AWBU to mammography findings. Conclusion This article demonstrates that experienced breast radiologists can learn to interpret 2D AWBU quickly. Radiologists will significantly improve their cancer detection rates in dense-breasted women by adding AWBU to mammography. This procedure has the potential for both standardizing the performance of whole-breast ultrasound and shortening the time required for radiologists. Appendix A We thank the radiologists participating in this trial: Catherine Babcook, Debra Copit, Ruth English, Jon Fish, Thalia B. Forte, Michael N. Linver, Laleh Lourie, Carrie Morrisson, Susan Roux, Thomas Stavros, Lakshmi Tegulapalle, and Richard Vanesian. Financial disclosures Dr. Kelly is the Majority Stockholder of Sonocine, Inc. Dr. Dean owns stock in Sonocine, Inc. Neither author has received any form of payment from the company. Drs. Lee and Comulada served as statistical consultants for the study and have no conflict of interest relevant to Sonocine, Inc. The study was funded by Sonocine, Inc. Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Document structure show

article-title	Breast cancer detection: radiologists’ performance using mammography with and without automated whole-breast ultrasound
abstract	Objective Radiologist reader performance for breast cancer detection using mammography plus automated whole-breast ultrasound (AWBU) was compared with mammography alone. Methods Screenings for non-palpable breast malignancies in women with radiographically dense breasts with contemporaneous mammograms and AWBU were reviewed by 12 radiologists blinded to the diagnoses; half the studies were abnormal. Readers first reviewed the 102 mammograms. The American College of Radiology (ACR) Breast Imaging Reporting and Data System (BIRADS) and Digital Mammographic Imaging Screening Trial (DMIST) likelihood ratings were recorded with location information for identified abnormalities. Readers then reviewed the mammograms and AWBU with knowledge of previous mammogram-only evaluation. We compared reader performance across screening techniques using absolute callback, areas under the curve (AUC), and figure of merit (FOM). Results True positivity of cancer detection increased 63%, with only a 4% decrease in true negativity. Reader-averaged AUC was higher for mammography plus AWBU compared with mammography alone by BIRADS (0.808 versus 0.701) and likelihood scores (0.810 versus 0.703). Similarly, FOM was higher for mammography plus AWBU compared with mammography alone by BIRADS (0.786 versus 0.613) and likelihood scores (0.791 versus 0.614). Conclusion Adding AWBU to mammography improved callback rates, accuracy of breast cancer detection, and confidence in callbacks for dense-breasted women.
sec	Objective Radiologist reader performance for breast cancer detection using mammography plus automated whole-breast ultrasound (AWBU) was compared with mammography alone.
title	Objective
p	Radiologist reader performance for breast cancer detection using mammography plus automated whole-breast ultrasound (AWBU) was compared with mammography alone.
sec	Methods Screenings for non-palpable breast malignancies in women with radiographically dense breasts with contemporaneous mammograms and AWBU were reviewed by 12 radiologists blinded to the diagnoses; half the studies were abnormal. Readers first reviewed the 102 mammograms. The American College of Radiology (ACR) Breast Imaging Reporting and Data System (BIRADS) and Digital Mammographic Imaging Screening Trial (DMIST) likelihood ratings were recorded with location information for identified abnormalities. Readers then reviewed the mammograms and AWBU with knowledge of previous mammogram-only evaluation. We compared reader performance across screening techniques using absolute callback, areas under the curve (AUC), and figure of merit (FOM).
title	Methods
p	Screenings for non-palpable breast malignancies in women with radiographically dense breasts with contemporaneous mammograms and AWBU were reviewed by 12 radiologists blinded to the diagnoses; half the studies were abnormal. Readers first reviewed the 102 mammograms. The American College of Radiology (ACR) Breast Imaging Reporting and Data System (BIRADS) and Digital Mammographic Imaging Screening Trial (DMIST) likelihood ratings were recorded with location information for identified abnormalities. Readers then reviewed the mammograms and AWBU with knowledge of previous mammogram-only evaluation. We compared reader performance across screening techniques using absolute callback, areas under the curve (AUC), and figure of merit (FOM).
sec	Results True positivity of cancer detection increased 63%, with only a 4% decrease in true negativity. Reader-averaged AUC was higher for mammography plus AWBU compared with mammography alone by BIRADS (0.808 versus 0.701) and likelihood scores (0.810 versus 0.703). Similarly, FOM was higher for mammography plus AWBU compared with mammography alone by BIRADS (0.786 versus 0.613) and likelihood scores (0.791 versus 0.614).
title	Results
p	True positivity of cancer detection increased 63%, with only a 4% decrease in true negativity. Reader-averaged AUC was higher for mammography plus AWBU compared with mammography alone by BIRADS (0.808 versus 0.701) and likelihood scores (0.810 versus 0.703). Similarly, FOM was higher for mammography plus AWBU compared with mammography alone by BIRADS (0.786 versus 0.613) and likelihood scores (0.791 versus 0.614).
sec	Conclusion Adding AWBU to mammography improved callback rates, accuracy of breast cancer detection, and confidence in callbacks for dense-breasted women.
title	Conclusion
p	Adding AWBU to mammography improved callback rates, accuracy of breast cancer detection, and confidence in callbacks for dense-breasted women.
body	Introduction Screening with mammography has been shown to reduce mortality from breast cancer [1, 2]. However, the sensitivity to non-palpable cancer of screening mammography in radiographically dense-breasted women is as low as 30–48% [3]. Extremely dense-breasted women have an 18-fold increase in interval cancer found between annual mammograms, compared with fatty-breasted women [4]. Magnetic resonance imaging (MRI) has been demonstrated to be, and recommended as, an efficacious adjunct to mammography for very high-risk, dense-breasted women [5, 6]. It has not been recommended for all dense-breasted women. Three limitations to MRI screening for breast cancer are cost, intravenous injection of gadolinium-containing contrast medium, and lower specificity of MRI compared with mammography [7] with increased false positive callbacks and biopsies. For radiographically dense-breasted women, whole-breast ultrasound as an adjunct to screening mammography has shown promise. Berg et al. increased cancer discovery 42% by adding handheld whole-breast ultrasound performed by radiologists [8]. Kelly et al. used an automated whole-breast ultrasound (AWBU) device capturing a ciné loop of 2D breast images [9]. This blinded study of mostly dense-breasted women showed a 100% increase in cancer detection, and a 200% increase in discovery of invasive cancers 1 cm or less, compared with mammograms alone. These ciné loops were recorded and are available for reader trials similar to those performed for comparison of screening mammography with and without computer-aided detection (CAD) [10]. For AWBU to be a useful adjunct to screening mammography for dense-breasted women, interpretation of examinations must be shown as beneficial, when performed by community radiologists. This paper evaluates the performance of such radiologists in detection before and after AWBU is added to a test set of screening mammograms of radiographically dense-breasted women. Materials and methods Imaging studies Mammograms Standard cranio-caudal (CC) and medio-lateral oblique (MLO) views of each breast were available for all cases. If implants were present, displacement views were included. Original analog films (66 cases) or prints of digital films (36 cases) were used for review. All cases used in the study provided informed consent, and the protocol was approved by the Institutional Review Boards at each hospital, or The Western Institutional Review Board [9]. AWBUs Automated whole-breast ultrasound (AWBU) is a computer-based system for performing, recording, and reading whole-breast ultrasound examinations similar in appearance to 2D freehand imaging (SonoCine, Reno, NV). Images were collected with 7- to 12-MHz multi-frequency transducers. The transducer is attached to a computer-guided mechanical arm that acquires images in CC rows overlapping 7 to 10 mm insuring complete coverage of both breasts. Images are collected approximately 0.8 mm apart. The AWBU software creates a ciné loop for review of approximately 3,000 images, simulating real-time imaging. The Windows®-based reading station uses a high-definition 1,600 × 1,200 monitor and special software to increase cancers’ conspicuity. The AWBU procedure was described more fully in a previous publication [9]. Readers Twelve board-certified breast radiologists who use breast ultrasound in their practices were recruited as readers for the trial. Remuneration for 3.5 days was at the prevailing US rate. Eleven readers were from the USA and one from Great Britain. Eleven had no experience with AWBU. One had reviewed limited AWBUs 8 years earlier during the developmental phase of the technology. No reader had foreknowledge of the positivity rate of the test set. Each reader had a 4-h tutorial with one author (KK) explaining the AWBU reading station operation. The readers reviewed and discussed approximately 12 AWBUs with known cancers, not part of the test set. They were not in the test set because either palpable findings were present or there were no concurrent mammograms. Nothing concerning the study was discussed, other than the use of the data form (Appendix A) and the number of cases to be reviewed. Procedure A set of 51 malignant cases (3 cases with bilateral cancers), including invasive and in situ cancer were collected for the trial (Table 1). Screening mammography and AWBU were performed within 2 months of each other. No cancers were associated with prospective palpable findings or symptoms suggestive of cancer. The mammograms were heterogeneously dense or extremely dense breast tissue (BIRADS 3 or 4) on the original reports. All imaging was performed from 2003 to 2008. The data set included all cases meeting the above criteria in the AWBU archives. Twelve cancers were included that were not prospectively reported on either imaging technique, but are visible in retrospect. Four of these became palpable within 1 year, three in more than 1 year; five were discovered in a subsequent screening round, three by AWBU only, and two by both AWBU and mammography. Table 1 Pathological diagnosis of 51 positive cases (54 cancers) ≤1 cm >1 to ≤2 cm >2 cm Total DCIS 2 0 4 6 IDC 17 19 5 41 ILC 3 2 1 6 Mixed IDC and ILC 0 1 0 1 Total 22 22 10 54 DCIS ductal carcinoma in situ, IDC invasive ductal carcinoma, ILC invasive lobular carcinoma Fifty-one normal cases performed from 2003 to 2008 were matched with each of the positive cases for the following factors: Facility Digital or analog mammogram Ultrasound machine model American breast cup size (A–DD) ACR BIRADS breast density Implant (saline or silicone) and location (pre- or retropectoral) Breast cancer history Age The normal case matching factors 1 to 7 closest to the age of the positive case was matched as the normal partner case. The mean difference in age between the positive case and its matched normal was 31 days. Testing occurred on a subsequent date at each reader’s own site with only the reader and a research assistant (monitor) present. The same monitor was present for all readers. She had no knowledge of the test set makeup, had no mammography or ultrasound training, reviewed the test data forms in real-time for completeness, and transferred the data to the study database. At each test site 102 mammograms were placed on a film alternator in random order, generated once, and used for all readers. Excluding breaks, the test subject’s time for review was recorded. The upper half of a data form (Appendix A) was completed for each case, checked by the monitor, and entered into the database. Four questions were asked: Would you request further evaluation based on this mammogram, or recommend routine screening? Where is/are the most suspicious (up to 3) lesions, identifying their location by breast and clock face position? What would be your prediction of the final ACR BIRADS after any needed diagnostic workup was completed? What is the reader’s confidence level that the woman has or does not have cancer (DMIST likelihood scale)? The American College of Radiology Breast Imaging Reporting and Data System (BIRADS) is a seven-point scale (0 = incomplete, needs additional assessment; 1 = normal; 2 = benign; 3 = probably benign; 4a = possible malignancy; 4b = probable malignancy, or 5 = highly suggestive of malignancy) designed to categorize the results of mammography and other imaging studies [3, 11]. Scores from 1 to 5 were allowed. Similar to the DMIST [12], readers were asked to predict a BIRADS score before any diagnostic workup. The DMIST likelihood rating is a seven-point scale to express the confidence of the diagnosis, and ranges from definitely not cancer to definitely cancer [3, 11, 12]. A correct location response was recorded for an hour position marked within the half of the breast centered at the middle of the cancer. A true positive (TP) was recorded for mammography for any malignant case if ‘callback’ was marked for mammography and any correct tumor location was identified. A TP was recorded for mammography plus AWBU if ‘callback’ was marked on either or both halves of the form in the malignant cases, with at least one correct location identified. Thus, a correctly identified TP found with mammography would remain TP even were it not identified again on AWBU. AWBU findings could change the outcome to TP if a cancer was correctly identified with AWBU , but missed with mammography. We evaluated readings on a per-case (i.e., per-patient) basis rather than a per-score basis because screening serves as a “go no-go” gatekeeper for subsequent workup [13]. A true negative (TN) was recorded for mammography for any normal case if ‘callback’ was not marked for mammography. A TN was recorded for mammography plus AWBU for any normal case if ‘callback’ was not marked on the second half of the form. This allowed the reader to reverse a callback for an asymmetric density seen mammographically but cleared by the AWBU as no suspicion. To validate TN cases, all cases were followed for at least 1 year or more. A false positive (FP) was recorded for mammography in two situations: Callback was marked for mammography in a normal case. Callback was marked for mammography in a cancer case, but none of the marked locations corresponded to the cancer. An FP was recorded for mammography plus AWBU in the same two situations as above when callback was marked for AWBU. A false negative (FN) was recorded for mammography when callback was not marked in a cancer case in the mammography portion of the form. Similarly, an FN was recorded for mammography plus AWBU when callback was not marked in a cancer case in either portion of the form. The 102 ABWUs were reviewed by readers on a review station brought by the research assistant acting as a monitor. They worked approximately 8 h daily for 3 days, with breaks at the readers’ choosing. The readers were given the corresponding mammograms with each AWBU and completed the second half of the data sheet with the knowledge from the mammogram-only evaluation available. The same questions were answered for AWBU and the reading time of each AWBU recorded. Statistical analysis Analyses were conducted in a multi-reader multi-case (MRMC) framework where each reader screened all cases and each case contained both screening techniques. The MRMC design efficiently reduces the number of readers and cases needed to detect improvements across techniques [14]. Analyses appropriate for an MRMC design were chosen both to correctly model correlations between readings on the same case across readers and to estimate correctly standard errors. Unless specified otherwise, analyses were conducted in SAS software version 9.2 (SAS Institute Inc., Cary, NC, USA). We present F statistics, shown as F(numerator degrees of freedom, denominator degrees of freedom), and p values for comparisons between mammography plus AWBU and mammography alone. Cases identified for further imaging were assessed by four binary measures: sensitivity = number of TP/number of cancer cases; specificity = number of TN/number of non-cancer cases; positive predictive value (PPV) = number of cancer cases/(number of TP + FP cases); and negative predictive value (NPV) = number of non-cancer cases/(number of FN + TN). Random-effect logistic regression models were used to test whether each binary measure differed significantly between mammography plus AWBU versus mammography alone. To account for the MRMC framework, we included random effects for readers and cases similar to the DBM model [15]. Accuracy was assessed through BIRADS ratings and DMIST likelihood scores, comparing two commonly used indicators of accuracy between mammography plus AWBU versus mammography alone: areas under the curve (AUC) and figures of merit (FOM). The FOM incorporates information from each reader on the region of suspected malignancy, as well as their confidence level in the finding, incorporated in an AUC. Because it includes both confidence level and location accuracy, the FOM is more powerful than AUC in detecting differences between techniques. We include both analyses, as described below: Areas under the curve (AUC) were estimated in DBM MRMC 2.1 [15] (available from http://perception.radiology.uiowa.edu) using the trapezoidal/Wilcoxon method. Readers and patients were treated as random factors. We also present reader-averaged receiver operating characteristic (ROC) curves; average values were calculated from separate ROC analyses conducted on each reader in the PROC LOGISTIC procedure. Figures of merit (FOM) were estimated by using jackknife alternative free-response receiver operating characteristic methodology as implemented in JAFROC Version 1.0 [16] (available from http://www.devchakraborty.com). The FOM is defined as the probability that a cancer on an abnormal image is scored higher than a falsely marked location on a normal image and is analogous to the ROC curve; a higher FOM indicates improvement in reader performance. Confidence in identification of cases for further imaging We used linear regression, comparing BIRADS ratings and DMIST likelihood scores across the two screening techniques among TP cases; mean ratings and scores are estimated by the regression for each screening technique. To account for the MRMC framework, we included random effects similar to the DBM model [15]; the model included a fixed effect for technique, classified as mammography plus AWBU or mammography alone, and random effects for readers and cases. Results Sample Subjects averaged 59.4 years of age (SD = 10.2; range = 41–83). The 51 cancer patients and 51 normal subjects were well-matched with an insignificant mean difference of 31.0 days in age between abnormal and normal cases (t test = 1.47, df = 50, p = 0.15). Table 1 lists the types and size of cancers in the test set. Identification of cases for further imaging Table 2 details individual performance in the identification of cancer cases for further imaging. Mean sensitivity increased from 50% to 81%, an improvement of 63% in the number of cancer cases identified (25.4 vs. 41.4, F(1, 1,161) = 165.95, p < 0.001). Specificity (60–58%; 30.7 vs. 29.1, F(1, 1,161) = 1.11, p = 0.29), PPV (mean = 47–67%; F(1, 1,297) = 0.02, p = 0.89), and NPV (mean = 65–75%; F(1, 933) = 0.61, p = 0.44) did not change significantly with the addition of AWBU. Table 2 Reader performance categorized by imaging technique (n = 102, 51 positive cases) Reader #a True positives True negatives False positives False negatives M M+A M M+A M M+A M M+A 1 28 45 32 21 27 30 15 6 2 28 45 25 21 33 30 16 6 3 25 44 30 20 32 31 15 7 4 26 43 20 28 43 23 13 8 5 26 43 32 30 28 21 16 8 6 32 43 20 37 43 14 7 8 7 26 41 25 27 33 24 18 10 8 23 40 35 31 17 20 27 11 9 16 40 43 25 21 26 22 11 10 27 39 34 36 27 15 14 12 11 26 37 35 41 24 10 17 14 12 22 37 37 34 20 17 23 14 Mean # of cases 25.4 41.4 30.7 29.3 29.0 21.8 16.9 9.6 % of 51 cases 49.8% 81.2% 60.2% 57.5% 56.9% 42.7% 33.1% 18.8% Mean # of added cases 16.0 −1.4 −7.2 −7.3 Mean % of 51 cases added 31.4% −2.7% −14.1% −14.3% % improvement compared with M alone 63% −4% −25% −43% M mammography, M+A mammography plus automated whole-breast ultrasound (AWBU) aReader # presented by best to worst performance based on sensitivity on M+A Individual success varied from 11 to 24 more cancer cases detected by AWBU. As a percentage of the cancers detected with mammography the range in improvement was 42–150%. Not only did all readers find more cancers individually, but all found 16–29% more cancers than the best mammography reader did with mammography alone. For the best performing mammography reader the cancer detections added by AWBU was predictably lower, as more cancers had already been identified with mammography. For the poorest performer on mammography, the addition of AWBU resulted in a 150% improvement, bringing his overall cancer detection rate near the average for the group. Table 3 shows the average reader performance by tumor size for the 45 image sets of patients with invasive cancer. The greatest percentage increase was for cancers 1 cm and under. This is due largely to the relatively poor performance at detecting these cancers with mammography, where only 26% of cases were correctly identified. By adding AWBU, the detection of these small cancers was increased to 65%. Table 3 Reader performance with 45 invasive cases ≤1 cm >1 to ≤2 cm >2 cm Total # % # % # % # % # of cancers 17 100 22 100 6 100 45 100 Mean cancers by mammography 4.4 26 13.5 61 3.0 50 20.9 46 Mean additional cancers by AWBU 6.7 39 6.6 30 2.0 33 15.3 34 Mean total cases detected 11.1 65 20.1 91 5.0 83 36.2 80 % improvement compared to mammography alone 151% 49% 67% 73% For cases with more than one invasive tumor, the larger of the two was used. For interval cancers after imaging, size is the greatest diameter of the tumor seen retrospectively on the AWBU or mammogram, otherwise the diameter is that reported by pathological diagnosis Accuracy The ROC area was greater for mammography plus AWBU for both BIRADS (0.808 versus 0.701; F(1, 123) = 14.79, p < 0.001) and likelihood scores (0.810 versus 0.703; F(1, 85) = 17.88, p < 0.001) as estimated by multi-reader multi-case analyses. This is highlighted in Fig. 1 by ROC curves that are generated by averaging the results of separate ROC analyses for each reader. The BIRADS and likelihood AUC curves for mammography and mammography plus AWBU in both cases almost superimpose when confidence in malignancy by mammography is high, but when confidence in malignancy by mammography is low, as in the lower portions of the graphs, the curves in both cases diverge significantly. In both cases the mammography plus AWBU approaches the y-axis indicating better cancer recognition. Fig. 1 Receiver operating characteristic curves averaged across 12 readers for mammography alone (circles and dashed line) and mammography plus AWBU (triangles and solid line) Figure 2 shows the areas under the ROC curves for each reader and for the average of all readers as estimated by multi-reader multi-case analyses. These individual line graphs mirror the improvement in reader performance shown in Table 2. Fig. 2 Changes in areas under the receiver operating characteristic curve(s) for each reader (hollow circles) and averaged across 12 readers (solid circles) Similar to ROC areas, the figures of merit (FOM) were higher for mammography plus AWBU across all readers, compared with mammography alone using both the BIRADS scores (0.786 versus 0.613; F(1, 270) = 34.1, p < 0.001) and DMIST likelihood scores (0.791 versus 0.614; F(1, 238) = 37.9, p < 0.001) as accuracy indices. Confidence in identification of cases for further imaging Readers reviewing cancer cases were more confident in correctly identifying cases for further imaging, i.e., TP reading, using mammography plus AWBU compared with mammography alone. On average, both BIRADS scores (mean = 4.8 versus 4.2, F(1, 740) = 81.91, p < 0.001) and DMIST likelihood scores (mean = 4.8 versus 4.1, F(1, 740) = 82.21, p < 0.001) were higher. Interpretation times Average reading time per study for the 102 AWBUs was 7 min 58 s (7:58) varying from 5:54 to 12:51. The difference in review time was unrelated to the number of cancers identified by each reader (correlation = 0.02, p = 0.96). Discussion Significant improvement in identification of asymptomatic cancers occurred for all readers in this study. This is shown by a 63% increase in callbacks of cancer cases with only a 4% decrease in correct identification of the true negative cases. The confidence of the diagnoses of the 102 cases with predictive BIRADS and DMIST likelihood scales was confirmed by using AUC and FOM methodology. With a short training period experienced radiologists using 2D AWBU significantly improve their ability to diagnose cancer in dense-breasted women. This type of AWBU is similar in appearance to real-time ultrasound images. The slower transducer speed enforced by the AWBU decreases inter-image distance, allowing the reader more time to identify small masses. At a review speed of 10 images per second the observer has 0.5 s to identify a 5-mm mass. A high-resolution computer screen, along with a post-processing technique to expand the grayscale at the black end of the spectrum, results in visually sharper margins and more contrast of masses against the background tissue. These factors are designed to make recognition of invasive cancers easier and more reliable. This automated process for breast ultrasound eliminates operator variability, provides greater consistency, and ensures reproducibility of quality images. Study radiologists increased discovery of T1a and T1b invasive cancers 150% over mammography alone (Table 3). The average review time per AWBU study was about 11 min shorter than the 19 min for radiologists in the ACRIN 6666 trial of handheld screening ultrasound [8]. As half of our test set subjects had cancers, it would be expected that the average review time we observed for AWBU would be significantly longer than in a typical screening population with mostly normal studies. Our study had a number of inherent weaknesses. Although the test set was confidential, the readers probably quickly realized that it was enriched. They may have been extraordinarily vigilant resulting in increases in both TPs and FPs. A false increase in TPs would occur if all the correctly identified cancers were not subsequently confirmed with biopsy. Also, analysis was performed on a case basis in the three patients in whom cancers were present in both breasts; it was assumed if one of the cancers was identified, the cancer in the other breast would be found by the subsequent workup. This assumption might have falsely raised the TPs and reduced the FNs . In addition, we did not have a comparison with hand-held ultrasound. Any of the following factors could have decreased the readers’ accuracy with AWBU (decreased true positives and negatives, and increased false positives and negatives) compared with a normal screening situation. Fatigue—Readers reviewed an average of 34 AWBUs daily. Inexperience with ultrasound screening—Some readers do not perform screening ultrasound. Limited experience with AWBU—This was the first exposure to AWBU for 11 of the readers. Unfamiliarity with some ultrasound formats—Images from many different manufacturers were used. In spite of these hindrances our observations clearly show that radiologists improve detection of cancers, especially small invasive ones, by adding AWBU to mammography findings. Conclusion This article demonstrates that experienced breast radiologists can learn to interpret 2D AWBU quickly. Radiologists will significantly improve their cancer detection rates in dense-breasted women by adding AWBU to mammography. This procedure has the potential for both standardizing the performance of whole-breast ultrasound and shortening the time required for radiologists.
sec	Introduction Screening with mammography has been shown to reduce mortality from breast cancer [1, 2]. However, the sensitivity to non-palpable cancer of screening mammography in radiographically dense-breasted women is as low as 30–48% [3]. Extremely dense-breasted women have an 18-fold increase in interval cancer found between annual mammograms, compared with fatty-breasted women [4]. Magnetic resonance imaging (MRI) has been demonstrated to be, and recommended as, an efficacious adjunct to mammography for very high-risk, dense-breasted women [5, 6]. It has not been recommended for all dense-breasted women. Three limitations to MRI screening for breast cancer are cost, intravenous injection of gadolinium-containing contrast medium, and lower specificity of MRI compared with mammography [7] with increased false positive callbacks and biopsies. For radiographically dense-breasted women, whole-breast ultrasound as an adjunct to screening mammography has shown promise. Berg et al. increased cancer discovery 42% by adding handheld whole-breast ultrasound performed by radiologists [8]. Kelly et al. used an automated whole-breast ultrasound (AWBU) device capturing a ciné loop of 2D breast images [9]. This blinded study of mostly dense-breasted women showed a 100% increase in cancer detection, and a 200% increase in discovery of invasive cancers 1 cm or less, compared with mammograms alone. These ciné loops were recorded and are available for reader trials similar to those performed for comparison of screening mammography with and without computer-aided detection (CAD) [10]. For AWBU to be a useful adjunct to screening mammography for dense-breasted women, interpretation of examinations must be shown as beneficial, when performed by community radiologists. This paper evaluates the performance of such radiologists in detection before and after AWBU is added to a test set of screening mammograms of radiographically dense-breasted women.
title	Introduction
p	Screening with mammography has been shown to reduce mortality from breast cancer [1, 2]. However, the sensitivity to non-palpable cancer of screening mammography in radiographically dense-breasted women is as low as 30–48% [3]. Extremely dense-breasted women have an 18-fold increase in interval cancer found between annual mammograms, compared with fatty-breasted women [4].
p	Magnetic resonance imaging (MRI) has been demonstrated to be, and recommended as, an efficacious adjunct to mammography for very high-risk, dense-breasted women [5, 6]. It has not been recommended for all dense-breasted women. Three limitations to MRI screening for breast cancer are cost, intravenous injection of gadolinium-containing contrast medium, and lower specificity of MRI compared with mammography [7] with increased false positive callbacks and biopsies.
p	For radiographically dense-breasted women, whole-breast ultrasound as an adjunct to screening mammography has shown promise. Berg et al. increased cancer discovery 42% by adding handheld whole-breast ultrasound performed by radiologists [8]. Kelly et al. used an automated whole-breast ultrasound (AWBU) device capturing a ciné loop of 2D breast images [9]. This blinded study of mostly dense-breasted women showed a 100% increase in cancer detection, and a 200% increase in discovery of invasive cancers 1 cm or less, compared with mammograms alone. These ciné loops were recorded and are available for reader trials similar to those performed for comparison of screening mammography with and without computer-aided detection (CAD) [10].
p	For AWBU to be a useful adjunct to screening mammography for dense-breasted women, interpretation of examinations must be shown as beneficial, when performed by community radiologists. This paper evaluates the performance of such radiologists in detection before and after AWBU is added to a test set of screening mammograms of radiographically dense-breasted women.
sec	Materials and methods Imaging studies Mammograms Standard cranio-caudal (CC) and medio-lateral oblique (MLO) views of each breast were available for all cases. If implants were present, displacement views were included. Original analog films (66 cases) or prints of digital films (36 cases) were used for review. All cases used in the study provided informed consent, and the protocol was approved by the Institutional Review Boards at each hospital, or The Western Institutional Review Board [9]. AWBUs Automated whole-breast ultrasound (AWBU) is a computer-based system for performing, recording, and reading whole-breast ultrasound examinations similar in appearance to 2D freehand imaging (SonoCine, Reno, NV). Images were collected with 7- to 12-MHz multi-frequency transducers. The transducer is attached to a computer-guided mechanical arm that acquires images in CC rows overlapping 7 to 10 mm insuring complete coverage of both breasts. Images are collected approximately 0.8 mm apart. The AWBU software creates a ciné loop for review of approximately 3,000 images, simulating real-time imaging. The Windows®-based reading station uses a high-definition 1,600 × 1,200 monitor and special software to increase cancers’ conspicuity. The AWBU procedure was described more fully in a previous publication [9]. Readers Twelve board-certified breast radiologists who use breast ultrasound in their practices were recruited as readers for the trial. Remuneration for 3.5 days was at the prevailing US rate. Eleven readers were from the USA and one from Great Britain. Eleven had no experience with AWBU. One had reviewed limited AWBUs 8 years earlier during the developmental phase of the technology. No reader had foreknowledge of the positivity rate of the test set. Each reader had a 4-h tutorial with one author (KK) explaining the AWBU reading station operation. The readers reviewed and discussed approximately 12 AWBUs with known cancers, not part of the test set. They were not in the test set because either palpable findings were present or there were no concurrent mammograms. Nothing concerning the study was discussed, other than the use of the data form (Appendix A) and the number of cases to be reviewed. Procedure A set of 51 malignant cases (3 cases with bilateral cancers), including invasive and in situ cancer were collected for the trial (Table 1). Screening mammography and AWBU were performed within 2 months of each other. No cancers were associated with prospective palpable findings or symptoms suggestive of cancer. The mammograms were heterogeneously dense or extremely dense breast tissue (BIRADS 3 or 4) on the original reports. All imaging was performed from 2003 to 2008. The data set included all cases meeting the above criteria in the AWBU archives. Twelve cancers were included that were not prospectively reported on either imaging technique, but are visible in retrospect. Four of these became palpable within 1 year, three in more than 1 year; five were discovered in a subsequent screening round, three by AWBU only, and two by both AWBU and mammography. Table 1 Pathological diagnosis of 51 positive cases (54 cancers) ≤1 cm >1 to ≤2 cm >2 cm Total DCIS 2 0 4 6 IDC 17 19 5 41 ILC 3 2 1 6 Mixed IDC and ILC 0 1 0 1 Total 22 22 10 54 DCIS ductal carcinoma in situ, IDC invasive ductal carcinoma, ILC invasive lobular carcinoma Fifty-one normal cases performed from 2003 to 2008 were matched with each of the positive cases for the following factors: Facility Digital or analog mammogram Ultrasound machine model American breast cup size (A–DD) ACR BIRADS breast density Implant (saline or silicone) and location (pre- or retropectoral) Breast cancer history Age The normal case matching factors 1 to 7 closest to the age of the positive case was matched as the normal partner case. The mean difference in age between the positive case and its matched normal was 31 days. Testing occurred on a subsequent date at each reader’s own site with only the reader and a research assistant (monitor) present. The same monitor was present for all readers. She had no knowledge of the test set makeup, had no mammography or ultrasound training, reviewed the test data forms in real-time for completeness, and transferred the data to the study database. At each test site 102 mammograms were placed on a film alternator in random order, generated once, and used for all readers. Excluding breaks, the test subject’s time for review was recorded. The upper half of a data form (Appendix A) was completed for each case, checked by the monitor, and entered into the database. Four questions were asked: Would you request further evaluation based on this mammogram, or recommend routine screening? Where is/are the most suspicious (up to 3) lesions, identifying their location by breast and clock face position? What would be your prediction of the final ACR BIRADS after any needed diagnostic workup was completed? What is the reader’s confidence level that the woman has or does not have cancer (DMIST likelihood scale)? The American College of Radiology Breast Imaging Reporting and Data System (BIRADS) is a seven-point scale (0 = incomplete, needs additional assessment; 1 = normal; 2 = benign; 3 = probably benign; 4a = possible malignancy; 4b = probable malignancy, or 5 = highly suggestive of malignancy) designed to categorize the results of mammography and other imaging studies [3, 11]. Scores from 1 to 5 were allowed. Similar to the DMIST [12], readers were asked to predict a BIRADS score before any diagnostic workup. The DMIST likelihood rating is a seven-point scale to express the confidence of the diagnosis, and ranges from definitely not cancer to definitely cancer [3, 11, 12]. A correct location response was recorded for an hour position marked within the half of the breast centered at the middle of the cancer. A true positive (TP) was recorded for mammography for any malignant case if ‘callback’ was marked for mammography and any correct tumor location was identified. A TP was recorded for mammography plus AWBU if ‘callback’ was marked on either or both halves of the form in the malignant cases, with at least one correct location identified. Thus, a correctly identified TP found with mammography would remain TP even were it not identified again on AWBU. AWBU findings could change the outcome to TP if a cancer was correctly identified with AWBU , but missed with mammography. We evaluated readings on a per-case (i.e., per-patient) basis rather than a per-score basis because screening serves as a “go no-go” gatekeeper for subsequent workup [13]. A true negative (TN) was recorded for mammography for any normal case if ‘callback’ was not marked for mammography. A TN was recorded for mammography plus AWBU for any normal case if ‘callback’ was not marked on the second half of the form. This allowed the reader to reverse a callback for an asymmetric density seen mammographically but cleared by the AWBU as no suspicion. To validate TN cases, all cases were followed for at least 1 year or more. A false positive (FP) was recorded for mammography in two situations: Callback was marked for mammography in a normal case. Callback was marked for mammography in a cancer case, but none of the marked locations corresponded to the cancer. An FP was recorded for mammography plus AWBU in the same two situations as above when callback was marked for AWBU. A false negative (FN) was recorded for mammography when callback was not marked in a cancer case in the mammography portion of the form. Similarly, an FN was recorded for mammography plus AWBU when callback was not marked in a cancer case in either portion of the form. The 102 ABWUs were reviewed by readers on a review station brought by the research assistant acting as a monitor. They worked approximately 8 h daily for 3 days, with breaks at the readers’ choosing. The readers were given the corresponding mammograms with each AWBU and completed the second half of the data sheet with the knowledge from the mammogram-only evaluation available. The same questions were answered for AWBU and the reading time of each AWBU recorded. Statistical analysis Analyses were conducted in a multi-reader multi-case (MRMC) framework where each reader screened all cases and each case contained both screening techniques. The MRMC design efficiently reduces the number of readers and cases needed to detect improvements across techniques [14]. Analyses appropriate for an MRMC design were chosen both to correctly model correlations between readings on the same case across readers and to estimate correctly standard errors. Unless specified otherwise, analyses were conducted in SAS software version 9.2 (SAS Institute Inc., Cary, NC, USA). We present F statistics, shown as F(numerator degrees of freedom, denominator degrees of freedom), and p values for comparisons between mammography plus AWBU and mammography alone. Cases identified for further imaging were assessed by four binary measures: sensitivity = number of TP/number of cancer cases; specificity = number of TN/number of non-cancer cases; positive predictive value (PPV) = number of cancer cases/(number of TP + FP cases); and negative predictive value (NPV) = number of non-cancer cases/(number of FN + TN). Random-effect logistic regression models were used to test whether each binary measure differed significantly between mammography plus AWBU versus mammography alone. To account for the MRMC framework, we included random effects for readers and cases similar to the DBM model [15]. Accuracy was assessed through BIRADS ratings and DMIST likelihood scores, comparing two commonly used indicators of accuracy between mammography plus AWBU versus mammography alone: areas under the curve (AUC) and figures of merit (FOM). The FOM incorporates information from each reader on the region of suspected malignancy, as well as their confidence level in the finding, incorporated in an AUC. Because it includes both confidence level and location accuracy, the FOM is more powerful than AUC in detecting differences between techniques. We include both analyses, as described below: Areas under the curve (AUC) were estimated in DBM MRMC 2.1 [15] (available from http://perception.radiology.uiowa.edu) using the trapezoidal/Wilcoxon method. Readers and patients were treated as random factors. We also present reader-averaged receiver operating characteristic (ROC) curves; average values were calculated from separate ROC analyses conducted on each reader in the PROC LOGISTIC procedure. Figures of merit (FOM) were estimated by using jackknife alternative free-response receiver operating characteristic methodology as implemented in JAFROC Version 1.0 [16] (available from http://www.devchakraborty.com). The FOM is defined as the probability that a cancer on an abnormal image is scored higher than a falsely marked location on a normal image and is analogous to the ROC curve; a higher FOM indicates improvement in reader performance. Confidence in identification of cases for further imaging We used linear regression, comparing BIRADS ratings and DMIST likelihood scores across the two screening techniques among TP cases; mean ratings and scores are estimated by the regression for each screening technique. To account for the MRMC framework, we included random effects similar to the DBM model [15]; the model included a fixed effect for technique, classified as mammography plus AWBU or mammography alone, and random effects for readers and cases.
title	Materials and methods
sec	Imaging studies Mammograms Standard cranio-caudal (CC) and medio-lateral oblique (MLO) views of each breast were available for all cases. If implants were present, displacement views were included. Original analog films (66 cases) or prints of digital films (36 cases) were used for review. All cases used in the study provided informed consent, and the protocol was approved by the Institutional Review Boards at each hospital, or The Western Institutional Review Board [9]. AWBUs Automated whole-breast ultrasound (AWBU) is a computer-based system for performing, recording, and reading whole-breast ultrasound examinations similar in appearance to 2D freehand imaging (SonoCine, Reno, NV). Images were collected with 7- to 12-MHz multi-frequency transducers. The transducer is attached to a computer-guided mechanical arm that acquires images in CC rows overlapping 7 to 10 mm insuring complete coverage of both breasts. Images are collected approximately 0.8 mm apart. The AWBU software creates a ciné loop for review of approximately 3,000 images, simulating real-time imaging. The Windows®-based reading station uses a high-definition 1,600 × 1,200 monitor and special software to increase cancers’ conspicuity. The AWBU procedure was described more fully in a previous publication [9].
title	Imaging studies
sec	Mammograms Standard cranio-caudal (CC) and medio-lateral oblique (MLO) views of each breast were available for all cases. If implants were present, displacement views were included. Original analog films (66 cases) or prints of digital films (36 cases) were used for review. All cases used in the study provided informed consent, and the protocol was approved by the Institutional Review Boards at each hospital, or The Western Institutional Review Board [9].
title	Mammograms
p	Standard cranio-caudal (CC) and medio-lateral oblique (MLO) views of each breast were available for all cases. If implants were present, displacement views were included. Original analog films (66 cases) or prints of digital films (36 cases) were used for review. All cases used in the study provided informed consent, and the protocol was approved by the Institutional Review Boards at each hospital, or The Western Institutional Review Board [9].
sec	AWBUs Automated whole-breast ultrasound (AWBU) is a computer-based system for performing, recording, and reading whole-breast ultrasound examinations similar in appearance to 2D freehand imaging (SonoCine, Reno, NV). Images were collected with 7- to 12-MHz multi-frequency transducers. The transducer is attached to a computer-guided mechanical arm that acquires images in CC rows overlapping 7 to 10 mm insuring complete coverage of both breasts. Images are collected approximately 0.8 mm apart. The AWBU software creates a ciné loop for review of approximately 3,000 images, simulating real-time imaging. The Windows®-based reading station uses a high-definition 1,600 × 1,200 monitor and special software to increase cancers’ conspicuity. The AWBU procedure was described more fully in a previous publication [9].
title	AWBUs
p	Automated whole-breast ultrasound (AWBU) is a computer-based system for performing, recording, and reading whole-breast ultrasound examinations similar in appearance to 2D freehand imaging (SonoCine, Reno, NV). Images were collected with 7- to 12-MHz multi-frequency transducers. The transducer is attached to a computer-guided mechanical arm that acquires images in CC rows overlapping 7 to 10 mm insuring complete coverage of both breasts. Images are collected approximately 0.8 mm apart. The AWBU software creates a ciné loop for review of approximately 3,000 images, simulating real-time imaging. The Windows®-based reading station uses a high-definition 1,600 × 1,200 monitor and special software to increase cancers’ conspicuity. The AWBU procedure was described more fully in a previous publication [9].
sec	Readers Twelve board-certified breast radiologists who use breast ultrasound in their practices were recruited as readers for the trial. Remuneration for 3.5 days was at the prevailing US rate. Eleven readers were from the USA and one from Great Britain. Eleven had no experience with AWBU. One had reviewed limited AWBUs 8 years earlier during the developmental phase of the technology. No reader had foreknowledge of the positivity rate of the test set. Each reader had a 4-h tutorial with one author (KK) explaining the AWBU reading station operation. The readers reviewed and discussed approximately 12 AWBUs with known cancers, not part of the test set. They were not in the test set because either palpable findings were present or there were no concurrent mammograms. Nothing concerning the study was discussed, other than the use of the data form (Appendix A) and the number of cases to be reviewed.
title	Readers
p	Twelve board-certified breast radiologists who use breast ultrasound in their practices were recruited as readers for the trial. Remuneration for 3.5 days was at the prevailing US rate. Eleven readers were from the USA and one from Great Britain. Eleven had no experience with AWBU. One had reviewed limited AWBUs 8 years earlier during the developmental phase of the technology. No reader had foreknowledge of the positivity rate of the test set.
p	Each reader had a 4-h tutorial with one author (KK) explaining the AWBU reading station operation. The readers reviewed and discussed approximately 12 AWBUs with known cancers, not part of the test set. They were not in the test set because either palpable findings were present or there were no concurrent mammograms. Nothing concerning the study was discussed, other than the use of the data form (Appendix A) and the number of cases to be reviewed.
sec	Procedure A set of 51 malignant cases (3 cases with bilateral cancers), including invasive and in situ cancer were collected for the trial (Table 1). Screening mammography and AWBU were performed within 2 months of each other. No cancers were associated with prospective palpable findings or symptoms suggestive of cancer. The mammograms were heterogeneously dense or extremely dense breast tissue (BIRADS 3 or 4) on the original reports. All imaging was performed from 2003 to 2008. The data set included all cases meeting the above criteria in the AWBU archives. Twelve cancers were included that were not prospectively reported on either imaging technique, but are visible in retrospect. Four of these became palpable within 1 year, three in more than 1 year; five were discovered in a subsequent screening round, three by AWBU only, and two by both AWBU and mammography. Table 1 Pathological diagnosis of 51 positive cases (54 cancers) ≤1 cm >1 to ≤2 cm >2 cm Total DCIS 2 0 4 6 IDC 17 19 5 41 ILC 3 2 1 6 Mixed IDC and ILC 0 1 0 1 Total 22 22 10 54 DCIS ductal carcinoma in situ, IDC invasive ductal carcinoma, ILC invasive lobular carcinoma Fifty-one normal cases performed from 2003 to 2008 were matched with each of the positive cases for the following factors: Facility Digital or analog mammogram Ultrasound machine model American breast cup size (A–DD) ACR BIRADS breast density Implant (saline or silicone) and location (pre- or retropectoral) Breast cancer history Age The normal case matching factors 1 to 7 closest to the age of the positive case was matched as the normal partner case. The mean difference in age between the positive case and its matched normal was 31 days. Testing occurred on a subsequent date at each reader’s own site with only the reader and a research assistant (monitor) present. The same monitor was present for all readers. She had no knowledge of the test set makeup, had no mammography or ultrasound training, reviewed the test data forms in real-time for completeness, and transferred the data to the study database. At each test site 102 mammograms were placed on a film alternator in random order, generated once, and used for all readers. Excluding breaks, the test subject’s time for review was recorded. The upper half of a data form (Appendix A) was completed for each case, checked by the monitor, and entered into the database. Four questions were asked: Would you request further evaluation based on this mammogram, or recommend routine screening? Where is/are the most suspicious (up to 3) lesions, identifying their location by breast and clock face position? What would be your prediction of the final ACR BIRADS after any needed diagnostic workup was completed? What is the reader’s confidence level that the woman has or does not have cancer (DMIST likelihood scale)? The American College of Radiology Breast Imaging Reporting and Data System (BIRADS) is a seven-point scale (0 = incomplete, needs additional assessment; 1 = normal; 2 = benign; 3 = probably benign; 4a = possible malignancy; 4b = probable malignancy, or 5 = highly suggestive of malignancy) designed to categorize the results of mammography and other imaging studies [3, 11]. Scores from 1 to 5 were allowed. Similar to the DMIST [12], readers were asked to predict a BIRADS score before any diagnostic workup. The DMIST likelihood rating is a seven-point scale to express the confidence of the diagnosis, and ranges from definitely not cancer to definitely cancer [3, 11, 12]. A correct location response was recorded for an hour position marked within the half of the breast centered at the middle of the cancer. A true positive (TP) was recorded for mammography for any malignant case if ‘callback’ was marked for mammography and any correct tumor location was identified. A TP was recorded for mammography plus AWBU if ‘callback’ was marked on either or both halves of the form in the malignant cases, with at least one correct location identified. Thus, a correctly identified TP found with mammography would remain TP even were it not identified again on AWBU. AWBU findings could change the outcome to TP if a cancer was correctly identified with AWBU , but missed with mammography. We evaluated readings on a per-case (i.e., per-patient) basis rather than a per-score basis because screening serves as a “go no-go” gatekeeper for subsequent workup [13]. A true negative (TN) was recorded for mammography for any normal case if ‘callback’ was not marked for mammography. A TN was recorded for mammography plus AWBU for any normal case if ‘callback’ was not marked on the second half of the form. This allowed the reader to reverse a callback for an asymmetric density seen mammographically but cleared by the AWBU as no suspicion. To validate TN cases, all cases were followed for at least 1 year or more. A false positive (FP) was recorded for mammography in two situations: Callback was marked for mammography in a normal case. Callback was marked for mammography in a cancer case, but none of the marked locations corresponded to the cancer. An FP was recorded for mammography plus AWBU in the same two situations as above when callback was marked for AWBU. A false negative (FN) was recorded for mammography when callback was not marked in a cancer case in the mammography portion of the form. Similarly, an FN was recorded for mammography plus AWBU when callback was not marked in a cancer case in either portion of the form. The 102 ABWUs were reviewed by readers on a review station brought by the research assistant acting as a monitor. They worked approximately 8 h daily for 3 days, with breaks at the readers’ choosing. The readers were given the corresponding mammograms with each AWBU and completed the second half of the data sheet with the knowledge from the mammogram-only evaluation available. The same questions were answered for AWBU and the reading time of each AWBU recorded.
title	Procedure
p	A set of 51 malignant cases (3 cases with bilateral cancers), including invasive and in situ cancer were collected for the trial (Table 1). Screening mammography and AWBU were performed within 2 months of each other. No cancers were associated with prospective palpable findings or symptoms suggestive of cancer. The mammograms were heterogeneously dense or extremely dense breast tissue (BIRADS 3 or 4) on the original reports. All imaging was performed from 2003 to 2008. The data set included all cases meeting the above criteria in the AWBU archives. Twelve cancers were included that were not prospectively reported on either imaging technique, but are visible in retrospect. Four of these became palpable within 1 year, three in more than 1 year; five were discovered in a subsequent screening round, three by AWBU only, and two by both AWBU and mammography. Table 1 Pathological diagnosis of 51 positive cases (54 cancers) ≤1 cm >1 to ≤2 cm >2 cm Total DCIS 2 0 4 6 IDC 17 19 5 41 ILC 3 2 1 6 Mixed IDC and ILC 0 1 0 1 Total 22 22 10 54 DCIS ductal carcinoma in situ, IDC invasive ductal carcinoma, ILC invasive lobular carcinoma
table-wrap	Table 1 Pathological diagnosis of 51 positive cases (54 cancers) ≤1 cm >1 to ≤2 cm >2 cm Total DCIS 2 0 4 6 IDC 17 19 5 41 ILC 3 2 1 6 Mixed IDC and ILC 0 1 0 1 Total 22 22 10 54 DCIS ductal carcinoma in situ, IDC invasive ductal carcinoma, ILC invasive lobular carcinoma
label	Table 1
caption	Pathological diagnosis of 51 positive cases (54 cancers)
p	Pathological diagnosis of 51 positive cases (54 cancers)
table	≤1 cm >1 to ≤2 cm >2 cm Total DCIS 2 0 4 6 IDC 17 19 5 41 ILC 3 2 1 6 Mixed IDC and ILC 0 1 0 1 Total 22 22 10 54
tr	≤1 cm >1 to ≤2 cm >2 cm Total
th	≤1 cm
th	>1 to ≤2 cm
th	>2 cm
th	Total
tr	DCIS 2 0 4 6
td	DCIS
td	2
td	0
td	4
td	6
tr	IDC 17 19 5 41
td	IDC
td	17
td	19
td	5
td	41
tr	ILC 3 2 1 6
td	ILC
td	3
td	2
td	1
td	6
tr	Mixed IDC and ILC 0 1 0 1
td	Mixed IDC and ILC
td	0
td	1
td	0
td	1
tr	Total 22 22 10 54
td	Total
td	22
td	22
td	10
td	54
table-wrap-foot	DCIS ductal carcinoma in situ, IDC invasive ductal carcinoma, ILC invasive lobular carcinoma
p	DCIS ductal carcinoma in situ, IDC invasive ductal carcinoma, ILC invasive lobular carcinoma
p	Fifty-one normal cases performed from 2003 to 2008 were matched with each of the positive cases for the following factors: Facility Digital or analog mammogram Ultrasound machine model American breast cup size (A–DD) ACR BIRADS breast density Implant (saline or silicone) and location (pre- or retropectoral) Breast cancer history Age The normal case matching factors 1 to 7 closest to the age of the positive case was matched as the normal partner case. The mean difference in age between the positive case and its matched normal was 31 days.
p	Facility
p	Digital or analog mammogram
p	Ultrasound machine model
p	American breast cup size (A–DD)
p	ACR BIRADS breast density
p	Implant (saline or silicone) and location (pre- or retropectoral)
p	Breast cancer history
p	Age
p	Testing occurred on a subsequent date at each reader’s own site with only the reader and a research assistant (monitor) present. The same monitor was present for all readers. She had no knowledge of the test set makeup, had no mammography or ultrasound training, reviewed the test data forms in real-time for completeness, and transferred the data to the study database.
p	At each test site 102 mammograms were placed on a film alternator in random order, generated once, and used for all readers. Excluding breaks, the test subject’s time for review was recorded. The upper half of a data form (Appendix A) was completed for each case, checked by the monitor, and entered into the database.
p	Four questions were asked: Would you request further evaluation based on this mammogram, or recommend routine screening? Where is/are the most suspicious (up to 3) lesions, identifying their location by breast and clock face position? What would be your prediction of the final ACR BIRADS after any needed diagnostic workup was completed? What is the reader’s confidence level that the woman has or does not have cancer (DMIST likelihood scale)?
p	Would you request further evaluation based on this mammogram, or recommend routine screening?
p	Where is/are the most suspicious (up to 3) lesions, identifying their location by breast and clock face position?
p	What would be your prediction of the final ACR BIRADS after any needed diagnostic workup was completed?
p	What is the reader’s confidence level that the woman has or does not have cancer (DMIST likelihood scale)?
p	The American College of Radiology Breast Imaging Reporting and Data System (BIRADS) is a seven-point scale (0 = incomplete, needs additional assessment; 1 = normal; 2 = benign; 3 = probably benign; 4a = possible malignancy; 4b = probable malignancy, or 5 = highly suggestive of malignancy) designed to categorize the results of mammography and other imaging studies [3, 11]. Scores from 1 to 5 were allowed. Similar to the DMIST [12], readers were asked to predict a BIRADS score before any diagnostic workup.
p	The DMIST likelihood rating is a seven-point scale to express the confidence of the diagnosis, and ranges from definitely not cancer to definitely cancer [3, 11, 12].
p	A correct location response was recorded for an hour position marked within the half of the breast centered at the middle of the cancer.
p	A true positive (TP) was recorded for mammography for any malignant case if ‘callback’ was marked for mammography and any correct tumor location was identified. A TP was recorded for mammography plus AWBU if ‘callback’ was marked on either or both halves of the form in the malignant cases, with at least one correct location identified. Thus, a correctly identified TP found with mammography would remain TP even were it not identified again on AWBU. AWBU findings could change the outcome to TP if a cancer was correctly identified with AWBU , but missed with mammography. We evaluated readings on a per-case (i.e., per-patient) basis rather than a per-score basis because screening serves as a “go no-go” gatekeeper for subsequent workup [13].
p	A true negative (TN) was recorded for mammography for any normal case if ‘callback’ was not marked for mammography. A TN was recorded for mammography plus AWBU for any normal case if ‘callback’ was not marked on the second half of the form. This allowed the reader to reverse a callback for an asymmetric density seen mammographically but cleared by the AWBU as no suspicion. To validate TN cases, all cases were followed for at least 1 year or more.
p	A false positive (FP) was recorded for mammography in two situations: Callback was marked for mammography in a normal case. Callback was marked for mammography in a cancer case, but none of the marked locations corresponded to the cancer.
p	Callback was marked for mammography in a normal case.
p	Callback was marked for mammography in a cancer case, but none of the marked locations corresponded to the cancer.
p	An FP was recorded for mammography plus AWBU in the same two situations as above when callback was marked for AWBU. A false negative (FN) was recorded for mammography when callback was not marked in a cancer case in the mammography portion of the form. Similarly, an FN was recorded for mammography plus AWBU when callback was not marked in a cancer case in either portion of the form.
p	The 102 ABWUs were reviewed by readers on a review station brought by the research assistant acting as a monitor. They worked approximately 8 h daily for 3 days, with breaks at the readers’ choosing. The readers were given the corresponding mammograms with each AWBU and completed the second half of the data sheet with the knowledge from the mammogram-only evaluation available. The same questions were answered for AWBU and the reading time of each AWBU recorded.
sec	Statistical analysis Analyses were conducted in a multi-reader multi-case (MRMC) framework where each reader screened all cases and each case contained both screening techniques. The MRMC design efficiently reduces the number of readers and cases needed to detect improvements across techniques [14]. Analyses appropriate for an MRMC design were chosen both to correctly model correlations between readings on the same case across readers and to estimate correctly standard errors. Unless specified otherwise, analyses were conducted in SAS software version 9.2 (SAS Institute Inc., Cary, NC, USA). We present F statistics, shown as F(numerator degrees of freedom, denominator degrees of freedom), and p values for comparisons between mammography plus AWBU and mammography alone. Cases identified for further imaging were assessed by four binary measures: sensitivity = number of TP/number of cancer cases; specificity = number of TN/number of non-cancer cases; positive predictive value (PPV) = number of cancer cases/(number of TP + FP cases); and negative predictive value (NPV) = number of non-cancer cases/(number of FN + TN). Random-effect logistic regression models were used to test whether each binary measure differed significantly between mammography plus AWBU versus mammography alone. To account for the MRMC framework, we included random effects for readers and cases similar to the DBM model [15]. Accuracy was assessed through BIRADS ratings and DMIST likelihood scores, comparing two commonly used indicators of accuracy between mammography plus AWBU versus mammography alone: areas under the curve (AUC) and figures of merit (FOM). The FOM incorporates information from each reader on the region of suspected malignancy, as well as their confidence level in the finding, incorporated in an AUC. Because it includes both confidence level and location accuracy, the FOM is more powerful than AUC in detecting differences between techniques. We include both analyses, as described below: Areas under the curve (AUC) were estimated in DBM MRMC 2.1 [15] (available from http://perception.radiology.uiowa.edu) using the trapezoidal/Wilcoxon method. Readers and patients were treated as random factors. We also present reader-averaged receiver operating characteristic (ROC) curves; average values were calculated from separate ROC analyses conducted on each reader in the PROC LOGISTIC procedure. Figures of merit (FOM) were estimated by using jackknife alternative free-response receiver operating characteristic methodology as implemented in JAFROC Version 1.0 [16] (available from http://www.devchakraborty.com). The FOM is defined as the probability that a cancer on an abnormal image is scored higher than a falsely marked location on a normal image and is analogous to the ROC curve; a higher FOM indicates improvement in reader performance. Confidence in identification of cases for further imaging We used linear regression, comparing BIRADS ratings and DMIST likelihood scores across the two screening techniques among TP cases; mean ratings and scores are estimated by the regression for each screening technique. To account for the MRMC framework, we included random effects similar to the DBM model [15]; the model included a fixed effect for technique, classified as mammography plus AWBU or mammography alone, and random effects for readers and cases.
title	Statistical analysis
p	Analyses were conducted in a multi-reader multi-case (MRMC) framework where each reader screened all cases and each case contained both screening techniques. The MRMC design efficiently reduces the number of readers and cases needed to detect improvements across techniques [14]. Analyses appropriate for an MRMC design were chosen both to correctly model correlations between readings on the same case across readers and to estimate correctly standard errors. Unless specified otherwise, analyses were conducted in SAS software version 9.2 (SAS Institute Inc., Cary, NC, USA). We present F statistics, shown as F(numerator degrees of freedom, denominator degrees of freedom), and p values for comparisons between mammography plus AWBU and mammography alone.
p	Cases identified for further imaging were assessed by four binary measures: sensitivity = number of TP/number of cancer cases; specificity = number of TN/number of non-cancer cases; positive predictive value (PPV) = number of cancer cases/(number of TP + FP cases); and negative predictive value (NPV) = number of non-cancer cases/(number of FN + TN). Random-effect logistic regression models were used to test whether each binary measure differed significantly between mammography plus AWBU versus mammography alone. To account for the MRMC framework, we included random effects for readers and cases similar to the DBM model [15].
p	Accuracy was assessed through BIRADS ratings and DMIST likelihood scores, comparing two commonly used indicators of accuracy between mammography plus AWBU versus mammography alone: areas under the curve (AUC) and figures of merit (FOM). The FOM incorporates information from each reader on the region of suspected malignancy, as well as their confidence level in the finding, incorporated in an AUC. Because it includes both confidence level and location accuracy, the FOM is more powerful than AUC in detecting differences between techniques. We include both analyses, as described below:
p	Areas under the curve (AUC) were estimated in DBM MRMC 2.1 [15] (available from http://perception.radiology.uiowa.edu) using the trapezoidal/Wilcoxon method. Readers and patients were treated as random factors. We also present reader-averaged receiver operating characteristic (ROC) curves; average values were calculated from separate ROC analyses conducted on each reader in the PROC LOGISTIC procedure.
p	Figures of merit (FOM) were estimated by using jackknife alternative free-response receiver operating characteristic methodology as implemented in JAFROC Version 1.0 [16] (available from http://www.devchakraborty.com). The FOM is defined as the probability that a cancer on an abnormal image is scored higher than a falsely marked location on a normal image and is analogous to the ROC curve; a higher FOM indicates improvement in reader performance.
sec	Confidence in identification of cases for further imaging We used linear regression, comparing BIRADS ratings and DMIST likelihood scores across the two screening techniques among TP cases; mean ratings and scores are estimated by the regression for each screening technique. To account for the MRMC framework, we included random effects similar to the DBM model [15]; the model included a fixed effect for technique, classified as mammography plus AWBU or mammography alone, and random effects for readers and cases.
title	Confidence in identification of cases for further imaging
p	We used linear regression, comparing BIRADS ratings and DMIST likelihood scores across the two screening techniques among TP cases; mean ratings and scores are estimated by the regression for each screening technique. To account for the MRMC framework, we included random effects similar to the DBM model [15]; the model included a fixed effect for technique, classified as mammography plus AWBU or mammography alone, and random effects for readers and cases.
sec	Results Sample Subjects averaged 59.4 years of age (SD = 10.2; range = 41–83). The 51 cancer patients and 51 normal subjects were well-matched with an insignificant mean difference of 31.0 days in age between abnormal and normal cases (t test = 1.47, df = 50, p = 0.15). Table 1 lists the types and size of cancers in the test set. Identification of cases for further imaging Table 2 details individual performance in the identification of cancer cases for further imaging. Mean sensitivity increased from 50% to 81%, an improvement of 63% in the number of cancer cases identified (25.4 vs. 41.4, F(1, 1,161) = 165.95, p < 0.001). Specificity (60–58%; 30.7 vs. 29.1, F(1, 1,161) = 1.11, p = 0.29), PPV (mean = 47–67%; F(1, 1,297) = 0.02, p = 0.89), and NPV (mean = 65–75%; F(1, 933) = 0.61, p = 0.44) did not change significantly with the addition of AWBU. Table 2 Reader performance categorized by imaging technique (n = 102, 51 positive cases) Reader #a True positives True negatives False positives False negatives M M+A M M+A M M+A M M+A 1 28 45 32 21 27 30 15 6 2 28 45 25 21 33 30 16 6 3 25 44 30 20 32 31 15 7 4 26 43 20 28 43 23 13 8 5 26 43 32 30 28 21 16 8 6 32 43 20 37 43 14 7 8 7 26 41 25 27 33 24 18 10 8 23 40 35 31 17 20 27 11 9 16 40 43 25 21 26 22 11 10 27 39 34 36 27 15 14 12 11 26 37 35 41 24 10 17 14 12 22 37 37 34 20 17 23 14 Mean # of cases 25.4 41.4 30.7 29.3 29.0 21.8 16.9 9.6 % of 51 cases 49.8% 81.2% 60.2% 57.5% 56.9% 42.7% 33.1% 18.8% Mean # of added cases 16.0 −1.4 −7.2 −7.3 Mean % of 51 cases added 31.4% −2.7% −14.1% −14.3% % improvement compared with M alone 63% −4% −25% −43% M mammography, M+A mammography plus automated whole-breast ultrasound (AWBU) aReader # presented by best to worst performance based on sensitivity on M+A Individual success varied from 11 to 24 more cancer cases detected by AWBU. As a percentage of the cancers detected with mammography the range in improvement was 42–150%. Not only did all readers find more cancers individually, but all found 16–29% more cancers than the best mammography reader did with mammography alone. For the best performing mammography reader the cancer detections added by AWBU was predictably lower, as more cancers had already been identified with mammography. For the poorest performer on mammography, the addition of AWBU resulted in a 150% improvement, bringing his overall cancer detection rate near the average for the group. Table 3 shows the average reader performance by tumor size for the 45 image sets of patients with invasive cancer. The greatest percentage increase was for cancers 1 cm and under. This is due largely to the relatively poor performance at detecting these cancers with mammography, where only 26% of cases were correctly identified. By adding AWBU, the detection of these small cancers was increased to 65%. Table 3 Reader performance with 45 invasive cases ≤1 cm >1 to ≤2 cm >2 cm Total # % # % # % # % # of cancers 17 100 22 100 6 100 45 100 Mean cancers by mammography 4.4 26 13.5 61 3.0 50 20.9 46 Mean additional cancers by AWBU 6.7 39 6.6 30 2.0 33 15.3 34 Mean total cases detected 11.1 65 20.1 91 5.0 83 36.2 80 % improvement compared to mammography alone 151% 49% 67% 73% For cases with more than one invasive tumor, the larger of the two was used. For interval cancers after imaging, size is the greatest diameter of the tumor seen retrospectively on the AWBU or mammogram, otherwise the diameter is that reported by pathological diagnosis Accuracy The ROC area was greater for mammography plus AWBU for both BIRADS (0.808 versus 0.701; F(1, 123) = 14.79, p < 0.001) and likelihood scores (0.810 versus 0.703; F(1, 85) = 17.88, p < 0.001) as estimated by multi-reader multi-case analyses. This is highlighted in Fig. 1 by ROC curves that are generated by averaging the results of separate ROC analyses for each reader. The BIRADS and likelihood AUC curves for mammography and mammography plus AWBU in both cases almost superimpose when confidence in malignancy by mammography is high, but when confidence in malignancy by mammography is low, as in the lower portions of the graphs, the curves in both cases diverge significantly. In both cases the mammography plus AWBU approaches the y-axis indicating better cancer recognition. Fig. 1 Receiver operating characteristic curves averaged across 12 readers for mammography alone (circles and dashed line) and mammography plus AWBU (triangles and solid line) Figure 2 shows the areas under the ROC curves for each reader and for the average of all readers as estimated by multi-reader multi-case analyses. These individual line graphs mirror the improvement in reader performance shown in Table 2. Fig. 2 Changes in areas under the receiver operating characteristic curve(s) for each reader (hollow circles) and averaged across 12 readers (solid circles) Similar to ROC areas, the figures of merit (FOM) were higher for mammography plus AWBU across all readers, compared with mammography alone using both the BIRADS scores (0.786 versus 0.613; F(1, 270) = 34.1, p < 0.001) and DMIST likelihood scores (0.791 versus 0.614; F(1, 238) = 37.9, p < 0.001) as accuracy indices. Confidence in identification of cases for further imaging Readers reviewing cancer cases were more confident in correctly identifying cases for further imaging, i.e., TP reading, using mammography plus AWBU compared with mammography alone. On average, both BIRADS scores (mean = 4.8 versus 4.2, F(1, 740) = 81.91, p < 0.001) and DMIST likelihood scores (mean = 4.8 versus 4.1, F(1, 740) = 82.21, p < 0.001) were higher. Interpretation times Average reading time per study for the 102 AWBUs was 7 min 58 s (7:58) varying from 5:54 to 12:51. The difference in review time was unrelated to the number of cancers identified by each reader (correlation = 0.02, p = 0.96).
title	Results
sec	Sample Subjects averaged 59.4 years of age (SD = 10.2; range = 41–83). The 51 cancer patients and 51 normal subjects were well-matched with an insignificant mean difference of 31.0 days in age between abnormal and normal cases (t test = 1.47, df = 50, p = 0.15). Table 1 lists the types and size of cancers in the test set.
title	Sample
p	Subjects averaged 59.4 years of age (SD = 10.2; range = 41–83). The 51 cancer patients and 51 normal subjects were well-matched with an insignificant mean difference of 31.0 days in age between abnormal and normal cases (t test = 1.47, df = 50, p = 0.15). Table 1 lists the types and size of cancers in the test set.
sec	Identification of cases for further imaging Table 2 details individual performance in the identification of cancer cases for further imaging. Mean sensitivity increased from 50% to 81%, an improvement of 63% in the number of cancer cases identified (25.4 vs. 41.4, F(1, 1,161) = 165.95, p < 0.001). Specificity (60–58%; 30.7 vs. 29.1, F(1, 1,161) = 1.11, p = 0.29), PPV (mean = 47–67%; F(1, 1,297) = 0.02, p = 0.89), and NPV (mean = 65–75%; F(1, 933) = 0.61, p = 0.44) did not change significantly with the addition of AWBU. Table 2 Reader performance categorized by imaging technique (n = 102, 51 positive cases) Reader #a True positives True negatives False positives False negatives M M+A M M+A M M+A M M+A 1 28 45 32 21 27 30 15 6 2 28 45 25 21 33 30 16 6 3 25 44 30 20 32 31 15 7 4 26 43 20 28 43 23 13 8 5 26 43 32 30 28 21 16 8 6 32 43 20 37 43 14 7 8 7 26 41 25 27 33 24 18 10 8 23 40 35 31 17 20 27 11 9 16 40 43 25 21 26 22 11 10 27 39 34 36 27 15 14 12 11 26 37 35 41 24 10 17 14 12 22 37 37 34 20 17 23 14 Mean # of cases 25.4 41.4 30.7 29.3 29.0 21.8 16.9 9.6 % of 51 cases 49.8% 81.2% 60.2% 57.5% 56.9% 42.7% 33.1% 18.8% Mean # of added cases 16.0 −1.4 −7.2 −7.3 Mean % of 51 cases added 31.4% −2.7% −14.1% −14.3% % improvement compared with M alone 63% −4% −25% −43% M mammography, M+A mammography plus automated whole-breast ultrasound (AWBU) aReader # presented by best to worst performance based on sensitivity on M+A Individual success varied from 11 to 24 more cancer cases detected by AWBU. As a percentage of the cancers detected with mammography the range in improvement was 42–150%. Not only did all readers find more cancers individually, but all found 16–29% more cancers than the best mammography reader did with mammography alone. For the best performing mammography reader the cancer detections added by AWBU was predictably lower, as more cancers had already been identified with mammography. For the poorest performer on mammography, the addition of AWBU resulted in a 150% improvement, bringing his overall cancer detection rate near the average for the group. Table 3 shows the average reader performance by tumor size for the 45 image sets of patients with invasive cancer. The greatest percentage increase was for cancers 1 cm and under. This is due largely to the relatively poor performance at detecting these cancers with mammography, where only 26% of cases were correctly identified. By adding AWBU, the detection of these small cancers was increased to 65%. Table 3 Reader performance with 45 invasive cases ≤1 cm >1 to ≤2 cm >2 cm Total # % # % # % # % # of cancers 17 100 22 100 6 100 45 100 Mean cancers by mammography 4.4 26 13.5 61 3.0 50 20.9 46 Mean additional cancers by AWBU 6.7 39 6.6 30 2.0 33 15.3 34 Mean total cases detected 11.1 65 20.1 91 5.0 83 36.2 80 % improvement compared to mammography alone 151% 49% 67% 73% For cases with more than one invasive tumor, the larger of the two was used. For interval cancers after imaging, size is the greatest diameter of the tumor seen retrospectively on the AWBU or mammogram, otherwise the diameter is that reported by pathological diagnosis
title	Identification of cases for further imaging
p	Table 2 details individual performance in the identification of cancer cases for further imaging. Mean sensitivity increased from 50% to 81%, an improvement of 63% in the number of cancer cases identified (25.4 vs. 41.4, F(1, 1,161) = 165.95, p < 0.001). Specificity (60–58%; 30.7 vs. 29.1, F(1, 1,161) = 1.11, p = 0.29), PPV (mean = 47–67%; F(1, 1,297) = 0.02, p = 0.89), and NPV (mean = 65–75%; F(1, 933) = 0.61, p = 0.44) did not change significantly with the addition of AWBU. Table 2 Reader performance categorized by imaging technique (n = 102, 51 positive cases) Reader #a True positives True negatives False positives False negatives M M+A M M+A M M+A M M+A 1 28 45 32 21 27 30 15 6 2 28 45 25 21 33 30 16 6 3 25 44 30 20 32 31 15 7 4 26 43 20 28 43 23 13 8 5 26 43 32 30 28 21 16 8 6 32 43 20 37 43 14 7 8 7 26 41 25 27 33 24 18 10 8 23 40 35 31 17 20 27 11 9 16 40 43 25 21 26 22 11 10 27 39 34 36 27 15 14 12 11 26 37 35 41 24 10 17 14 12 22 37 37 34 20 17 23 14 Mean # of cases 25.4 41.4 30.7 29.3 29.0 21.8 16.9 9.6 % of 51 cases 49.8% 81.2% 60.2% 57.5% 56.9% 42.7% 33.1% 18.8% Mean # of added cases 16.0 −1.4 −7.2 −7.3 Mean % of 51 cases added 31.4% −2.7% −14.1% −14.3% % improvement compared with M alone 63% −4% −25% −43% M mammography, M+A mammography plus automated whole-breast ultrasound (AWBU) aReader # presented by best to worst performance based on sensitivity on M+A
table-wrap	Table 2 Reader performance categorized by imaging technique (n = 102, 51 positive cases) Reader #a True positives True negatives False positives False negatives M M+A M M+A M M+A M M+A 1 28 45 32 21 27 30 15 6 2 28 45 25 21 33 30 16 6 3 25 44 30 20 32 31 15 7 4 26 43 20 28 43 23 13 8 5 26 43 32 30 28 21 16 8 6 32 43 20 37 43 14 7 8 7 26 41 25 27 33 24 18 10 8 23 40 35 31 17 20 27 11 9 16 40 43 25 21 26 22 11 10 27 39 34 36 27 15 14 12 11 26 37 35 41 24 10 17 14 12 22 37 37 34 20 17 23 14 Mean # of cases 25.4 41.4 30.7 29.3 29.0 21.8 16.9 9.6 % of 51 cases 49.8% 81.2% 60.2% 57.5% 56.9% 42.7% 33.1% 18.8% Mean # of added cases 16.0 −1.4 −7.2 −7.3 Mean % of 51 cases added 31.4% −2.7% −14.1% −14.3% % improvement compared with M alone 63% −4% −25% −43% M mammography, M+A mammography plus automated whole-breast ultrasound (AWBU) aReader # presented by best to worst performance based on sensitivity on M+A
label	Table 2
caption	Reader performance categorized by imaging technique (n = 102, 51 positive cases)
p	Reader performance categorized by imaging technique (n = 102, 51 positive cases)
table	Reader #a True positives True negatives False positives False negatives M M+A M M+A M M+A M M+A 1 28 45 32 21 27 30 15 6 2 28 45 25 21 33 30 16 6 3 25 44 30 20 32 31 15 7 4 26 43 20 28 43 23 13 8 5 26 43 32 30 28 21 16 8 6 32 43 20 37 43 14 7 8 7 26 41 25 27 33 24 18 10 8 23 40 35 31 17 20 27 11 9 16 40 43 25 21 26 22 11 10 27 39 34 36 27 15 14 12 11 26 37 35 41 24 10 17 14 12 22 37 37 34 20 17 23 14 Mean # of cases 25.4 41.4 30.7 29.3 29.0 21.8 16.9 9.6 % of 51 cases 49.8% 81.2% 60.2% 57.5% 56.9% 42.7% 33.1% 18.8% Mean # of added cases 16.0 −1.4 −7.2 −7.3 Mean % of 51 cases added 31.4% −2.7% −14.1% −14.3% % improvement compared with M alone 63% −4% −25% −43%
tr	Reader #a True positives True negatives False positives False negatives
th	Reader #a
th	True positives
th	True negatives
th	False positives
th	False negatives
tr	M M+A M M+A M M+A M M+A
th	M
th	M+A
th	M
th	M+A
th	M
th	M+A
th	M
th	M+A
tr	1 28 45 32 21 27 30 15 6
td	1
td	28
td	45
td	32
td	21
td	27
td	30
td	15
td	6
tr	2 28 45 25 21 33 30 16 6
td	2
td	28
td	45
td	25
td	21
td	33
td	30
td	16
td	6
tr	3 25 44 30 20 32 31 15 7
td	3
td	25
td	44
td	30
td	20
td	32
td	31
td	15
td	7
tr	4 26 43 20 28 43 23 13 8
td	4
td	26
td	43
td	20
td	28
td	43
td	23
td	13
td	8
tr	5 26 43 32 30 28 21 16 8
td	5
td	26
td	43
td	32
td	30
td	28
td	21
td	16
td	8
tr	6 32 43 20 37 43 14 7 8
td	6
td	32
td	43
td	20
td	37
td	43
td	14
td	7
td	8
tr	7 26 41 25 27 33 24 18 10
td	7
td	26
td	41
td	25
td	27
td	33
td	24
td	18
td	10
tr	8 23 40 35 31 17 20 27 11
td	8
td	23
td	40
td	35
td	31
td	17
td	20
td	27
td	11
tr	9 16 40 43 25 21 26 22 11
td	9
td	16
td	40
td	43
td	25
td	21
td	26
td	22
td	11
tr	10 27 39 34 36 27 15 14 12
td	10
td	27
td	39
td	34
td	36
td	27
td	15
td	14
td	12
tr	11 26 37 35 41 24 10 17 14
td	11
td	26
td	37
td	35
td	41
td	24
td	10
td	17
td	14
tr	12 22 37 37 34 20 17 23 14
td	12
td	22
td	37
td	37
td	34
td	20
td	17
td	23
td	14
tr	Mean # of cases 25.4 41.4 30.7 29.3 29.0 21.8 16.9 9.6
td	Mean # of cases
td	25.4
td	41.4
td	30.7
td	29.3
td	29.0
td	21.8
td	16.9
td	9.6
tr	% of 51 cases 49.8% 81.2% 60.2% 57.5% 56.9% 42.7% 33.1% 18.8%
td	% of 51 cases
td	49.8%
td	81.2%
td	60.2%
td	57.5%
td	56.9%
td	42.7%
td	33.1%
td	18.8%
tr	Mean # of added cases 16.0 −1.4 −7.2 −7.3
td	Mean # of added cases
td	16.0
td	−1.4
td	−7.2
td	−7.3
tr	Mean % of 51 cases added 31.4% −2.7% −14.1% −14.3%
td	Mean % of 51 cases added
td	31.4%
td	−2.7%
td	−14.1%
td	−14.3%
tr	% improvement compared with M alone 63% −4% −25% −43%
td	% improvement compared with M alone
td	63%
td	−4%
td	−25%
td	−43%
table-wrap-foot	M mammography, M+A mammography plus automated whole-breast ultrasound (AWBU) aReader # presented by best to worst performance based on sensitivity on M+A
p	M mammography, M+A mammography plus automated whole-breast ultrasound (AWBU)
p	aReader # presented by best to worst performance based on sensitivity on M+A
p	Individual success varied from 11 to 24 more cancer cases detected by AWBU. As a percentage of the cancers detected with mammography the range in improvement was 42–150%. Not only did all readers find more cancers individually, but all found 16–29% more cancers than the best mammography reader did with mammography alone. For the best performing mammography reader the cancer detections added by AWBU was predictably lower, as more cancers had already been identified with mammography. For the poorest performer on mammography, the addition of AWBU resulted in a 150% improvement, bringing his overall cancer detection rate near the average for the group.
p	Table 3 shows the average reader performance by tumor size for the 45 image sets of patients with invasive cancer. The greatest percentage increase was for cancers 1 cm and under. This is due largely to the relatively poor performance at detecting these cancers with mammography, where only 26% of cases were correctly identified. By adding AWBU, the detection of these small cancers was increased to 65%. Table 3 Reader performance with 45 invasive cases ≤1 cm >1 to ≤2 cm >2 cm Total # % # % # % # % # of cancers 17 100 22 100 6 100 45 100 Mean cancers by mammography 4.4 26 13.5 61 3.0 50 20.9 46 Mean additional cancers by AWBU 6.7 39 6.6 30 2.0 33 15.3 34 Mean total cases detected 11.1 65 20.1 91 5.0 83 36.2 80 % improvement compared to mammography alone 151% 49% 67% 73% For cases with more than one invasive tumor, the larger of the two was used. For interval cancers after imaging, size is the greatest diameter of the tumor seen retrospectively on the AWBU or mammogram, otherwise the diameter is that reported by pathological diagnosis
table-wrap	Table 3 Reader performance with 45 invasive cases ≤1 cm >1 to ≤2 cm >2 cm Total # % # % # % # % # of cancers 17 100 22 100 6 100 45 100 Mean cancers by mammography 4.4 26 13.5 61 3.0 50 20.9 46 Mean additional cancers by AWBU 6.7 39 6.6 30 2.0 33 15.3 34 Mean total cases detected 11.1 65 20.1 91 5.0 83 36.2 80 % improvement compared to mammography alone 151% 49% 67% 73% For cases with more than one invasive tumor, the larger of the two was used. For interval cancers after imaging, size is the greatest diameter of the tumor seen retrospectively on the AWBU or mammogram, otherwise the diameter is that reported by pathological diagnosis
label	Table 3
caption	Reader performance with 45 invasive cases
p	Reader performance with 45 invasive cases
table	≤1 cm >1 to ≤2 cm >2 cm Total # % # % # % # % # of cancers 17 100 22 100 6 100 45 100 Mean cancers by mammography 4.4 26 13.5 61 3.0 50 20.9 46 Mean additional cancers by AWBU 6.7 39 6.6 30 2.0 33 15.3 34 Mean total cases detected 11.1 65 20.1 91 5.0 83 36.2 80 % improvement compared to mammography alone 151% 49% 67% 73%
tr	≤1 cm >1 to ≤2 cm >2 cm Total
th	≤1 cm
th	>1 to ≤2 cm
th	>2 cm
th	Total
tr	# % # % # % # %
th	#
th	%
th	#
th	%
th	#
th	%
th	#
th	%
tr	# of cancers 17 100 22 100 6 100 45 100
td	# of cancers
td	17
td	100
td	22
td	100
td	6
td	100
td	45
td	100
tr	Mean cancers by mammography 4.4 26 13.5 61 3.0 50 20.9 46
td	Mean cancers by mammography
td	4.4
td	26
td	13.5
td	61
td	3.0
td	50
td	20.9
td	46
tr	Mean additional cancers by AWBU 6.7 39 6.6 30 2.0 33 15.3 34
td	Mean additional cancers by AWBU
td	6.7
td	39
td	6.6
td	30
td	2.0
td	33
td	15.3
td	34
tr	Mean total cases detected 11.1 65 20.1 91 5.0 83 36.2 80
td	Mean total cases detected
td	11.1
td	65
td	20.1
td	91
td	5.0
td	83
td	36.2
td	80
tr	% improvement compared to mammography alone 151% 49% 67% 73%
td	% improvement compared to mammography alone
td	151%
td	49%
td	67%
td	73%
table-wrap-foot	For cases with more than one invasive tumor, the larger of the two was used. For interval cancers after imaging, size is the greatest diameter of the tumor seen retrospectively on the AWBU or mammogram, otherwise the diameter is that reported by pathological diagnosis
p	For cases with more than one invasive tumor, the larger of the two was used. For interval cancers after imaging, size is the greatest diameter of the tumor seen retrospectively on the AWBU or mammogram, otherwise the diameter is that reported by pathological diagnosis
sec	Accuracy The ROC area was greater for mammography plus AWBU for both BIRADS (0.808 versus 0.701; F(1, 123) = 14.79, p < 0.001) and likelihood scores (0.810 versus 0.703; F(1, 85) = 17.88, p < 0.001) as estimated by multi-reader multi-case analyses. This is highlighted in Fig. 1 by ROC curves that are generated by averaging the results of separate ROC analyses for each reader. The BIRADS and likelihood AUC curves for mammography and mammography plus AWBU in both cases almost superimpose when confidence in malignancy by mammography is high, but when confidence in malignancy by mammography is low, as in the lower portions of the graphs, the curves in both cases diverge significantly. In both cases the mammography plus AWBU approaches the y-axis indicating better cancer recognition. Fig. 1 Receiver operating characteristic curves averaged across 12 readers for mammography alone (circles and dashed line) and mammography plus AWBU (triangles and solid line) Figure 2 shows the areas under the ROC curves for each reader and for the average of all readers as estimated by multi-reader multi-case analyses. These individual line graphs mirror the improvement in reader performance shown in Table 2. Fig. 2 Changes in areas under the receiver operating characteristic curve(s) for each reader (hollow circles) and averaged across 12 readers (solid circles) Similar to ROC areas, the figures of merit (FOM) were higher for mammography plus AWBU across all readers, compared with mammography alone using both the BIRADS scores (0.786 versus 0.613; F(1, 270) = 34.1, p < 0.001) and DMIST likelihood scores (0.791 versus 0.614; F(1, 238) = 37.9, p < 0.001) as accuracy indices.
title	Accuracy
p	The ROC area was greater for mammography plus AWBU for both BIRADS (0.808 versus 0.701; F(1, 123) = 14.79, p < 0.001) and likelihood scores (0.810 versus 0.703; F(1, 85) = 17.88, p < 0.001) as estimated by multi-reader multi-case analyses. This is highlighted in Fig. 1 by ROC curves that are generated by averaging the results of separate ROC analyses for each reader. The BIRADS and likelihood AUC curves for mammography and mammography plus AWBU in both cases almost superimpose when confidence in malignancy by mammography is high, but when confidence in malignancy by mammography is low, as in the lower portions of the graphs, the curves in both cases diverge significantly. In both cases the mammography plus AWBU approaches the y-axis indicating better cancer recognition. Fig. 1 Receiver operating characteristic curves averaged across 12 readers for mammography alone (circles and dashed line) and mammography plus AWBU (triangles and solid line)
figure	Fig. 1 Receiver operating characteristic curves averaged across 12 readers for mammography alone (circles and dashed line) and mammography plus AWBU (triangles and solid line)
label	Fig. 1
caption	Receiver operating characteristic curves averaged across 12 readers for mammography alone (circles and dashed line) and mammography plus AWBU (triangles and solid line)
p	Receiver operating characteristic curves averaged across 12 readers for mammography alone (circles and dashed line) and mammography plus AWBU (triangles and solid line)
p	Figure 2 shows the areas under the ROC curves for each reader and for the average of all readers as estimated by multi-reader multi-case analyses. These individual line graphs mirror the improvement in reader performance shown in Table 2. Fig. 2 Changes in areas under the receiver operating characteristic curve(s) for each reader (hollow circles) and averaged across 12 readers (solid circles)
figure	Fig. 2 Changes in areas under the receiver operating characteristic curve(s) for each reader (hollow circles) and averaged across 12 readers (solid circles)
label	Fig. 2
caption	Changes in areas under the receiver operating characteristic curve(s) for each reader (hollow circles) and averaged across 12 readers (solid circles)
p	Changes in areas under the receiver operating characteristic curve(s) for each reader (hollow circles) and averaged across 12 readers (solid circles)
p	Similar to ROC areas, the figures of merit (FOM) were higher for mammography plus AWBU across all readers, compared with mammography alone using both the BIRADS scores (0.786 versus 0.613; F(1, 270) = 34.1, p < 0.001) and DMIST likelihood scores (0.791 versus 0.614; F(1, 238) = 37.9, p < 0.001) as accuracy indices.
sec	Confidence in identification of cases for further imaging Readers reviewing cancer cases were more confident in correctly identifying cases for further imaging, i.e., TP reading, using mammography plus AWBU compared with mammography alone. On average, both BIRADS scores (mean = 4.8 versus 4.2, F(1, 740) = 81.91, p < 0.001) and DMIST likelihood scores (mean = 4.8 versus 4.1, F(1, 740) = 82.21, p < 0.001) were higher.
title	Confidence in identification of cases for further imaging
p	Readers reviewing cancer cases were more confident in correctly identifying cases for further imaging, i.e., TP reading, using mammography plus AWBU compared with mammography alone. On average, both BIRADS scores (mean = 4.8 versus 4.2, F(1, 740) = 81.91, p < 0.001) and DMIST likelihood scores (mean = 4.8 versus 4.1, F(1, 740) = 82.21, p < 0.001) were higher.
sec	Interpretation times Average reading time per study for the 102 AWBUs was 7 min 58 s (7:58) varying from 5:54 to 12:51. The difference in review time was unrelated to the number of cancers identified by each reader (correlation = 0.02, p = 0.96).
title	Interpretation times
p	Average reading time per study for the 102 AWBUs was 7 min 58 s (7:58) varying from 5:54 to 12:51. The difference in review time was unrelated to the number of cancers identified by each reader (correlation = 0.02, p = 0.96).
sec	Discussion Significant improvement in identification of asymptomatic cancers occurred for all readers in this study. This is shown by a 63% increase in callbacks of cancer cases with only a 4% decrease in correct identification of the true negative cases. The confidence of the diagnoses of the 102 cases with predictive BIRADS and DMIST likelihood scales was confirmed by using AUC and FOM methodology. With a short training period experienced radiologists using 2D AWBU significantly improve their ability to diagnose cancer in dense-breasted women. This type of AWBU is similar in appearance to real-time ultrasound images. The slower transducer speed enforced by the AWBU decreases inter-image distance, allowing the reader more time to identify small masses. At a review speed of 10 images per second the observer has 0.5 s to identify a 5-mm mass. A high-resolution computer screen, along with a post-processing technique to expand the grayscale at the black end of the spectrum, results in visually sharper margins and more contrast of masses against the background tissue. These factors are designed to make recognition of invasive cancers easier and more reliable. This automated process for breast ultrasound eliminates operator variability, provides greater consistency, and ensures reproducibility of quality images. Study radiologists increased discovery of T1a and T1b invasive cancers 150% over mammography alone (Table 3). The average review time per AWBU study was about 11 min shorter than the 19 min for radiologists in the ACRIN 6666 trial of handheld screening ultrasound [8]. As half of our test set subjects had cancers, it would be expected that the average review time we observed for AWBU would be significantly longer than in a typical screening population with mostly normal studies. Our study had a number of inherent weaknesses. Although the test set was confidential, the readers probably quickly realized that it was enriched. They may have been extraordinarily vigilant resulting in increases in both TPs and FPs. A false increase in TPs would occur if all the correctly identified cancers were not subsequently confirmed with biopsy. Also, analysis was performed on a case basis in the three patients in whom cancers were present in both breasts; it was assumed if one of the cancers was identified, the cancer in the other breast would be found by the subsequent workup. This assumption might have falsely raised the TPs and reduced the FNs . In addition, we did not have a comparison with hand-held ultrasound. Any of the following factors could have decreased the readers’ accuracy with AWBU (decreased true positives and negatives, and increased false positives and negatives) compared with a normal screening situation. Fatigue—Readers reviewed an average of 34 AWBUs daily. Inexperience with ultrasound screening—Some readers do not perform screening ultrasound. Limited experience with AWBU—This was the first exposure to AWBU for 11 of the readers. Unfamiliarity with some ultrasound formats—Images from many different manufacturers were used. In spite of these hindrances our observations clearly show that radiologists improve detection of cancers, especially small invasive ones, by adding AWBU to mammography findings.
title	Discussion
p	Significant improvement in identification of asymptomatic cancers occurred for all readers in this study. This is shown by a 63% increase in callbacks of cancer cases with only a 4% decrease in correct identification of the true negative cases. The confidence of the diagnoses of the 102 cases with predictive BIRADS and DMIST likelihood scales was confirmed by using AUC and FOM methodology.
p	With a short training period experienced radiologists using 2D AWBU significantly improve their ability to diagnose cancer in dense-breasted women. This type of AWBU is similar in appearance to real-time ultrasound images. The slower transducer speed enforced by the AWBU decreases inter-image distance, allowing the reader more time to identify small masses. At a review speed of 10 images per second the observer has 0.5 s to identify a 5-mm mass. A high-resolution computer screen, along with a post-processing technique to expand the grayscale at the black end of the spectrum, results in visually sharper margins and more contrast of masses against the background tissue. These factors are designed to make recognition of invasive cancers easier and more reliable. This automated process for breast ultrasound eliminates operator variability, provides greater consistency, and ensures reproducibility of quality images. Study radiologists increased discovery of T1a and T1b invasive cancers 150% over mammography alone (Table 3).
p	The average review time per AWBU study was about 11 min shorter than the 19 min for radiologists in the ACRIN 6666 trial of handheld screening ultrasound [8]. As half of our test set subjects had cancers, it would be expected that the average review time we observed for AWBU would be significantly longer than in a typical screening population with mostly normal studies.
p	Our study had a number of inherent weaknesses. Although the test set was confidential, the readers probably quickly realized that it was enriched. They may have been extraordinarily vigilant resulting in increases in both TPs and FPs.
p	A false increase in TPs would occur if all the correctly identified cancers were not subsequently confirmed with biopsy. Also, analysis was performed on a case basis in the three patients in whom cancers were present in both breasts; it was assumed if one of the cancers was identified, the cancer in the other breast would be found by the subsequent workup. This assumption might have falsely raised the TPs and reduced the FNs . In addition, we did not have a comparison with hand-held ultrasound.
p	Any of the following factors could have decreased the readers’ accuracy with AWBU (decreased true positives and negatives, and increased false positives and negatives) compared with a normal screening situation. Fatigue—Readers reviewed an average of 34 AWBUs daily. Inexperience with ultrasound screening—Some readers do not perform screening ultrasound. Limited experience with AWBU—This was the first exposure to AWBU for 11 of the readers. Unfamiliarity with some ultrasound formats—Images from many different manufacturers were used.
p	Fatigue—Readers reviewed an average of 34 AWBUs daily.
p	Inexperience with ultrasound screening—Some readers do not perform screening ultrasound.
p	Limited experience with AWBU—This was the first exposure to AWBU for 11 of the readers.
p	Unfamiliarity with some ultrasound formats—Images from many different manufacturers were used.
p	In spite of these hindrances our observations clearly show that radiologists improve detection of cancers, especially small invasive ones, by adding AWBU to mammography findings.
sec	Conclusion This article demonstrates that experienced breast radiologists can learn to interpret 2D AWBU quickly. Radiologists will significantly improve their cancer detection rates in dense-breasted women by adding AWBU to mammography. This procedure has the potential for both standardizing the performance of whole-breast ultrasound and shortening the time required for radiologists.
title	Conclusion
p	This article demonstrates that experienced breast radiologists can learn to interpret 2D AWBU quickly. Radiologists will significantly improve their cancer detection rates in dense-breasted women by adding AWBU to mammography. This procedure has the potential for both standardizing the performance of whole-breast ultrasound and shortening the time required for radiologists.
back	Appendix A We thank the radiologists participating in this trial: Catherine Babcook, Debra Copit, Ruth English, Jon Fish, Thalia B. Forte, Michael N. Linver, Laleh Lourie, Carrie Morrisson, Susan Roux, Thomas Stavros, Lakshmi Tegulapalle, and Richard Vanesian. Financial disclosures Dr. Kelly is the Majority Stockholder of Sonocine, Inc. Dr. Dean owns stock in Sonocine, Inc. Neither author has received any form of payment from the company. Drs. Lee and Comulada served as statistical consultants for the study and have no conflict of interest relevant to Sonocine, Inc. The study was funded by Sonocine, Inc. Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
appendix	Appendix A
sec	Appendix A
title	Appendix A
ack	We thank the radiologists participating in this trial: Catherine Babcook, Debra Copit, Ruth English, Jon Fish, Thalia B. Forte, Michael N. Linver, Laleh Lourie, Carrie Morrisson, Susan Roux, Thomas Stavros, Lakshmi Tegulapalle, and Richard Vanesian. Financial disclosures Dr. Kelly is the Majority Stockholder of Sonocine, Inc. Dr. Dean owns stock in Sonocine, Inc. Neither author has received any form of payment from the company. Drs. Lee and Comulada served as statistical consultants for the study and have no conflict of interest relevant to Sonocine, Inc. The study was funded by Sonocine, Inc. Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
p	We thank the radiologists participating in this trial: Catherine Babcook, Debra Copit, Ruth English, Jon Fish, Thalia B. Forte, Michael N. Linver, Laleh Lourie, Carrie Morrisson, Susan Roux, Thomas Stavros, Lakshmi Tegulapalle, and Richard Vanesian.
sec	Financial disclosures Dr. Kelly is the Majority Stockholder of Sonocine, Inc. Dr. Dean owns stock in Sonocine, Inc. Neither author has received any form of payment from the company. Drs. Lee and Comulada served as statistical consultants for the study and have no conflict of interest relevant to Sonocine, Inc. The study was funded by Sonocine, Inc.
title	Financial disclosures
p	Dr. Kelly is the Majority Stockholder of Sonocine, Inc. Dr. Dean owns stock in Sonocine, Inc. Neither author has received any form of payment from the company. Drs. Lee and Comulada served as statistical consultants for the study and have no conflict of interest relevant to Sonocine, Inc. The study was funded by Sonocine, Inc.
sec	Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
title	Open Access
p	This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Annnotations TAB TSV DIC JSON TextAE

last updated at 2021-06-01 04:20:07 UTC

Denotations: 0
Blocks: 0
Relations: 0

PMC:2948156 / 6974-7517 JSONTXT

Document structure show

Annnotations TAB TSV DIC JSON TextAE

PMC:2948156 / 6974-7517 JSON TXT