Discussion and conclusion The above summary of the published results demonstrates several debatable issues concerning the CNBSS:It has to be emphasized that by definition, screening should address asymptomatic women [40]. Current screening is population-based and aims to invite asymptomatic women. The prevalent round of the Canadian trial had a very high proportion of palpable cancers (partly skewed by the nature of their recruitment strategy). Thus, their results are not applicable to current practice where there is uniform access to high quality symptomatic services for women with symptoms. By accepting palpable tumours (most probably advanced and with worse prognosis), the results are skewed: the overall numbers of cancers are artificially high while the palpable cancers cannot contribute to an improved mortality reduction. Thus, the screening effect will be considerably diluted and underestimated. This leads also to underpowered statistics [28, 31]. The documented process of randomisation did not warrant blinding. Thus, any person involved in the study could subvert the randomisation. Also, the probability of subversion was enhanced since mammography was not necessarily offered to women in the control group. We do not assume that the principal investigators committed any fraud. However, they could not have prevented subversion with the chosen protocol. The disproportional distribution of far advanced stages (cancers with > 4 involved lymph nodes) in the prevalence screen < age 50 years is highly significant and supports the doubts concerning correct randomisation. An even distribution of demographic and risk factors cannot exclude a bias toward late stage cancers, which may severely impact on the assessment of mortality reduction and calculation of overdiagnoses. Long-term mortality reduction was calculated from a maximum of five annual rounds. Because of the short overall duration and continuing entrance of first round screenees, the maximum screening effect could not be reached for many of the participants. Mortality reduction was calculated based on cumulative rates of a mixed trial participation of one to five rounds during up to 5 years. This might lead to a substantial underestimation of the true screening effect compared to a screening programme following approved guidelines (in which participants undergo approximately 10 complete rounds in 20 years) [41]. The higher evidence classification of individual versus cluster randomized studies is correct in principle. But for screening trials, where non-invited and invited women cannot be blinded, individual randomisation may lead to a much higher contamination of the control group than in a cluster randomized setting. Thus, in the CNBSS, as in other individual randomized screening trials, a substantial underestimation of the screening effect through contamination cannot be ruled out. The fact that none of the radiological reviewers including the responsible physicist considered the quality sufficient is highly concerning, as is the described lack of technologist and reader training and the high rate of interval cancers. Two reviewers resigned during the study. How can a method be tested if it is not properly performed and interpreted? What is the value of the results? The fact that recommended biopsies of mammographically detected abnormalities were not systematically performed is likely to have distorted the results. What effect is expected from early detection if suspicious findings are not followed by adequate assessment and therapy? Both obvious and probable protocol deficiencies are likely to have had an impact on the results counteracting a possible effect of mammography screening on breast cancer mortality and distorting estimates on overdiagnosis. As appropriate randomisation is one of the key validity criteria for RCTs, a study with that kind of violation should not be rated as a high quality study. Whether evidence from such a trial can be used at all must be questioned. Even if the raised concerns were insignificant and the results were valid, the question remains whether the results and conclusions from a screening trial performed in 1980-85 are applicable to or useful for the assessment of present screening programmes. The most appropriate answer is obviously “no”. First, the age range in the CNBSS was 40–59 years, which does not apply to the age range of most mammography screening programs today (50–69 or 74 years), as recommended in National and European Guidelines. Because of the lower incidence of breast cancer at younger ages, the absolute effect is lower. Today we know that mammography quality is even more important in the younger age group due to more difficult detection within dense breast tissue. Secondly, the mammographic technique and quality assurance of the complete chain from screening to screen reading, assessment, and treatment in modern population-based mammography screening programs is almost completely different from the CNBSS [42]. Abnormalities are routinely assessed using state of the art minimally invasive methods, and treatment is increasingly standardized and adapted to the stage at detection and the aggressiveness of the cancer. Finally the improved quality today clearly has increased the sensitivity and specificity of mammography screening. The result of 68 % of palpable breast cancers in the mammography arm (average size 1.9 cm) would today be unacceptable for annual (!) screening. What can we conclude from this short review? Probable deficits of randomisation and of proper application of the test cannot be repaired by performing a follow-up study. The results of such a study remain biased. We do not want to discount the CNBSS which represents an enormous and exceptional effort in the 1980s, and we took note of its size, the time, and circumstances when it was conducted. Also, the important question asked in this trial was different from all other screening trials. However, the chosen methodology led to obvious biases with significant impact on the results, especially when long-term results are considered. Furthermore, it is more than obvious that the setting of mammography in the CNBSS is not comparable to present mammography screening programs. Therefore, using the CNBSS as “highest evidence” to assess the effects of modern mammography screening programs of the new millennium is not scientifically justified. Considering the fact that properly performed cohort studies and nested case-control studies with appropriate consideration of length time bias demonstrate a much higher effect of mortality reduction the assumption that the null effect of the CNBSS is “due to availability of chemotherapy” [20] is unproven and highly speculative. It is an unanswered question, why high-ranking journals, like the BMJ, and representatives of evidence-based medicine close their eyes to these arguments [43] and still refer to the CNBSS as “superior” evidence (against mammography screening). Taking the CNBSS as an example, the authors want to point out how evidence in the field of breast cancer screening has systematically been omitted, distorted, or inappropriately used over the last decades. When using CNBSS data, opponents of screening mammography [6, 19, 44, 45] ignore or misinterpret an important part of the existing evidence. The consequence of this recommendation is “waiting until a cancer becomes palpable”. This means that contrary to early detection, women would present at a stage that usually requires aggressive treatment including chemotherapy and more often axillary dissection. There is no doubt that evidence shows that the earlier the stage of breast cancer at diagnosis, the better the prognosis. In conclusion, the comparison of the settings of the CNBSS with the setting of modern mammography screening is akin to comparing apples with pears. Drawing conclusions from the CNBSS for today’s quality-assured population based screening programmes is an act of negligence. What we need today is the continuous evaluation of the ongoing mammography screening programmes, including, but not only, breast cancer mortality as an outcome.