Summary of main results Individual signs and symptoms appear to have very poor diagnostic properties for COVID‐19, although this has to be interpreted in the presence of a limited number of studies, heterogeneity between the studies precluding any firm conclusions and in a context of selection bias. Most features had very low sensitivity, while specificity was moderate to high. We have identified four possible red flags, that is, symptoms that increased the probability of COVID‐19 when present because of a positive likelihood ratio of at least 5 in at least one study: fever, myalgia or arthralgia, fatigue, and headache. When we apply the results of sensitivity and specificity of these systemic features to disease probabilities, we assess their value to rule in and rule out disease as shown in the dumbbell plots (Figure 16). These clearly show the limited effect on disease probability from these signs and symptoms. Importantly, we did not find any studies investigating the diagnostic accuracy of combinations of signs and symptoms. There were also no studies from community primary care settings. Some of our findings are counterintuitive, for example that the majority of the studies that investigated cough found that cough decreases the probability of COVID‐19 despite the fact that it is part of the case definition of COVID‐19 in most countries. This is also the case for fever in two studies and myalgia in one study ‐ even though these features were also red flags in at least one other study. We believe this may be caused by selection bias. Selection bias is present when selective and non‐random inclusion and exclusion of participants apply and the resulting association between exposure and outcome (here the accuracy of the test) differs in the selected study population compared to the eligible study population, and it has been shown that this may decrease estimates of diagnostic accuracy (Rutjes 2006). For the diagnosis of COVID‐19, rapidly and constantly changing, and widely variable test criteria have influenced who was referred for testing and who was not. Inclusion in the study of only a selective fraction of eligible patients can give a biased estimate of the real accuracy of the index test when measured against the reference standard and real disease status. Griffith 2020 reported on the problematic presence of collider stratification bias in the published studies on COVID‐19. Appropriate sampling strategies need to be applied to avoid conclusions of spurious relationships, more specifically in our case, the biased accuracy estimates of signs and symptoms for the diagnosis of COVID‐19 disease. Selection of patients based on the presence of specific pre‐set symptoms, such as fever and cough, lead to biased associations between these symptoms and disease, and sensitivity and specificity estimates that differ from their true values. The example of collider bias for cough is illustrated in Figure 17. Grouping studies by diagnostic criteria for selection might clarify this issue, but studies do not clearly describe them, with study authors referring to the guidelines in general that were applicable at the time. 17 Directed acyclic graph on cough Another form of selection bias is spectrum bias, where the patients included in the studies do not reflect the patient spectrum to which the index test will be applied. The inclusion of hospitalised patients can lead to such a bias, when in these patients both the distribution of signs and symptoms differ and assessment with the reference standard is differential. In addition, the distribution and severity of alternative diagnoses may be different in hospitalised populations than in patients presenting to ambulatory care settings.