Methods

Criteria for considering studies for this review

Types of studies
We kept the eligibility criteria purposely broad to include all patient groups and all variations of a test at this initial stage of reviewing the evidence (that is, if the patient population was unclear, we included the study).
We included studies of all designs that produce estimates of test accuracy or provide data from which estimates can be computed. We included both single‐gate (studies that recruit from a patient pathway before disease status has been ascertained) and multi‐gate (where people with and without the target condition are recruited separately) designs. This means that we included studies that were cross‐sectional or diagnostic case‐control type studies.
When interpreting the results, we made sure that the limitations of different study designs were carefully considered, using quality assessment and analysis.

Participants
Studies recruiting people presenting to primary care or outpatient hospital settings with suspicion of COVID‐19 disease were eligible.
For the initial version of this review, we included studies that recruited symptomatic people either known to have SARS‐CoV‐2 infection or known not to have SARS‐CoV‐2 infection.
Studies had to have a sample size of a minimum of 10 participants.

Index tests
All signs and symptoms, including:
signs such as oxygen saturation, measured by oximetry or blood pressure;
classic symptoms, such as fever or cough.
We included combinations of signs and symptoms, but not when they were combined with laboratory, imaging, or other types of index tests as these will be covered in the other reviews.

Target conditions
To be eligible studies had to identify at least one of:
mild or moderate COVID‐19 disease;
COVID‐19 pneumonia.
Asymptomatic infection with SARS‐CoV‐2 infection is out of scope for this review, considering it is by definition not possible to detect this based on signs and symptoms.

Reference standards
We anticipated that studies would use a range of reference standards. Although RT‐PCR is considered the best available test, due to rapidly evolving knowledge about the target conditions, multiple reference standards on their own as well as in combination have emerged.
We expected to encounter cases defined by:
RT‐PCR alone;
RT‐PCR, clinical expertise, and imaging (for example, CT thorax);
repeated RT‐PCR several days apart or from different samples;
plaque reduction neutralisation test (PRNT) or enzyme‐linked immunosorbent assay(ELISA) tests;
information available at a subsequent time point;
World Health Organization (WHO) and other case definitions (see Appendix 1).
This list is not exhaustive, and we recorded all reference standards encountered. With a group of methodological and clinical experts, we are producing a ranking of reference standards according to their ability to correctly classify participants using a consensus process. We will use the ranking for informing the assessment of methodological quality in the next update of this review.

Search methods for identification of studies
The final search date for this version of the review is 27 April 2020.

Electronic searches
We conducted a single literature search to cover our suite of Cochrane COVID‐19 diagnostic test accuracy (DTA) reviews (Deeks 2020; McInnes 2020).
We conducted electronic searches using two primary sources. Both of these searches aimed to identify all published articles and preprints related to COVID‐19, and were not restricted to those evaluating biomarkers or tests. Thus, there are no test terms, diagnosis terms, or methodological terms in the searches. Searches were limited to 2019 and 2020, and for this version of the review have been conducted to 27 April 2020.

Cochrane COVID‐19 Study Register searches
We used the Cochrane COVID‐19 Study Register (covid-19.cochrane.org/), for searches conducted to 28 March 2020. At that time, the register was populated by searches of PubMed, as well as trials registers at ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform (ICTRP).
Search strategies were designed for maximum sensitivity, to retrieve all human studies on COVID‐19 and with no language limits. See Appendix 2.

COVID‐19 Living Evidence Database from the University of Bern
From 28 March 2020, we used the COVID‐19 Living Evidence database from the Institute of Social and Preventive Medicine (ISPM) at the University of Bern (www.ispm.unibe.ch), as the primary source of records for the Cochrane COVID‐19 DTA reviews. This search includes PubMed, Embase, and preprints indexed in bioRxiv and medRxiv databases. The strategies as described on the ISPM website are described here (ispmbern.github.io/covid-19/). See Appendix 3.
The decision to focus primarily on the 'Bern' feed was due to the exceptionally large numbers of COVID‐19 studies available only as preprints. The Cochrane COVID‐19 Study Register has undergone a number of iterations since the end of March 2020 and we anticipate moving back to the Cochrane COVID‐19 Study Register as the primary source of records for subsequent review updates.

Searching other resources
We obtained Embase records through Martha Knuth for the Centers for Disease Control and Prevention (CDC), Stephen B Thacker CDC Library, COVID‐19 Research Articles Downloadable Database and de‐duplicated them against the Cochrane COVID‐19 Study Register up to 1 April 2020. See Appendix 4.
We also checked our search results against two additional repositories of COVID‐19 publications including:
the Evidence for Policy and Practice Information and Co‐ordinating Centre (EPPI‐Centre) 'COVID‐19: Living map of the evidence' (eppi.ioe.ac.uk/COVID19_MAP/covid_map_v4.html);
the Norwegian Institute of Public Health 'NIPH systematic and living map on COVID‐19 evidence' (www.nornesk.no/forskningskart/NIPH_diagnosisMap.html)
Both of these repositories allow their contents to be filtered according to studies potentially relating to diagnosis, and both have agreed to provide us with updates of new diagnosis studies added. For this iteration of the review, we examined all diagnosis studies from both sources up to 16 April 2020.
We did not apply any language restrictions.

Data collection and analysis

Selection of studies
Pairs of review authors independently screened studies. We resolved disagreements by discussion with a third, experienced review author for initial title and abstract screening, and through discussion between three review authors for eligibility assessments.

Data extraction and management
Pairs of review authors independently performed data extraction. We resolved disagreements by discussion between three review authors.
We intended to contact study authors where we needed to clarify details or obtain missing information.

Assessment of methodological quality
Pairs of review authors independently assessed risk of bias and applicability concerns using the QUADAS‐2 (Quality Assessment tool for Diagnostic Accuracy Studies) checklist, which was common to the suite of reviews but tailored to each particular review (Whiting 2011; Table 2). For this review, we excluded the questions on the nature of the samples as these were not relevant, and we added a question on who assessed the signs. We resolved disagreements by discussion between three review authors.
1  QUADAS‐2 checklist
Index test(s)   Signs and symptoms
Patients (setting, intended use of index test, presentation, prior testing)  Primary care, hospital outpatient settings including emergency departmentsInpatients presenting with suspected COVID‐19No prior testingSigns and symptoms often used for triage or referral
Reference standard and target condition  The focus will be on the diagnosis of COVID‐19 disease and COVID‐19 pneumonia. For this review, the focus will not be on prognosis.
Participant selection
Was a consecutive or random sample of patients enrolled?  This will be similar for all index tests, target conditions, and populations.YES: if a study explicitly stated that all participants within a certain time frame were included; that this was done consecutively; or that a random selection was done.NO: if it was clear that a different selection procedure was employed; for example, selection based on clinician's preference, or based on institutions.UNCLEAR: if the selection procedure was not clear or not reported.
Was a case‐control design avoided?  This will be similar for all index tests, target conditions, and populations.YES: if a study explicitly stated that all participants came from the same group of (suspected) patients.NO: if it was clear that a different selection procedure was employed for the participants depending on their COVID‐19 (pneumonia) status or SARS‐CoV‐2 infection status.UNCLEAR: if the selection procedure was not clear or not reported.
Did the study avoid inappropriate exclusions?  Studies may have excluded participants, or selected participants in such a way that they avoided including those who were difficult to diagnose or likely to be borderline. Although the inclusion and exclusion criteria will be different for the different index tests, inappropriate exclusions and inclusions will be similar for all index tests: for example, only elderly patients excluded, or children (as sampling may be more difficult). This needs to be addressed on a case‐by‐case basis.YES: if a high proportion of eligible patients was included without clear selection.NO: if a high proportion of eligible patients was excluded without providing a reason; if, in a retrospective study, participants without index test or reference standard results were excluded; if exclusion was based on severity assessment post‐factum or comorbidities (cardiovascular disease, diabetes, immunosuppression).UNCLEAR: if the exclusion criteria were not reported.
Did the study avoid inappropriate inclusions?  YES: if samples included were likely to be representative of the spectrum of disease.NO: if the study oversampled patients with particular characteristics likely to affect estimates of accuracy.UNCLEAR: if the exclusion criteria were not reported.
Could the selection of patients have introduced bias?  HIGH: if one or more signalling questions were answered with NO, as any deviation from the selection process may lead to bias.LOW: if all signalling questions were answered with YES.UNCLEAR: all other instances.
Is there concern that the included patients do not match the review question?  HIGH: if accuracy of signs and symptoms were assessed in a case‐control design, or in an already highly selected group of participants, or the study was able to only estimate sensitivity or specificity.LOW: any situation where signs and symptoms were the first assessment/test to be done on the included participants.UNCLEAR: if a description about the participants was lacking.
Index tests
Were the index test results interpreted without knowledge of the results of the reference standard?  This will be similar for all index tests, target conditions, and populations.YES: if blinding was explicitly stated or index test was recorded before the results from the reference standard were available.NO: if it was explicitly stated that the index test results were interpreted with knowledge of the results of the reference standard.UNCLEAR: if blinding was unclearly reported.
If a threshold was used, was it prespecified?  This will be similar for all index tests, target conditions, and populations.YES: if the test was dichotomous by nature, or if the threshold was stated in the methods section, or if authors stated that the threshold as recommended by the manufacturer was used.NO: if a receiver operating characteristic curve was drawn or multiple threshold reported in the results section; and the final result was based on one of these thresholds; if fever was not defined beforehand.UNCLEAR: if threshold selection was not clearly reported.
Could the conduct or interpretation of the index test have introduced bias?  HIGH: if one or more signalling questions were answered with NO, as even in a laboratory situation knowledge of the reference standard may lead to bias.LOW: if all signalling questions were answered with YES.UNCLEAR: all other instances.
Is there concern that the index test, its conduct, or interpretation differ from the review question?  This will probably be answered 'LOW' in all cases except when assessments were made in a different setting, or using personnel not available in practice.
Reference standard
Is the reference standard likely to correctly classify the target condition?  We will define acceptable reference standards using a consensus process once the list of reference standards that have been used has been obtained from the eligible studies.For severe pneumonia, we will consider how well processes adhered to the WHO case definition in Appendix 1.
Were the reference standard results interpreted without knowledge of the results of the index test?  YES: if it was explicitly stated that the reference standard results were interpreted without knowledge of the results of the index test, or if the result of the index test was obtained after the reference standard.NO: if it was explicitly stated that the reference standard results were interpreted with knowledge of the results of the index test or if the index test was used to make the final diagnosis.UNCLEAR: if blinding was unclearly reported.
Did the definition of the reference standard incorporate results from the index test(s)?  YES: if results from the index test were a component of the reference standard definition.NO: if the reference standard did not incorporate the index standard test.UNCLEAR: if it was unclear whether the results of the index test formed part of the reference standard.
Could the conduct or interpretation of the reference standard have introduced bias?  HIGH: if one or more signalling questions were answered with NO.LOW: if all signalling questions were answered with YES.UNCLEAR: all other instances.
Is there concern that the target condition as defined by the reference standard does not match the review question?  HIGH: if the target condition was COVID‐19 pneumonia, but only RT‐PCR was used; if alternative diagnosis was highly likely and not excluded (will happen in paediatric cases, where exclusion of other respiratory pathogens is also necessary); if tests used to follow up viral load in known test‐positives.LOW: if above situations were not present.UNCLEAR: if intention for testing was not reported in the study.
Flow and timing
Was there an appropriate interval between index test(s) and reference standard?  YES: this will be similar for all index tests, populations for the current infection target conditions: as the situation of a patient, including clinical presentation and disease progress, evolves rapidly and new/ongoing exposure can result in case status change, an appropriate time interval will be within 24 hours.NO: if there was more than 24 hours between the index test and the reference standard or if participants were otherwise reported to be assessed with the index versus reference standard test at moments of different severity.UNCLEAR: if the time interval was not reported.
Did all patients receive a reference standard?  YES: if all participants received a reference standard (clearly no partial verification).NO: if only (part of) the index test‐positives or index test‐negatives received the complete reference standard.UNCLEAR: if it was not reported.
Did all patients receive the same reference standard?  YES: if all participants received the same reference standard (clearly no differential verification).NO: if (part of) the index test‐positives or index test‐negatives received a different reference standard.UNCLEAR: if it was not reported.
Were all patients included in the analysis?  YES: if all included participants were included in the analyses.NO: if after the inclusion/exclusion process, participants were removed from the analyses for different reasons: no reference standard done, no index test done, intermediate results of both index test or reference standard, indeterminate results of both index test or reference standard, samples unusable.UNCLEAR: if this was not clear from the reported numbers.
Could the patient flow have introduced bias?  HIGH: if one or more signalling questions were answered with NO.LOW: if all signalling questions were answered with YES.UNCLEAR: all other instances.
ICU: intensive care unit; RT‐PCR: reverse transcription polymerase chain reaction; SARS‐CoV‐2: severe acute respiratory syndrome coronavirus 2; WHO: World Health Organization

Statistical analysis and data synthesis
We present results of estimated sensitivity and specificity using paired forest plots and summarised in tables as appropriate.
We present the results without meta‐analysis, due to the small numbers of studies currently available, considerable heterogeneity across studies and the high risk of bias that we identified, as we felt doing so would otherwise produce a seemingly more accurate estimate than the underlying evidence is able to provide at this moment in time.
We present results of estimated sensitivity and specificity using paired forest plots in Review Manager 2014, and dumbbell plots to display the change in disease probability after a positive or negative result.
We disaggregated data by study design and organised by target condition, reporting results from cross‐sectional studies separately from studies that used a diagnostic case‐control or other design that we assessed as prone to high risk of bias.
When pooling does become possible in a future update of this review, we will estimate mean sensitivity and specificity using hierarchical models where tests either report binary results or at commonly reported thresholds. Where data are sparse, we will use methods described by Takwoingi 2017 for obtaining estimates from simplified models. We anticipate that over time sufficient data will accumulate to provide clear estimates of test accuracy for some tests. We will undertake meta‐analysis in STATA version 16.0 (STATA), or SAS (SAS 2015), as detailed in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Chapter 10; Macaskill 2013).

Investigations of heterogeneity
We have listed sources of heterogeneity that we investigated if adequate data were available in the Secondary objectives. In this version of the review, we used stratification to investigate heterogeneity as we considered it was inappropriate to combine studies. In future updates, if meta‐analysis becomes possible, we will investigate heterogeneity through meta‐regression.
We will stratify by reference standard and study design. In this version of the review we have stratified by study design only, as stratification by reference standard was not yet possible.

Sensitivity analyses
We aimed to undertake sensitivity analyses considering the impact of:
unpublished studies;
studies with inadequate reference standards.
However, neither were possible in this version of the review.

Assessment of reporting bias
We aimed to publish lists of studies that we know exist but for which we have not managed to locate reports, and request information to include in updates of these reviews. However, at the time of writing this version of the review, we are unaware of unpublished studies.

Summary of findings
We have listed our key findings in a 'Summary of findings' table to determine the strength of evidence for each test and findings, and to highlight important gaps in the evidence.

Updating
We will undertake the searches of published literature and preprints bi‐weekly, and, dependent on the number of new and important studies found, we will consider updating each review with each search if resources allow.