PMC:7116472 / 5135-17637 JSONTXT 3 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
T36 0-7 Sentence denotes Methods
T37 9-33 Sentence denotes Study design and setting
T38 34-245 Sentence denotes The International Severe Acute Respiratory and emerging Infections Consortium (ISARIC) World Health Organization (WHO) Clinical Characterisation Protocol UK (CCP-UK) study is an ongoing prospective cohort study.
T39 246-515 Sentence denotes The study is being performed by the ISARIC Coronavirus Clinical Characterisation Consortium (ISARIC-4C) in 260 hospitals across England, Scotland, and Wales (National Institute for Health Research Clinical Research Network Central Portfolio Management System ID 14152).
T40 516-819 Sentence denotes The protocol and further study details are available online.8 Model development and reporting followed the TRIPOD (transparent reporting of a multivariable prediction model for individual prediction or diagnosis) guidelines.9 The study is being conducted according to a predefined protocol (appendix 1).
T41 821-833 Sentence denotes Participants
T42 834-1103 Sentence denotes The study recruited consecutive patients aged 18 years and older with a completed index admission to one of 260 hospitals in England, Scotland, or Wales.8 Reverse transcriptase polymerase chain reaction was the only mode of testing available during the period of study.
T43 1104-1215 Sentence denotes The decision to test was at the discretion of the clinician attending the patient, and not defined by protocol.
T44 1216-1385 Sentence denotes The enrolment criterion “high likelihood of infection” reflected that a preparedness protocol cannot assume a diagnostic test will be available for an emergent pathogen.
T45 1386-1478 Sentence denotes In this activation, site training emphasised the importance of only recruiting proven cases.
T46 1480-1495 Sentence denotes Data collection
T47 1496-1592 Sentence denotes Demographic, clinical, and outcome data were collected by using a prespecified case report form.
T48 1593-2027 Sentence denotes Comorbidities were defined according to a modified Charlson comorbidity index.10 Comorbidities collected were chronic cardiac disease, chronic respiratory disease (excluding asthma), chronic renal disease (estimated glomerular filtration rate ≤30), mild to severe liver disease, dementia, chronic neurological conditions, connective tissue disease, diabetes mellitus (diet, tablet, or insulin controlled), HIV or AIDS, and malignancy.
T49 2028-2268 Sentence denotes These conditions were selected a priori by a global consortium to provide rapid, coordinated clinical investigation of patients presenting with any severe or potentially severe acute infection of public interest and enabled standardisation.
T50 2269-2676 Sentence denotes Clinician defined obesity was also included as a comorbidity owing to its probable association with adverse outcomes in patients with covid-19.1112 The clinical information used to calculate prognostic scores was taken from the day of admission to hospital.13 A practical approach was taken to sample size requirements.14 We used all available data to maximise the power and generalisability of our results.
T51 2677-2822 Sentence denotes Model reliability was assessed by using a temporally distinct validation cohort with geographical subsetting, together with sensitivity analyses.
T52 2824-2832 Sentence denotes Outcomes
T53 2833-2879 Sentence denotes The primary outcome was in-hospital mortality.
T54 2880-3048 Sentence denotes This outcome was selected because of the importance of the early identification of patients likely to develop severe illness from SARS-CoV-2 infection (a rule in test).
T55 3049-3247 Sentence denotes We chose to restrict analysis of outcomes to patients who were admitted more than four weeks before final data extraction (29 June 2020) to enable most patients to complete their hospital admission.
T56 3249-3280 Sentence denotes Independent predictor variables
T57 3281-3949 Sentence denotes A reduced set of potential predictor variables was selected a priori, including patient demographic information, common clinical investigations, and parameters consistently identified as clinically important in covid-19 cohorts following the methods described by Wynants and colleagues (appendix 2).5 Candidate predictor variables were selected based on three common criteria15: patient and clinical variables known to influence outcome in pneumonia and flulike illness; clinical biomarkers previously identified within the literature as potential predictors in patients with covid-19; values available for at least two thirds of patients within the derivation cohort.
T58 3950-4209 Sentence denotes Because our overall aim was to develop an easy-to-use risk stratification score, we made the decision to include an overall comorbidity count for each patient within model development giving each comorbidity equal weight, rather than individual comorbidities.
T59 4210-4370 Sentence denotes Recent evidence suggests an additive effect of comorbidity in patients with covid-19, with increasing number of comorbidities associated with poorer outcomes.16
T60 4372-4389 Sentence denotes Model development
T61 4390-4557 Sentence denotes Missing values for potential candidate variables were handled by using multiple imputation with chained equations, under the missing at random assumption (appendix 6).
T62 4558-4689 Sentence denotes Ten sets, each with 10 iterations, were imputed using available explanatory variables for both cohorts (derivation and validation).
T63 4690-4796 Sentence denotes The outcome variable was included as a predictor in the derivation dataset but not the validation dataset.
T64 4797-4913 Sentence denotes All model derivation and validation was performed in imputed datasets, with Rubin’s rules17 used to combine results.
T65 4914-4980 Sentence denotes Models were trained by using all available data up to 20 May 2020.
T66 4981-5127 Sentence denotes The primary intention was to create a pragmatic model for bedside use not requiring complex equations, online calculators, or mobile applications.
T67 5128-5233 Sentence denotes An a priori decision was therefore made to categorise continuous variables in the final prognostic score.
T68 5234-5287 Sentence denotes We used a three stage model building process (fig 1).
T69 5288-5476 Sentence denotes Firstly, generalised additive models were built incorporating continuous smoothed predictors (penalised thin plate splines) in combination with categorical predictors as linear components.
T70 5477-5661 Sentence denotes A criterion based approach to variable selection was taken based on the deviance explained, the unbiased risk estimator, and the area under the receiver operating characteristic curve.
T71 5662-5843 Sentence denotes Secondly, we visually inspected plots of component smoothed continuous predictors for linearity, and selected optimal cut-off values by using the methods of Barrio and colleagues.18
T72 5844-5981 Sentence denotes Lastly, final models using categorised variables were specified with least absolute shrinkage and selection operator logistic regression.
T73 5982-6135 Sentence denotes L1 penalised coefficients were derived using 10-fold cross validation to select the value of lambda (minimised cross validated sum of squared residuals).
T74 6136-6330 Sentence denotes We converted shrunk coefficients to a prognostic index with appropriate scaling to create the pragmatic 4C Mortality Score (where 4C stands for Coronavirus Clinical Characterisation Consortium).
T75 6331-6420 Sentence denotes We used machine learning approaches in parallel for comparison of predictive performance.
T76 6421-6595 Sentence denotes Given issues with interpretability, this was intended to provide a best-in-class comparison of predictive performance when accounting for any complex underlying interactions.
T77 6596-6649 Sentence denotes Gradient boosting decision trees were used (XGBoost).
T78 6650-6776 Sentence denotes All candidate predictor variables identified were included within the model, except for those with high missing values (>33%).
T79 6777-6908 Sentence denotes We retained individual major comorbidity variables within the model to determine whether inclusion improved predictive performance.
T80 6909-6998 Sentence denotes An 80%/20% random split of the derivation dataset was used to define train and test sets.
T81 6999-7075 Sentence denotes The validation datasets were held back and not used in the training process.
T82 7076-7364 Sentence denotes We used a mortality label and design matrix of centred or standardised continuous and categorical variables including all candidate variables to train gradient boosted trees minimising the binary classification error rate (defined as number of wrong cases divided by number of all cases).
T83 7365-7532 Sentence denotes Hyperparameters were tuned, including the learning rate and maximum tree depth, to maximise the area under the receiver operating characteristic curve in the test set.
T84 7533-7731 Sentence denotes This approach affords flexibility in the handling of missing data; therefore, two models were trained and optimised, one using imputed data and the other modelling missingness in complete case data.
T85 7732-7951 Sentence denotes We assessed discrimination for all models by using the area under the receiver operating characteristic curve in the derivation cohort, with 95% confidence intervals calculated by bootstrapped resampling (2000 samples).
T86 7952-8167 Sentence denotes A value of 0.5 indicates no predictive ability, 0.8 is considered good, and 1.0 is perfect.19 We assessed overall goodness of fit with the Brier score,20 a measure to quantify how close predictions are to the truth.
T87 8168-8259 Sentence denotes The score ranges between 0 and 1, where smaller values indicate superior model performance.
T88 8260-8440 Sentence denotes We plotted model calibration curves to examine agreement between predicted and observed risk across deciles of mortality risk to determine the presence of over or under prediction.
T89 8441-8680 Sentence denotes Risk cut-off values were defined by the total point score for an individual, which represented low (<2% mortality rate), intermediate (2-14.9%), or high risk (≥15%) groups, similar to commonly used pneumonia risk stratification scores.2122
T90 8681-8743 Sentence denotes We performed sensitivity analyses by using complete case data.
T91 8744-8833 Sentence denotes Model discrimination was also checked in ethnic groups and by sex using imputed datasets.
T92 8835-8851 Sentence denotes Model validation
T93 8852-8974 Sentence denotes Patients entered into the ISARIC WHO CCP-UK study after 20 May 2020 were included in a separate validation cohort (fig 1).
T94 8975-9080 Sentence denotes We determined discrimination, calibration, and performance across a range of clinically relevant metrics.
T95 9081-9220 Sentence denotes To avoid bias in the assessment of outcomes, patients who were admitted within four weeks of data extraction on 29 June 2020 were excluded.
T96 9221-9314 Sentence denotes We included patients without an outcome after four weeks and considered to have had no event.
T97 9315-9428 Sentence denotes A sensitivity analysis was also performed, with stratification of the validation cohort by geographical location.
T98 9429-9798 Sentence denotes We selected this geographical categorisation based on well described economic and health inequalities between the north and south of the United Kingdom.2324 Recent analysis has shown the impact of deprivation on risk of dying with covid-19.25 As a result, population differences between regions could change the discriminatory performance of risk stratification scores.
T99 9799-10091 Sentence denotes Two geographical cohorts were created, based on north-south geographical locations across the UK as defined by Hacking and colleagues.23 We performed a further sensitivity analysis to determine model performance in ethnic minority groups given the reported differences in covid-19 outcomes.26
T100 10092-10188 Sentence denotes All tests were two tailed and P values less than 0.05 were considered statistically significant.
T101 10189-10330 Sentence denotes We used R (version 3.6.3) with the finalfit, mice, glmnet, pROC, recipes, xgboost, rmda, and tidyverse packages for all statistical analysis.
T102 10332-10383 Sentence denotes Comparison with existing risk stratification scores
T103 10384-10493 Sentence denotes All derived models in the derivation dataset were compared within the validation cohort with existing scores.
T104 10494-10686 Sentence denotes We assessed model performance by using the area under the receiver operating characteristic curve statistic, sensitivity, specificity, positive predictive value, and negative predictive value.
T105 10687-10831 Sentence denotes Existing risk stratification scores were identified through a systematic literature search of Embase, WHO Medicus, and Google Scholar databases.
T106 10832-11016 Sentence denotes We used the search terms “pneumonia,” “sepsis,” “influenza,” “COVID-19,” “SARS-CoV-2,” “coronavirus” combined with “score” and “prognosis.” We applied no language or date restrictions.
T107 11017-11062 Sentence denotes The last search was performed on 1 July 2020.
T108 11063-11197 Sentence denotes Risk stratification tools were included whose variables were available within the database and had accessible methods for calculation.
T109 11198-11404 Sentence denotes We calculated performance characteristics according to original publications, and selected score cutoff values for adverse outcomes based on the most commonly used criteria identified within the literature.
T110 11405-11550 Sentence denotes Cut-off values were the score value for which the patient was considered at low or high risk of adverse outcome, as defined by the study authors.
T111 11551-11640 Sentence denotes Patients with one or more missing input variables were omitted for that particular score.
T112 11641-11802 Sentence denotes We also performed a decision curve analysis.27 Briefly, assessment of the adequacy of clinical prediction models can be extended by determining clinical utility.
T113 11803-12014 Sentence denotes By using decision curve analysis, we can make a clinical judgment about the relative value of benefits (treating a true positive) and harms (treating a false positive) associated with a clinical prediction tool.
T114 12015-12244 Sentence denotes The standardised net benefit was plotted against the threshold probability for considering a patient high risk for age alone and for the best discriminating models applicable to more than 50% of patients in the validation cohort.
T115 12246-12276 Sentence denotes Patient and public involvement
T116 12277-12391 Sentence denotes This was an urgent public health research study in response to a Public Health Emergency of International Concern.
T117 12392-12502 Sentence denotes Patients or the public were not involved in the design, conduct, or reporting of this rapid response research.