PubAnnotation

Id	Subject	Object	Predicate	Lexical cue
T1	0-92	Sentence	denotes	Fast automated detection of COVID-19 from medical images using convolutional neural networks
T2	94-102	Sentence	denotes	Abstract
T3	103-192	Sentence	denotes	Coronavirus disease 2019 (COVID-19) is a global pandemic posing significant health risks.
T4	193-291	Sentence	denotes	The diagnostic test sensitivity of COVID-19 is limited due to irregularities in specimen handling.
T5	292-439	Sentence	denotes	We propose a deep learning framework that identifies COVID-19 from medical images as an auxiliary testing method to improve diagnostic sensitivity.
T6	440-774	Sentence	denotes	We use pseudo-coloring methods and a platform for annotating X-ray and computed tomography images to train the convolutional neural network, which achieves a performance similar to that of experts and provides high scores for multiple statistical indices (F1 scores > 96.72% (0.9307, 0.9890) and specificity >99.33% (0.9792, 1.0000)).
T7	775-859	Sentence	denotes	Heatmaps are used to visualize the salient features extracted by the neural network.
T8	860-1053	Sentence	denotes	The neural network-based regression provides strong correlations between the lesion areas in the images and five clinical indicators, resulting in high accuracy of the classification framework.
T9	1054-1163	Sentence	denotes	The proposed method represents a potential computer-aided diagnosis method for COVID-19 in clinical practice.
T10	1165-1389	Sentence	denotes	Liang, Gu and other colleagues develop a convoluted neural network (CNN)-based framework to diagnose COVID-19 infection from chest X-ray and computed tomography images, and comparison with other upper respiratory infections.
T11	1390-1578	Sentence	denotes	Compared to expert evaluation of the images, the neural network achieved upwards of 99% specificity, showing promise for the automated detection of COVID-19 infection in clinical settings.
T12	1580-1592	Sentence	denotes	Introduction
T13	1593-1918	Sentence	denotes	Coronavirus disease 2019 (COVID-19), a highly infectious disease with the basic reproductive number (R0) of 5.7 (reported by the US Centers for Disease Control and Prevention), is caused by the most recently discovered coronavirus1 and was declared a global pandemic by the World Health Organization (WHO) on March 11, 20202.
T14	1919-2028	Sentence	denotes	It poses a serious threat to human health worldwide, as well as substantial economic losses to all countries.
T15	2029-2191	Sentence	denotes	As of 7 September 2020, 27,032,617 people have been infected by COVID-19 after testing, and 881,464 deaths have occurred, according to the statistics of the WHO3.
T16	2192-2335	Sentence	denotes	The Wall Street banks have estimated that the COVID-19 pandemic may cause losses of $5.5 trillion to the global economy over the next 2 years4.
T17	2336-2562	Sentence	denotes	The WHO recommends using real-time reverse transcriptase-polymerase chain reaction (rRT-PCR) for laboratory confirmation of the COVID-19 virus in respiratory specimens obtained by the preferred method of nasopharyngeal swabs5.
T18	2563-2688	Sentence	denotes	Laboratories performing diagnostic testing for COVID-19 should strictly comply with the WHO biosafety guidance for COVID-196.
T19	2689-2977	Sentence	denotes	It is also necessary to follow the standard operating procedures (SOPs) for specimen collection, storage, packaging, and transport because the specimens should be regarded as potentially infectious, and the testing process can only be performed in a Biosafety Level 3 (BSL-3) laboratory7.
T20	2978-3075	Sentence	denotes	Not all cities worldwide have adequate medical facilities to follow the WHO biosafety guidelines.
T21	3076-3335	Sentence	denotes	According to an early report (Feb 17, 2020), the sensitivity of tests for the detection of COVID-19 using rRT-PCR analysis of nasopharyngeal swab specimens is around 30–60% due to irregularities during the collection and transportation of COVID-19 specimens8.
T22	3336-3437	Sentence	denotes	Recent studies reported a higher sensitivity range from 71% (Feb 19, 2020) to 91% (Mar 27, 2020)9,10.
T23	3438-3673	Sentence	denotes	A recent systematic review reported that the sensitivity of the PCR test for COVID-19 might be in the range of 71–98% (Apr 21, 2020), whereas the specificity of tests for the detection of COVID-19 using rRT-PCR analysis is about 95%11.
T24	3674-3905	Sentence	denotes	Yang et al.8 discovered that although no viral ribonucleic acid (RNA) was detected by rRT-PCR in the first three or all nasopharyngeal swab specimens in mild cases, the patient was eventually diagnosed with COVID-19 (Feb 17, 2020).
T25	3906-4026	Sentence	denotes	Therefore, the WHO has stated that one or more negative results do not rule out the possibility of COVID-19 infection12.
T26	4027-4123	Sentence	denotes	Additional auxiliary tests with relatively higher sensitivity to COVID-19 are urgently required.
T27	4124-4273	Sentence	denotes	The clinical symptoms associated with COVID-19 include fever, dry cough, dyspnea, and pneumonia, as described in the guideline released by the WHO13.
T28	4274-4436	Sentence	denotes	It has been recommended to use the WHO’s case definition for influenza-like illness (ILI) and severe acute respiratory infection (SARI) for monitoring COVID-1913.
T29	4437-4592	Sentence	denotes	As reported by the CHINA-WHO COVID-19 joint investigation group (February 28, 2020)14, autopsies showed the presence of lung infection in COVID-19 victims.
T30	4593-4771	Sentence	denotes	Therefore, medical imaging of the lungs might be a suitable auxiliary diagnostic testing method for COVID-19 since it uses available medical technology and clinical examinations.
T31	4772-4942	Sentence	denotes	Chest radiography (CXR) and chest computed tomography (CT) are the most common medical imaging examinations for the lungs and are available in most hospitals worldwide15.
T32	4943-5119	Sentence	denotes	Different tissues of the body absorb X-rays to different degrees16, resulting in grayscale images that allow for the detection of anomalies based on the contrast in the images.
T33	5120-5240	Sentence	denotes	CT differs from normal CXR in that it has superior tissue contrast with different shades of gray (about 32–64 levels)17.
T34	5241-5329	Sentence	denotes	The CT images are digitally processed18 to create a three-dimensional image of the body.
T35	5330-5398	Sentence	denotes	However, CT examinations are more expensive than CXR examinations19.
T36	5399-5536	Sentence	denotes	Recent studies reported that the use of CXR and CT images resulted in improved diagnostic sensitivity for the detection of COVID-1920,21.
T37	5537-5631	Sentence	denotes	The interpretation of medical images is time-consuming, labor-intensive, and often subjective.
T38	5632-5731	Sentence	denotes	The medical images are first annotated by experts to generate a report of the radiography findings.
T39	5732-5845	Sentence	denotes	Subsequently, the radiography findings are analyzed, and clinical factors are considered to obtain a diagnosis15.
T40	5846-6084	Sentence	denotes	However, during the current pandemic, the frontline expert physicians are faced with a massive workload and lack of time, which increases the physical and psychological burden on staff and might adversely affect the diagnostic efficiency.
T41	6085-6286	Sentence	denotes	Since modern hospitals have advanced digital imaging technology, medical image processing methods may have the potential for fast and accurate diagnosis of COVID-19 to reduce the burden on the experts.
T42	6287-6626	Sentence	denotes	Deep learning (DL) methods, especially convolutional neural networks (CNNs), are effective approaches for representation learning using multilayer neural networks22 and have provided excellent performance solutions to many problems in image classification23,24, object detection25, games and decisions26, and natural language processing27.
T43	6627-6757	Sentence	denotes	A deep residual network28 is a type of CNN architecture that uses the strategy of skip connections to avoid degradation of models.
T44	6758-6929	Sentence	denotes	However, the applications of DL for clinical diagnoses remains limited due to the lack of interpretability of the DL model and the multi-modal properties of clinical data.
T45	6930-7136	Sentence	denotes	Some studies have demonstrated excellent performance of DL methods for the detection of lung cancer with CT images29, pneumonia with CXR images30, and diabetic retinopathy with retinal fundus photographs31.
T46	7137-7294	Sentence	denotes	To the best of our knowledge, the DL method has been validated only on single modal data, and no correlation analysis with clinical indicators was performed.
T47	7295-7443	Sentence	denotes	Traditional machine learning methods are more constrained and better suited than DL methods to specific, practical computing tasks using features32.
T48	7444-7678	Sentence	denotes	As demonstrated by Jin et al., the traditional machine learning algorithm using the scale-invariant feature transform (SIFT)33 and random sample consensus (RANSAC)34 may outperform the state-of-the-art DL methods for image matching35.
T49	7679-7863	Sentence	denotes	We designed a general end-to-end DL framework for information extraction from CXR images (X-data) and CT images (CT-data) that can be considered a cross-domain transfer learning model.
T50	7864-8111	Sentence	denotes	In this study, we developed a custom platform for rapid expert annotation and proposed the modular CNN-based multi-stage framework (classification framework and regression framework) consisting of basic component units and special component units.
T51	8112-8224	Sentence	denotes	The framework represents an auxiliary examination method for high precision and automated detection of COVID-19.
T52	8225-8270	Sentence	denotes	This study makes the following contributions:
T53	8271-8494	Sentence	denotes	First, a multi-stage CNN-based classification framework consisting of two basic units (ResBlock-A and ResBlock-B) and a special unit (control gate block) was established for use with multi-modal images (X-data and CT-data).
T54	8495-8600	Sentence	denotes	The classification results were compared with evaluations by experts with different levels of experience.
T55	8601-8777	Sentence	denotes	Different optimization goals were established for the different stages in the framework to obtain good performances, which were evaluated using multiple statistical indicators.
T56	8778-8947	Sentence	denotes	Second, principal component analysis (PCA) was used to determine the characteristics of the X-data and CT-data of different categories (normal, COVID-19, and influenza).
T57	8948-9113	Sentence	denotes	Gradient-weighted class activation mapping (Grad-CAM) was used to visualize the salient features in the images and extract the lesion areas associated with COVID-19.
T58	9114-9356	Sentence	denotes	Third, data preprocessing methods, including pseudo-coloring and dimension normalization, were developed to facilitate the interpretability of the medical images and adapt the proposed framework to the multi-modal images (X-data and CT-data).
T59	9357-9535	Sentence	denotes	Fourth, A knowledge distillation method was adopted as a training strategy to obtain high performance with low computational requirements and improve the usability of the method.
T60	9536-9691	Sentence	denotes	Last, The CNN-based regression framework was used to describe the relationships between the radiography findings and the clinical symptoms of the patients.
T61	9692-9821	Sentence	denotes	Multiple evaluation indicators were used to assess the correlations between the radiography findings and the clinical indicators.
T62	9823-9830	Sentence	denotes	Results
T63	9832-9851	Sentence	denotes	Data set properties
T64	9852-9915	Sentence	denotes	Multi-modal data from multiple sources were used in this study.
T65	9916-10063	Sentence	denotes	X-data, CT-data, and clinical data used in our research were collected from four public data sets and one frontline hospital data (Youan hospital).
T66	10064-10230	Sentence	denotes	Each data set was divided into two parts: train-val part and test part using a train-test-split function (TTSF) of the scikit-learn library which is shown in Table 1.
T67	10231-10348	Sentence	denotes	The details of the multi-modal data types are described in the “Methods” section (see “Data sets splitting” section).
T68	10349-10466	Sentence	denotes	Table 1 Number of cases from four public data sets and the Youan hospital (X-data, CT-data, clinical indicator data).
T69	10467-10501	Sentence	denotes	Study X-data CT-data Clinical data
T70	10502-10552	Sentence	denotes	Train + Val Test Train + Val Test Train + Val Test
T71	10553-10596	Sentence	denotes	*Normal (RSNA + LUNA16) 5000 100 100 20 – –
T72	10597-10639	Sentence	denotes	Pneumonia (RSNA + ICNP) 3000 100 83 20 – –
T73	10640-10669	Sentence	denotes	COVID-19 (CCD) 150 62 – – – –
T74	10670-10713	Sentence	denotes	Influenza (Youan Hospital) 100 45 35 15 – –
T75	10714-10756	Sentence	denotes	*Normal (Youan Hospital) 478 25 139 20 – –
T76	10757-10801	Sentence	denotes	Pneumonia (Youan Hospital) 380 55 180 35 – –
T77	10802-10845	Sentence	denotes	COVID-19 (Youan Hospital) 35 10 75 20 75 20
T78	10846-10874	Sentence	denotes	Total 9143 397 612 130 75 20
T79	10875-11062	Sentence	denotes	The term *Normal in this work means the cases where the lungs are not manifest evidence of COVID-19, influenza, or pneumonia on imaging and the RT-PCR testing of the COVID-19 is negative.
T80	11064-11164	Sentence	denotes	A platform was developed for annotating lesion areas of COVID-19 in medical images (X-data, CT-data)
T81	11165-11328	Sentence	denotes	Medical imaging uses images of internal tissues of the human body or a part of the human body in a non-invasive manner for clinical diagnoses or treatment plans36.
T82	11329-11515	Sentence	denotes	Medical images (e.g., X-data and CT-data) are usually acquired using computed radiography and are typically stored in the Digital Imaging and Communications in Medicine (DICOM) format37.
T83	11516-11695	Sentence	denotes	X-data are two-dimensional grayscale images, and CT-data are three-dimensional data, consisting of slices of the data in the z axis direction of a two-dimensional grayscale image.
T84	11696-11811	Sentence	denotes	Machine learning methods are playing increasingly important roles in medical image analysis, especially DL methods.
T85	11812-11932	Sentence	denotes	DL uses multiple non-linear transformations to create a mapping relationship between the input data and output labels38.
T86	11933-12027	Sentence	denotes	The objective of this study was to annotate lesion areas in medical images with high accuracy.
T87	12028-12216	Sentence	denotes	Therefore, we developed a pseudo-coloring method, which is a technique that helps enhance medical images for physicians to isolate relevant tissues and groups different tissues together39.
T88	12217-12377	Sentence	denotes	We converted the original grayscale images to color images using the open-source image processing tools Open Source Computer Vision Library (OpenCV) and Pillow.
T89	12378-12435	Sentence	denotes	Examples of the pseudo-color images are shown in Fig. 1a.
T90	12436-12575	Sentence	denotes	We developed a platform that uses a client-server architecture to annotate the potential lesion areas of COVID-19 on the CXR and CT images.
T91	12576-12655	Sentence	denotes	The platform can be deployed on a private cloud for security and local sharing.
T92	12656-12814	Sentence	denotes	All the images were annotated by two experienced radiologists (one was a 5th-year radiologist and the other was a 3rd-year radiologist) in the Youan Hospital.
T93	12815-12977	Sentence	denotes	If there was disagreement about a result, a senior radiologist and a respiratory doctor made the final decision to ensure the precision of the annotation process.
T94	12978-13051	Sentence	denotes	The details of the annotation pipeline are shown in Supplementary Fig. 1.
T95	13052-13158	Sentence	denotes	Fig. 1 Demonstrations of data preprocessing methods including pseudo-coloring and dimension normalization.
T96	13159-13224	Sentence	denotes	a Pseudo-coloring for abnormal examples in the CXR and CT images.
T97	13225-13357	Sentence	denotes	The original grayscale images were transformed into color images using the pseudo-coloring method and were annotated by the experts.
T98	13358-13501	Sentence	denotes	The scale bar on the right is the range of pixel values of the image data. b Dimension normalization to reduce the dimensions in the CT images.
T99	13502-13605	Sentence	denotes	The number of CT images were first resampled to a multiple of three and then divided into three groups.
T100	13606-13684	Sentence	denotes	Followed by the 1 × 1 convolution layers to reduce the dimensions of the data.
T101	13686-13799	Sentence	denotes	PCA was used to determine the characteristics of the medical images for the COVID-19, influenza, and normal cases
T102	13800-13961	Sentence	denotes	PCA was used to visually compare the characteristics of the medical images (X-data, CT-data) for the COVID-19 cases with those of the normal and influenza cases.
T103	13962-14118	Sentence	denotes	Figure 2a shows the mean image of each category and the five eigenvectors that represent the principal components of PCA in the corresponding feature space.
T104	14119-14310	Sentence	denotes	Significant differences are observed between the COVID-19, influenza, and normal cases, indicating the possibility of being able to distinguish COVID-19 cases from normal and influenza cases.
T105	14311-14385	Sentence	denotes	Fig. 2 PCA visualizations and example heatmaps of both X-data and CT-data.
T106	14386-14448	Sentence	denotes	a Mean image and eigenvectors of five different sub-data sets.
T107	14449-14531	Sentence	denotes	The first column shows the mean image and the other columns show the eigenvectors.
T108	14532-14739	Sentence	denotes	The first row shows the mean image and five eigenvectors of the normal CXR images; second row: COVID-19 CXR images, third row: normal CT images, fourth row: influenza CT images, last row: COVID-19 CT images.
T109	14740-14926	Sentence	denotes	The scale bar on the right is the range of pixel values of the image data. b Heatmaps of both X-data and CT-data were demonstrated for better interpretability of the proposed frameworks.
T110	14927-15016	Sentence	denotes	The scale bar on the right is the probability of the areas being suspected as infections.
T111	15018-15187	Sentence	denotes	The CNN-based classification framework exhibited excellent performance based on the validation by experts using multi-modal data from public data sets and Youan hospital
T112	15188-15406	Sentence	denotes	The structure of the proposed framework, consisting of the stage I sub-framework and the stage II sub-framework is shown in Fig. 3a, where Q, L, M, and N are the hyper-parameters of the framework for general use cases.
T113	15407-15536	Sentence	denotes	The values of Q, L, M, and N were 1, 1, 2, and 2, respectively, in this study; this framework referred to as the CNNCF framework.
T114	15537-15695	Sentence	denotes	The stage I and stage II sub-frameworks were designed to extract features corresponding to different optimization goals in the analysis of the medical images.
T115	15696-15941	Sentence	denotes	The performance of the CNNCF was evaluated using multi-modal data sets (X-data and CT-data) to ensure the generalization and transferability of the model, and five evaluation indicators were used (sensitivity, precision, specificity, F1, kappa).
T116	15942-16066	Sentence	denotes	The salient features of the images extracted by the CNNCF were visualized in a heatmap (four examples are shown in Fig. 2b).
T117	16067-16315	Sentence	denotes	In this study, multiple experiments were conducted (including experiments that included data from the same source and from different sources) to validate the generalization ability of the framework while avoiding the possible sample selection bias.
T118	16316-16552	Sentence	denotes	Five experts evaluated the images, i.e., a 7th-year respiratory resident (Respira.), a 3rd-year emergency resident (Emerg.), a 1st-year respiratory intern (Intern), a 5th-year radiologist (Rad-5th), and a 3rd-year radiologist (Rad-3rd).
T119	16553-16625	Sentence	denotes	The definition of the expert group can be found in Supplementary Note 1.
T120	16626-16908	Sentence	denotes	The abbreviations of all the data sets used in the following experiments including XPDS, XPTS, XPVS, XHDS, XHTS, XHVS, CTPDS, CTPTS, CTPVS, CTHDS, CTHTS, CTHVS, CADS, CATS, CAVS, XMTS, XMVS, CTMTS, and CTMVS were defined in the “Methods” section (see “Data sets splitting” section).
T121	16909-16945	Sentence	denotes	The following results were obtained.
T122	16946-16974	Sentence	denotes	Fig. 3 CNN-based frameworks.
T123	16975-17237	Sentence	denotes	a The classification framework for the identification of COVID-19. b The regression framework for the correlation analysis between the lesion areas and the clinical indicators. c is the workflow of the classification framework for the identification of COVID-19.
T124	17239-17251	Sentence	denotes	Experiment-A
T125	17252-17425	Sentence	denotes	In this experiment, we used the X-data of the XPVS where the normal cases were from the RSNA data set and the COVID-19 cases were from the COVID CXR data set (CCD) data set.
T126	17426-17563	Sentence	denotes	The results of the five evaluation indicators for the comparison of the COVID-19 cases and normal cases of the XPVS are shown in Table 2.
T127	17564-17674	Sentence	denotes	An excellent performance was obtained, with the best score of specificity of 99.33% and a precision of 98.33%.
T128	17675-17864	Sentence	denotes	The F1 score was 96.72%, which was higher than that of the Respire. (96.12%), the Emerg. (93.94%), the Intern (84.67%), and the Rad-3rd (85.93%) and lower than that of the Rad-5th (98.41%).
T129	17865-18058	Sentence	denotes	The kappa index was 95.40%, which was higher than that of the Respire. (94.43%), the Emerg. (91.21%), the Intern (77.45%), and the Rad-3rd (79.42%), and lower than that of the Rad-5th (97.74%).
T130	18059-18249	Sentence	denotes	The sensitivity index was 95.16%, which was higher than that of the Intern (93.55%) and the Rad-3rd (93.55%) and lower than that of the Respire. (100%), the Emerg. (100%) and Rad-5th (100%).
T131	18250-18415	Sentence	denotes	The receiver operating characteristic (ROC) scores for the CNNCF and the experts are plotted in Fig. 4a; the area under the ROC curve (AUROC) of the CNNCF is 0.9961.
T132	18416-18571	Sentence	denotes	The precision-recall scores for the CNNCF and the experts are plotted in Fig. 4d; the area under the precision-recall curve (AUPRC) of the CNNCF is 0.9910.
T133	18572-18892	Sentence	denotes	Table 2 Performance indices of the classification framework (CNNCF) of experiment A and the average performance of the 7th-year respiratory resident (Respira.), the 3rd-year emergency resident (Emerg.), the 1st-year respiratory intern (Intern), the 5th-year radiologist (Rad-5th), and the 3rd-year radiologist (Rad-3rd).
T134	18893-18980	Sentence	denotes	F1 (95% CI) Kappa (95% CI) Specificity (95% CI) Sensitivity (95% CI) Precision (95% CI)
T135	18981-18989	Sentence	denotes	CNNCF 0.
T136	18990-19086	Sentence	denotes	9672 (0.9307, 0.9890) 0.9540 (0.9030, 0.9924) 0.9933 (0.9792, 1.0000) 0.9516 (0.8889, 1.0000) 0.
T137	19087-19108	Sentence	denotes	9833 (0.9444, 1.0000)
T138	19109-19117	Sentence	denotes	Respire.
T139	19118-19237	Sentence	denotes	0.9612 (0.9231, 0.9920) 0.9443 (0.8912, 0.9887) 0.9667 (0.9363, 0.9933) 1.0000 (1.0000, 1.0000) 0.9254 (0.8095, 0.9571)
T140	19238-19244	Sentence	denotes	Emerg.
T141	19245-19247	Sentence	denotes	0.
T142	19248-19365	Sentence	denotes	9394 (0.8947, 0.9781) 0.9121 (0.8492, 0.9677) 0.9467 (0.9091, 0.9797) 1.0000 (1.0000, 1.0000) 0.8857 (0.8095, 0.9571)
T143	19366-19373	Sentence	denotes	Intern.
T144	19374-19492	Sentence	denotes	0.8467 (0.7692, 0.9041) 0.7745 (0.6730, 0.8592) 0.8867 (0.8333, 0.9343) 0.9355 (0.8596, 0.984) 0.7733 (0.6708, 0.8649)
T145	19493-19620	Sentence	denotes	Rad-5th 0.9841 (0.9593, 1.0000) 0.9774 (0.9433, 1.0000) 0.9867 (0.9662, 1.0000) 1.0000 (1.0000, 1.0000) 0.9688 (0.9219, 1.0000)
T146	19621-19748	Sentence	denotes	Rad-3rd 0.8593 (0.7931, 0.9180) 0.7942 (0.7062, 0.8779) 0.9000 (0.8541, 0.9481) 0.9355 (0.8666, 0.9841) 0.7945 (0.6974, 0.8873)
T147	19749-19812	Sentence	denotes	Fig. 4 ROC and PRC curves for the CNNCF of the experiments A-C.
T148	19813-19902	Sentence	denotes	NC indicates that the positive case is a COVID-19 case, and the negative case is *Normal.
T149	19903-19987	Sentence	denotes	CI indicates that the positive case is COVID-19, and the negative case is influenza.
T150	19988-20074	Sentence	denotes	The points are the results of experts, corresponding to the results in Tables 2 and 3.
T151	20075-20384	Sentence	denotes	The background gray dashed curves in the PRC curve correspond to the iso-F1 curves. a ROC curve for the NC using X-data. b ROC curve for the NC using CT-data. c ROC curve for the CI using CT-data. d PRC curve for the NC using X-data. e PRC curve for the NC using CT-data. f PRC curve for the CI using CT-data.
T152	20386-20398	Sentence	denotes	Experiment-B
T153	20399-20565	Sentence	denotes	In this experiment, we used the CT-data of the CTPVS and CTHVS where the normal cases were from the LUNA data set and the COVID-19 cases were from the Youan hospital.
T154	20566-20799	Sentence	denotes	The results of the five evaluation indicators for the comparison of the COVID-19 cases and normal cases of the CTHVS and the CTPVS are shown in Table 3, where the normal cases are from CTPVS and the COVID-19 cases are from the CTHVS.
T155	20800-20986	Sentence	denotes	The CNNCF exhibits good performance for the five evaluation indices, which are similar to that of the Respire. and the Rad-5th and higher than that of the Intern, the Emerg. and Rad-3rd.
T156	20987-21056	Sentence	denotes	The ROC scores are plotted in Fig. 4b; the AUROC of the CNNCF is 1.0.
T157	21057-21137	Sentence	denotes	The precision-recall scores are shown in Fig. 4e; the AUPRC of the CNNCF is 1.0.
T158	21138-21470	Sentence	denotes	Table 3 Performance indices of the classification framework (CNNCF) of the experiments B and C, and the average performance of the 7th-year respiratory resident (Respira.), the 3rd-year emergency resident (Emerg.), the 1st-year respiratory intern (Intern), the 5th-year radiologist (Rad-5th), and the 3rd-year radiologist (Rad-3rd).
T159	21471-21502	Sentence	denotes	CT (*Normal and COVID-19 cases)
T160	21503-21517	Sentence	denotes	CNNCF Respire.
T161	21518-21524	Sentence	denotes	Emerg.
T162	21525-21532	Sentence	denotes	Intern.
T163	21533-21548	Sentence	denotes	Rad-5th Rad-3rd
T164	21549-21704	Sentence	denotes	F1 (95% CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8571, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8667, 1.0000)
T165	21705-21863	Sentence	denotes	Kappa (95% CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.7422, 1.0000) 1.0000 (1.0000, 1.0000) 0.9000 (0.7487, 1.0000)
T166	21864-22028	Sentence	denotes	Specificity (95% CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8333, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8333, 1.0000)
T167	22029-22193	Sentence	denotes	Sensitivity (95% CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8333, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8421, 1.0000)
T168	22194-22356	Sentence	denotes	Precision (95% CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8235, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8333, 1.0000)
T169	22357-22390	Sentence	denotes	CT (Influenza and COVID-19 cases)
T170	22391-22405	Sentence	denotes	CNNCF Respire.
T171	22406-22412	Sentence	denotes	Emerg.
T172	22413-22420	Sentence	denotes	Intern.
T173	22421-22436	Sentence	denotes	Rad-5th Rad-3rd
T174	22437-22591	Sentence	denotes	F1 (95%CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.8966 (0.7332, 1.0000) 0.8000 (0.6207, 0.9412) 0.9677 (0.8889, 1.0000) 0.8667 (0.7199, 0.9744)
T175	22592-22680	Sentence	denotes	Kappa (95%CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.8236 (0.5817, 1.0000) 0.
T176	22681-22750	Sentence	denotes	6500 (0.3698, 0.8852) 0.9421 (0.8148, 1.0000) 0.7667 (0.5349, 0.9429)
T177	22751-22914	Sentence	denotes	Specificity (95%CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9048 (0.7619, 1.0000) 0.8500 (0.6818, 1.0000) 0.9500 (0.8333, 1.0000) 0.9000 (0.7619, 1.0000)
T178	22915-23078	Sentence	denotes	Sensitivity (95%CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9286 (0.7500, 1.0000) 0.8000 (0.5714, 1.0000) 1.0000 (1.0000, 1.0000) 0.8667 (0.6667, 1.0000)
T179	23079-23240	Sentence	denotes	Precision (95%CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.8667 (0.6874, 1.0000) 0.8000 (0.5881, 1.0000) 0.9375 (0.8000, 1.0000) 0.8667 (0.6667, 1.0000)
T180	23242-23254	Sentence	denotes	Experiment-C
T181	23255-23387	Sentence	denotes	In this experiment, we used the CT-data of the CTHVS where the normal cases and the COVID-19 cases were all from the Youan hospital.
T182	23388-23601	Sentence	denotes	The results of the five evaluation indicators for the comparison of the COVID-19 cases and influenza cases of the CTHVS are shown in Table 3 where the influenza cases and the COVID-19 cases are all from the CTHVS.
T183	23602-23695	Sentence	denotes	The CNNCF achieved the highest performance and the best score of all five evaluation indices.
T184	23696-23765	Sentence	denotes	The ROC scores are plotted in Fig. 4c; the AUROC of the CNNCF is 1.0.
T185	23766-23850	Sentence	denotes	The precision-recall scores are shown in Fig. 4f, and the AUPRC of the CNNCF is 1.0.
T186	23852-23864	Sentence	denotes	Experiment-D
T187	23865-24126	Sentence	denotes	The boxplots of the five evaluation indicators, the F1 score (Fig. 5a, d, g), the kappa coefficient (Fig. 5b, e, h), and the specificity (Fig. 5c, f, i) of experiments A–C are shown in Fig. 5, and the precision and sensitivity are shown in Supplementary Fig. 2.
T188	24127-24294	Sentence	denotes	A bootstrapping method40 was used to calculate the empirical distributions, and McNemar’s test41 was used to analyze the differences between the CNNCF and the experts.
T189	24295-24510	Sentence	denotes	The p-values of the McNemar’s test (Supplementary Tables 1–3) for the five evaluation indicators were all 1.0, indicating no statistically significant difference between the CNNCF results and the expert evaluations.
T190	24511-24634	Sentence	denotes	Fig. 5 Boxplots of the F1 score, kappa score, and specificity for the CNNCF and expert results for COVID-19 identification.
T191	24635-24724	Sentence	denotes	NC indicates that the positive case is a COVID-19 case, and the negative case is *Normal.
T192	24725-24809	Sentence	denotes	CI indicates that the positive case is COVID-19, and the negative case is influenza.
T193	24810-25264	Sentence	denotes	Bootstrapping is used to generate n = 1000 resampled independent validation sets for the XVS and the CTVS. a F1 score for the NC using X-data. b Kappa score for the NC using X-data. c Specificity for the NC using X-data. d F1 score for the NC using CT-data. e Kappa score for the NC using CT-data. f Specificity for the NC using CT-data. g F1 score for the CI using CT-data. h Kappa score for the CI using CT-data. i Specificity for the CI using CT-data.
T194	25265-25526	Sentence	denotes	We also conducted extra experiments with both configurations of the same data source and different data sources: the descriptions and graph charts can be found in the Supplementary Experiments and Tables (Supplementary Tables 4–19 and Supplementary Figs. 3–18).
T195	25527-25617	Sentence	denotes	The data used in experiments E–G were CTHVS and the data were all from the Youan hospital.
T196	25618-25707	Sentence	denotes	The data used in experiments H–K were XHVS and the data were all from the Youan hospital.
T197	25708-25761	Sentence	denotes	The data used in experiments L–N were XPVS and CTPVS.
T198	25762-25978	Sentence	denotes	The data used in the experiment L was from the same data set RSNA, while the data used in experiment M was from different data sets where the pneumonia cases were from the ICNP, and the normal cases were from LUNA16.
T199	25979-26173	Sentence	denotes	The data used in the experiments O–R, from the four public data sets and one hospital (Youan hospital) data set (including normal cases, pneumonia cases and COVID-19 cases), were XMVS and CTMVS.
T200	26174-26252	Sentence	denotes	In all the experiments (experiments A–R), the CNNCF achieved good performance.
T201	26253-26413	Sentence	denotes	Notably, in order to obtain a more comprehensive evaluation of the CNNCF while further improving the usability in clinical practice, experiment-R was performed.
T202	26414-26586	Sentence	denotes	In the experiment-R, the CNNCF was used to distinguish three types of cases simultaneously (Including the COVID-19, pneumonia, and normal cases) on both the XMVS and CTMVS.
T203	26587-26784	Sentence	denotes	Good performances were obtained on the XMVS, with the best score of F1 score of 91.89%, kappa score of 89.74%, specificity of 97.14%, sensitivity of 94.44%, and a precision of 89.47%, respectively.
T204	26785-26907	Sentence	denotes	Excellent performances were obtained on the CTMVS, with the best score of the five evaluation indicators were all 100.00%.
T205	26908-27021	Sentence	denotes	The ROC score and PRC score in the experiment-R were also satisfactory which were shown in Supplementary Fig. 18.
T206	27022-27130	Sentence	denotes	The results of the experiment-R further demonstrated the effectiveness and robustness of the proposed CNNCF.
T207	27132-27186	Sentence	denotes	Image analysis identifies salient features of COVID-19
T208	27187-27326	Sentence	denotes	In clinical practice, the diagnostic decision of a clinician relies on the identification of the SAs in the medical images by radiologists.
T209	27327-27459	Sentence	denotes	The statistical results show that the performance of the CNNCF for the identification of COVID-19 is as good as that of the experts.
T210	27460-27563	Sentence	denotes	A comparison consisting of two parts was performed to evaluate the discriminatory ability of the CNNCF.
T211	27564-27724	Sentence	denotes	In the first part, we used Grad-CAM, which is a non-intrusive method to extract the salient features in medical images, to create a heatmap of the CNNCF result.
T212	27725-27815	Sentence	denotes	Figure 2b shows the heatmaps of four examples of COVID-19 cases in the X-data and CT-data.
T213	27816-28011	Sentence	denotes	In the second part, we used density-based spatial clustering of applications with noise (DBSCAN) to calculate the center pixel coordinates (CPC) of the salient features corresponding to COVID-19.
T214	28012-28058	Sentence	denotes	All CPCs were normalized to a range of 0 to 1.
T215	28059-28209	Sentence	denotes	Subsequently, we used a significance test (ST)42 to analyze the relationship between the CPC of the CNNCF output and the CPC annotated by the experts.
T216	28210-28459	Sentence	denotes	A good performance was obtained, with a mean square error (MSE) of 0.0108, a mean absolute error (MAE) of 0.0722, a root mean squared error (RMSE) of 0.1040, a correlation coefficient (r) of 0.9761, and a coefficient of determination (R2) of 0.8801.
T217	28461-28582	Sentence	denotes	A strong correlation was observed between the lesion areas detected by the proposed framework and the clinical indicators
T218	28583-28734	Sentence	denotes	In clinical practice, multiple clinical indicators are analyzed to determine whether further examinations (i.e., medical image examination) are needed.
T219	28735-28810	Sentence	denotes	These indicators can be used to assess the predictive ability of the model.
T220	28811-28912	Sentence	denotes	In addition, various examinations are required to perform an accurate diagnosis in clinical practice.
T221	28913-29003	Sentence	denotes	However, the correlations between the results of various examinations are often not clear.
T222	29004-29323	Sentence	denotes	We used the stage II sub-framework and the regressor block of the CNNRF to conduct a correlation analysis between the lesion areas detected by the framework and five clinical indicators (white blood cell count, neutrophil percentage, lymphocyte percentage, procalcitonin, C-reactive protein) of COVID-19 using the CADS.
T223	29324-29517	Sentence	denotes	The inputs of the CNNRF were the lesion area images of each case, and the output was a 5-dimensional vector describing the correlation between the lesion areas and the five clinical indicators.
T224	29518-29582	Sentence	denotes	The MAE, MSE, RMSE, r, and R2 were used to evaluate the results.
T225	29583-29730	Sentence	denotes	The ST and the Pearson correlation coefficient (PCC)43 were used to determine the correlation between the lesion areas and the clinical indicators.
T226	29731-29842	Sentence	denotes	A strong correlation was obtained, with MSE = 0.0163, MAE = 0.0941, RMSE = 0.1172, r = 0.8274, and R2 = 0.6465.
T227	29843-29936	Sentence	denotes	At a significance level of 0.001, the value of r was 1.27 times the critical value of 0.6524.
T228	29937-30047	Sentence	denotes	This result indicates a high and significant correlation between the lesion areas and the clinical indicators.
T229	30048-30119	Sentence	denotes	The PCC was 0.8274 (range of 0.8–1.0), indicating a strong correlation.
T230	30120-30183	Sentence	denotes	The CNNRF was trained on the CATS and evaluated using the CAVS.
T231	30184-30301	Sentence	denotes	The initial learning rate was 0.01, and the optimization function was the stochastic gradient descent (SGD) method44.
T232	30302-30388	Sentence	denotes	The parameters of the CNNRF were initialized using the Xavier initialization method45.
T233	30390-30400	Sentence	denotes	Discussion
T234	30401-30521	Sentence	denotes	We developed a computer-aided diagnosis method for the identification of COVID-19 in medical images using DL techniques.
T235	30522-30647	Sentence	denotes	Strong correlations were obtained between the lesion areas identified by the proposed CNNRF and the five clinical indicators.
T236	30648-30729	Sentence	denotes	An excellent agreement was observed between the model results and expert opinion.
T237	30730-31014	Sentence	denotes	Popular image annotation tools (e.g., Labelme46 and VOTT47) are used to annotate various images and support common formats, such as Joint Photographic Experts Group (JPG), Portable Network Graphics (PNG), and Tag Image File Format (TIFF); these formats are not used in the DICOM data.
T238	31015-31192	Sentence	denotes	Therefore, we developed an annotation platform that does not require much storage space or transformations and can be deployed on a private cloud for security and local sharing.
T239	31193-31342	Sentence	denotes	Our eyes are not highly sensitive to grayscale images in regions with high average brightness48, resulting in relatively low identification accuracy.
T240	31343-31482	Sentence	denotes	The proposed pseudo-color method increased the information content of the medical images and facilitated the identification of the details.
T241	31483-31582	Sentence	denotes	PCA has been widely used for feature extraction and dimensionality reduction in image processing49.
T242	31583-31647	Sentence	denotes	We used PCA to determine the feature space of the sub-data sets.
T243	31648-31747	Sentence	denotes	Each image in a specified sub-data set was represented as a linear combination of the eigenvectors.
T244	31748-31865	Sentence	denotes	Since the eigenvectors describe the most informative regions in the medical images, they represent each sub-data set.
T245	31866-31953	Sentence	denotes	We visualized the top-five eigenvectors of each sub-data set using an intuitive method.
T246	31954-32105	Sentence	denotes	The CNNCF is a modular framework consisting of two stages that were trained with different optimization goals and controlled by the control gate block.
T247	32106-32283	Sentence	denotes	Each stage consisted of multiple residual blocks (ResBlock-A and ResBlock-B) that retained the features in the different layers, thereby preventing the degradation of the model.
T248	32284-32391	Sentence	denotes	The design of the control gate block was inspired by the synaptic frontend structure in the nervous system.
T249	32392-32500	Sentence	denotes	We calculated the score of the optimization target, and a score above a predefined threshold was acceptable.
T250	32501-32644	Sentence	denotes	If the times of the neurotransmitter were above another predefined threshold, the control gate was opened to let the features information pass.
T251	32645-32696	Sentence	denotes	The framework was trained in a step-by-step manner.
T252	32697-32906	Sentence	denotes	Training occurred at each stage for a specified goal, and the second stage used the features extracted by the first stage, thereby reusing the features and increasing the convergence speed of the second stage.
T253	32907-33024	Sentence	denotes	The CNNCF exhibited excellent performance for identifying the COVID-19 cases automatically in the X-data and CT-data.
T254	33025-33212	Sentence	denotes	Unlike traditional machine learning methods, the CNNCF was trained in an end-to-end manner, which ensured the flexibility of the framework for different data sets without much adjustment.
T255	33213-33446	Sentence	denotes	We adopted a knowledge distillation method in the training phrase; a small model (called a student network) was trained to mimic the ensemble of multiple models (called teacher networks) to obtain a small model with high performance.
T256	33447-33578	Sentence	denotes	In the distillation process, knowledge was transferred from the teacher networks to the student network to minimize knowledge loss.
T257	33579-33668	Sentence	denotes	The target was the output of the teacher networks; these outputs were called soft labels.
T258	33669-33858	Sentence	denotes	The student network also learned from the ground-truth labels (also called hard labels), thereby minimizing the knowledge loss from the student networks, whose targets were the hard labels.
T259	33859-34005	Sentence	denotes	Therefore, the overall loss function of the student network incorporated both knowledge distillation and knowledge loss from the student networks.
T260	34006-34249	Sentence	denotes	After the student network had been well-trained, the task of the teacher networks was complete, and the student model could be used on a regular computer with a fast speed, which is suitable for hospitals without extensive computing resources.
T261	34250-34381	Sentence	denotes	As a result of the knowledge distillation method, the CNNCF achieved high performance with a few parameters in the teacher network.
T262	34382-34510	Sentence	denotes	The CNNRF is a modular framework consisting of one stage II sub-framework and one regressor block to handle the regression task.
T263	34511-34782	Sentence	denotes	In the regressor block, we used skip connections that consisted of a convolution layer with multiple 1 × 1 convolution kernels for retaining the features extracted by the stage II sub-framework while improving the non-linear representation ability of the regressor block.
T264	34783-34982	Sentence	denotes	We made use of flexible blocks to achieve good performance for the classification and regression tasks, unlike traditional machine learning methods, which are commonly used for either of these tasks.
T265	34983-35133	Sentence	denotes	Five statistical indices, including sensitivity, specificity, precision, kappa coefficient, and F1 were used to evaluate the performance of the CNNCF.
T266	35134-35259	Sentence	denotes	The sensitivity is related to the positive detection rate and is of great significance in the diagnostic testing of COVID-19.
T267	35260-35359	Sentence	denotes	The specificity refers to the ability of the model to correctly identify patients with the disease.
T268	35360-35442	Sentence	denotes	The precision indicates the ability of the model to provide a positive prediction.
T269	35443-35506	Sentence	denotes	The kappa demonstrates the stability of the model’s prediction.
T270	35507-35564	Sentence	denotes	The F1 is the harmonic mean of precision and sensitivity.
T271	35565-35703	Sentence	denotes	Good performance was achieved by the CNNCF based on the five statistical indices for the multi-modal image data sets (X-data and CT-data).
T272	35704-35808	Sentence	denotes	The consistency between the model results and the expert evaluation was determined using McNemar’s test.
T273	35809-35984	Sentence	denotes	The good performance demonstrated the model’s capacity of learning from the experts using the labels of the image data and mimicking the experts in diagnostic decision-making.
T274	35985-36082	Sentence	denotes	The ROC and PRC of the CNNCF were used to evaluate the performance of the classification model50.
T275	36083-36241	Sentence	denotes	The ROC is a probability curve that shows the trade-off between the true positive rate (TPR) and false-positive rate (FPR) using different threshold settings.
T276	36242-36360	Sentence	denotes	The AUROC provides a measure of separability and demonstrated the discriminative capacity of the classification model.
T277	36361-36493	Sentence	denotes	The larger the AUROC, the better the performance of the model is for predicting the true positive (TP) and true negative (TN) cases.
T278	36494-36613	Sentence	denotes	The PRC shows the trade-off between the TPR and the positive predictive value (PPV) using different threshold settings.
T279	36614-36700	Sentence	denotes	The larger the AUPRC, the higher the capacity of the model is to predict the TP cases.
T280	36701-36815	Sentence	denotes	In our experiments, the CNNCF achieved high scores for both the AUPRC and AUROC (>99%) for the X-data and CT-data.
T281	36816-36942	Sentence	denotes	DL has made significant progress in numerous areas in recent years and has provided best-performance solutions for many tasks.
T282	36943-37126	Sentence	denotes	In areas that require high interpretability, such as autonomous driving and medical diagnosis, DL has disadvantages because it is a black-box approach and lacks good interpretability.
T283	37127-37314	Sentence	denotes	The strong correlation obtained between the CNNCF output and the experts’ evaluation suggested that the mechanism of the proposed CNNCF is similar to that used by humans analyzing images.
T284	37315-37479	Sentence	denotes	The combination of the visual interpretation and the correlation analysis enhanced the ability of the framework to interpret the results, making it highly reliable.
T285	37480-37606	Sentence	denotes	The CNNCF has a promising potential for clinical diagnosis considering its high performance and hybrid interpretation ability.
T286	37607-38039	Sentence	denotes	We have explored the potential use of the CNNCF for clinical diagnosis with the support of the Beijing Youan hospital (which is an authoritative hospital for the study of infectious diseases and one of the designated hospitals for COVID-19 treatment) using both real data after privacy masking and input from experts under experimental conditions and provided a suitable schedule for assisting experts with the radiography analysis.
T287	38040-38125	Sentence	denotes	However, medical diagnosis in a real situation is more complex than in an experiment.
T288	38126-38325	Sentence	denotes	Therefore, further studies will be conducted in different hospitals with different complexities and uncertainties to obtain more experience in multiple clinical use cases with the proposed framework.
T289	38326-38520	Sentence	denotes	The objective of this study was to use statistical methods to analyze the relationship between salient features in images and expert evaluations and test the discriminative ability of the model.
T290	38521-38716	Sentence	denotes	The CNNRF can be considered a cross-modal prediction model, which is a challenging research area that requires more attention because it is closely related to associative thinking and creativity.
T291	38717-38879	Sentence	denotes	In addition, the correlation analysis might be a possible optimization direction to improve the interpretability performance of the classification model using DL.
T292	38880-39104	Sentence	denotes	In conclusion, we proposed a complete framework for the computer-aided diagnosis of COVID-19, including data annotation, data preprocessing, model design, correlation analysis, and assessment of the model’s interpretability.
T293	39105-39244	Sentence	denotes	We developed a pseudo-color tool to convert the grayscale medical images to color images to facilitate image interpretation by the experts.
T294	39245-39371	Sentence	denotes	We developed a platform for the annotation of medical images characterized by high security, local sharing, and expandability.
T295	39372-39507	Sentence	denotes	We designed a simple data preprocessing method for converting multiple types of images (X-data, CT-data) to three-channel color images.
T296	39508-39675	Sentence	denotes	We established a modular CNN-based classification framework with high flexibility and wide use cases, consisting of the ResBlock-A, ResBlock-B, and Control Gate Block.
T297	39676-39835	Sentence	denotes	A knowledge distillation method was used as a training strategy for the proposed classification framework to ensure high performance with fast inference speed.
T298	39836-40083	Sentence	denotes	A CNN-based regression framework that required minimal changes to the architecture of the classification framework was employed to determine the correlation between the lesion area images of patients with COVID-19 and the five clinical indicators.
T299	40084-40303	Sentence	denotes	The three evaluation indices (F1, kappa, specificity) of the classification framework were similar to those of the respiratory resident and the emergency resident and slightly higher than that of the respiratory intern.
T300	40304-40433	Sentence	denotes	We visualized the salient features that contributed most to the CNNCF output in a heatmap for easy interpretability of the CNNCF.
T301	40434-40613	Sentence	denotes	The proposed CNNCF computer-aided diagnosis method showed relatively high precision and has a potential for the automatic diagnosis of COVID-19 in clinical practice in the future.
T302	40614-40723	Sentence	denotes	The outbreak of the COVID-19 epidemic poses serious threats to the safety and health of the human population.
T303	40724-40867	Sentence	denotes	At present, popular methods for the diagnosis and monitoring of viruses include the detection of viral RNAs using PCR or a test for antibodies.
T304	40868-41037	Sentence	denotes	However, one negative result of the RT-PCR test (especially in the areas of high infection risk) might not be enough to rule out the possibility of a COVID-19 infection.
T305	41038-41158	Sentence	denotes	On June 14, 2020, the Beijing Municipal Health Commission declared that strict management of fever clinics was required.
T306	41159-41425	Sentence	denotes	All medical institutions in Beijing were required to conduct tests to detect COVID-19 nucleic acids and antibodies, CT examinations, and the routine blood test (also referred to as “1 + 3 tests”) for patients with fever that live in areas with high infection risk51.
T307	41426-41627	Sentence	denotes	Therefore, the proposed computer-aided diagnosis using medical imaging could be used as an auxiliary diagnosis tool to help physicians identify people with high infection risk in the clinical workflow.
T308	41628-41703	Sentence	denotes	There is also a potential for broader applicability of the proposed method.
T309	41704-41857	Sentence	denotes	Once the method has been improved, it might be used in other diagnostic decision-making scenarios (lung cancer, liver cancer, etc.) using medical images.
T310	41858-41943	Sentence	denotes	The expertise of a specialist will be required in clinical cases in future scenarios.
T311	41944-42105	Sentence	denotes	However, we are optimistic about the potential of using DL methods in intelligent medicine and expect that many people will benefit from the advanced technology.
T312	42107-42114	Sentence	denotes	Methods
T313	42116-42135	Sentence	denotes	Data sets splitting
T314	42136-42312	Sentence	denotes	We used the multi-modal data sets from four public data sets and one hospital (Youan hospital) in our research and split the hybrid data set in the following manner.For X-data:
T315	42313-42455	Sentence	denotes	The CXR images of COVID-19 cases collected from the public CCD52 contained 212 patients diagnosed with COVID-19 and were resized to 512 × 512.
T316	42456-42529	Sentence	denotes	Each image contained 1–2 suspected areas with inflammatory lesions (SAs).
T317	42530-42629	Sentence	denotes	We also collected 5100 normal cases and 3100 pneumonia cases from another public data set (RSNA)53.
T318	42630-42851	Sentence	denotes	In addition, The CXR images collected from the Youan hospital contained 45 cases diagnosed with COVID-19, 503 normal cases, 435 cases diagnosed with pneumonia (not COVID-19 patients), and 145 cases diagnosed as influenza.
T319	42852-42958	Sentence	denotes	The CXR images collected from the Youan hospital were obtained using the Carestream DRX-Revolution system.
T320	42959-43076	Sentence	denotes	All the CXR images of COVID-19 cases were analyzed by the two experienced radiologists to determine the lesion areas.
T321	43077-43256	Sentence	denotes	The X-data of the normal cases (XNPDS), that of the pneumonia cases (XPPDS), and that of the COVID-19 cases (XCPDS) from public data sets constituted the X public data set (XPDS).
T322	43257-43440	Sentence	denotes	The X-data of the normal cases (XNHDS), that of the pneumonia cases (XPHDS), and that of the COVID-19 cases (XCHDS) from the Youan hospital constituted the X hospital data set (XHDS).
T323	43441-43453	Sentence	denotes	For CT-data:
T324	43454-43936	Sentence	denotes	We collected CT-data of 120 normal cases from a public lung CT-data set (LUNA16, a large data set for automatic nodule detection in the lungs54), which was a subset of LIDC-IDRI (The LIDC-IDRI contains a total of 1018 helical thoracic CT scans collected using manufacturers from eight medical imaging companies including AGFA Healthcare, Carestream Health, Inc., Fuji Photo Film Co., GE Healthcare, iCAD, Inc., Philips Healthcare, Riverain Medical, and Siemens Medical Solutions)55.
T325	43937-44102	Sentence	denotes	It was confirmed by the two experienced radiologists from the Youan Hospital that no lesion areas of COVID-19, pneumonia, or influenza were present in the 120 cases.
T326	44103-44245	Sentence	denotes	We also collected the CT-data of pneumonia cases from a public data set (images of COVID-19 positive and negative pneumonia patients: ICNP)56.
T327	44246-44418	Sentence	denotes	The CT-data collected from the Youan hospital contained 95 patients diagnosed with COVID-19, 50 patients diagnosed with influenza and 215 patients diagnosed with pneumonia.
T328	44419-44587	Sentence	denotes	The images of the CT scans collected from the Youan hospital were obtained using the PHILIPS Brilliance iCT 256 system (Which was also used for the LIDC-IDRI data set).
T329	44588-44701	Sentence	denotes	The slice thickness of the CT scans was 5 mm, and the CT-data images were grayscale images with 512 × 512 pixels.
T330	44702-44892	Sentence	denotes	Areas with 2–5 SAs were annotated by the two experienced radiologists using a rapid keystroke-entry format in the images for each case, and these areas ranged from 16 × 16 to 64 × 64 pixels.
T331	44893-45044	Sentence	denotes	The CT-data of the normal cases (CTNPDS) and that of the pneumonia cases (CTPPDS) from the public data sets constituted the CT public data set (CTPDS).
T332	45045-45289	Sentence	denotes	The CT-data of the COVID-19 cases from the Youan hospital (CTCHDS), the influenza cases from the Youan hospital (CTIHDS), and the normal cases from the Youan hospital (CTNHDS) constituted the CT hospital (clinically-diagnosed) data set (CTHDS).
T333	45290-45318	Sentence	denotes	For clinical indicator data:
T334	45319-45545	Sentence	denotes	Five clinical indicators (white blood cell count, neutrophil percentage, lymphocyte percentage, procalcitonin, C-reactive protein) of 95 COVID-19 cases were obtained from the Youan hospital, as shown in Supplementary Table 20.
T335	45546-45802	Sentence	denotes	A total of 95 data pairs from the 95 COVID-19 cases (369 images of the lesion area and the 95 × 5 clinical indicators) were collected from the Youan hospital for the correlation analysis of the lesion areas of the COVID-19 and the five clinical indicators.
T336	45803-45910	Sentence	denotes	The images of the SAs and the clinical indicator data constituted the correlation analysis data set (CADS).
T337	45911-46030	Sentence	denotes	We split the XPDS, XHDS, CTPDS, CTHDS, and CADS into the training-validation (train-val) and test data sets using TTSF.
T338	46031-46137	Sentence	denotes	The details of the hybrid data sets for the public data sets and Youan hospital data are shown in Table 1.
T339	46138-46225	Sentence	denotes	The train-val part of CTHDS is referred to as CTHTS, and the test part is called CTHVS.
T340	46226-46367	Sentence	denotes	The same naming scheme was adopted for XPDS, XHDS, CTPDS, and CADS, i.e., XPTS, XPVS, XHTS, XHVS, CTPTS, CTPVS, CATS, and CAVS, respectively.
T341	46368-46552	Sentence	denotes	The training-validation part of the four public data sets and the hospital (Youan Hospital) data set were mixed for X-data and CT-data, which were named as XMTS and CTMTS respectively.
T342	46553-46626	Sentence	denotes	While the test parts were split in the same way and named XMVS and CTMVS.
T343	46628-46647	Sentence	denotes	Image preprocessing
T344	46648-46830	Sentence	denotes	All image data (X-data and CT-data) in the DICOM format were loaded using the Pydicom library (version 1.4.0) and processed as arrays using the Numpy library (version 1.16.0).X-data:
T345	46831-47014	Sentence	denotes	The two-dimensional array (x axis and y axis) of the image of the X-data (size of 512 × 512) was normalized to pixel values of 0–255 and stored in png format using the OpenCV library.
T346	47015-47083	Sentence	denotes	Each preprocessed image was resized to 512 × 512 and had 3 channels.
T347	47084-47092	Sentence	denotes	CT-data:
T348	47093-47254	Sentence	denotes	The array of the CT-data was three-dimensional (x axis, y axis, and z axis), and the length of the z axis was ~300, which represented the number of image slices.
T349	47255-47331	Sentence	denotes	Each image slice was two-dimensional (x axis and y axis, size of 512 × 512).
T350	47332-47519	Sentence	denotes	As shown in Fig. 1b, the array of the image was divided into three groups in the z axis direction, and each group contained 100 image slices (each case was resampled to 300 image slices).
T351	47520-47650	Sentence	denotes	The image slices in each group were processed using a window center of −600 and a window width of 2000 to extract the lung tissue.
T352	47651-47789	Sentence	denotes	The images of the CT-data with 300 image slices were normalized to pixel values of 0–255 and stored in npy format using the Numpy library.
T353	47790-47999	Sentence	denotes	A convolution filter was applied with three 1 × 1 convolution kernels to preprocess the CT-data, which is a trainable layer with the aim of normalizing the input; the image size was 512 × 512, with 3 channels.
T354	48001-48035	Sentence	denotes	Annotation tool for medical images
T355	48036-48161	Sentence	denotes	The server program of the annotation tool was deployed in a computer with large network bandwidth and abundant storage space.
T356	48162-48297	Sentence	denotes	The client program of the annotation tool was deployed in the office computer of the experts, who were given unique user IDs for login.
T357	48298-48458	Sentence	denotes	The interface of the client program had a built-in image viewer with a window size of 512 × 512 and an export tool for obtaining the annotations in text format.
T358	48459-48682	Sentence	denotes	Multiple drawing tools were provided to annotate the lesion area in the images, including a rectangle tool for drawing a bounding box around the target, a polygon tool for outlining the target, and a circle tool the target.
T359	48683-48753	Sentence	denotes	Multiple categories could be defined and assigned to the target areas.
T360	48754-48980	Sentence	denotes	All annotations were stored in a structured query language (SQL) database, and the export tool was used to export the annotations to two common file formats (comma-separated values (csv) and JavaScript object notation (json)).
T361	48981-49028	Sentence	denotes	The experts could share the annotation results.
T362	49029-49166	Sentence	denotes	Since the size of the X-data and the CT slice-data were identical, the annotations for both data were performed with the annotation tool.
T363	49167-49262	Sentence	denotes	Here we use one image slice of the CT-data as an example to demonstrate the annotation process.
T364	49263-49332	Sentence	denotes	In this study, two experts were asked to annotate the medical images.
T365	49333-49393	Sentence	denotes	The normal cases were reviewed and confirmed by the experts.
T366	49394-49488	Sentence	denotes	The abnormal cases, including the COVID-19 and influenza cases, were annotated by the experts.
T367	49489-49579	Sentence	denotes	Bounding boxes of the lesion areas in the images were annotated using the annotation tool.
T368	49580-49640	Sentence	denotes	In general, each case contained 2–5 slices with annotations.
T369	49641-49784	Sentence	denotes	The cases with the annotated slices were considered positive cases, and each case was assigned to a category (COVID-19 case or influenza case).
T370	49785-49850	Sentence	denotes	The pipeline of the annotation was shown in Supplementary Fig. 1.
T371	49852-49883	Sentence	denotes	Model architecture and training
T372	49884-50115	Sentence	denotes	In this study, we proposed a modular CNNCF to identify the COVID-19 cases in the medical images and a CNNRF to determine the relationships between the lesion areas in the medical images and the five clinical indicators of COVID-19.
T373	50116-50192	Sentence	denotes	Both proposed frameworks consisted of two units (ResBlock-A and ResBlock-B).
T374	50193-50295	Sentence	denotes	The CNNCF and CNNRF had unique units, namely the control gate block and regressor block, respectively.
T375	50296-50421	Sentence	denotes	Both frameworks were implemented using two NVIDIA GTX 1080TI graphics cards and the open-source PyTorch framework.ResBlock-A:
T376	50422-50442	Sentence	denotes	As discussed in ref.
T377	50443-50584	Sentence	denotes	57, the residual block is a CNN-based block that allows the CNN models to reuse features, thus accelerating the training speed of the models.
T378	50585-50745	Sentence	denotes	In this study, we developed a residual block (ResBlock-A) that utilized a skip-connection for retaining features in different layers in the forward propagation.
T379	50746-50958	Sentence	denotes	This block (Fig. 6a) consisted of a multiple-input multiple-output structure with two branches (an upper branch and a bottom branch), where input 1 and input 2 have the same size, but the values may be different.
T380	50959-51052	Sentence	denotes	In contrast, output 1 and output 2 had the same size, but output 1 did not have a ReLu layer.
T381	51053-51180	Sentence	denotes	The upper branch consisted of a max-pooling layer (Max-Pooling), a convolution layer (Conv 1 × 1), and a batch norm layer (BN).
T382	51181-51415	Sentence	denotes	The Max-Pooling had a kernel size of 3 × 3 and a stride of 2 to downsample the input 1 for retaining the features and ensuring the same size as the output layer before the element-wise add operation was conducted in the bottom branch.
T383	51416-51594	Sentence	denotes	The Conv 1 × 1 consisted of multiple 1 × 1 convolution kernels with the same number as that in the second convolution layer in the bottom branch to adjust the number of channels.
T384	51595-51744	Sentence	denotes	The BN used a regulation function to ensure the input in each layer of the model followed a normal distribution with a mean of 0 and a variance of 1.
T385	51745-51835	Sentence	denotes	The bottom branch consisted of two convolution layers, two BN layers, and two ReLu layers.
T386	51836-52044	Sentence	denotes	The first convolution layer in the bottom branch consisted of multiple 3 × 3 convolution kernels with a stride of 2 and a padding of 1 to reduce the size of the feature maps when local features were obtained.
T387	52045-52181	Sentence	denotes	The second convolution layer in the bottom branch consisted of multiple 3 × 3 convolution kernels with a stride of 1 and a padding of 1.
T388	52182-52301	Sentence	denotes	The ReLu function was used as the activation function to ensure a non-linear relationship between the different layers.
T389	52302-52436	Sentence	denotes	The output of the upper branch and the output of the bottom branch after the second BN were fused using an element-wise add operation.
T390	52437-52523	Sentence	denotes	The fused result was output 1, and the fused result after the ReLu layer was output 2.
T391	52524-52572	Sentence	denotes	Fig. 6 The four units of the proposed framework.
T392	52573-53253	Sentence	denotes	a ResBlock-A architecture, containing two convolution layers with 3 × 3 kernels, one convolution layer with a 1 × 1 kernel, three batch normalization layers, two ReLu layers, and one max-pooling layer with a 3 × 3 kernel. b ResBlock-B architecture; the basic unit is the same as the ResBlock-A, except for output 1. c The Control Gate Block has a synaptic-based frontend architecture that controls the direction of the feature map flow and the overall optimization direction of the framework. d The Regressor architecture is a skip-connection architecture containing one convolution layer with 3 × 3 kernels, one batch normalization layer, one ReLu layer, and three linear layers.
T393	53254-53265	Sentence	denotes	ResBlock-B:
T394	53266-53402	Sentence	denotes	The ResBlock-B (Fig. 6b) was a multiple-input single-output block that was similar to the ResBlock-A, except that there was no output 1.
T395	53403-53553	Sentence	denotes	The value of the stride and padding in each layer of the ResBlock-A and ResBlock-B could be adjusted using hyper-parameters based on the requirements.
T396	53554-53573	Sentence	denotes	Control Gate Block:
T397	53574-53827	Sentence	denotes	As shown in Fig. 6c, the Control Gate Block was a multiple-input single-output block consisting of a predictor module, a counter module, and a synapses module to control the optimization direction while controlling the information flow in the framework.
T398	53828-53952	Sentence	denotes	The pipeline of the predictor module is shown in Supplementary Fig. 19a, where the Input S1 is the output of the ResBlock-B.
T399	53953-54054	Sentence	denotes	The Input S1 was then flattened to a one-dimensional feature vector as the input of the linear layer.
T400	54055-54161	Sentence	denotes	The output of the linear layer was converted to a probability of each category using the softmax function.
T401	54162-54310	Sentence	denotes	A sensitivity calculator used the Vpred and Vtrue as inputs to calculate the TP, TN, FP, and false-negative (FN) rates to calculate the sensitivity.
T402	54311-54410	Sentence	denotes	The sensitivity calculation was followed by a step function to control the output of the predictor.
T403	54411-54557	Sentence	denotes	The ths was a threshold value; if the calculated sensitivity was greater or equal to ths, the step function output 1; otherwise, the output was 0.
T404	54558-54639	Sentence	denotes	The counter module was a conditional counter, as shown in Supplementary Fig. 19b.
T405	54640-54705	Sentence	denotes	If the input n was zero, the counter was cleared and set to zero.
T406	54706-54744	Sentence	denotes	Otherwise, the counter increased by 1.
T407	54745-54779	Sentence	denotes	The output of the counter was num.
T408	54780-54929	Sentence	denotes	The synapses block mimicked the synaptic structure, and the input variable num was similar to a neurotransmitter, as shown in Supplementary Fig. 19c.
T409	54930-54989	Sentence	denotes	The input num was the input parameter of the step function.
T410	54990-55118	Sentence	denotes	The ths was a threshold value; if the input num was greater or equal to ths, the step function output 1; otherwise, it output 0.
T411	55119-55222	Sentence	denotes	An element-wise multiplication was performed between the input S1 and the output of the synapses block.
T412	55223-55278	Sentence	denotes	The multiplied result was passed on to a discriminator.
T413	55279-55379	Sentence	denotes	If the sum of each element in the result was not zero, the Input S1 was passed on to the next layer.
T414	55380-55434	Sentence	denotes	Otherwise, the input S1 information was not passed on.
T415	55435-55451	Sentence	denotes	Regressor block:
T416	55452-55580	Sentence	denotes	The regressor block consisted of multiple linear layers, a convolution layer, a BN layer, and a ReLu layer, as shown in Fig. 6d.
T417	55581-55723	Sentence	denotes	A skip-connection architecture was adopted to retain the features and increase the ability of the block to represent non-linear relationships.
T418	55724-55854	Sentence	denotes	The convolution block in the skip-connection structure was a convolution layer with multiple numbers of 1 × 1 convolution kernels.
T419	55855-56010	Sentence	denotes	The number of the convolution kernels was the same as that of the output size of the second linear layer to ensure the consistency of the vector dimension.
T420	56011-56112	Sentence	denotes	The input size and output size of each linear layer were adjustable to be applicable to actual cases.
T421	56113-56255	Sentence	denotes	Based on the four blocks, two frameworks were designed for the classification task and regression task, respectively.Classification framework:
T422	56256-56321	Sentence	denotes	The CNNCF consisted of stage I and stage II, as shown in Fig. 3a.
T423	56322-56393	Sentence	denotes	Stage I was duplicated Q times in the framework (in this study, Q = 1).
T424	56394-56516	Sentence	denotes	It consisted of multiple ResBlock-A with a number of M (in this study, M = 2), one ResBlock-B, and one Control Gate Block.
T425	56517-56620	Sentence	denotes	Stage II consisted of multiple ResBlock-A with a number of N (in this study, N = 2) and one ResBlock-B.
T426	56621-56767	Sentence	denotes	The weighted cross-entropy loss function was used and was minimized using the SGD optimizer with a learning rate of a1 (in this study, a1 = 0.01).
T427	56768-57018	Sentence	denotes	A warm-up strategy58 was used in the initialization of the learning rate for a smooth training start, and a reduction factor of b1 (in this study, b1 = 0.1) was used to reduce the learning rate after every c1 (in this study, c1 = 10) training epochs.
T428	57019-57157	Sentence	denotes	The model was trained for d1 (in this study, d1 = 40) epochs, and the model parameters saved in the last epoch was used in the test phase.
T429	57158-57179	Sentence	denotes	Regression framework:
T430	57180-57252	Sentence	denotes	The CNNRF (Fig. 3b) consisted of two parts (stage II and the regressor).
T431	57253-57497	Sentence	denotes	The inputs to the regression framework were the images of the lesion areas, and the output was the corresponding vector with five dimensions, representing the five clinical indicators (all clinical indicators were normalized to a range of 0–1).
T432	57498-57602	Sentence	denotes	The stage II structure was the same as that in the classification framework, except for some parameters.
T433	57603-57746	Sentence	denotes	The loss function was the MSE loss function, which was minimized using the SGD optimizer with a learning rate of a2 (in this study, a2 = 0.01).
T434	57747-57995	Sentence	denotes	A warm-up strategy was used in the initialization of the learning rate for a smooth training start, and a reduction factor of b2 (in this study, b2 = 0.1) was used to reduce the learning rate after every c2 (in this study, c2 = 50) training epochs.
T435	57996-58140	Sentence	denotes	The framework was trained for d2 (in this study, d2 = 200) epochs, and the model parameters saved in the last epoch were used in the test phase.
T436	58141-58186	Sentence	denotes	The workflow of the classification framework.
T437	58187-58260	Sentence	denotes	The workflow of the classification framework was demonstrated in Fig. 3c.
T438	58261-58389	Sentence	denotes	The preprocessed images are sent to the first convolution block to expand the channels and processed as the input for the CNNCF.
T439	58390-58532	Sentence	denotes	Given the input Fi with a size of M × N × 64, the stage I output feature maps F′i with a size of M/8 × N/8 × 256 in the default configuration.
T440	58533-58672	Sentence	denotes	As we introduced above, the Control Gate Block controls the optimization direction while controlling the information flow in the framework.
T441	58673-58755	Sentence	denotes	If the Control Gate Block is open, the feature maps F′i are passed on to stage II.
T442	58756-59061	Sentence	denotes	Given the input F′i, the stage II output the feature maps F″i with a size of M/64 × N/64 × 512 which is defined as follows:1 Fi′=S1(Fi)Fi″=S2(Fi′)⊗CGB(Fi′),where S1 denotes the stage I block, S2 denotes the stage II block, and CGB is the Control Gate Block. ⊗ is the element-wise multiplication operation.
T443	59062-59218	Sentence	denotes	Stage II is Followed by a global average pooling layer (GAP) and a fully connect layer (FC layer) with a softmax function to generate the final predictions.
T444	59219-59309	Sentence	denotes	Given F″i as input, the GAP is adopted to generate a vector Vf with a size of 1 × 1 × 512.
T445	59310-59677	Sentence	denotes	Given Vf as input, the FC layer with the softmax function outputs a vector Vc with a size of 1 × 1 × C.2 Vf=GAPFi′Vc=SMaxFCVf,where GAP is the global average pooling layer, the FC is the fully connect layer, SMax is the softmax function, Vf is the feature vector generated by the GAP, Vc is the prediction vector, and C is the number of case types used in this study.
T446	59679-59756	Sentence	denotes	Training strategies and evaluation indicators of the classification framework
T447	59757-59850	Sentence	denotes	The training strategies and hyper-parameters of the classification framework were as follows.
T448	59851-60029	Sentence	denotes	We adopted a knowledge distillation method (Fig. 7) to train the CNNCF as a student network with one stage I block and one stage II block, each of which contained two ResBlock-A.
T449	60030-60234	Sentence	denotes	Four teacher networks (the hyper-parameters are provided in Supplementary Table 21) with the proposed blocks were trained on the train-val part of each sub-data set using a 5-fold cross-validation method.
T450	60235-60304	Sentence	denotes	All networks were initialized using the Xavier initialization method.
T451	60305-60383	Sentence	denotes	The initial learning rate was 0.01, and the optimization function was the SGD.
T452	60384-60494	Sentence	denotes	The CNNCF was trained using the image data and the label, as well as the fused output of the teacher networks.
T453	60495-60617	Sentence	denotes	The comparison of RT-PCR test results using throat specimen and the CNNCF results were provided in Supplementary Table 22.
T454	60618-60695	Sentence	denotes	Supplementary Fig. 20 shows the details of the knowledge distillation method.
T455	60696-60812	Sentence	denotes	The definitions and details of the five evaluation indicators used in this study were given in Supplementary Note 2.
T456	60813-60912	Sentence	denotes	Fig. 7 Knowledge distillation consisting of multiple teacher networks and a target student network.
T457	60913-61013	Sentence	denotes	The knowledge is transferred from the teacher networks to the student network using a loss function.
T458	61015-61054	Sentence	denotes	Gradient-weighted class activation maps
T459	61055-61198	Sentence	denotes	Grad-CAM59 in the Pytorch framework was used to visualize the salient features that contributed the most to the prediction output of the model.
T460	61199-61445	Sentence	denotes	Given a target category, the Grad-CAM performed back-propagation to obtain the final CNN feature maps and the gradient of the feature maps; only pixels with positive contributions to the specified category were retained through the ReLU function.
T461	61446-61645	Sentence	denotes	The Grad-CAM method was used for all test data set (X-data and CT-data) in the CNNCF without changing the framework structure to obtain a visual output of the framework’s high discriminatory ability.
T462	61647-61677	Sentence	denotes	Statistics and reproducibility
T463	61678-61796	Sentence	denotes	We used multiple statistical indices and empirical distributions to assess the performance of the proposed frameworks.
T464	61797-61956	Sentence	denotes	The equations of the statistical indices are shown in Supplementary Fig. 21 and all the abbreviations used in this study are defined in Supplementary Table 23.
T465	61957-62085	Sentence	denotes	All the data used in this study followed the criteria: (1) sign informed consent prior to enrollment. (2) At least 18 years old.
T466	62086-62217	Sentence	denotes	This study was conducted following the declaration of Helsinki and was approved by the Capital Medical University Ethics Committee.
T467	62218-62419	Sentence	denotes	The following statistical analyses of the data were conducted for both evaluating the classification framework and the regression framework.Statistical indices to evaluate the classification framework.
T468	62420-62647	Sentence	denotes	Multiple evaluation indicators (PRC, ROC, AUPRC, AUROC, sensitivity, specificity, precision, kappa index, and F1 with a fixed threshold) were computed for a comprehensive and accurate assessment of the classification framework.
T469	62648-62764	Sentence	denotes	Multiple threshold values were in the range from 0 to 1 with a step value of 0.005 to obtain the ROC and PRC curves.
T470	62765-62931	Sentence	denotes	The PRC showed the relationship between the precision and the sensitivity (or recall), and the ROC indicated the relationship between the sensitivity and specificity.
T471	62932-63019	Sentence	denotes	The two curves reflected the comprehensive performance of the classification framework.
T472	63020-63124	Sentence	denotes	The kappa index is a statistical method for assessing the degree of agreement between different methods.
T473	63125-63204	Sentence	denotes	In our use case, the indicator was used to measure the stability of the method.
T474	63205-63297	Sentence	denotes	The F1 score is a harmonic average of precision and sensitivity and considers the FP and FN.
T475	63298-63390	Sentence	denotes	The bootstrapping method was used to calculate the empirical distribution of each indicator.
T476	63391-63584	Sentence	denotes	The detailed calculation process was as follows: we conducted random sampling with replacement to generate 1000 new test data sets with the same number of samples as the original test data set.
T477	63585-63658	Sentence	denotes	The evaluation indicators were calculated to determine the distributions.
T478	63659-63732	Sentence	denotes	The results were displayed in boxplots (Fig. 5 and Supplementary Fig. 2).
T479	63733-63790	Sentence	denotes	Statistical indices to evaluate the regression framework.
T480	63791-63938	Sentence	denotes	Multiple evaluation indicators (MSE, RMSE, MAE, R2, and PCC) were computed for a comprehensive and accurate assessment of the regression framework.
T481	63939-64021	Sentence	denotes	The MSE was used to calculate the deviation between the predicted and true values.
T482	64022-64069	Sentence	denotes	The RMSE was the square root of the MSE result.
T483	64070-64131	Sentence	denotes	The two indicators show the accuracy of the model prediction.
T484	64132-64206	Sentence	denotes	The R2 was used to assess the goodness-of-fit of the regression framework.
T485	64207-64298	Sentence	denotes	The r was used to assess the correlation between two variables in the regression framework.
T486	64299-64393	Sentence	denotes	The indicators were calculated using the open-source tools scikit-learn and the scipy library.
T487	64395-64420	Sentence	denotes	Supplementary information
T488	64422-64438	Sentence	denotes	Peer Review File
T489	64439-64464	Sentence	denotes	Supplementary Information
T490	64466-64601	Sentence	denotes	Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
T491	64603-64628	Sentence	denotes	Supplementary information
T492	64629-64713	Sentence	denotes	Supplementary information is available for this paper at 10.1038/s42003-020-01535-7.
T493	64715-64731	Sentence	denotes	Acknowledgements
T494	64732-64838	Sentence	denotes	We would like to thank the Ministry of Science and Technology of the People’s Republic of China (Grant No.
T495	64839-64918	Sentence	denotes	2017YFB1400100) and the National Natural Science Foundation of China (Grant No.
T496	64919-64947	Sentence	denotes	61876059) for their support.
T497	64949-64969	Sentence	denotes	Author contributions
T498	64970-65041	Sentence	denotes	S.L. and Y.G. contributed significantly to the conception of the study.
T499	65042-65096	Sentence	denotes	S.L. designed the network and conduct the experiments.
T500	65097-65167	Sentence	denotes	S.L. and Y.G. provided, marked, and analyzed the experimental results.

T1

0-92

Sentence

denotes

Fast automated detection of COVID-19 from medical images using convolutional neural networks

T2

94-102

Sentence

denotes

Abstract

T3

103-192

Sentence

denotes

Coronavirus disease 2019 (COVID-19) is a global pandemic posing significant health risks.

T4

193-291

Sentence

denotes

The diagnostic test sensitivity of COVID-19 is limited due to irregularities in specimen handling.

T5

292-439

Sentence

denotes

We propose a deep learning framework that identifies COVID-19 from medical images as an auxiliary testing method to improve diagnostic sensitivity.

T6

440-774

Sentence

denotes

We use pseudo-coloring methods and a platform for annotating X-ray and computed tomography images to train the convolutional neural network, which achieves a performance similar to that of experts and provides high scores for multiple statistical indices (F1 scores > 96.72% (0.9307, 0.9890) and specificity >99.33% (0.9792, 1.0000)).

T7

775-859

Sentence

denotes

Heatmaps are used to visualize the salient features extracted by the neural network.

T8

860-1053

Sentence

denotes

The neural network-based regression provides strong correlations between the lesion areas in the images and five clinical indicators, resulting in high accuracy of the classification framework.

T9

1054-1163

Sentence

denotes

The proposed method represents a potential computer-aided diagnosis method for COVID-19 in clinical practice.

T10

1165-1389

Sentence

denotes

Liang, Gu and other colleagues develop a convoluted neural network (CNN)-based framework to diagnose COVID-19 infection from chest X-ray and computed tomography images, and comparison with other upper respiratory infections.

T11

1390-1578

Sentence

denotes

Compared to expert evaluation of the images, the neural network achieved upwards of 99% specificity, showing promise for the automated detection of COVID-19 infection in clinical settings.

T12

1580-1592

Sentence

denotes

Introduction

T13

1593-1918

Sentence

denotes

Coronavirus disease 2019 (COVID-19), a highly infectious disease with the basic reproductive number (R0) of 5.7 (reported by the US Centers for Disease Control and Prevention), is caused by the most recently discovered coronavirus1 and was declared a global pandemic by the World Health Organization (WHO) on March 11, 20202.

T14

1919-2028

Sentence

denotes

It poses a serious threat to human health worldwide, as well as substantial economic losses to all countries.

T15

2029-2191

Sentence

denotes

As of 7 September 2020, 27,032,617 people have been infected by COVID-19 after testing, and 881,464 deaths have occurred, according to the statistics of the WHO3.

T16

2192-2335

Sentence

denotes

The Wall Street banks have estimated that the COVID-19 pandemic may cause losses of $5.5 trillion to the global economy over the next 2 years4.

T17

2336-2562

Sentence

denotes

The WHO recommends using real-time reverse transcriptase-polymerase chain reaction (rRT-PCR) for laboratory confirmation of the COVID-19 virus in respiratory specimens obtained by the preferred method of nasopharyngeal swabs5.

T18

2563-2688

Sentence

denotes

Laboratories performing diagnostic testing for COVID-19 should strictly comply with the WHO biosafety guidance for COVID-196.

T19

2689-2977

Sentence

denotes

It is also necessary to follow the standard operating procedures (SOPs) for specimen collection, storage, packaging, and transport because the specimens should be regarded as potentially infectious, and the testing process can only be performed in a Biosafety Level 3 (BSL-3) laboratory7.

T20

2978-3075

Sentence

denotes

Not all cities worldwide have adequate medical facilities to follow the WHO biosafety guidelines.

T21

3076-3335

Sentence

denotes

According to an early report (Feb 17, 2020), the sensitivity of tests for the detection of COVID-19 using rRT-PCR analysis of nasopharyngeal swab specimens is around 30–60% due to irregularities during the collection and transportation of COVID-19 specimens8.

T22

3336-3437

Sentence

denotes

Recent studies reported a higher sensitivity range from 71% (Feb 19, 2020) to 91% (Mar 27, 2020)9,10.

T23

3438-3673

Sentence

denotes

A recent systematic review reported that the sensitivity of the PCR test for COVID-19 might be in the range of 71–98% (Apr 21, 2020), whereas the specificity of tests for the detection of COVID-19 using rRT-PCR analysis is about 95%11.

T24

3674-3905

Sentence

denotes

Yang et al.8 discovered that although no viral ribonucleic acid (RNA) was detected by rRT-PCR in the first three or all nasopharyngeal swab specimens in mild cases, the patient was eventually diagnosed with COVID-19 (Feb 17, 2020).

T25

3906-4026

Sentence

denotes

Therefore, the WHO has stated that one or more negative results do not rule out the possibility of COVID-19 infection12.

T26

4027-4123

Sentence

denotes

Additional auxiliary tests with relatively higher sensitivity to COVID-19 are urgently required.

T27

4124-4273

Sentence

denotes

The clinical symptoms associated with COVID-19 include fever, dry cough, dyspnea, and pneumonia, as described in the guideline released by the WHO13.

T28

4274-4436

Sentence

denotes

It has been recommended to use the WHO’s case definition for influenza-like illness (ILI) and severe acute respiratory infection (SARI) for monitoring COVID-1913.

T29

4437-4592

Sentence

denotes

As reported by the CHINA-WHO COVID-19 joint investigation group (February 28, 2020)14, autopsies showed the presence of lung infection in COVID-19 victims.

T30

4593-4771

Sentence

denotes

Therefore, medical imaging of the lungs might be a suitable auxiliary diagnostic testing method for COVID-19 since it uses available medical technology and clinical examinations.

T31

4772-4942

Sentence

denotes

Chest radiography (CXR) and chest computed tomography (CT) are the most common medical imaging examinations for the lungs and are available in most hospitals worldwide15.

T32

4943-5119

Sentence

denotes

Different tissues of the body absorb X-rays to different degrees16, resulting in grayscale images that allow for the detection of anomalies based on the contrast in the images.

T33

5120-5240

Sentence

denotes

CT differs from normal CXR in that it has superior tissue contrast with different shades of gray (about 32–64 levels)17.

T34

5241-5329

Sentence

denotes

The CT images are digitally processed18 to create a three-dimensional image of the body.

T35

5330-5398

Sentence

denotes

However, CT examinations are more expensive than CXR examinations19.

T36

5399-5536

Sentence

denotes

Recent studies reported that the use of CXR and CT images resulted in improved diagnostic sensitivity for the detection of COVID-1920,21.

T37

5537-5631

Sentence

denotes

The interpretation of medical images is time-consuming, labor-intensive, and often subjective.

T38

5632-5731

Sentence

denotes

The medical images are first annotated by experts to generate a report of the radiography findings.

T39

5732-5845

Sentence

denotes

Subsequently, the radiography findings are analyzed, and clinical factors are considered to obtain a diagnosis15.

T40

5846-6084

Sentence

denotes

However, during the current pandemic, the frontline expert physicians are faced with a massive workload and lack of time, which increases the physical and psychological burden on staff and might adversely affect the diagnostic efficiency.

T41

6085-6286

Sentence

denotes

Since modern hospitals have advanced digital imaging technology, medical image processing methods may have the potential for fast and accurate diagnosis of COVID-19 to reduce the burden on the experts.

T42

6287-6626

Sentence

denotes

Deep learning (DL) methods, especially convolutional neural networks (CNNs), are effective approaches for representation learning using multilayer neural networks22 and have provided excellent performance solutions to many problems in image classification23,24, object detection25, games and decisions26, and natural language processing27.

T43

6627-6757

Sentence

denotes

A deep residual network28 is a type of CNN architecture that uses the strategy of skip connections to avoid degradation of models.

T44

6758-6929

Sentence

denotes

However, the applications of DL for clinical diagnoses remains limited due to the lack of interpretability of the DL model and the multi-modal properties of clinical data.

T45

6930-7136

Sentence

denotes

Some studies have demonstrated excellent performance of DL methods for the detection of lung cancer with CT images29, pneumonia with CXR images30, and diabetic retinopathy with retinal fundus photographs31.

T46

7137-7294

Sentence

denotes

To the best of our knowledge, the DL method has been validated only on single modal data, and no correlation analysis with clinical indicators was performed.

T47

7295-7443

Sentence

denotes

Traditional machine learning methods are more constrained and better suited than DL methods to specific, practical computing tasks using features32.

T48

7444-7678

Sentence

denotes

As demonstrated by Jin et al., the traditional machine learning algorithm using the scale-invariant feature transform (SIFT)33 and random sample consensus (RANSAC)34 may outperform the state-of-the-art DL methods for image matching35.

T49

7679-7863

Sentence

denotes

We designed a general end-to-end DL framework for information extraction from CXR images (X-data) and CT images (CT-data) that can be considered a cross-domain transfer learning model.

T50

7864-8111

Sentence

denotes

In this study, we developed a custom platform for rapid expert annotation and proposed the modular CNN-based multi-stage framework (classification framework and regression framework) consisting of basic component units and special component units.

T51

8112-8224

Sentence

denotes

The framework represents an auxiliary examination method for high precision and automated detection of COVID-19.

T52

8225-8270

Sentence

denotes

This study makes the following contributions:

T53

8271-8494

Sentence

denotes

First, a multi-stage CNN-based classification framework consisting of two basic units (ResBlock-A and ResBlock-B) and a special unit (control gate block) was established for use with multi-modal images (X-data and CT-data).

T54

8495-8600

Sentence

denotes

The classification results were compared with evaluations by experts with different levels of experience.

T55

8601-8777

Sentence

denotes

Different optimization goals were established for the different stages in the framework to obtain good performances, which were evaluated using multiple statistical indicators.

T56

8778-8947

Sentence

denotes

Second, principal component analysis (PCA) was used to determine the characteristics of the X-data and CT-data of different categories (normal, COVID-19, and influenza).

T57

8948-9113

Sentence

denotes

Gradient-weighted class activation mapping (Grad-CAM) was used to visualize the salient features in the images and extract the lesion areas associated with COVID-19.

T58

9114-9356

Sentence

denotes

Third, data preprocessing methods, including pseudo-coloring and dimension normalization, were developed to facilitate the interpretability of the medical images and adapt the proposed framework to the multi-modal images (X-data and CT-data).

T59

9357-9535

Sentence

denotes

Fourth, A knowledge distillation method was adopted as a training strategy to obtain high performance with low computational requirements and improve the usability of the method.

T60

9536-9691

Sentence

denotes

Last, The CNN-based regression framework was used to describe the relationships between the radiography findings and the clinical symptoms of the patients.

T61

9692-9821

Sentence

denotes

Multiple evaluation indicators were used to assess the correlations between the radiography findings and the clinical indicators.

T62

9823-9830

Sentence

denotes

Results

T63

9832-9851

Sentence

denotes

Data set properties

T64

9852-9915

Sentence

denotes

Multi-modal data from multiple sources were used in this study.

T65

9916-10063

Sentence

denotes

X-data, CT-data, and clinical data used in our research were collected from four public data sets and one frontline hospital data (Youan hospital).

T66

10064-10230

Sentence

denotes

Each data set was divided into two parts: train-val part and test part using a train-test-split function (TTSF) of the scikit-learn library which is shown in Table 1.

T67

10231-10348

Sentence

denotes

The details of the multi-modal data types are described in the “Methods” section (see “Data sets splitting” section).

T68

10349-10466

Sentence

denotes

Table 1 Number of cases from four public data sets and the Youan hospital (X-data, CT-data, clinical indicator data).

T69

10467-10501

Sentence

denotes

Study X-data CT-data Clinical data

T70

10502-10552

Sentence

denotes

Train + Val Test Train + Val Test Train + Val Test

T71

10553-10596

Sentence

denotes

*Normal (RSNA + LUNA16) 5000 100 100 20 – –

T72

10597-10639

Sentence

denotes

Pneumonia (RSNA + ICNP) 3000 100 83 20 – –

T73

10640-10669

Sentence

denotes

COVID-19 (CCD) 150 62 – – – –

T74

10670-10713

Sentence

denotes

Influenza (Youan Hospital) 100 45 35 15 – –

T75

10714-10756

Sentence

denotes

*Normal (Youan Hospital) 478 25 139 20 – –

T76

10757-10801

Sentence

denotes

Pneumonia (Youan Hospital) 380 55 180 35 – –

T77

10802-10845

Sentence

denotes

COVID-19 (Youan Hospital) 35 10 75 20 75 20

T78

10846-10874

Sentence

denotes

Total 9143 397 612 130 75 20

T79

10875-11062

Sentence

denotes

The term *Normal in this work means the cases where the lungs are not manifest evidence of COVID-19, influenza, or pneumonia on imaging and the RT-PCR testing of the COVID-19 is negative.

T80

11064-11164

Sentence

denotes

A platform was developed for annotating lesion areas of COVID-19 in medical images (X-data, CT-data)

T81

11165-11328

Sentence

denotes

Medical imaging uses images of internal tissues of the human body or a part of the human body in a non-invasive manner for clinical diagnoses or treatment plans36.

T82

11329-11515

Sentence

denotes

Medical images (e.g., X-data and CT-data) are usually acquired using computed radiography and are typically stored in the Digital Imaging and Communications in Medicine (DICOM) format37.

T83

11516-11695

Sentence

denotes

X-data are two-dimensional grayscale images, and CT-data are three-dimensional data, consisting of slices of the data in the z axis direction of a two-dimensional grayscale image.

T84

11696-11811

Sentence

denotes

Machine learning methods are playing increasingly important roles in medical image analysis, especially DL methods.

T85

11812-11932

Sentence

denotes

DL uses multiple non-linear transformations to create a mapping relationship between the input data and output labels38.

T86

11933-12027

Sentence

denotes

The objective of this study was to annotate lesion areas in medical images with high accuracy.

T87

12028-12216

Sentence

denotes

Therefore, we developed a pseudo-coloring method, which is a technique that helps enhance medical images for physicians to isolate relevant tissues and groups different tissues together39.

T88

12217-12377

Sentence

denotes

We converted the original grayscale images to color images using the open-source image processing tools Open Source Computer Vision Library (OpenCV) and Pillow.

T89

12378-12435

Sentence

denotes

Examples of the pseudo-color images are shown in Fig. 1a.

T90

12436-12575

Sentence

denotes

We developed a platform that uses a client-server architecture to annotate the potential lesion areas of COVID-19 on the CXR and CT images.

T91

12576-12655

Sentence

denotes

The platform can be deployed on a private cloud for security and local sharing.

T92

12656-12814

Sentence

denotes

All the images were annotated by two experienced radiologists (one was a 5th-year radiologist and the other was a 3rd-year radiologist) in the Youan Hospital.

T93

12815-12977

Sentence

denotes

If there was disagreement about a result, a senior radiologist and a respiratory doctor made the final decision to ensure the precision of the annotation process.

T94

12978-13051

Sentence

denotes

The details of the annotation pipeline are shown in Supplementary Fig. 1.

T95

13052-13158

Sentence

denotes

Fig. 1 Demonstrations of data preprocessing methods including pseudo-coloring and dimension normalization.

T96

13159-13224

Sentence

denotes

a Pseudo-coloring for abnormal examples in the CXR and CT images.

T97

13225-13357

Sentence

denotes

The original grayscale images were transformed into color images using the pseudo-coloring method and were annotated by the experts.

T98

13358-13501

Sentence

denotes

The scale bar on the right is the range of pixel values of the image data. b Dimension normalization to reduce the dimensions in the CT images.

T99

13502-13605

Sentence

denotes

The number of CT images were first resampled to a multiple of three and then divided into three groups.

T100

13606-13684

Sentence

denotes

Followed by the 1 × 1 convolution layers to reduce the dimensions of the data.

T101

13686-13799

Sentence

denotes

PCA was used to determine the characteristics of the medical images for the COVID-19, influenza, and normal cases

T102

13800-13961

Sentence

denotes

PCA was used to visually compare the characteristics of the medical images (X-data, CT-data) for the COVID-19 cases with those of the normal and influenza cases.

T103

13962-14118

Sentence

denotes

Figure 2a shows the mean image of each category and the five eigenvectors that represent the principal components of PCA in the corresponding feature space.

T104

14119-14310

Sentence

denotes

Significant differences are observed between the COVID-19, influenza, and normal cases, indicating the possibility of being able to distinguish COVID-19 cases from normal and influenza cases.

T105

14311-14385

Sentence

denotes

Fig. 2 PCA visualizations and example heatmaps of both X-data and CT-data.

T106

14386-14448

Sentence

denotes

a Mean image and eigenvectors of five different sub-data sets.

T107

14449-14531

Sentence

denotes

The first column shows the mean image and the other columns show the eigenvectors.

T108

14532-14739

Sentence

denotes

The first row shows the mean image and five eigenvectors of the normal CXR images; second row: COVID-19 CXR images, third row: normal CT images, fourth row: influenza CT images, last row: COVID-19 CT images.

T109

14740-14926

Sentence

denotes

The scale bar on the right is the range of pixel values of the image data. b Heatmaps of both X-data and CT-data were demonstrated for better interpretability of the proposed frameworks.

T110

14927-15016

Sentence

denotes

The scale bar on the right is the probability of the areas being suspected as infections.

T111

15018-15187

Sentence

denotes

The CNN-based classification framework exhibited excellent performance based on the validation by experts using multi-modal data from public data sets and Youan hospital

T112

15188-15406

Sentence

denotes

The structure of the proposed framework, consisting of the stage I sub-framework and the stage II sub-framework is shown in Fig. 3a, where Q, L, M, and N are the hyper-parameters of the framework for general use cases.

T113

15407-15536

Sentence

denotes

The values of Q, L, M, and N were 1, 1, 2, and 2, respectively, in this study; this framework referred to as the CNNCF framework.

T114

15537-15695

Sentence

denotes

The stage I and stage II sub-frameworks were designed to extract features corresponding to different optimization goals in the analysis of the medical images.

T115

15696-15941

Sentence

denotes

The performance of the CNNCF was evaluated using multi-modal data sets (X-data and CT-data) to ensure the generalization and transferability of the model, and five evaluation indicators were used (sensitivity, precision, specificity, F1, kappa).

T116

15942-16066

Sentence

denotes

The salient features of the images extracted by the CNNCF were visualized in a heatmap (four examples are shown in Fig. 2b).

T117

16067-16315

Sentence

denotes

In this study, multiple experiments were conducted (including experiments that included data from the same source and from different sources) to validate the generalization ability of the framework while avoiding the possible sample selection bias.

T118

16316-16552

Sentence

denotes

Five experts evaluated the images, i.e., a 7th-year respiratory resident (Respira.), a 3rd-year emergency resident (Emerg.), a 1st-year respiratory intern (Intern), a 5th-year radiologist (Rad-5th), and a 3rd-year radiologist (Rad-3rd).

T119

16553-16625

Sentence

denotes

The definition of the expert group can be found in Supplementary Note 1.

T120

16626-16908

Sentence

denotes

The abbreviations of all the data sets used in the following experiments including XPDS, XPTS, XPVS, XHDS, XHTS, XHVS, CTPDS, CTPTS, CTPVS, CTHDS, CTHTS, CTHVS, CADS, CATS, CAVS, XMTS, XMVS, CTMTS, and CTMVS were defined in the “Methods” section (see “Data sets splitting” section).

T121

16909-16945

Sentence

denotes

The following results were obtained.

T122

16946-16974

Sentence

denotes

Fig. 3 CNN-based frameworks.

T123

16975-17237

Sentence

denotes

a The classification framework for the identification of COVID-19. b The regression framework for the correlation analysis between the lesion areas and the clinical indicators. c is the workflow of the classification framework for the identification of COVID-19.

T124

17239-17251

Sentence

denotes

Experiment-A

T125

17252-17425

Sentence

denotes

In this experiment, we used the X-data of the XPVS where the normal cases were from the RSNA data set and the COVID-19 cases were from the COVID CXR data set (CCD) data set.

T126

17426-17563

Sentence

denotes

The results of the five evaluation indicators for the comparison of the COVID-19 cases and normal cases of the XPVS are shown in Table 2.

T127

17564-17674

Sentence

denotes

An excellent performance was obtained, with the best score of specificity of 99.33% and a precision of 98.33%.

T128

17675-17864

Sentence

denotes

The F1 score was 96.72%, which was higher than that of the Respire. (96.12%), the Emerg. (93.94%), the Intern (84.67%), and the Rad-3rd (85.93%) and lower than that of the Rad-5th (98.41%).

T129

17865-18058

Sentence

denotes

The kappa index was 95.40%, which was higher than that of the Respire. (94.43%), the Emerg. (91.21%), the Intern (77.45%), and the Rad-3rd (79.42%), and lower than that of the Rad-5th (97.74%).

T130

18059-18249

Sentence

denotes

The sensitivity index was 95.16%, which was higher than that of the Intern (93.55%) and the Rad-3rd (93.55%) and lower than that of the Respire. (100%), the Emerg. (100%) and Rad-5th (100%).

T131

18250-18415

Sentence

denotes

The receiver operating characteristic (ROC) scores for the CNNCF and the experts are plotted in Fig. 4a; the area under the ROC curve (AUROC) of the CNNCF is 0.9961.

T132

18416-18571

Sentence

denotes

The precision-recall scores for the CNNCF and the experts are plotted in Fig. 4d; the area under the precision-recall curve (AUPRC) of the CNNCF is 0.9910.

T133

18572-18892

Sentence

denotes

Table 2 Performance indices of the classification framework (CNNCF) of experiment A and the average performance of the 7th-year respiratory resident (Respira.), the 3rd-year emergency resident (Emerg.), the 1st-year respiratory intern (Intern), the 5th-year radiologist (Rad-5th), and the 3rd-year radiologist (Rad-3rd).

T134

18893-18980

Sentence

denotes

F1 (95% CI) Kappa (95% CI) Specificity (95% CI) Sensitivity (95% CI) Precision (95% CI)

T135

18981-18989

Sentence

denotes

CNNCF 0.

T136

18990-19086

Sentence

denotes

9672 (0.9307, 0.9890) 0.9540 (0.9030, 0.9924) 0.9933 (0.9792, 1.0000) 0.9516 (0.8889, 1.0000) 0.

T137

19087-19108

Sentence

denotes

9833 (0.9444, 1.0000)

T138

19109-19117

Sentence

denotes

Respire.

T139

19118-19237

Sentence

denotes

0.9612 (0.9231, 0.9920) 0.9443 (0.8912, 0.9887) 0.9667 (0.9363, 0.9933) 1.0000 (1.0000, 1.0000) 0.9254 (0.8095, 0.9571)

T140

19238-19244

Sentence

denotes

Emerg.

T141

19245-19247

Sentence

denotes

0.

T142

19248-19365

Sentence

denotes

9394 (0.8947, 0.9781) 0.9121 (0.8492, 0.9677) 0.9467 (0.9091, 0.9797) 1.0000 (1.0000, 1.0000) 0.8857 (0.8095, 0.9571)

T143

19366-19373

Sentence

denotes

Intern.

T144

19374-19492

Sentence

denotes

0.8467 (0.7692, 0.9041) 0.7745 (0.6730, 0.8592) 0.8867 (0.8333, 0.9343) 0.9355 (0.8596, 0.984) 0.7733 (0.6708, 0.8649)

T145

19493-19620

Sentence

denotes

Rad-5th 0.9841 (0.9593, 1.0000) 0.9774 (0.9433, 1.0000) 0.9867 (0.9662, 1.0000) 1.0000 (1.0000, 1.0000) 0.9688 (0.9219, 1.0000)

T146

19621-19748

Sentence

denotes

Rad-3rd 0.8593 (0.7931, 0.9180) 0.7942 (0.7062, 0.8779) 0.9000 (0.8541, 0.9481) 0.9355 (0.8666, 0.9841) 0.7945 (0.6974, 0.8873)

T147

19749-19812

Sentence

denotes

Fig. 4 ROC and PRC curves for the CNNCF of the experiments A-C.

T148

19813-19902

Sentence

denotes

NC indicates that the positive case is a COVID-19 case, and the negative case is *Normal.

T149

19903-19987

Sentence

denotes

CI indicates that the positive case is COVID-19, and the negative case is influenza.

T150

19988-20074

Sentence

denotes

The points are the results of experts, corresponding to the results in Tables 2 and 3.

T151

20075-20384

Sentence

denotes

The background gray dashed curves in the PRC curve correspond to the iso-F1 curves. a ROC curve for the NC using X-data. b ROC curve for the NC using CT-data. c ROC curve for the CI using CT-data. d PRC curve for the NC using X-data. e PRC curve for the NC using CT-data. f PRC curve for the CI using CT-data.

T152

20386-20398

Sentence

denotes

Experiment-B

T153

20399-20565

Sentence

denotes

In this experiment, we used the CT-data of the CTPVS and CTHVS where the normal cases were from the LUNA data set and the COVID-19 cases were from the Youan hospital.

T154

20566-20799

Sentence

denotes

The results of the five evaluation indicators for the comparison of the COVID-19 cases and normal cases of the CTHVS and the CTPVS are shown in Table 3, where the normal cases are from CTPVS and the COVID-19 cases are from the CTHVS.

T155

20800-20986

Sentence

denotes

The CNNCF exhibits good performance for the five evaluation indices, which are similar to that of the Respire. and the Rad-5th and higher than that of the Intern, the Emerg. and Rad-3rd.

T156

20987-21056

Sentence

denotes

The ROC scores are plotted in Fig. 4b; the AUROC of the CNNCF is 1.0.

T157

21057-21137

Sentence

denotes

The precision-recall scores are shown in Fig. 4e; the AUPRC of the CNNCF is 1.0.

T158

21138-21470

Sentence

denotes

Table 3 Performance indices of the classification framework (CNNCF) of the experiments B and C, and the average performance of the 7th-year respiratory resident (Respira.), the 3rd-year emergency resident (Emerg.), the 1st-year respiratory intern (Intern), the 5th-year radiologist (Rad-5th), and the 3rd-year radiologist (Rad-3rd).

T159

21471-21502

Sentence

denotes

CT (*Normal and COVID-19 cases)

T160

21503-21517

Sentence

denotes

CNNCF Respire.

T161

21518-21524

Sentence

denotes

Emerg.

T162

21525-21532

Sentence

denotes

Intern.

T163

21533-21548

Sentence

denotes

Rad-5th Rad-3rd

T164

21549-21704

Sentence

denotes

F1 (95% CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8571, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8667, 1.0000)

T165

21705-21863

Sentence

denotes

Kappa (95% CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.7422, 1.0000) 1.0000 (1.0000, 1.0000) 0.9000 (0.7487, 1.0000)

T166

21864-22028

Sentence

denotes

Specificity (95% CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8333, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8333, 1.0000)

T167

22029-22193

Sentence

denotes

Sensitivity (95% CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8333, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8421, 1.0000)

T168

22194-22356

Sentence

denotes

Precision (95% CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8235, 1.0000) 1.0000 (1.0000, 1.0000) 0.9500 (0.8333, 1.0000)

T169

22357-22390

Sentence

denotes

CT (Influenza and COVID-19 cases)

T170

22391-22405

Sentence

denotes

CNNCF Respire.

T171

22406-22412

Sentence

denotes

Emerg.

T172

22413-22420

Sentence

denotes

Intern.

T173

22421-22436

Sentence

denotes

Rad-5th Rad-3rd

T174

22437-22591

Sentence

denotes

F1 (95%CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.8966 (0.7332, 1.0000) 0.8000 (0.6207, 0.9412) 0.9677 (0.8889, 1.0000) 0.8667 (0.7199, 0.9744)

T175

22592-22680

Sentence

denotes

Kappa (95%CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.8236 (0.5817, 1.0000) 0.

T176

22681-22750

Sentence

denotes

6500 (0.3698, 0.8852) 0.9421 (0.8148, 1.0000) 0.7667 (0.5349, 0.9429)

T177

22751-22914

Sentence

denotes

Specificity (95%CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9048 (0.7619, 1.0000) 0.8500 (0.6818, 1.0000) 0.9500 (0.8333, 1.0000) 0.9000 (0.7619, 1.0000)

T178

22915-23078

Sentence

denotes

Sensitivity (95%CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.9286 (0.7500, 1.0000) 0.8000 (0.5714, 1.0000) 1.0000 (1.0000, 1.0000) 0.8667 (0.6667, 1.0000)

T179

23079-23240

Sentence

denotes

Precision (95%CI) 1.0000 (1.0000, 1.0000) 1.0000 (1.0000, 1.0000) 0.8667 (0.6874, 1.0000) 0.8000 (0.5881, 1.0000) 0.9375 (0.8000, 1.0000) 0.8667 (0.6667, 1.0000)

T180

23242-23254

Sentence

denotes

Experiment-C

T181

23255-23387

Sentence

denotes

In this experiment, we used the CT-data of the CTHVS where the normal cases and the COVID-19 cases were all from the Youan hospital.

T182

23388-23601

Sentence

denotes

The results of the five evaluation indicators for the comparison of the COVID-19 cases and influenza cases of the CTHVS are shown in Table 3 where the influenza cases and the COVID-19 cases are all from the CTHVS.

T183

23602-23695

Sentence

denotes

The CNNCF achieved the highest performance and the best score of all five evaluation indices.

T184

23696-23765

Sentence

denotes

The ROC scores are plotted in Fig. 4c; the AUROC of the CNNCF is 1.0.

T185

23766-23850

Sentence

denotes

The precision-recall scores are shown in Fig. 4f, and the AUPRC of the CNNCF is 1.0.

T186

23852-23864

Sentence

denotes

Experiment-D

T187

23865-24126

Sentence

denotes

The boxplots of the five evaluation indicators, the F1 score (Fig. 5a, d, g), the kappa coefficient (Fig. 5b, e, h), and the specificity (Fig. 5c, f, i) of experiments A–C are shown in Fig. 5, and the precision and sensitivity are shown in Supplementary Fig. 2.

T188

24127-24294

Sentence

denotes

A bootstrapping method40 was used to calculate the empirical distributions, and McNemar’s test41 was used to analyze the differences between the CNNCF and the experts.

T189

24295-24510

Sentence

denotes

The p-values of the McNemar’s test (Supplementary Tables 1–3) for the five evaluation indicators were all 1.0, indicating no statistically significant difference between the CNNCF results and the expert evaluations.

T190

24511-24634

Sentence

denotes

Fig. 5 Boxplots of the F1 score, kappa score, and specificity for the CNNCF and expert results for COVID-19 identification.

T191

24635-24724

Sentence

denotes

NC indicates that the positive case is a COVID-19 case, and the negative case is *Normal.

T192

24725-24809

Sentence

denotes

CI indicates that the positive case is COVID-19, and the negative case is influenza.

T193

24810-25264

Sentence

denotes

Bootstrapping is used to generate n = 1000 resampled independent validation sets for the XVS and the CTVS. a F1 score for the NC using X-data. b Kappa score for the NC using X-data. c Specificity for the NC using X-data. d F1 score for the NC using CT-data. e Kappa score for the NC using CT-data. f Specificity for the NC using CT-data. g F1 score for the CI using CT-data. h Kappa score for the CI using CT-data. i Specificity for the CI using CT-data.

T194

25265-25526

Sentence

denotes

We also conducted extra experiments with both configurations of the same data source and different data sources: the descriptions and graph charts can be found in the Supplementary Experiments and Tables (Supplementary Tables 4–19 and Supplementary Figs. 3–18).

T195

25527-25617

Sentence

denotes

The data used in experiments E–G were CTHVS and the data were all from the Youan hospital.

T196

25618-25707

Sentence

denotes

The data used in experiments H–K were XHVS and the data were all from the Youan hospital.

T197

25708-25761

Sentence

denotes

The data used in experiments L–N were XPVS and CTPVS.

T198

25762-25978

Sentence

denotes

The data used in the experiment L was from the same data set RSNA, while the data used in experiment M was from different data sets where the pneumonia cases were from the ICNP, and the normal cases were from LUNA16.

T199

25979-26173

Sentence

denotes

The data used in the experiments O–R, from the four public data sets and one hospital (Youan hospital) data set (including normal cases, pneumonia cases and COVID-19 cases), were XMVS and CTMVS.

T200

26174-26252

Sentence

denotes

In all the experiments (experiments A–R), the CNNCF achieved good performance.

T201

26253-26413

Sentence

denotes

Notably, in order to obtain a more comprehensive evaluation of the CNNCF while further improving the usability in clinical practice, experiment-R was performed.

T202

26414-26586

Sentence

denotes

In the experiment-R, the CNNCF was used to distinguish three types of cases simultaneously (Including the COVID-19, pneumonia, and normal cases) on both the XMVS and CTMVS.

T203

26587-26784

Sentence

denotes

Good performances were obtained on the XMVS, with the best score of F1 score of 91.89%, kappa score of 89.74%, specificity of 97.14%, sensitivity of 94.44%, and a precision of 89.47%, respectively.

T204

26785-26907

Sentence

denotes

Excellent performances were obtained on the CTMVS, with the best score of the five evaluation indicators were all 100.00%.

T205

26908-27021

Sentence

denotes

The ROC score and PRC score in the experiment-R were also satisfactory which were shown in Supplementary Fig. 18.

T206

27022-27130

Sentence

denotes

The results of the experiment-R further demonstrated the effectiveness and robustness of the proposed CNNCF.

T207

27132-27186

Sentence

denotes

Image analysis identifies salient features of COVID-19

T208

27187-27326

Sentence

denotes

In clinical practice, the diagnostic decision of a clinician relies on the identification of the SAs in the medical images by radiologists.

T209

27327-27459

Sentence

denotes

The statistical results show that the performance of the CNNCF for the identification of COVID-19 is as good as that of the experts.

T210

27460-27563

Sentence

denotes

A comparison consisting of two parts was performed to evaluate the discriminatory ability of the CNNCF.

T211

27564-27724

Sentence

denotes

In the first part, we used Grad-CAM, which is a non-intrusive method to extract the salient features in medical images, to create a heatmap of the CNNCF result.

T212

27725-27815

Sentence

denotes

Figure 2b shows the heatmaps of four examples of COVID-19 cases in the X-data and CT-data.

T213

27816-28011

Sentence

denotes

In the second part, we used density-based spatial clustering of applications with noise (DBSCAN) to calculate the center pixel coordinates (CPC) of the salient features corresponding to COVID-19.

T214

28012-28058

Sentence

denotes

All CPCs were normalized to a range of 0 to 1.

T215

28059-28209

Sentence

denotes

Subsequently, we used a significance test (ST)42 to analyze the relationship between the CPC of the CNNCF output and the CPC annotated by the experts.

T216

28210-28459

Sentence

denotes

A good performance was obtained, with a mean square error (MSE) of 0.0108, a mean absolute error (MAE) of 0.0722, a root mean squared error (RMSE) of 0.1040, a correlation coefficient (r) of 0.9761, and a coefficient of determination (R2) of 0.8801.

T217

28461-28582

Sentence

denotes

A strong correlation was observed between the lesion areas detected by the proposed framework and the clinical indicators

T218

28583-28734

Sentence

denotes

In clinical practice, multiple clinical indicators are analyzed to determine whether further examinations (i.e., medical image examination) are needed.

T219

28735-28810

Sentence

denotes

These indicators can be used to assess the predictive ability of the model.

T220

28811-28912

Sentence

denotes

In addition, various examinations are required to perform an accurate diagnosis in clinical practice.

T221

28913-29003

Sentence

denotes

However, the correlations between the results of various examinations are often not clear.

T222

29004-29323

Sentence

denotes

We used the stage II sub-framework and the regressor block of the CNNRF to conduct a correlation analysis between the lesion areas detected by the framework and five clinical indicators (white blood cell count, neutrophil percentage, lymphocyte percentage, procalcitonin, C-reactive protein) of COVID-19 using the CADS.

T223

29324-29517

Sentence

denotes

The inputs of the CNNRF were the lesion area images of each case, and the output was a 5-dimensional vector describing the correlation between the lesion areas and the five clinical indicators.

T224

29518-29582

Sentence

denotes

The MAE, MSE, RMSE, r, and R2 were used to evaluate the results.

T225

29583-29730

Sentence

denotes

The ST and the Pearson correlation coefficient (PCC)43 were used to determine the correlation between the lesion areas and the clinical indicators.

T226

29731-29842

Sentence

denotes

A strong correlation was obtained, with MSE = 0.0163, MAE = 0.0941, RMSE = 0.1172, r = 0.8274, and R2 = 0.6465.

T227

29843-29936

Sentence

denotes

At a significance level of 0.001, the value of r was 1.27 times the critical value of 0.6524.

T228

29937-30047

Sentence

denotes

This result indicates a high and significant correlation between the lesion areas and the clinical indicators.

T229

30048-30119

Sentence

denotes

The PCC was 0.8274 (range of 0.8–1.0), indicating a strong correlation.

T230

30120-30183

Sentence

denotes

The CNNRF was trained on the CATS and evaluated using the CAVS.

T231

30184-30301

Sentence

denotes

The initial learning rate was 0.01, and the optimization function was the stochastic gradient descent (SGD) method44.

T232

30302-30388

Sentence

denotes

The parameters of the CNNRF were initialized using the Xavier initialization method45.

T233

30390-30400

Sentence

denotes

Discussion

T234

30401-30521

Sentence

denotes

We developed a computer-aided diagnosis method for the identification of COVID-19 in medical images using DL techniques.

T235

30522-30647

Sentence

denotes

Strong correlations were obtained between the lesion areas identified by the proposed CNNRF and the five clinical indicators.

T236

30648-30729

Sentence

denotes

An excellent agreement was observed between the model results and expert opinion.

T237

30730-31014

Sentence

denotes

Popular image annotation tools (e.g., Labelme46 and VOTT47) are used to annotate various images and support common formats, such as Joint Photographic Experts Group (JPG), Portable Network Graphics (PNG), and Tag Image File Format (TIFF); these formats are not used in the DICOM data.

T238

31015-31192

Sentence

denotes

Therefore, we developed an annotation platform that does not require much storage space or transformations and can be deployed on a private cloud for security and local sharing.

T239

31193-31342

Sentence

denotes

Our eyes are not highly sensitive to grayscale images in regions with high average brightness48, resulting in relatively low identification accuracy.

T240

31343-31482

Sentence

denotes

The proposed pseudo-color method increased the information content of the medical images and facilitated the identification of the details.

T241

31483-31582

Sentence

denotes

PCA has been widely used for feature extraction and dimensionality reduction in image processing49.

T242

31583-31647

Sentence

denotes

We used PCA to determine the feature space of the sub-data sets.

T243

31648-31747

Sentence

denotes

Each image in a specified sub-data set was represented as a linear combination of the eigenvectors.

T244

31748-31865

Sentence

denotes

Since the eigenvectors describe the most informative regions in the medical images, they represent each sub-data set.

T245

31866-31953

Sentence

denotes

We visualized the top-five eigenvectors of each sub-data set using an intuitive method.

T246

31954-32105

Sentence

denotes

The CNNCF is a modular framework consisting of two stages that were trained with different optimization goals and controlled by the control gate block.

T247

32106-32283

Sentence

denotes

Each stage consisted of multiple residual blocks (ResBlock-A and ResBlock-B) that retained the features in the different layers, thereby preventing the degradation of the model.

T248

32284-32391

Sentence

denotes

The design of the control gate block was inspired by the synaptic frontend structure in the nervous system.

T249

32392-32500

Sentence

denotes

We calculated the score of the optimization target, and a score above a predefined threshold was acceptable.

T250

32501-32644

Sentence

denotes

If the times of the neurotransmitter were above another predefined threshold, the control gate was opened to let the features information pass.

T251

32645-32696

Sentence

denotes

The framework was trained in a step-by-step manner.

T252

32697-32906

Sentence

denotes

Training occurred at each stage for a specified goal, and the second stage used the features extracted by the first stage, thereby reusing the features and increasing the convergence speed of the second stage.

T253

32907-33024

Sentence

denotes

The CNNCF exhibited excellent performance for identifying the COVID-19 cases automatically in the X-data and CT-data.

T254

33025-33212

Sentence

denotes

Unlike traditional machine learning methods, the CNNCF was trained in an end-to-end manner, which ensured the flexibility of the framework for different data sets without much adjustment.

T255

33213-33446

Sentence

denotes

We adopted a knowledge distillation method in the training phrase; a small model (called a student network) was trained to mimic the ensemble of multiple models (called teacher networks) to obtain a small model with high performance.

T256

33447-33578

Sentence

denotes

In the distillation process, knowledge was transferred from the teacher networks to the student network to minimize knowledge loss.

T257

33579-33668

Sentence

denotes

The target was the output of the teacher networks; these outputs were called soft labels.

T258

33669-33858

Sentence

denotes

The student network also learned from the ground-truth labels (also called hard labels), thereby minimizing the knowledge loss from the student networks, whose targets were the hard labels.

T259

33859-34005

Sentence

denotes

Therefore, the overall loss function of the student network incorporated both knowledge distillation and knowledge loss from the student networks.

T260

34006-34249

Sentence

denotes

After the student network had been well-trained, the task of the teacher networks was complete, and the student model could be used on a regular computer with a fast speed, which is suitable for hospitals without extensive computing resources.

T261

34250-34381

Sentence

denotes

As a result of the knowledge distillation method, the CNNCF achieved high performance with a few parameters in the teacher network.

T262

34382-34510

Sentence

denotes

The CNNRF is a modular framework consisting of one stage II sub-framework and one regressor block to handle the regression task.

T263

34511-34782

Sentence

denotes

In the regressor block, we used skip connections that consisted of a convolution layer with multiple 1 × 1 convolution kernels for retaining the features extracted by the stage II sub-framework while improving the non-linear representation ability of the regressor block.

T264

34783-34982

Sentence

denotes

We made use of flexible blocks to achieve good performance for the classification and regression tasks, unlike traditional machine learning methods, which are commonly used for either of these tasks.

T265

34983-35133

Sentence

denotes

Five statistical indices, including sensitivity, specificity, precision, kappa coefficient, and F1 were used to evaluate the performance of the CNNCF.

T266

35134-35259

Sentence

denotes

The sensitivity is related to the positive detection rate and is of great significance in the diagnostic testing of COVID-19.

T267

35260-35359

Sentence

denotes

The specificity refers to the ability of the model to correctly identify patients with the disease.

T268

35360-35442

Sentence

denotes

The precision indicates the ability of the model to provide a positive prediction.

T269

35443-35506

Sentence

denotes

The kappa demonstrates the stability of the model’s prediction.

T270

35507-35564

Sentence

denotes

The F1 is the harmonic mean of precision and sensitivity.

T271

35565-35703

Sentence

denotes

Good performance was achieved by the CNNCF based on the five statistical indices for the multi-modal image data sets (X-data and CT-data).

T272

35704-35808

Sentence

denotes

The consistency between the model results and the expert evaluation was determined using McNemar’s test.

T273

35809-35984

Sentence

denotes

The good performance demonstrated the model’s capacity of learning from the experts using the labels of the image data and mimicking the experts in diagnostic decision-making.

T274

35985-36082

Sentence

denotes

The ROC and PRC of the CNNCF were used to evaluate the performance of the classification model50.

T275

36083-36241

Sentence

denotes

The ROC is a probability curve that shows the trade-off between the true positive rate (TPR) and false-positive rate (FPR) using different threshold settings.

T276

36242-36360

Sentence

denotes

The AUROC provides a measure of separability and demonstrated the discriminative capacity of the classification model.

T277

36361-36493

Sentence

denotes

The larger the AUROC, the better the performance of the model is for predicting the true positive (TP) and true negative (TN) cases.

T278

36494-36613

Sentence

denotes

The PRC shows the trade-off between the TPR and the positive predictive value (PPV) using different threshold settings.

T279

36614-36700

Sentence

denotes

The larger the AUPRC, the higher the capacity of the model is to predict the TP cases.

T280

36701-36815

Sentence

denotes

In our experiments, the CNNCF achieved high scores for both the AUPRC and AUROC (>99%) for the X-data and CT-data.

T281

36816-36942

Sentence

denotes

DL has made significant progress in numerous areas in recent years and has provided best-performance solutions for many tasks.

T282

36943-37126

Sentence

denotes

In areas that require high interpretability, such as autonomous driving and medical diagnosis, DL has disadvantages because it is a black-box approach and lacks good interpretability.

T283

37127-37314

Sentence

denotes

The strong correlation obtained between the CNNCF output and the experts’ evaluation suggested that the mechanism of the proposed CNNCF is similar to that used by humans analyzing images.

T284

37315-37479

Sentence

denotes

The combination of the visual interpretation and the correlation analysis enhanced the ability of the framework to interpret the results, making it highly reliable.

T285

37480-37606

Sentence

denotes

The CNNCF has a promising potential for clinical diagnosis considering its high performance and hybrid interpretation ability.

T286

37607-38039

Sentence

denotes

We have explored the potential use of the CNNCF for clinical diagnosis with the support of the Beijing Youan hospital (which is an authoritative hospital for the study of infectious diseases and one of the designated hospitals for COVID-19 treatment) using both real data after privacy masking and input from experts under experimental conditions and provided a suitable schedule for assisting experts with the radiography analysis.

T287

38040-38125

Sentence

denotes

However, medical diagnosis in a real situation is more complex than in an experiment.

T288

38126-38325

Sentence

denotes

Therefore, further studies will be conducted in different hospitals with different complexities and uncertainties to obtain more experience in multiple clinical use cases with the proposed framework.

T289

38326-38520

Sentence

denotes

The objective of this study was to use statistical methods to analyze the relationship between salient features in images and expert evaluations and test the discriminative ability of the model.

T290

38521-38716

Sentence

denotes

The CNNRF can be considered a cross-modal prediction model, which is a challenging research area that requires more attention because it is closely related to associative thinking and creativity.

T291

38717-38879

Sentence

denotes

In addition, the correlation analysis might be a possible optimization direction to improve the interpretability performance of the classification model using DL.

T292

38880-39104

Sentence

denotes

In conclusion, we proposed a complete framework for the computer-aided diagnosis of COVID-19, including data annotation, data preprocessing, model design, correlation analysis, and assessment of the model’s interpretability.

T293

39105-39244

Sentence

denotes

We developed a pseudo-color tool to convert the grayscale medical images to color images to facilitate image interpretation by the experts.

T294

39245-39371

Sentence

denotes

We developed a platform for the annotation of medical images characterized by high security, local sharing, and expandability.

T295

39372-39507

Sentence

denotes

We designed a simple data preprocessing method for converting multiple types of images (X-data, CT-data) to three-channel color images.

T296

39508-39675

Sentence

denotes

We established a modular CNN-based classification framework with high flexibility and wide use cases, consisting of the ResBlock-A, ResBlock-B, and Control Gate Block.

T297

39676-39835

Sentence

denotes

A knowledge distillation method was used as a training strategy for the proposed classification framework to ensure high performance with fast inference speed.

T298

39836-40083

Sentence

denotes

A CNN-based regression framework that required minimal changes to the architecture of the classification framework was employed to determine the correlation between the lesion area images of patients with COVID-19 and the five clinical indicators.

T299

40084-40303

Sentence

denotes

The three evaluation indices (F1, kappa, specificity) of the classification framework were similar to those of the respiratory resident and the emergency resident and slightly higher than that of the respiratory intern.

T300

40304-40433

Sentence

denotes

We visualized the salient features that contributed most to the CNNCF output in a heatmap for easy interpretability of the CNNCF.

T301

40434-40613

Sentence

denotes

The proposed CNNCF computer-aided diagnosis method showed relatively high precision and has a potential for the automatic diagnosis of COVID-19 in clinical practice in the future.

T302

40614-40723

Sentence

denotes

The outbreak of the COVID-19 epidemic poses serious threats to the safety and health of the human population.

T303

40724-40867

Sentence

denotes

At present, popular methods for the diagnosis and monitoring of viruses include the detection of viral RNAs using PCR or a test for antibodies.

T304

40868-41037

Sentence

denotes

However, one negative result of the RT-PCR test (especially in the areas of high infection risk) might not be enough to rule out the possibility of a COVID-19 infection.

T305

41038-41158

Sentence

denotes

On June 14, 2020, the Beijing Municipal Health Commission declared that strict management of fever clinics was required.

T306

41159-41425

Sentence

denotes

All medical institutions in Beijing were required to conduct tests to detect COVID-19 nucleic acids and antibodies, CT examinations, and the routine blood test (also referred to as “1 + 3 tests”) for patients with fever that live in areas with high infection risk51.

T307

41426-41627

Sentence

denotes

Therefore, the proposed computer-aided diagnosis using medical imaging could be used as an auxiliary diagnosis tool to help physicians identify people with high infection risk in the clinical workflow.

T308

41628-41703

Sentence

denotes

There is also a potential for broader applicability of the proposed method.

T309

41704-41857

Sentence

denotes

Once the method has been improved, it might be used in other diagnostic decision-making scenarios (lung cancer, liver cancer, etc.) using medical images.

T310

41858-41943

Sentence

denotes

The expertise of a specialist will be required in clinical cases in future scenarios.

T311

41944-42105

Sentence

denotes

However, we are optimistic about the potential of using DL methods in intelligent medicine and expect that many people will benefit from the advanced technology.

T312

42107-42114

Sentence

denotes

Methods

T313

42116-42135

Sentence

denotes

Data sets splitting

T314

42136-42312

Sentence

denotes

We used the multi-modal data sets from four public data sets and one hospital (Youan hospital) in our research and split the hybrid data set in the following manner.For X-data:

T315

42313-42455

Sentence

denotes

The CXR images of COVID-19 cases collected from the public CCD52 contained 212 patients diagnosed with COVID-19 and were resized to 512 × 512.

T316

42456-42529

Sentence

denotes

Each image contained 1–2 suspected areas with inflammatory lesions (SAs).

T317

42530-42629

Sentence

denotes

We also collected 5100 normal cases and 3100 pneumonia cases from another public data set (RSNA)53.

T318

42630-42851

Sentence

denotes

In addition, The CXR images collected from the Youan hospital contained 45 cases diagnosed with COVID-19, 503 normal cases, 435 cases diagnosed with pneumonia (not COVID-19 patients), and 145 cases diagnosed as influenza.

T319

42852-42958

Sentence

denotes

The CXR images collected from the Youan hospital were obtained using the Carestream DRX-Revolution system.

T320

42959-43076

Sentence

denotes

All the CXR images of COVID-19 cases were analyzed by the two experienced radiologists to determine the lesion areas.

T321

43077-43256

Sentence

denotes

The X-data of the normal cases (XNPDS), that of the pneumonia cases (XPPDS), and that of the COVID-19 cases (XCPDS) from public data sets constituted the X public data set (XPDS).

T322

43257-43440

Sentence

denotes

The X-data of the normal cases (XNHDS), that of the pneumonia cases (XPHDS), and that of the COVID-19 cases (XCHDS) from the Youan hospital constituted the X hospital data set (XHDS).

T323

43441-43453

Sentence

denotes

For CT-data:

T324

43454-43936

Sentence

denotes

We collected CT-data of 120 normal cases from a public lung CT-data set (LUNA16, a large data set for automatic nodule detection in the lungs54), which was a subset of LIDC-IDRI (The LIDC-IDRI contains a total of 1018 helical thoracic CT scans collected using manufacturers from eight medical imaging companies including AGFA Healthcare, Carestream Health, Inc., Fuji Photo Film Co., GE Healthcare, iCAD, Inc., Philips Healthcare, Riverain Medical, and Siemens Medical Solutions)55.

T325

43937-44102

Sentence

denotes

It was confirmed by the two experienced radiologists from the Youan Hospital that no lesion areas of COVID-19, pneumonia, or influenza were present in the 120 cases.

T326

44103-44245

Sentence

denotes

We also collected the CT-data of pneumonia cases from a public data set (images of COVID-19 positive and negative pneumonia patients: ICNP)56.

T327

44246-44418

Sentence

denotes

The CT-data collected from the Youan hospital contained 95 patients diagnosed with COVID-19, 50 patients diagnosed with influenza and 215 patients diagnosed with pneumonia.

T328

44419-44587

Sentence

denotes

The images of the CT scans collected from the Youan hospital were obtained using the PHILIPS Brilliance iCT 256 system (Which was also used for the LIDC-IDRI data set).

T329

44588-44701

Sentence

denotes

The slice thickness of the CT scans was 5 mm, and the CT-data images were grayscale images with 512 × 512 pixels.

T330

44702-44892

Sentence

denotes

Areas with 2–5 SAs were annotated by the two experienced radiologists using a rapid keystroke-entry format in the images for each case, and these areas ranged from 16 × 16 to 64 × 64 pixels.

T331

44893-45044

Sentence

denotes

The CT-data of the normal cases (CTNPDS) and that of the pneumonia cases (CTPPDS) from the public data sets constituted the CT public data set (CTPDS).

T332

45045-45289

Sentence

denotes

The CT-data of the COVID-19 cases from the Youan hospital (CTCHDS), the influenza cases from the Youan hospital (CTIHDS), and the normal cases from the Youan hospital (CTNHDS) constituted the CT hospital (clinically-diagnosed) data set (CTHDS).

T333

45290-45318

Sentence

denotes

For clinical indicator data:

T334

45319-45545

Sentence

denotes

Five clinical indicators (white blood cell count, neutrophil percentage, lymphocyte percentage, procalcitonin, C-reactive protein) of 95 COVID-19 cases were obtained from the Youan hospital, as shown in Supplementary Table 20.

T335

45546-45802

Sentence

denotes

A total of 95 data pairs from the 95 COVID-19 cases (369 images of the lesion area and the 95 × 5 clinical indicators) were collected from the Youan hospital for the correlation analysis of the lesion areas of the COVID-19 and the five clinical indicators.

T336

45803-45910

Sentence

denotes

The images of the SAs and the clinical indicator data constituted the correlation analysis data set (CADS).

T337

45911-46030

Sentence

denotes

We split the XPDS, XHDS, CTPDS, CTHDS, and CADS into the training-validation (train-val) and test data sets using TTSF.

T338

46031-46137

Sentence

denotes

The details of the hybrid data sets for the public data sets and Youan hospital data are shown in Table 1.

T339

46138-46225

Sentence

denotes

The train-val part of CTHDS is referred to as CTHTS, and the test part is called CTHVS.

T340

46226-46367

Sentence

denotes

The same naming scheme was adopted for XPDS, XHDS, CTPDS, and CADS, i.e., XPTS, XPVS, XHTS, XHVS, CTPTS, CTPVS, CATS, and CAVS, respectively.

T341

46368-46552

Sentence

denotes

The training-validation part of the four public data sets and the hospital (Youan Hospital) data set were mixed for X-data and CT-data, which were named as XMTS and CTMTS respectively.

T342

46553-46626

Sentence

denotes

While the test parts were split in the same way and named XMVS and CTMVS.

T343

46628-46647

Sentence

denotes

Image preprocessing

T344

46648-46830

Sentence

denotes

All image data (X-data and CT-data) in the DICOM format were loaded using the Pydicom library (version 1.4.0) and processed as arrays using the Numpy library (version 1.16.0).X-data:

T345

46831-47014

Sentence

denotes

The two-dimensional array (x axis and y axis) of the image of the X-data (size of 512 × 512) was normalized to pixel values of 0–255 and stored in png format using the OpenCV library.

T346

47015-47083

Sentence

denotes

Each preprocessed image was resized to 512 × 512 and had 3 channels.

T347

47084-47092

Sentence

denotes

CT-data:

T348

47093-47254

Sentence

denotes

The array of the CT-data was three-dimensional (x axis, y axis, and z axis), and the length of the z axis was ~300, which represented the number of image slices.

T349

47255-47331

Sentence

denotes

Each image slice was two-dimensional (x axis and y axis, size of 512 × 512).

T350

47332-47519

Sentence

denotes

As shown in Fig. 1b, the array of the image was divided into three groups in the z axis direction, and each group contained 100 image slices (each case was resampled to 300 image slices).

T351

47520-47650

Sentence

denotes

The image slices in each group were processed using a window center of −600 and a window width of 2000 to extract the lung tissue.

T352

47651-47789

Sentence

denotes

The images of the CT-data with 300 image slices were normalized to pixel values of 0–255 and stored in npy format using the Numpy library.

T353

47790-47999

Sentence

denotes

A convolution filter was applied with three 1 × 1 convolution kernels to preprocess the CT-data, which is a trainable layer with the aim of normalizing the input; the image size was 512 × 512, with 3 channels.

T354

48001-48035

Sentence

denotes

Annotation tool for medical images

T355

48036-48161

Sentence

denotes

The server program of the annotation tool was deployed in a computer with large network bandwidth and abundant storage space.

T356

48162-48297

Sentence

denotes

The client program of the annotation tool was deployed in the office computer of the experts, who were given unique user IDs for login.

T357

48298-48458

Sentence

denotes

The interface of the client program had a built-in image viewer with a window size of 512 × 512 and an export tool for obtaining the annotations in text format.

T358

48459-48682

Sentence

denotes

Multiple drawing tools were provided to annotate the lesion area in the images, including a rectangle tool for drawing a bounding box around the target, a polygon tool for outlining the target, and a circle tool the target.

T359

48683-48753

Sentence

denotes

Multiple categories could be defined and assigned to the target areas.

T360

48754-48980

Sentence

denotes

All annotations were stored in a structured query language (SQL) database, and the export tool was used to export the annotations to two common file formats (comma-separated values (csv) and JavaScript object notation (json)).

T361

48981-49028

Sentence

denotes

The experts could share the annotation results.

T362

49029-49166

Sentence

denotes

Since the size of the X-data and the CT slice-data were identical, the annotations for both data were performed with the annotation tool.

T363

49167-49262

Sentence

denotes

Here we use one image slice of the CT-data as an example to demonstrate the annotation process.

T364

49263-49332

Sentence

denotes

In this study, two experts were asked to annotate the medical images.

T365

49333-49393

Sentence

denotes

The normal cases were reviewed and confirmed by the experts.

T366

49394-49488

Sentence

denotes

The abnormal cases, including the COVID-19 and influenza cases, were annotated by the experts.

T367

49489-49579

Sentence

denotes

Bounding boxes of the lesion areas in the images were annotated using the annotation tool.

T368

49580-49640

Sentence

denotes

In general, each case contained 2–5 slices with annotations.

T369

49641-49784

Sentence

denotes

The cases with the annotated slices were considered positive cases, and each case was assigned to a category (COVID-19 case or influenza case).

T370

49785-49850

Sentence

denotes

The pipeline of the annotation was shown in Supplementary Fig. 1.

T371

49852-49883

Sentence

denotes

Model architecture and training

T372

49884-50115

Sentence

denotes

In this study, we proposed a modular CNNCF to identify the COVID-19 cases in the medical images and a CNNRF to determine the relationships between the lesion areas in the medical images and the five clinical indicators of COVID-19.

T373

50116-50192

Sentence

denotes

Both proposed frameworks consisted of two units (ResBlock-A and ResBlock-B).

T374

50193-50295

Sentence

denotes

The CNNCF and CNNRF had unique units, namely the control gate block and regressor block, respectively.

T375

50296-50421

Sentence

denotes

Both frameworks were implemented using two NVIDIA GTX 1080TI graphics cards and the open-source PyTorch framework.ResBlock-A:

T376

50422-50442

Sentence

denotes

As discussed in ref.

T377

50443-50584

Sentence

denotes

57, the residual block is a CNN-based block that allows the CNN models to reuse features, thus accelerating the training speed of the models.

T378

50585-50745

Sentence

denotes

In this study, we developed a residual block (ResBlock-A) that utilized a skip-connection for retaining features in different layers in the forward propagation.

T379

50746-50958

Sentence

denotes

This block (Fig. 6a) consisted of a multiple-input multiple-output structure with two branches (an upper branch and a bottom branch), where input 1 and input 2 have the same size, but the values may be different.

T380

50959-51052

Sentence

denotes

In contrast, output 1 and output 2 had the same size, but output 1 did not have a ReLu layer.

T381

51053-51180

Sentence

denotes

The upper branch consisted of a max-pooling layer (Max-Pooling), a convolution layer (Conv 1 × 1), and a batch norm layer (BN).

T382

51181-51415

Sentence

denotes

The Max-Pooling had a kernel size of 3 × 3 and a stride of 2 to downsample the input 1 for retaining the features and ensuring the same size as the output layer before the element-wise add operation was conducted in the bottom branch.

T383

51416-51594

Sentence

denotes

The Conv 1 × 1 consisted of multiple 1 × 1 convolution kernels with the same number as that in the second convolution layer in the bottom branch to adjust the number of channels.

T384

51595-51744

Sentence

denotes

The BN used a regulation function to ensure the input in each layer of the model followed a normal distribution with a mean of 0 and a variance of 1.

T385

51745-51835

Sentence

denotes

The bottom branch consisted of two convolution layers, two BN layers, and two ReLu layers.

T386

51836-52044

Sentence

denotes

The first convolution layer in the bottom branch consisted of multiple 3 × 3 convolution kernels with a stride of 2 and a padding of 1 to reduce the size of the feature maps when local features were obtained.

T387

52045-52181

Sentence

denotes

The second convolution layer in the bottom branch consisted of multiple 3 × 3 convolution kernels with a stride of 1 and a padding of 1.

T388

52182-52301

Sentence

denotes

The ReLu function was used as the activation function to ensure a non-linear relationship between the different layers.

T389

52302-52436

Sentence

denotes

The output of the upper branch and the output of the bottom branch after the second BN were fused using an element-wise add operation.

T390

52437-52523

Sentence

denotes

The fused result was output 1, and the fused result after the ReLu layer was output 2.

T391

52524-52572

Sentence

denotes

Fig. 6 The four units of the proposed framework.

T392

52573-53253

Sentence

denotes

a ResBlock-A architecture, containing two convolution layers with 3 × 3 kernels, one convolution layer with a 1 × 1 kernel, three batch normalization layers, two ReLu layers, and one max-pooling layer with a 3 × 3 kernel. b ResBlock-B architecture; the basic unit is the same as the ResBlock-A, except for output 1. c The Control Gate Block has a synaptic-based frontend architecture that controls the direction of the feature map flow and the overall optimization direction of the framework. d The Regressor architecture is a skip-connection architecture containing one convolution layer with 3 × 3 kernels, one batch normalization layer, one ReLu layer, and three linear layers.

T393

53254-53265

Sentence

denotes

ResBlock-B:

T394

53266-53402

Sentence

denotes

The ResBlock-B (Fig. 6b) was a multiple-input single-output block that was similar to the ResBlock-A, except that there was no output 1.

T395

53403-53553

Sentence

denotes

The value of the stride and padding in each layer of the ResBlock-A and ResBlock-B could be adjusted using hyper-parameters based on the requirements.

T396

53554-53573

Sentence

denotes

Control Gate Block:

T397

53574-53827

Sentence

denotes

As shown in Fig. 6c, the Control Gate Block was a multiple-input single-output block consisting of a predictor module, a counter module, and a synapses module to control the optimization direction while controlling the information flow in the framework.

T398

53828-53952

Sentence

denotes

The pipeline of the predictor module is shown in Supplementary Fig. 19a, where the Input S1 is the output of the ResBlock-B.

T399

53953-54054

Sentence

denotes

The Input S1 was then flattened to a one-dimensional feature vector as the input of the linear layer.

T400

54055-54161

Sentence

denotes

The output of the linear layer was converted to a probability of each category using the softmax function.

T401

54162-54310

Sentence

denotes

A sensitivity calculator used the Vpred and Vtrue as inputs to calculate the TP, TN, FP, and false-negative (FN) rates to calculate the sensitivity.

T402

54311-54410

Sentence

denotes

The sensitivity calculation was followed by a step function to control the output of the predictor.

T403

54411-54557

Sentence

denotes

The ths was a threshold value; if the calculated sensitivity was greater or equal to ths, the step function output 1; otherwise, the output was 0.

T404

54558-54639

Sentence

denotes

The counter module was a conditional counter, as shown in Supplementary Fig. 19b.

T405

54640-54705

Sentence

denotes

If the input n was zero, the counter was cleared and set to zero.

T406

54706-54744

Sentence

denotes

Otherwise, the counter increased by 1.

T407

54745-54779

Sentence

denotes

The output of the counter was num.

T408

54780-54929

Sentence

denotes

The synapses block mimicked the synaptic structure, and the input variable num was similar to a neurotransmitter, as shown in Supplementary Fig. 19c.

T409

54930-54989

Sentence

denotes

The input num was the input parameter of the step function.

T410

54990-55118

Sentence

denotes

The ths was a threshold value; if the input num was greater or equal to ths, the step function output 1; otherwise, it output 0.

T411

55119-55222

Sentence

denotes

An element-wise multiplication was performed between the input S1 and the output of the synapses block.

T412

55223-55278

Sentence

denotes

The multiplied result was passed on to a discriminator.

T413

55279-55379

Sentence

denotes

If the sum of each element in the result was not zero, the Input S1 was passed on to the next layer.

T414

55380-55434

Sentence

denotes

Otherwise, the input S1 information was not passed on.

T415

55435-55451

Sentence

denotes

Regressor block:

T416

55452-55580

Sentence

denotes

The regressor block consisted of multiple linear layers, a convolution layer, a BN layer, and a ReLu layer, as shown in Fig. 6d.

T417

55581-55723

Sentence

denotes

A skip-connection architecture was adopted to retain the features and increase the ability of the block to represent non-linear relationships.

T418

55724-55854

Sentence

denotes

The convolution block in the skip-connection structure was a convolution layer with multiple numbers of 1 × 1 convolution kernels.

T419

55855-56010

Sentence

denotes

The number of the convolution kernels was the same as that of the output size of the second linear layer to ensure the consistency of the vector dimension.

T420

56011-56112

Sentence

denotes

The input size and output size of each linear layer were adjustable to be applicable to actual cases.

T421

56113-56255

Sentence

denotes

Based on the four blocks, two frameworks were designed for the classification task and regression task, respectively.Classification framework:

T422

56256-56321

Sentence

denotes

The CNNCF consisted of stage I and stage II, as shown in Fig. 3a.

T423

56322-56393

Sentence

denotes

Stage I was duplicated Q times in the framework (in this study, Q = 1).

T424

56394-56516

Sentence

denotes

It consisted of multiple ResBlock-A with a number of M (in this study, M = 2), one ResBlock-B, and one Control Gate Block.

T425

56517-56620

Sentence

denotes

Stage II consisted of multiple ResBlock-A with a number of N (in this study, N = 2) and one ResBlock-B.

T426

56621-56767

Sentence

denotes

The weighted cross-entropy loss function was used and was minimized using the SGD optimizer with a learning rate of a1 (in this study, a1 = 0.01).

T427

56768-57018

Sentence

denotes

A warm-up strategy58 was used in the initialization of the learning rate for a smooth training start, and a reduction factor of b1 (in this study, b1 = 0.1) was used to reduce the learning rate after every c1 (in this study, c1 = 10) training epochs.

T428

57019-57157

Sentence

denotes

The model was trained for d1 (in this study, d1 = 40) epochs, and the model parameters saved in the last epoch was used in the test phase.

T429

57158-57179

Sentence

denotes

Regression framework:

T430

57180-57252

Sentence

denotes

The CNNRF (Fig. 3b) consisted of two parts (stage II and the regressor).

T431

57253-57497

Sentence

denotes

The inputs to the regression framework were the images of the lesion areas, and the output was the corresponding vector with five dimensions, representing the five clinical indicators (all clinical indicators were normalized to a range of 0–1).

T432

57498-57602

Sentence

denotes

The stage II structure was the same as that in the classification framework, except for some parameters.

T433

57603-57746

Sentence

denotes

The loss function was the MSE loss function, which was minimized using the SGD optimizer with a learning rate of a2 (in this study, a2 = 0.01).

T434

57747-57995

Sentence

denotes

A warm-up strategy was used in the initialization of the learning rate for a smooth training start, and a reduction factor of b2 (in this study, b2 = 0.1) was used to reduce the learning rate after every c2 (in this study, c2 = 50) training epochs.

T435

57996-58140

Sentence

denotes

The framework was trained for d2 (in this study, d2 = 200) epochs, and the model parameters saved in the last epoch were used in the test phase.

T436

58141-58186

Sentence

denotes

The workflow of the classification framework.

T437

58187-58260

Sentence

denotes

The workflow of the classification framework was demonstrated in Fig. 3c.

T438

58261-58389

Sentence

denotes

The preprocessed images are sent to the first convolution block to expand the channels and processed as the input for the CNNCF.

T439

58390-58532

Sentence

denotes

Given the input Fi with a size of M × N × 64, the stage I output feature maps F′i with a size of M/8 × N/8 × 256 in the default configuration.

T440

58533-58672

Sentence

denotes

As we introduced above, the Control Gate Block controls the optimization direction while controlling the information flow in the framework.

T441

58673-58755

Sentence

denotes

If the Control Gate Block is open, the feature maps F′i are passed on to stage II.

T442

58756-59061

Sentence

denotes

Given the input F′i, the stage II output the feature maps F″i with a size of M/64 × N/64 × 512 which is defined as follows:1 Fi′=S1(Fi)Fi″=S2(Fi′)⊗CGB(Fi′),where S1 denotes the stage I block, S2 denotes the stage II block, and CGB is the Control Gate Block. ⊗ is the element-wise multiplication operation.

T443

59062-59218

Sentence

denotes

Stage II is Followed by a global average pooling layer (GAP) and a fully connect layer (FC layer) with a softmax function to generate the final predictions.

T444

59219-59309

Sentence

denotes

Given F″i as input, the GAP is adopted to generate a vector Vf with a size of 1 × 1 × 512.

T445

59310-59677

Sentence

denotes

Given Vf as input, the FC layer with the softmax function outputs a vector Vc with a size of 1 × 1 × C.2 Vf=GAPFi′Vc=SMaxFCVf,where GAP is the global average pooling layer, the FC is the fully connect layer, SMax is the softmax function, Vf is the feature vector generated by the GAP, Vc is the prediction vector, and C is the number of case types used in this study.

T446

59679-59756

Sentence

denotes

Training strategies and evaluation indicators of the classification framework

T447

59757-59850

Sentence

denotes

The training strategies and hyper-parameters of the classification framework were as follows.

T448

59851-60029

Sentence

denotes

We adopted a knowledge distillation method (Fig. 7) to train the CNNCF as a student network with one stage I block and one stage II block, each of which contained two ResBlock-A.

T449

60030-60234

Sentence

denotes

Four teacher networks (the hyper-parameters are provided in Supplementary Table 21) with the proposed blocks were trained on the train-val part of each sub-data set using a 5-fold cross-validation method.

T450

60235-60304

Sentence

denotes

All networks were initialized using the Xavier initialization method.

T451

60305-60383

Sentence

denotes

The initial learning rate was 0.01, and the optimization function was the SGD.

T452

60384-60494

Sentence

denotes

The CNNCF was trained using the image data and the label, as well as the fused output of the teacher networks.

T453

60495-60617

Sentence

denotes

The comparison of RT-PCR test results using throat specimen and the CNNCF results were provided in Supplementary Table 22.

T454

60618-60695

Sentence

denotes

Supplementary Fig. 20 shows the details of the knowledge distillation method.

T455

60696-60812

Sentence

denotes

The definitions and details of the five evaluation indicators used in this study were given in Supplementary Note 2.

T456

60813-60912

Sentence

denotes

Fig. 7 Knowledge distillation consisting of multiple teacher networks and a target student network.

T457

60913-61013

Sentence

denotes

The knowledge is transferred from the teacher networks to the student network using a loss function.

T458

61015-61054

Sentence

denotes

Gradient-weighted class activation maps

T459

61055-61198

Sentence

denotes

Grad-CAM59 in the Pytorch framework was used to visualize the salient features that contributed the most to the prediction output of the model.

T460

61199-61445

Sentence

denotes

Given a target category, the Grad-CAM performed back-propagation to obtain the final CNN feature maps and the gradient of the feature maps; only pixels with positive contributions to the specified category were retained through the ReLU function.

T461

61446-61645

Sentence

denotes

The Grad-CAM method was used for all test data set (X-data and CT-data) in the CNNCF without changing the framework structure to obtain a visual output of the framework’s high discriminatory ability.

T462

61647-61677

Sentence

denotes

Statistics and reproducibility

T463

61678-61796

Sentence

denotes

We used multiple statistical indices and empirical distributions to assess the performance of the proposed frameworks.

T464

61797-61956

Sentence

denotes

The equations of the statistical indices are shown in Supplementary Fig. 21 and all the abbreviations used in this study are defined in Supplementary Table 23.

T465

61957-62085

Sentence

denotes

All the data used in this study followed the criteria: (1) sign informed consent prior to enrollment. (2) At least 18 years old.

T466

62086-62217

Sentence

denotes

This study was conducted following the declaration of Helsinki and was approved by the Capital Medical University Ethics Committee.

T467

62218-62419

Sentence

denotes

The following statistical analyses of the data were conducted for both evaluating the classification framework and the regression framework.Statistical indices to evaluate the classification framework.

T468

62420-62647

Sentence

denotes

Multiple evaluation indicators (PRC, ROC, AUPRC, AUROC, sensitivity, specificity, precision, kappa index, and F1 with a fixed threshold) were computed for a comprehensive and accurate assessment of the classification framework.

T469

62648-62764

Sentence

denotes

Multiple threshold values were in the range from 0 to 1 with a step value of 0.005 to obtain the ROC and PRC curves.

T470

62765-62931

Sentence

denotes

The PRC showed the relationship between the precision and the sensitivity (or recall), and the ROC indicated the relationship between the sensitivity and specificity.

T471

62932-63019

Sentence

denotes

The two curves reflected the comprehensive performance of the classification framework.

T472

63020-63124

Sentence

denotes

The kappa index is a statistical method for assessing the degree of agreement between different methods.

T473

63125-63204

Sentence

denotes

In our use case, the indicator was used to measure the stability of the method.

T474

63205-63297

Sentence

denotes

The F1 score is a harmonic average of precision and sensitivity and considers the FP and FN.

T475

63298-63390

Sentence

denotes

The bootstrapping method was used to calculate the empirical distribution of each indicator.

T476

63391-63584

Sentence

denotes

The detailed calculation process was as follows: we conducted random sampling with replacement to generate 1000 new test data sets with the same number of samples as the original test data set.

T477

63585-63658

Sentence

denotes

The evaluation indicators were calculated to determine the distributions.

T478

63659-63732

Sentence

denotes

The results were displayed in boxplots (Fig. 5 and Supplementary Fig. 2).

T479

63733-63790

Sentence

denotes

Statistical indices to evaluate the regression framework.

T480

63791-63938

Sentence

denotes

Multiple evaluation indicators (MSE, RMSE, MAE, R2, and PCC) were computed for a comprehensive and accurate assessment of the regression framework.

T481

63939-64021

Sentence

denotes

The MSE was used to calculate the deviation between the predicted and true values.

T482

64022-64069

Sentence

denotes

The RMSE was the square root of the MSE result.

T483

64070-64131

Sentence

denotes

The two indicators show the accuracy of the model prediction.

T484

64132-64206

Sentence

denotes

The R2 was used to assess the goodness-of-fit of the regression framework.

T485

64207-64298

Sentence

denotes

The r was used to assess the correlation between two variables in the regression framework.

T486

64299-64393

Sentence

denotes

The indicators were calculated using the open-source tools scikit-learn and the scipy library.

T487

64395-64420

Sentence

denotes

Supplementary information

T488

64422-64438

Sentence

denotes

Peer Review File

T489

64439-64464

Sentence

denotes

Supplementary Information

T490

64466-64601

Sentence

denotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

T491

64603-64628

Sentence

denotes

Supplementary information

T492

64629-64713

Sentence

denotes

Supplementary information is available for this paper at 10.1038/s42003-020-01535-7.

T493

64715-64731

Sentence

denotes

Acknowledgements

T494

64732-64838

Sentence

denotes

We would like to thank the Ministry of Science and Technology of the People’s Republic of China (Grant No.

T495

64839-64918

Sentence

denotes

2017YFB1400100) and the National Natural Science Foundation of China (Grant No.

T496

64919-64947

Sentence

denotes

61876059) for their support.

T497

64949-64969

Sentence

denotes

Author contributions

T498

64970-65041

Sentence

denotes

S.L. and Y.G. contributed significantly to the conception of the study.

T499

65042-65096

Sentence

denotes

S.L. designed the network and conduct the experiments.

T500

65097-65167

Sentence

denotes

S.L. and Y.G. provided, marked, and analyzed the experimental results.

T501

65168-65249

Sentence

denotes

H.L. contributed with valuable discussions and analyzed the experimental results.

T502

65250-65365

Sentence

denotes

Y.G. supported and supervised the work and contributed with valuable scientific advice as the corresponding author.

T503

65366-65466

Sentence

denotes

X.G. collected the medical image data from Youan Hospital and contributed with valuable discussions.

T504

65467-65538

Sentence

denotes

H.L. and L.L. provided analysis and interpretation of the medical data.

T505

65539-65612

Sentence

denotes

Z.W., M.L., and L.T. contributed with valuable discussions and revisions.

T506

65613-65664

Sentence

denotes

All authors contributed to writing this manuscript.

T507

65666-65683

Sentence

denotes

Data availability

T508

65684-65895

Sentence

denotes

The data sets used in this study (named Hybrid Datasets) are composed of public data sets from four public data repositories and a hospital data set provided by the cooperative hospital (Beijing Youan hospital).

T509

65896-66129

Sentence

denotes

The four public data repositories are Covid-ChestXray-Dataset (CCD), Rsna-pneumonia-detection-challenge (RSNA), Lung Nodule Analysis 2016 (LUNA16), and Images of COVID-19 positive and negative pneumonia patients (ICNP), respectively.

T510

66130-66221

Sentence

denotes

Full data of the Hybrid Data sets are available at Figshare (10.6084/m9.figshare.13235009).

T511

66223-66240

Sentence

denotes

Code availability

T512

66241-66314

Sentence

denotes

We used standard software packages as described in the “Methods” section.

T513

66315-66463

Sentence

denotes

The implementation details of the proposed framework can be downloaded from https://github.com/SHERLOCKLS/Detection-of-COVID-19-from-medical-images.

T514

66465-66484

Sentence

denotes

Competing interests

T515

66485-66528

Sentence

denotes

The authors declare no competing interests.

PMC:7782580 JSON TXT 9 Projects

Annnotations TAB TSV DIC JSON TextAE

PMC:7782580 JSONTXT 9 Projects

Annnotations TAB TSV DIC JSON TextAE

PMC:7782580 JSON TXT 9 Projects