Introduction Coronavirus disease 2019 (COVID-19), a highly infectious disease with the basic reproductive number (R0) of 5.7 (reported by the US Centers for Disease Control and Prevention), is caused by the most recently discovered coronavirus1 and was declared a global pandemic by the World Health Organization (WHO) on March 11, 20202. It poses a serious threat to human health worldwide, as well as substantial economic losses to all countries. As of 7 September 2020, 27,032,617 people have been infected by COVID-19 after testing, and 881,464 deaths have occurred, according to the statistics of the WHO3. The Wall Street banks have estimated that the COVID-19 pandemic may cause losses of $5.5 trillion to the global economy over the next 2 years4. The WHO recommends using real-time reverse transcriptase-polymerase chain reaction (rRT-PCR) for laboratory confirmation of the COVID-19 virus in respiratory specimens obtained by the preferred method of nasopharyngeal swabs5. Laboratories performing diagnostic testing for COVID-19 should strictly comply with the WHO biosafety guidance for COVID-196. It is also necessary to follow the standard operating procedures (SOPs) for specimen collection, storage, packaging, and transport because the specimens should be regarded as potentially infectious, and the testing process can only be performed in a Biosafety Level 3 (BSL-3) laboratory7. Not all cities worldwide have adequate medical facilities to follow the WHO biosafety guidelines. According to an early report (Feb 17, 2020), the sensitivity of tests for the detection of COVID-19 using rRT-PCR analysis of nasopharyngeal swab specimens is around 30–60% due to irregularities during the collection and transportation of COVID-19 specimens8. Recent studies reported a higher sensitivity range from 71% (Feb 19, 2020) to 91% (Mar 27, 2020)9,10. A recent systematic review reported that the sensitivity of the PCR test for COVID-19 might be in the range of 71–98% (Apr 21, 2020), whereas the specificity of tests for the detection of COVID-19 using rRT-PCR analysis is about 95%11. Yang et al.8 discovered that although no viral ribonucleic acid (RNA) was detected by rRT-PCR in the first three or all nasopharyngeal swab specimens in mild cases, the patient was eventually diagnosed with COVID-19 (Feb 17, 2020). Therefore, the WHO has stated that one or more negative results do not rule out the possibility of COVID-19 infection12. Additional auxiliary tests with relatively higher sensitivity to COVID-19 are urgently required. The clinical symptoms associated with COVID-19 include fever, dry cough, dyspnea, and pneumonia, as described in the guideline released by the WHO13. It has been recommended to use the WHO’s case definition for influenza-like illness (ILI) and severe acute respiratory infection (SARI) for monitoring COVID-1913. As reported by the CHINA-WHO COVID-19 joint investigation group (February 28, 2020)14, autopsies showed the presence of lung infection in COVID-19 victims. Therefore, medical imaging of the lungs might be a suitable auxiliary diagnostic testing method for COVID-19 since it uses available medical technology and clinical examinations. Chest radiography (CXR) and chest computed tomography (CT) are the most common medical imaging examinations for the lungs and are available in most hospitals worldwide15. Different tissues of the body absorb X-rays to different degrees16, resulting in grayscale images that allow for the detection of anomalies based on the contrast in the images. CT differs from normal CXR in that it has superior tissue contrast with different shades of gray (about 32–64 levels)17. The CT images are digitally processed18 to create a three-dimensional image of the body. However, CT examinations are more expensive than CXR examinations19. Recent studies reported that the use of CXR and CT images resulted in improved diagnostic sensitivity for the detection of COVID-1920,21. The interpretation of medical images is time-consuming, labor-intensive, and often subjective. The medical images are first annotated by experts to generate a report of the radiography findings. Subsequently, the radiography findings are analyzed, and clinical factors are considered to obtain a diagnosis15. However, during the current pandemic, the frontline expert physicians are faced with a massive workload and lack of time, which increases the physical and psychological burden on staff and might adversely affect the diagnostic efficiency. Since modern hospitals have advanced digital imaging technology, medical image processing methods may have the potential for fast and accurate diagnosis of COVID-19 to reduce the burden on the experts. Deep learning (DL) methods, especially convolutional neural networks (CNNs), are effective approaches for representation learning using multilayer neural networks22 and have provided excellent performance solutions to many problems in image classification23,24, object detection25, games and decisions26, and natural language processing27. A deep residual network28 is a type of CNN architecture that uses the strategy of skip connections to avoid degradation of models. However, the applications of DL for clinical diagnoses remains limited due to the lack of interpretability of the DL model and the multi-modal properties of clinical data. Some studies have demonstrated excellent performance of DL methods for the detection of lung cancer with CT images29, pneumonia with CXR images30, and diabetic retinopathy with retinal fundus photographs31. To the best of our knowledge, the DL method has been validated only on single modal data, and no correlation analysis with clinical indicators was performed. Traditional machine learning methods are more constrained and better suited than DL methods to specific, practical computing tasks using features32. As demonstrated by Jin et al., the traditional machine learning algorithm using the scale-invariant feature transform (SIFT)33 and random sample consensus (RANSAC)34 may outperform the state-of-the-art DL methods for image matching35. We designed a general end-to-end DL framework for information extraction from CXR images (X-data) and CT images (CT-data) that can be considered a cross-domain transfer learning model. In this study, we developed a custom platform for rapid expert annotation and proposed the modular CNN-based multi-stage framework (classification framework and regression framework) consisting of basic component units and special component units. The framework represents an auxiliary examination method for high precision and automated detection of COVID-19. This study makes the following contributions: First, a multi-stage CNN-based classification framework consisting of two basic units (ResBlock-A and ResBlock-B) and a special unit (control gate block) was established for use with multi-modal images (X-data and CT-data). The classification results were compared with evaluations by experts with different levels of experience. Different optimization goals were established for the different stages in the framework to obtain good performances, which were evaluated using multiple statistical indicators. Second, principal component analysis (PCA) was used to determine the characteristics of the X-data and CT-data of different categories (normal, COVID-19, and influenza). Gradient-weighted class activation mapping (Grad-CAM) was used to visualize the salient features in the images and extract the lesion areas associated with COVID-19. Third, data preprocessing methods, including pseudo-coloring and dimension normalization, were developed to facilitate the interpretability of the medical images and adapt the proposed framework to the multi-modal images (X-data and CT-data). Fourth, A knowledge distillation method was adopted as a training strategy to obtain high performance with low computational requirements and improve the usability of the method. Last, The CNN-based regression framework was used to describe the relationships between the radiography findings and the clinical symptoms of the patients. Multiple evaluation indicators were used to assess the correlations between the radiography findings and the clinical indicators.