Result Simulation and prediction of the COVID-19 disease variations In this section, the variations of the COVID-19 in Guangdong province are simulated and predicted based on our SEIRQ model only considering the input population from the other provinces of China (excluding Hubei province). The simulated period are from Jan 27, 2019 to Feb 19, 2020. The parameter values and the initial values of our simulation and prediction are provided in Table 1 . The performance is evaluated by the data from Feb 20, 2020 to Feb 23, 2020, and R  *  2, AE, RE, RMSE, MPAE and DISO are employed to quantify the accuracy. The simulation and prediction results are displayed in Table 2 and Fig. 2 . Table 1 Parameter estimates for COVID-19 in Guangdong province. Parameter Definitions Esimated value Source β Transmission incidence rate 2.45 × 10−8 Estimated σ The fraction of transmission incidence rate for exposed individuals 0.63 Estimated α Disease-induced death rate 0.00375 Estimated ν Transmission rate of exposed individuals to the infected class 0.183 [Zhao et al., 2020a], [Zhao et al., 2020b] γ(t) Recovery rate 0.008+0.19(1+e5.0126−0.1846t) Estimated q1(t) Quarantined rate of susceptible individuals 0.28 Estimated q2(t) Quarantined rate of exposed individuals 0.76 Estimated q3 Quarantined rate of infected individuals 0.89 Estimated A(t) Input number 86926 data B1 Output number 21356 data p1 The fraction of input population into susceptible class 0.9999927 Computed p2 The fraction of input population into exposed class 0.0000073 Computed p3 The fraction of input population into infected class 0 Assumed Initial values Definitions Esimated value Source N(0) Initial total population 113460000 GSY S(0) Initial susceptible population 113346174 Estimated E(0) Initial exposed population 31 Estimated I(0) Initial infected population 19 Estimated Sq(0) Initial quarantined susceptible population 113460 Estimated Eq(0) Initial quarantined exposed population 128 data Iq(0) Initial quarantined infected population 184 data R(0 Initial recovered population 4 data Note: GSY: Guangdong Statistical Yearbook, 2019. Table 2 Evaluation results of the simulation and prediction in Guangdong province. Different cases Simulation Prediction R * 2 AE MAPE (%) DISO 20/2 21/2 22/2 23/2 RE (%) RE (%) RE (%) RE (%) Cumulative confirmed cases 0.9973 −5.33 2.54 0.06 −0.38 −0.45 −0.37 −0.37 Confirmed cases 0.9898 −2.63 3.86 0.11 2.68 1.51 0.81 7.07 Recovered cases 0.9934 −3.38 43.32 0.17 −2.09 −1.38 −3.75 −10.41 Figure 2 Simulation and prediction of the COVID-19 in Guangdong province. (A) cumulative confirmed cases; (B) daily new confirmed cases and (C) difference of increased confirmed cases. The initial values and parameters are S(0) = 113346174, E(0) = 31, I(0) = 19, R(0) = 4, Sq(0) = 113460, Eq(0) = 128, Iq(0) = 184, A = 86926, B = 21356, p1 = 0.9999927, p2 = 0.0000073, p3 = 0, q1 = 0.28, q2 = 0.76, q3 = 0.89, α = 0.00375, β = 2.45 × 10−8, ν = 0.183, σ = 0.63, γ(t) = 0.008 + 0.19/(1 + e5.0126−0.1846t). Our model has the ability to simulate and to predict the COVID-19 variations with the very high accuracy (Table 2 and Fig. 2). Particularly, the determinant coefficients R* of the cumulative confirmed cases, confirmed cases and recovered cases are highly to 0.9973, 0.9898 and 0.9934, respectively (Table 2). Very small estimations are obtained with the AE values of −5.33, −2.63 and −3.38 for the cumulative cases, confirmed cases and recovered cases. The comprehensive accuracies of our model are quantitatively measured by the DISO with the values of 0.06, 0.11 and 0.17 for the cumulative cases, confirmed cases and recovered cases. For the validation at Feb 20, Feb 21, Feb 22 and Feb 23, 2020, the very small RE values of the cumulative confirmed cases, confirmed cases and recovered cases indicate that our model also has very high accuracies and it can be employed to predict the future variations of the COVID-19 disease (Table 2). Moreover, the largest number of cumulative confirmed cases is 1397 at May 7, 2020 which indicates that the COVID-19 disease will become extinction after 102 days in Guangdong province (Fig. 2A, STable 1). The peak value time of daily new confirmed cases is Feb 1, 2020 which is highly agrement with the reported time at Jan 31, 2020 (Fig. 2B). For the confirmed cases, the peak value and the corresponding time are both obtained by our model with the simulated values of 1002 at Feb 10, 2020 and reported values of 1007 at Feb 9, 2020 (Fig. 2C). The number of the recovered cases will reach about 1400 which is consist with the future changes of the cumulative confirmed cases (Fig. 2D). In order to further explore the forecasting accuracy of our model, we have been compared the forecasting result with the observed data prolonged 11 days from Feb 24, 2020 to Mar 4, 2020. The absolute values of RE (relative error) of the cumulative confirmed cases are smaller than 1% (Table 3 ). The corresponding figures also display that our model can capture the temporal variations in a relative longer period (see SFigure 1 in the supplementary information). Table 3 Evaluation results of the prediction in Guangdong province. RE (%) 24/2 25/2 26/2 27/2 28/2 29/2 1/3 2/3 3/3 4/3 Cumulative confirmed cases −2.30 −0.41 0.12 0.20 0.25 0.37 0.40 0.49 0.58 0.66 Confirmed cases −14.98 −19.21 −24.22 −26.74 −27.64 −30.81 −36.19 −35.94 −33.52 −34.68 Recovered cases 9.60 11.35 13.09 12.57 10.88 10.67 11.35 9.35 7.08 6.31 Effects of input population at different scenarios The input population variations include the percentage changes p 2 of the exposed individuals and the number changes A of the input population which impact the disease on the peak value of the cumulative confirmed cases and the disease extinction time (Figure 3, Figure 4 ). For the first time point t 1  = 10 (i.e. Feb 6, 2020), the days of disease extinction (DDE) are shortened to 78 days (i.e. Apr 13, 2020) and 69 days (i.e. Apr 4, 2020) at Sce 1: (p 2, A) = (p 2  * , 1.5A  *) and Sce 2: (p 2, A) = (p 2  * , 2A  *), and the maximum values of the cumulative confirmed cases (MVCCC) have the numbers of 1396 and 1397 [Fig. 3A, Supplementary table 1 (STable 1)]. For the confirmed cases, the peak values are nearly close to the baseline value with the number of 1003, and the corresponding times are same as the baseline value (STable 1). Moreover, the confirmed cases of Sce 1 and Sce 2 have the same variations as the baseline result with their early disease extinction that are consist with the variations of the cumulative confirmed cases (Fig. 2A and 3 A). For Sce 4, Sce 5, Sce 7 and Sce 8, compared with the baseline results, the DDE of these scenarios are 81 days (i.e. Apr 16, 2020), 59 days (i.e. Mar 25, 2020), 83 days (i.e. Apr 18, 2020) and 73 days (i.e. Apr 8, 2020), respectively which indicate the early extinction of COVID-19 (STable 1). The MVCCC of the four scenarios are larger than the baseline result with the largest value (1448) in Sce: 8 (Fig. 3A, STable 1). For the confirmed cases, these scenarios are similar as these of the baseline results (Fig. 4A, STable 1). Figure 3 Scenarios results of input population impacting on the cumuletive confirmed COVID-19 cases at four time points: (A) t1 = 10, (B) t1 = 20, (C) t1 = 28 and (D) t1 = 38 corresponding to Feb 6, 2020, Feb 16, 2020, Feb 24, 2020 and Mar 5, 2020. Figure 4 Scenarios results of input population impacting on the confirmed COVID-19 cases at four time points: (A) t1 = 10, (B) t1 = 20, (C) t1 = 28 and (D) t1 = 38 corresponding to Feb 6, 2020, Feb 16, 2020, Feb 24, 2020 and Mar 5, 2020. For Sce 3: (p 2, A) = (1.5p 2  * , A  *) and Sce 6: (p 2, A) = (2p 2  * , A  *), the increased percentage of the exposed individuals only impacted the number of the cumulative confirmed cases with the values of 1422 and 1447, and the corresponding DDE have only small changes with 105 days for Sce 3 and 107 days for Sce 6 (Fig. 3A, STable 1). For the confirmed cases, they have the very similar variations as the baseline result in the peak value and the peak value time (Fig. 4A, STable 1). For the other three time points t 1  = 20, t 1  = 28 and t 1  = 38, the differences of the scenarios results are similar as the these of t 1  = 10. Moreover, for each scenario, the changes in the input population have the nearly same impacts on the disease variations among the four time points which display that the same input population strategies at different time points have no significant difference on the disease. From the above analysis, it can be concluded that the increased numbers of the input population can mainly shorten the disease extinction days and the increased percentages of the exposed individuals of the input population increase the number of cumulative confirmed cases at a small percentage. Both the increased input population and the increased exposed individuals have no impacts on the peak values and peak value times of the confirmed cases. Effects of quarantine rates at different scenarios In this section, the effects of quarantine rates at six scenarios on the COVID-19 variations are displayed in Figure 5, Figure 6 . For the first time point t 1  = 10, Feb 6, 2020, Sce 1 (q 1, q 2) = (0q 1  * , 0q 2  *) has significantly negative impacts on the COVID-19 variations with the disease outbreak again which suggest the very high risks appear at the quarantine strategy of Sce 1 (Figure 5, Figure 6A). Specifically, the confirmed cases reaches its first peak value as the baseline result at Feb 10, 2020, and then the number is decreased close to 97 at Mar 14, 2020. A sharp increase is detected to the second peak value of the confirmed cases with the number of 1016704 at 165 days (Fig. 6A). The disease will become extinction after 361 days with the MVCCC dramatically reaching to more than 9 million (Figs. 5A and STable 2). Sce 2: (q 1, q 2) = (0q 1  * , 0.5q 2  *) and Sce 3: (q 1, q 2) = (0q 1  * , q 2  *) have the similar impacts on the disease variations with the largest cumulative confirmed values of 1444 at 110 days (i.e. May 15, 2020), and 1416 at 105 days (i.e. May 10, 2020). The DDE and MVCCC of Sce 4: (q 1, q 2) = (0.5q 1  * , 0.5q 2  *), Sce 5: (q 1, q 2) = (0.5q 1  * , q 2  *) and Sce 6: (q 1, q 2) = (q 1  * , 0.5q 2  *) are agreement with the baseline results (STable 2). These three scenarios have very weak influences on the confirmed case variations compared with the baseline result (Fig. 6A, STable 2). Figure 5 Scenarios results of quarantine rates impacting on the cumuletive confirmed COVID-19 cases at four time points: (A) t1 = 10, (B) t1 = 20, (C) t1 = 28 and (D) t1 = 38 corresponding to Feb 6, 2020, Feb 16, 2020, Feb 24, 2020 and Mar 5, 2020. Figure 6 Scenarios results of quarantine rates impacting on the confirmed COVID-19 cases at four time points: (A) t1 = 10, (B) t1 = 20, (C) t1 = 28 and (D) t1 = 38 corresponding to Feb 6, 2020, Feb 16, 2020, Feb 24, 2020 and Mar 5, 2020. For the other three time points, Sce 1: (q 1, q 2) = (0q 1  * , 0q 2  *) increased the MVCCC and prolonged the DDE with the values of 1430 at 123 days (i.e. May 28, 2020), 1416 at 115 days (i.e. May 20, 2020) and 1409 at 112 days (i.e. May 17, 2020) (STable 2). The disease variations of the other scenarios are agreement with the baseline results which indicates the weak impacts of these scenarios (Fig. 5A, STable 2). Moreover, we also explored that the second outbreak of the disease appears when both the values of q 1 and q 2 are nearly close to zero, such as (q 1, q 2) = (0.01q 1  * , 0.01q 2  *), (0q 1  * , 0.05q 2  *) at t 1  = 10, and (q 1, q 2) = (0q 1  * , 0q 2  *) at t 1  = 11 (Fig. 7 , STable 3). This suggests that no quarantine or very weak quarantine on the susceptible individuals and exposed individuals before the days of the peak values of the confirmed cases may lead to the disease outbreak again. Figure 7 Cumulative confirmed COVID-19 cases (A) and confirmed COVID-19 cases (B) at the scenarios of aspect 2 with (q1, q2) = (0.01q1 * , 0.01q2 *), (0q1 * , 0.05q2 *) at t1 = 10, and (q1, q2) = (0q1 * , 0q2 *) at t1 = 11, and the other parameters as the baseline values. Effects of both input population and quarantine rates at different scenarios The impact results of both the input population and quarantine rates on the COVID-19 disease are displayed in Fig. 8, 9 and STable 3. According to the results in “Effects of input population at different scenarios” and “Effects of quarantine rates at different scenarios” sections, the second outbreak of the disease are obtained in the scenarios with no or very weak quarantine strategy. Therefore, Figs. 8 and 9 only provide the COVID-19 disease variations of the scenarios with second outbreak, and the disease variations in other scenarios are not provided. STable 4 provides the results of all the scenarios. Figure 8 Scenarios results of both input population and quarantine rates impacting on the cumuletive confirmed COVID-19 cases at four time points: (A) t1 = 10, (B) t1 = 20, (C) t1 = 28 and (D) t1 = 38 corresponding to Feb 6, 2020, Feb 16, 2020, Feb 24, 2020 and Mar 5, 2020. Figure 9 Scenarios results of both input population and quarantine rates impacting on the cumuletive confirmed COVID-19 cases at four time points: (A) t1 = 10, (B) t1 = 20, (C) t1 = 28 and (D) t1 = 38 corresponding to Feb 6, 2020, Feb 16, 2020, Feb 24, 2020 and Mar 5,  2020. For time point t 1  = 10, Sce 1: (p 2, A, q 1, q 2) = (1.5p 2  * , 1.5A  * , 0q 1  * , 0q 2  *), Sce 2: (p 2, A, q 1, q 2) = (1.5p 2  * , 2A  * , 0q 1  * , 0q 2  *), Sce 7: (p 2, A, q 1, q 2) = (2p 2  * , 1.5A  * , 0q 1  * , 0q 2  *) and Sce 8: (p 2, A, q 1, q 2) = (2p 2  * , 2A  * , 0q 1  * , 0q 2  *) have the MVCCC larger than 10 million at 328, 313, 327 and 312 days (Fig. 8A, STable 3). In fact, they have the two outbreaks of the disease with the confirmed cases having the first peak value as the baseline result at Feb 10, 2020 and the second peak values larger than 1 million at 142 days, 132 days, 141 days and 130 days for Sce1, Sce 2, Sce 7 and Sce 8, respectively (Fig. 9A, STable 3). The magnified figure in the period of Jan 27, 2020-Apr 26, 2020 clearly displays the second outbreak of this disease (Fig. 9A). Moreover, the weak changes of the four scenarios in the quarantine rates or around the time point t 1  = 10, the second outbreak also resulted in the second outbreak of the disease. If the control measures employed as the four scenarios after the other three time points t 1  = 20, t 1  = 28, and t 1  = 38, the MVCCC are rapidly decreased with still larger than the baseline results, and the DDE are prolonged except the Sce 2 and Sce 8 of t 1  = 28, and t 1  = 38 (STable 4). For the other scenarios: Sce 3-Sce 6 and Sce 9-Sce 12 of the four time points, the DDE become smaller than the baseline result due to the larger input population and more exposed individuals. Moreover, the weaker quarantine rates together with the more input population resulted in the more infected individuals and increased the MVCCC (STable 4).