CORD-19:041bae0a6de2b69979d39460b3f2ee8946534ec2 8 Projects
Estimation of the final size of the second phase of the coronavirus COVID 19 epidemic by the logistic model
Abstract
In the note, the logistic growth regression model is used for the estimation of the final size and its peak time of the coronavirus epidemic in China, South Korea, and the rest of the World.
In the previous article [1], we try to estimate the final size of the epidemic for the whole World using the logistic model and SIR model. The estimation was about 83000 cases.
Both models show that the outbreak is moderating; however, new data showed a linear upward trend. It turns out that the epidemy in China was slowing but is begin to spread elsewhere in the World.
In this note, we will give forecasting epidemic size for China, South Korea, and the rest of the World and daily predictions using the logistic model. Full daily reports and counties outside of China generated are available at https://www.researchgate.net/publication/339912313_Forecasting_of_final_COVID -19_epidemic_size
The MATLAB program fitVirus used for calculations is freely available from https://www.mathworks.com/matlabcentral/fileexchange/74411-fitvirus We note that logistic models give similar results as the SIR model (at least for the case of China and South Korea). However, the logistic model is given by explicit formula and is thus much simpler for regression analysis than the SIR model, where one must on each optimization step solve a system of ordinary differential equations. (One may,
however, use approximate solution and thus obtain four-parameter problem which can be very sensitive to initial guess). Yet, the logistics model has its drawbacks as the epidemic approaches its final stage: the actual number of cases may be slightly larger than that predicted by the logistics model. If the actual number of cases begins to exceed the predicted end-state systematically, then a second phase of the epidemic is likely to occur, and the model will no longer be applicable.
In mathematical epidemiology, when one uses a phenomenological approach, the epidemic dynamics can be described by the following variant of logistic growth model [2] [3] [4] [5]
where C is an accumulated number of cases, 0 r infection rate, and 0 K is the final epidemic size. If 0 0 0 C C is the initial number of cases then the solution of (1) is
When t the number of cases follows the Weibull function
The growth rate dC dt reaches its maximum when
. From this condition, we obtain that the growth rate peak occurs in time time . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.11.20024901 doi: medRxiv preprint
At this time the number of cases is
and the growth rate is
To answer the question about doubling time t , i.e., the time takes to double the
The first term represents initial exponential growth, then t increases with t. When . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.11.20024901 doi: medRxiv preprint Now, if 1 2 , , , n C C C are the number of cases at times 1 2 , , , n t t t , then the final size predictions of the epidemic based on these data are 1 2 , , , n K K K
. When convergence is achieved, then one may try to predict the final epidemic size by iterated Shanks
There is no natural law or process behind this transformation; therefore, it must be used with some care. In particular, the calculated limit is useless if n K C , i.e., it is below the current data.
The logistic model (2) contains three parameters: K, r, and A, which should be determined by regression analysis. Because the model is nonlinear, some care should be taken for initial guess. First of all, in the early stage, the logistic curve follows an exponential growth curve (3) , so the estimation of K is practically impossible. With enough data, the initial guess can be obtain in the following way. Expressing t from (2) and use three equidistant data point yield the following system of three equations:
This system has a solution [7]
The solution is acceptable when all the unknowns are positive.
Formulas (10),(11),(12) are used to calculate the initial approximation in the fitVirus program. For practical calculation, we take the first, the middle, and the last data point. If this calculation fails, we consider regression analysis as questionable. Final . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . https://doi.org/10.1101/2020.03.11.20024901 doi: medRxiv preprint values of the parameters K, r, and A are then calculated by least-square fit using the MATLAB functions lsqcurvefit and fitnlm.
Before we proceed, we for convenience, introduce the following epidemy phases (see Fig 2) :
The duration of the fast-growing period is thus equal to 4 r
. We note that the names of the phases are not standard, and are arbitrarily chosen.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . https://doi.org/10.1101/2020.03.11.20024901 doi: medRxiv preprint
On the base of available data, one can predict that the final size of coronavirus epidemy in China using the logistic model will be approximately 81 000 ± 500 cases (Table 1) and that the peak of the epidemic was on 8 Feb 2020 (Table 2 ). It seems that the epidemic in China is in the ending stage (Fig 3, Fig 4) .
The short-term forecasting is given in Table 3 where we see that the discrepancy of actual and forecasted number of cases is within 2%. However, actual and predicted daily new cases are scattered and vary between 13% to 300%. On 7 Mar 2020, the actual number of cases was 80695, and the daily number of cases was 44. Prediction in Table 3 is cumulative 80588 cases and 39 daily cases. The errors are 0.1% and 11%, respectively. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . https://doi.org/10.1101/2020.03.11.20024901 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . https://doi.org/10.1101/2020.03.11.20024901 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . https://doi.org/10.1101/2020.03.11.20024901 doi: medRxiv preprint On the base of available data, one can predict that the final size of coronavirus epidemy in of South Korea using the logistic model will be approximately 8050 ±70 cases ( Fig 5, Table 4 ) and that the peak of the epidemic was on 1 Mar 2020. The epidemic in South Korea appears to be in the steady-state transition phase. These figures were already predicted on 4. Mar 2020 (Table 5) , i.e., the prediction was approximately 7500
to 8500 cases and that the peak will be around 2 Mar.
On 7 Mar 2020, the actual number of cases was 7134, and the daily number of cases was 367. Prediction in Table 5 is cumulative 6572 cases and 259 daily cases. The errors are 8 % and 30%, respectively. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . https://doi.org/10.1101/2020.03.11.20024901 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . https://doi.org/10.1101/2020.03.11.20024901 doi: medRxiv preprint
The comparison of the predicted final sizes is shown in the graph in Figure 6 . Based on the data from 11. Mar 2020, a very rough estimate indicates that the number of cases will be about 90000 (Fig 7) ; data from 13. Mar 2020 rise this number to 380 000.
However, it is an early-stage epidemic, so these estimates are very questionable and will be changed with new data. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.11.20024901 doi: medRxiv preprint
On the base of available data, one can predict that the final size of coronavirus epidemy in China will be around 81 000 cases. For South Korea, a prediction is about 8000 cases. For the rest of the World, the forecasts are still very unreliable in is now approximately 380 000 cases.
We emphasize that the logistics model is a phenomenological, data-driven model. Thus, its forecasts are as reliable as useful are data and as good as the model can capture the dynamics of the epidemic. As daily epidemic size forecasts begin to converge, it can be said that the outbreak is under control. However, any systematic deviation from the forecast curve may indicate that the epidemic is escaping control. An example is China.
By 25. Feb., the data follows a logistic curve and then begins to deviate from it. We now know that this was the beginning of the second stage of the epidemic, which is now spreading around the World. A similar linear trend can now be observed for South Korea ( Fig 5) ; we hope this does not mark the beginning of the second stage of the epidemic in this country.
|
Annnotations
- Denotations: 0
- Blocks: 0
- Relations: 0