Results Our sample starts from January 19, when the first COVID-19 case was reported outside Wuhan. The sample spans 6 weeks in total and ends on February 29. We divide the whole sample into two sub-samples (January 19 to February 1, and February 2 to February 29) and estimate the model using the whole sample and two sub-samples, respectively. In the first 2 weeks, COVID-19 infections quickly spread throughout China with every province reporting at least one confirmed case, and the number of cases also increased at an increasing speed (Fig. 2). It is also during these 2 weeks that the Chinese government took actions swiftly to curtail the virus transmission. On January 20, COVID-19 was classified as a class B statutory infectious disease and treated as a class A statutory infectious disease. The city of Wuhan was placed under lockdown on January 23; roads were closed, and residents were not allowed to leave the city. Many other cities also imposed public policies ranging from canceling public events and stopping public transportation to limiting how often residents could leave home. By comparing the dynamics of virus transmissions in these two sub-samples, we can infer the effectiveness of these public health measures. In this section, we will mostly rely on model A to interpret the results, which estimates the effects of the average number of new cases in the preceding first and second week, respectively, and therefore enables us to examine the transmission dynamics at different time lags. As a robustness check, we also consider a simpler lag structure to describe the transmission dynamics. In model B, we estimate the effects of the average number of new cases in the past 14 days instead of using two separate lag variables. Within-city transmission Table 3 reports the estimation results of the OLS and IV regressions of Eq. 2, in which only within-city transmission is considered. After controlling for time-invariant city fixed effects and time effects that are common to all cities, on average, one new infection leads to 1.142 more cases in the next week, but 0.824 fewer cases 1 week later. The negative effect can be attributed to the fact that both local authorities and residents would have taken more protective measures in response to a higher perceived risk of contracting the virus given more time. Information disclosure on newly confirmed cases at the daily level by official media and information dissemination on social media throughout China may have promoted more timely actions by the public, resulting in slower virus transmissions. We then compare the transmission rates in different time windows. In the first sub-sample, one new infection leads to 2.135 more cases within a week, implying a fast growth in the number of cases. However, in the second sub-sample, the effect decreases to 1.077, suggesting that public health measures imposed in late January were effective in limiting a further spread of the virus. Similar patterns are also observed in model B. Table 3 Within-city transmission of COVID-19 Jan 19–Feb 29 Jan 19–Feb 1 Feb 2–Feb 29 (1) (2) (3) (4) (5) (6) OLS IV OLS IV OLS IV All cities excluding Wuhan Model A: lagged variables are averages over the preceding first and second week separately Average # of new cases 0.873*** 1.142*** 1.692*** 2.135*** 0.768*** 1.077*** 1-week lag (0.00949) (0.0345) (0.0312) (0.0549) (0.0120) (0.0203) Average # of new cases − 0.415*** − 0.824*** 0.860 − 6.050*** − 0.408*** − 0.796*** 2-week lag (0.00993) (0.0432) (2.131) (2.314) (0.00695) (0.0546) Model B: lagged variables are averages over the preceding 2 weeks Average # of new case 0.474*** 0.720*** 3.310*** 3.860*** 0.494*** 1.284*** Previous 14 days (0.0327) (0.143) (0.223) (0.114) (0.00859) (0.107) Observations 12,768 12,768 4256 4256 8512 8512 Number of cities 304 304 304 304 304 304 Weather controls Yes Yes Yes Yes Yes Yes City FE Yes Yes Yes Yes Yes Yes Date FE Yes Yes Yes Yes Yes Yes All cities excluding cities in Hubei Province Model A: lagged variables are averages over the preceding first and second week separately Average # of new cases 0.725*** 1.113*** 1.050*** 1.483*** 0.620*** 0.903*** 1-week lag (0.141) (0.0802) (0.0828) (0.205) (0.166) (0.0349) Average # of new cases − 0.394*** − 0.572*** 0.108 − 3.664 − 0.228*** − 0.341*** 2-week lag (0.0628) (0.107) (0.675) (2.481) (0.0456) (0.121) Model B: lagged variables are averages over the preceding 2 weeks Average # of new cases 0.357*** 0.631*** 1.899*** 2.376*** 0.493*** 0.745*** Previous 14 days (0.0479) (0.208) (0.250) (0.346) (0.122) (0.147) Observations 12,096 12,096 4032 4032 8064 8064 Number of cities 288 288 288 288 288 288 Weather controls Yes Yes Yes Yes Yes Yes City FE Yes Yes Yes Yes Yes Yes Date FE Yes Yes Yes Yes Yes Yes The dependent variable is the number of daily new cases. The endogenous explanatory variables include the average numbers of new confirmed cases in the own city in the preceding first and second weeks (model A) and the average number in the preceding 14 days (model B). Weekly averages of daily maximum temperature, precipitation, wind speed, the interaction between precipitation and wind speed, and the inverse log distance weighted sum of each of these variables in other cities, during the preceding third and fourth weeks, are used as instrumental variables in the IV regressions. Weather controls include contemporaneous weather variables in the preceding first and second weeks. Standard errors in parentheses are clustered by provinces. *** p < 0.01, ** p < 0.05, * p < 0.1 Many cases were also reported in other cities in Hubei province apart from Wuhan, where six of them reported over 1000 cumulative cases by February 1513. Their overstretched health care system exacerbates the concern over delayed reporting of confirmed cases in these cities. To mitigate the effect of such potential measurement errors on our estimates, we re-estimate (2) excluding all cities in Hubei province. The bottom panel of Table 3 reports these estimates. Comparing the IV estimates in columns (4) and (6) between the upper and lower panels, we find that the transmission rates are lower in cities outside Hubei. In the January 19–February 1 sub-sample, one new case leads to 1.483 more cases in the following week, and this is reduced to 0.903 in the February 2–February 29 sub-sample. We also find a similar pattern when comparing the estimates from model B. Between-city transmission People may contract the virus from interaction with the infected people who live in the same city or other cities. In Eq. 1, we consider the effects of the number of new infections in other cities and in the epicenter of the epidemic (Wuhan), respectively, using inverse log distance as weights. In addition, geographic proximity may not fully describe the level of social interactions between residents in Wuhan and other cities since the lockdown in Wuhan on January 23 significantly reduced the population flow from Wuhan to other cities. To alleviate this concern, we also use a measure of the size of population flow from Wuhan to a destination city, which is constructed by multiplying the daily migration index on the population flow out of Wuhan (Fig 3) with the share of the flow that a destination city receives provided by Baidu (Fig. 4). For days before January 25, we use the average destination shares between January 10 and January 24. For days on or after January 24, we use the average destination shares between January 25 and February 2314. Table 4 reports the estimates from IV regressions of Eq. 1, and Table 5 reports the results from the same regressions excluding Hubei province. Column (4) of Table 4 indicates that in the first sub-sample, one new case leads to 2.456 more cases within 1 week, and the effect is not statistically significant between 1 and 2 weeks. Column (6) suggests that in the second sub-sample, one new case leads to 1.127 more cases within 1 week, and the effect is not statistically significant between 1 and 2 weeks. The comparison of the coefficients on own city between different sub-samples indicates that the responses of the government and the public have effectively decreased the risk of additional infections. Comparing Table 4 with Table 3, we find that although the number of new cases in the preceding second week turns insignificant and smaller in magnitude, coefficients on the number of new cases in the preceding first week are not sensitive to the inclusion of terms on between-city transmissions. Table 4 Within- and between-city rransmission of COVID-19 Jan 19–Feb 29 Jan 19–Feb 1 Feb 2–Feb 29 (1) (2) (3) (4) (5) (6) OLS IV OLS IV OLS IV Model A: lagged variables are averages over the preceding first and second week separately Average # of new cases, 1-week lag Own city 0.862*** 1.387*** 0.939*** 2.456*** 0.786*** 1.127*** (0.0123) (0.122) (0.102) (0.638) (0.0196) (0.0686) Other cities 0.00266 − 0.0248 0.0889 0.0412 − 0.00316 − 0.0212 wt. = inv. dist. (0.00172) (0.0208) (0.0714) (0.0787) (0.00227) (0.0137) Wuhan − 0.0141 0.0303 − 0.879 − 0.957 − 0.00788 0.0236 wt. = inv. dist. (0.0115) (0.0318) (0.745) (0.955) (0.00782) (0.0200) Wuhan 3.74e-05 0.00151*** 0.00462*** 0.00471*** − 0.00211*** − 0.00238** wt. = pop. flow (0.000163) (0.000391) (0.000326) (0.000696) (4.01e-05) (0.00113) Average # of new cases, 2-week lag Own city − 0.425*** − 0.795*** 2.558 − 1.633 − 0.205*** − 0.171 (0.0318) (0.0643) (2.350) (2.951) (0.0491) (0.224) Other cities − 0.00451** − 0.00766 − 0.361 − 0.0404 − 0.00912** − 0.0230 wt. = inv. dist. (0.00213) (0.00814) (0.371) (0.496) (0.00426) (0.0194) Wuhan − 0.0410* 0.0438 3.053 3.031 − 0.0603 − 0.00725 wt. = inv. dist. (0.0240) (0.0286) (2.834) (3.559) (0.0384) (0.0137) Wuhan 0.00261*** 0.00333*** 0.00711*** − 0.00632 0.00167** 0.00368*** wt. = pop. flow (0.000290) (0.000165) (0.00213) (0.00741) (0.000626) (0.000576) Model B: lagged variables are averages over the preceding 2 weeks Own city 0.425*** 1.195*** 1.564*** 2.992*** 0.615*** 1.243*** (0.0771) (0.160) (0.174) (0.892) (0.0544) (0.115) Other cities − 0.00901 − 0.0958** 0.0414 0.0704 − 0.0286*** − 0.0821*** wt. = inv. dist. (0.00641) (0.0428) (0.0305) (0.0523) (0.0101) (0.0246) Wuhan − 0.198* − 0.0687** − 0.309 − 0.608 − 0.234* − 0.144 wt. = inv. dist. (0.104) (0.0268) (0.251) (0.460) (0.121) (0.0994) Wuhan 0.00770*** 0.00487*** 0.00779*** 0.00316 0.00829*** 0.00772*** wt. = pop. flow (0.000121) (0.000706) (0.000518) (0.00276) (0.000367) (0.000517) Observations 12,768 12,768 4256 4256 8512 8512 Number of cities 304 304 304 304 304 304 Weather controls Yes Yes Yes Yes Yes Yes City FE Yes Yes Yes Yes Yes Yes Date FE Yes Yes Yes Yes Yes Yes The dependent variable is the number of daily new cases. The endogenous explanatory variables include the average numbers of new confirmed cases in the own city and nearby cities in the preceding first and second weeks (model A) and averages in the preceding 14 days (model B). Weekly averages of daily maximum temperature, precipitation, wind speed, the interaction between precipitation and wind speed, and the inverse log distance weighted sum of these variables in other cities, during the preceding third and fourth weeks, are used as instrumental variables in the IV regressions. Weather controls include contemporaneous weather variables in the preceding first and second weeks. Standard errors in parentheses are clustered by provinces. *** p < 0.01, ** p < 0.05, * p < 0.1 Table 5 Within- and between-city transmission of COVID-19, excluding cities in Hubei Province Jan 19–Feb 29 Jan 19–Feb 1 Feb 2–Feb 29 (1) (2) (3) (4) (5) (6) OLS IV OLS IV OLS IV Model A: lagged variables are averages over the preceding first and second week separately Average # of new cases, 1-week lag Own city 0.656*** 1.117*** 0.792*** 1.194*** 0.567*** 0.899*** (0.153) (0.112) (0.0862) (0.302) (0.172) (0.0924) Other cities 0.00114 − 0.00213 − 0.0160 − 0.0734 0.000221 − 0.00526** wt. = inv. dist. (0.000741) (0.00367) (0.0212) (0.0803) (0.000626) (0.00244) Wuhan − 0.000482 0.00420 0.104 0.233 5.89e-05 0.00769** wt. = inv. dist. (0.00173) (0.00649) (0.128) (0.156) (0.00194) (0.00379) Wuhan 0.00668*** 0.00616*** 0.00641*** 0.00375 − 0.000251 0.00390 wt. = pop. flow (0.00159) (0.00194) (0.00202) (0.00256) (0.00245) (0.00393) Average # of new cases, 2-week lag Own city − 0.350*** − 0.580*** 0.230 − 1.541 − 0.157** − 0.250** (0.0667) (0.109) (0.572) (1.448) (0.0636) (0.119) Other cities − 0.000869 0.00139 0.172 0.584 − 0.00266* − 0.00399 wt. = inv. dist. (0.00102) (0.00311) (0.122) (0.595) (0.00154) (0.00276) Wuhan − 0.00461 0.000894 − 0.447 − 0.970 − 0.00456 0.00478* wt. = inv. dist. (0.00304) (0.00592) (0.829) (0.808) (0.00368) (0.00280) Wuhan 0.00803*** 0.00203 0.00973*** 0.00734 0.00759*** 0.00466*** wt. = pop. flow (0.00201) (0.00192) (0.00317) (0.00680) (0.00177) (0.00140) Model B: lagged variables are averages over the preceding 2 weeks Own city 0.242*** 0.654*** 1.407*** 1.876*** 0.406*** 0.614*** (0.0535) (0.195) (0.215) (0.376) (0.118) (0.129) Other cities 0.000309 − 0.00315 0.00608 0.0194 − 0.00224 − 0.00568 wt. = inv. dist. (0.00142) (0.00745) (0.0188) (0.0300) (0.00204) (0.00529) Wuhan − 0.0133** − 0.0167 − 0.0146 − 0.0362 − 0.0138** − 0.00847 wt. = inv. dist. (0.00535) (0.0140) (0.0902) (0.0741) (0.00563) (0.00787) Wuhan 0.0153*** 0.0133*** 0.00826*** 0.00404 0.0132*** 0.0123*** wt. = pop. flow (0.00273) (0.00273) (0.00241) (0.00423) (0.00222) (0.00205) Observations 12,096 12,096 4032 4032 8064 8064 Number of cities 288 288 288 288 288 288 Weather controls Yes Yes Yes Yes Yes Yes City FE Yes Yes Yes Yes Yes Yes Date FE Yes Yes Yes Yes Yes Yes The dependent variable is the number of daily new cases. The endogenous explanatory variables include the average numbers of new confirmed cases in the own city and nearby cities in the preceding first and second weeks (model A) and averages in the preceding 14 days (model B). Weekly averages of daily maximum temperature, precipitation, wind speed, the interaction between precipitation and wind speed, and the inverse log distance weighted sum of these variables in other cities, during the preceding third and fourth weeks, are used as instrumental variables in the IV regressions. Weather controls include contemporaneous weather variables in the preceding first and second weeks. Standard errors in parentheses are clustered by provinces. *** p < 0.01, ** p < 0.05, * p < 0.1 As a robustness test, Table 5 reports the estimation results excluding the cities in Hubei province. Column (4) of Table 5 indicates that in the first sub-sample, one new case leads to 1.194 more cases within a week, while in the second sub-sample, one new case only leads to 0.899 more cases within a week. Besides, in the second subsample, one new case results in 0.250 fewer new infections between 1 and 2 weeks, which is larger in magnitude and more significant than the estimate (− 0.171) when cities in Hubei province are included for estimation (column (6) of Table 4). The time varying patterns in local transmissions are evident using the rolling window analysis (Fig. 5). The upper left panel displays the estimated coefficients on local transmissions for various 14-day sub-samples with the starting date labelled on the horizontal axis. After a slight increase in the local transmission rates, one case generally leads to fewer and fewer additional cases a few days after January 19. Besides, the transmission rate displays a slight increase beginning around February 4, which corresponds to the return travels and work resumption after Chinese Spring Festival, but eventually decreases at around February 12. Such decrease may be partly attributed to the social distancing strategies at the city level, so we examine the impacts of relevant policies in Section 5. Moreover, the transmission rates in cities outside Hubei province have been kept at low levels throughout the whole sample period (columns (4) and (6) of Table 5). These results suggest that the policies adopted at the national and provincial levels soon after January 19 prevented cities outside Hubei from becoming new hotspots of infections. Overall, the spread of the virus has been effectively contained by mid February, particularly for cities outside Hubei province. Fig. 5 Rolling window analysis of within- and between-city transmission of COVID-19. This figure shows the estimated coefficients and 95% CIs from the instrumental variable regressions. The specification is the same as the IV regression models in Table 4. Each estimation sample contains 14 days with the starting date indicated on the horizontal axis In the epidemiology literature, the estimates on the basic reproduction number of COVID-19 are approximately within the wide range of 1.4∼6.5 (Liu et al. 2020). Its value depends on the estimation method used, underlying assumptions of modeling, time period covered, geographic regions (with varying preparedness of health care systems), and factors considered in the models that affect disease transmissions (such as the behavior of the susceptible and infected population). Intuitively, it can be interpreted as measuring the expected number of new cases that are generated by one existing case. It is of interest to note that our estimates are within this range. Based on the results from model B in Tables 4 and 5, one case leads to 2.992 more cases in the same city in the next 14 days (1.876 if cities in Hubei province are excluded). In the second sub-sample (February 2–February 29), these numbers are reduced to 1.243 and 0.614, respectively, suggesting that factors such as public health measures and people’s behavior may play an important role in containing the transmission of COVID-19. While our basic reproduction number estimate (R0) is within the range of estimates in the literature and is close to its median, five features may distinguish our estimates from some of the existing epidemiological estimates. First, our instrumental variable approach helps isolate the causal effect of virus transmissions from other confounded factors; second, our estimate is based on an extended time period of the COVID-19 pandemic (until the end of February 2020) that may mitigate potential biases in the literature that relies on a shorter sampling period within 1–28 January 2020; third, our modeling makes minimum assumptions of virus transmissions, such as imposing fewer restrictions on the relationship between the unobserved determinants of new cases and the number of cases in the past; fourth, our model simultaneously considers comprehensive factors that may affect virus transmissions, including multiple policy instruments (such as closed management of communities and shelter-at-home order), population flow, within- and between-city transmissions, economic and demographic conditions, weather patterns, and preparedness of health care system. Fifth, our study uses spatially disaggregated data that cover China (except its Hubei province), while some other studies examine Wuhan city, Hubei province, China as a whole, or overseas. Regarding the between-city transmission from Wuhan, we observe that the population flow better explains the contagion effect than geographic proximity (Table 4). In the first sub-sample, one new case in Wuhan leads to more cases in other cities receiving more population flows from Wuhan within 1 week. Interestingly, in the second sub-sample, population flow from Wuhan significantly decreases the transmission rate within 1 week, suggesting that people have been taking more cautious measures from high COVID-19 risk areas; however, more arrivals from Wuhan in the preceding second week can still be a risk. A back of the envelope calculation indicates that one new case in Wuhan leads to 0.064 (0.050) more cases in the destination city per 10,000 travelers from Wuhan within 1 (2) week between January 19 and February 1 (February 2 and February 29)15. Note that while the effect is statistically significant, it should be interpreted in context. It was estimated that 15,000,000 people would travel out of Wuhan during the Lunar New Year holiday16. If all had gone to one city, this would have directly generated about 171 cases within 2 weeks. The risk of infection is likely very low for most travelers except for few who have previous contacts with sources of infection, and person-specific history of past contacts may be an essential predictor for infection risk, in addition to the total number of population flows17. A city may also be affected by infections in nearby cities apart from spillovers from Wuhan. We find that the coefficients that represent the infectious effects from nearby cities are generally small and not statistically significant (Table 4), implying that few cities outside Wuhan are themselves exporting infections. This is consistent with the findings in the World Health Organization (2020b) that other than cases that are imported from Hubei, additional human-to-human transmissions are limited for cities outside Hubei. Restricting to cities outside Hubei province, the results are similar (Table 5), except that the transmission from Wuhan is not significant in the first half sample. Social and economic mediating factors We also investigate the mediating impacts of some socioeconomic and environmental characteristics on the transmission rates (3). To ease the comparison between different moderators, we consider the mediating impacts on the influence of the average number of new cases in the past 2 weeks. Regarding own-city transmissions, we examine the mediating effects of population density, GDP per capita, number of doctors, and average temperature, wind speed, precipitation, and a dummy variable of adverse weather conditions. Regarding between-city transmissions, we consider the mediating effects of distance, difference in population density, and difference in GDP per capita since cities that are similar in density or economic development level may be more closely linked. We also include a measure of population flows from Wuhan. Table 6 reports the estimation results of the IV regressions. To ease the comparison across various moderators, for the mediating variables of within-city transmissions that are significant at 10%, we compute the changes in the variables so that the effect of new confirmed infections in the past 14 days on current new confirmed cases is reduced by 1 (columns (2) and (4)). Table 6 Social and economic factors mediating the transmission of COVID-19 (1) (2) (3) (4) Jan 19–Feb 1 Feb 2–Feb 29 IV Coeff. IV Coeff. Average # of new cases, previous 14 days Own city − 0.251 0.672*** (0.977) (0.219) × population density 0.000164 − 0.000202** + 495 per km2 (0.000171) (8.91e-05) × per capita GDP 0.150*** − 66, 667 RMB 0.0102 (0.0422) (0.0196) × # of doctors − 0.108* + 92, 593 0.0179 (0.0622) (0.0236) × temperature 0.0849* − 11.78∘C − 0.00945 (0.0438) (0.0126) × wind speed − 0.109 0.128 (0.131) (0.114) × precipitation 0.965* − 1.04 mm 0.433* − 2.31 mm (0.555) (0.229) × adverse weather 0.0846 − 0.614*** + 163% (0.801) (0.208) Other cities 0.0356 − 0.00429 wt. = inv. distance (0.0375) (0.00343) Other cities 0.00222 0.000192 wt. = inv. density ratio (0.00147) (0.000891) Other cities 0.00232 0.00107 wt. = inv. per capita GDP ratio (0.00497) (0.00165) Wuhan − 0.165 − 0.00377 wt. = inv. distance (0.150) (0.00981) Wuhan − 0.00336 − 0.000849 wt. = inv. density ratio (0.00435) (0.00111) Wuhan − 0.440 − 0.0696 wt. = inv. per capita GDP ratio (0.318) (0.0699) Wuhan 0.00729*** 0.0125*** wt. = population flow (0.00202) (0.00187) Observations 4032 8064 Number of cities 288 288 Weather controls Yes Yes City FE Yes Yes Date FE Yes Yes The dependent variable is the number of daily new confirmed cases. The sample excludes cities in Hubei province. Columns (2) and (4) report the changes in the mediating variables that are needed to reduce the impact of new confirmed cases in the preceding 2 weeks by 1, using estimates with significance levels of at least 0.1 in columns (1) and (3), respectively. The endogenous variables include the average numbers of new cases in the own city and nearby cities in the preceding 14 days and their interactions with the mediating variables. Weekly averages of daily maximum temperature, precipitation, wind speed, the interaction between precipitation and wind speed, and the inverse log distance weighted sum of these variables in neighboring cities, during the preceding third and fourth weeks, are used as instrumental variables in the IV regressions. Additional instrumental variables are constructed by interacting them with the mediating variables. Weather controls include these variables in the preceding first and second weeks. Standard errors in parentheses are clustered by provinces *** p < 0.01, ** p < 0.05, * p < 0.1 In the early phase of the epidemic (January 19 to February 1), cities with more medical resources, which are measured by the number of doctors, have lower transmission rates. One standard deviation increase in the number of doctors reduces the transmission rate by 0.12. Cities with higher GDP per capita have higher transmission rates, which can be ascribed to the increased social interactions as economic activities increase18. In the second sub-sample, these effects become insignificant probably because public health measures and inter-city resource sharing take effects. In fact, cities with higher population density have lower transmission rates in the second sub-sample. Regarding the environmental factors, we notice different significant mediating variables across the first and second sub-samples. The transmission rates are lower with adverse weather conditions, lower temperature, or less rain. Further research is needed to identify clear mechanisms. In addition, population flow from Wuhan still poses a risk of new infections for other cities even after we account for the above mediating effects on own-city transmission. This effect is robust to the inclusion of the proximity measures based on economic similarity and geographic proximity between Wuhan and other cities. Nevertheless, we do not find much evidence on between-city transmissions among cities other than Wuhan.