Id |
Subject |
Object |
Predicate |
Lexical cue |
T44 |
0-20 |
Sentence |
denotes |
Statistical analyses |
T45 |
21-238 |
Sentence |
denotes |
The monthly pattern for the cumulative number of COVID-19 cases in each country/region was visualized in relation to the geography, biome type, and climate (mean temperature and annual precipitation) of that location. |
T46 |
239-476 |
Sentence |
denotes |
In addition, the pattern of increasing COVID-19 case numbers was evaluated based on country type, with individual countries being classified into four types defined by the number of COVID-19 cases per week and the date of outbreak onset. |
T47 |
477-885 |
Sentence |
denotes |
To ensure the robustness of our results, we investigated the relationship between various environmental variables (climate, host susceptibility to COVID-19, international human mobility, and socioeconomic factors) and the number of COVID-19 cases (per 1 million population) using the two different approaches: conventional multiple linear regression and random forest, which is a machine-learning model [15]. |
T48 |
886-1033 |
Sentence |
denotes |
We separately modeled the cumulative number of COVID-19 cases (per 1 million population) in successive periods from December 2019 to June 30, 2020. |
T49 |
1034-1632 |
Sentence |
denotes |
In the multiple regression analysis, we set the log-scaled cumulative number of COVID-19 cases within a period as the response variable and the climatic factors (mean temperature, squared mean temperature, and log-scaled monthly precipitation), socioeconomic conditions (log-scaled population density and GDP per person), international human mobility (the relative amount of foreign visitors per population) and region-specific COVID-19 susceptibility (the percentage of people aged ≥ 65 years, the log-scaled relative incidence of malaria, and the BCG vaccination effect) as explanatory variables. |
T50 |
1633-1896 |
Sentence |
denotes |
To control for country/region-specific observation biases, we included the length of time (measured in days) since the first confirmed COVID-19 case in each country/region and the number of COVID-19 tests conducted (as a measure of sampling effort) as covariates. |
T51 |
1897-2182 |
Sentence |
denotes |
In addition, we applied the trend surface method to take spatial autocorrelation into account as a covariate; we added the first eigenvector of the geo-distance matrix among the countries or regions, which was computed using the geocoordinates of the largest city, as a covariate [16]. |
T52 |
2183-2282 |
Sentence |
denotes |
The explanatory power of the model was evaluated by the adjusted coefficient of determination (R2). |
T53 |
2283-2526 |
Sentence |
denotes |
We also calculated the relative importance of each explanatory variable in a regression model according to its partial coefficient of determination and determined the predominant variables that explained the variance in the response variables. |
T54 |
2527-2609 |
Sentence |
denotes |
The statistical significance of each variable was determined by conducting F-test. |
T55 |
2610-2725 |
Sentence |
denotes |
All the explanatory variables were standardized to have a mean of zero and a variance of one before these analyses. |
T56 |
2726-2819 |
Sentence |
denotes |
The explanatory factors of the regression model were compared between the four country types. |
T57 |
2820-2939 |
Sentence |
denotes |
In the random forest model, we used the same set of response and explanatory variables, as well as the same covariates. |
T58 |
2940-3019 |
Sentence |
denotes |
In each run of the random forest analysis, we generated 1,000 regression trees. |
T59 |
3020-3109 |
Sentence |
denotes |
The model performance was evaluated by the proportion of variance explained by the model. |
T60 |
3110-3257 |
Sentence |
denotes |
We evaluated the relative importance of each explanatory variable based on the increase in the mean squared error when the variable was permutated. |
T61 |
3258-3393 |
Sentence |
denotes |
Before these analyses, we tested the collinearity between the explanatory variables by calculating the variance inflation factor (VIF). |
T62 |
3394-3549 |
Sentence |
denotes |
For the study period, the largest VIF value was 8.56, and the VIF at June 30, 2020 was 8.56, indicating the absence of multicollinearity in the regression. |
T63 |
3550-3790 |
Sentence |
denotes |
To confirm the testing effort bias on the number of confirmed cases, we conducted an additional analysis that accounted for the number of conducted tests (i.e., sampling efforts) in individual countries/regions, as a covariate in the model. |
T64 |
3791-3979 |
Sentence |
denotes |
Note that this analysis was applied to the data from 128/828 countries/regions, because testing data for many countries is currently unavailable (https://ourworldindata.org/covid-testing). |
T65 |
3980-4192 |
Sentence |
denotes |
All analyses were performed with the R environment for statistical computing [17]; the ‘sf’ package was used for graphics artworks [18] and the ‘randomForest’ package was used for the random forest analysis [19]. |