PubAnnotation

Id	Subject	Object	Predicate	Lexical cue
T44	0-20	Sentence	denotes	Statistical analyses
T45	21-238	Sentence	denotes	The monthly pattern for the cumulative number of COVID-19 cases in each country/region was visualized in relation to the geography, biome type, and climate (mean temperature and annual precipitation) of that location.
T46	239-476	Sentence	denotes	In addition, the pattern of increasing COVID-19 case numbers was evaluated based on country type, with individual countries being classified into four types defined by the number of COVID-19 cases per week and the date of outbreak onset.
T47	477-885	Sentence	denotes	To ensure the robustness of our results, we investigated the relationship between various environmental variables (climate, host susceptibility to COVID-19, international human mobility, and socioeconomic factors) and the number of COVID-19 cases (per 1 million population) using the two different approaches: conventional multiple linear regression and random forest, which is a machine-learning model [15].
T48	886-1033	Sentence	denotes	We separately modeled the cumulative number of COVID-19 cases (per 1 million population) in successive periods from December 2019 to June 30, 2020.
T49	1034-1632	Sentence	denotes	In the multiple regression analysis, we set the log-scaled cumulative number of COVID-19 cases within a period as the response variable and the climatic factors (mean temperature, squared mean temperature, and log-scaled monthly precipitation), socioeconomic conditions (log-scaled population density and GDP per person), international human mobility (the relative amount of foreign visitors per population) and region-specific COVID-19 susceptibility (the percentage of people aged ≥ 65 years, the log-scaled relative incidence of malaria, and the BCG vaccination effect) as explanatory variables.
T50	1633-1896	Sentence	denotes	To control for country/region-specific observation biases, we included the length of time (measured in days) since the first confirmed COVID-19 case in each country/region and the number of COVID-19 tests conducted (as a measure of sampling effort) as covariates.
T51	1897-2182	Sentence	denotes	In addition, we applied the trend surface method to take spatial autocorrelation into account as a covariate; we added the first eigenvector of the geo-distance matrix among the countries or regions, which was computed using the geocoordinates of the largest city, as a covariate [16].
T52	2183-2282	Sentence	denotes	The explanatory power of the model was evaluated by the adjusted coefficient of determination (R2).
T53	2283-2526	Sentence	denotes	We also calculated the relative importance of each explanatory variable in a regression model according to its partial coefficient of determination and determined the predominant variables that explained the variance in the response variables.
T54	2527-2609	Sentence	denotes	The statistical significance of each variable was determined by conducting F-test.
T55	2610-2725	Sentence	denotes	All the explanatory variables were standardized to have a mean of zero and a variance of one before these analyses.
T56	2726-2819	Sentence	denotes	The explanatory factors of the regression model were compared between the four country types.
T57	2820-2939	Sentence	denotes	In the random forest model, we used the same set of response and explanatory variables, as well as the same covariates.
T58	2940-3019	Sentence	denotes	In each run of the random forest analysis, we generated 1,000 regression trees.
T59	3020-3109	Sentence	denotes	The model performance was evaluated by the proportion of variance explained by the model.
T60	3110-3257	Sentence	denotes	We evaluated the relative importance of each explanatory variable based on the increase in the mean squared error when the variable was permutated.
T61	3258-3393	Sentence	denotes	Before these analyses, we tested the collinearity between the explanatory variables by calculating the variance inflation factor (VIF).
T62	3394-3549	Sentence	denotes	For the study period, the largest VIF value was 8.56, and the VIF at June 30, 2020 was 8.56, indicating the absence of multicollinearity in the regression.
T63	3550-3790	Sentence	denotes	To confirm the testing effort bias on the number of confirmed cases, we conducted an additional analysis that accounted for the number of conducted tests (i.e., sampling efforts) in individual countries/regions, as a covariate in the model.
T64	3791-3979	Sentence	denotes	Note that this analysis was applied to the data from 128/828 countries/regions, because testing data for many countries is currently unavailable (https://ourworldindata.org/covid-testing).
T65	3980-4192	Sentence	denotes	All analyses were performed with the R environment for statistical computing [17]; the ‘sf’ package was used for graphics artworks [18] and the ‘randomForest’ package was used for the random forest analysis [19].

T44

0-20

Sentence

denotes

Statistical analyses

T45

21-238

Sentence

denotes

The monthly pattern for the cumulative number of COVID-19 cases in each country/region was visualized in relation to the geography, biome type, and climate (mean temperature and annual precipitation) of that location.

T46

239-476

Sentence

denotes

In addition, the pattern of increasing COVID-19 case numbers was evaluated based on country type, with individual countries being classified into four types defined by the number of COVID-19 cases per week and the date of outbreak onset.

T47

477-885

Sentence

denotes

To ensure the robustness of our results, we investigated the relationship between various environmental variables (climate, host susceptibility to COVID-19, international human mobility, and socioeconomic factors) and the number of COVID-19 cases (per 1 million population) using the two different approaches: conventional multiple linear regression and random forest, which is a machine-learning model [15].

T48

886-1033

Sentence

denotes

We separately modeled the cumulative number of COVID-19 cases (per 1 million population) in successive periods from December 2019 to June 30, 2020.

T49

1034-1632

Sentence

denotes

In the multiple regression analysis, we set the log-scaled cumulative number of COVID-19 cases within a period as the response variable and the climatic factors (mean temperature, squared mean temperature, and log-scaled monthly precipitation), socioeconomic conditions (log-scaled population density and GDP per person), international human mobility (the relative amount of foreign visitors per population) and region-specific COVID-19 susceptibility (the percentage of people aged ≥ 65 years, the log-scaled relative incidence of malaria, and the BCG vaccination effect) as explanatory variables.

T50

1633-1896

Sentence

denotes

To control for country/region-specific observation biases, we included the length of time (measured in days) since the first confirmed COVID-19 case in each country/region and the number of COVID-19 tests conducted (as a measure of sampling effort) as covariates.

T51

1897-2182

Sentence

denotes

In addition, we applied the trend surface method to take spatial autocorrelation into account as a covariate; we added the first eigenvector of the geo-distance matrix among the countries or regions, which was computed using the geocoordinates of the largest city, as a covariate [16].

T52

2183-2282

Sentence

denotes

The explanatory power of the model was evaluated by the adjusted coefficient of determination (R2).

T53

2283-2526

Sentence

denotes

We also calculated the relative importance of each explanatory variable in a regression model according to its partial coefficient of determination and determined the predominant variables that explained the variance in the response variables.

T54

2527-2609

Sentence

denotes

The statistical significance of each variable was determined by conducting F-test.

T55

2610-2725

Sentence

denotes

All the explanatory variables were standardized to have a mean of zero and a variance of one before these analyses.

T56

2726-2819

Sentence

denotes

The explanatory factors of the regression model were compared between the four country types.

T57

2820-2939

Sentence

denotes

In the random forest model, we used the same set of response and explanatory variables, as well as the same covariates.

T58

2940-3019

Sentence

denotes

In each run of the random forest analysis, we generated 1,000 regression trees.

T59

3020-3109

Sentence

denotes

The model performance was evaluated by the proportion of variance explained by the model.

T60

3110-3257

Sentence

denotes

We evaluated the relative importance of each explanatory variable based on the increase in the mean squared error when the variable was permutated.

T61

3258-3393

Sentence

denotes

Before these analyses, we tested the collinearity between the explanatory variables by calculating the variance inflation factor (VIF).

T62

3394-3549

Sentence

denotes

For the study period, the largest VIF value was 8.56, and the VIF at June 30, 2020 was 8.56, indicating the absence of multicollinearity in the regression.

T63

3550-3790

Sentence

denotes

To confirm the testing effort bias on the number of confirmed cases, we conducted an additional analysis that accounted for the number of conducted tests (i.e., sampling efforts) in individual countries/regions, as a covariate in the model.

T64

3791-3979

Sentence

denotes

Note that this analysis was applied to the data from 128/828 countries/regions, because testing data for many countries is currently unavailable (https://ourworldindata.org/covid-testing).

T65

3980-4192

Sentence

denotes

All analyses were performed with the R environment for statistical computing [17]; the ‘sf’ package was used for graphics artworks [18] and the ‘randomForest’ package was used for the random forest analysis [19].

PMC:7510993 / 9012-13204 JSON TXT 7 Projects

Annnotations TAB TSV DIC JSON TextAE

PMC:7510993 / 9012-13204 JSONTXT 7 Projects

Annnotations TAB TSV DIC JSON TextAE

PMC:7510993 / 9012-13204 JSON TXT 7 Projects