Id |
Subject |
Object |
Predicate |
Lexical cue |
T26 |
0-21 |
Sentence |
denotes |
Materials and methods |
T27 |
23-35 |
Sentence |
denotes |
Data sources |
T28 |
36-149 |
Sentence |
denotes |
We compiled geographic data on the number of reported COVID-19 cases per day from December 2019 to June 30, 2020. |
T29 |
150-310 |
Sentence |
denotes |
We collected the numbers of COVID-19 cases for 1,020 countries/regions from various sources (see S1 Appendix for a list of data sources for the COVID-19 cases). |
T30 |
311-471 |
Sentence |
denotes |
We then calculated the length of time (in days) since the onset of COVID-19 spread as defined by the date of the first confirmed case in each country or region. |
T31 |
472-728 |
Sentence |
denotes |
We also examined the number of SARS-CoV-2 PCR tests conducted based on data published by the World Health Organization (WHO) (https://ourworldindata.org/covid-testing) to assess the influence of sampling effort on the number of confirmed cases of COVID-19. |
T32 |
729-801 |
Sentence |
denotes |
For each country or region, we compiled several environmental variables. |
T33 |
802-985 |
Sentence |
denotes |
For mapping cases of COVID-19, the longitude and latitude of the largest city and area for each country or region were extracted from GADM maps and data (https://gadm.org/index.html). |
T34 |
986-1321 |
Sentence |
denotes |
Based on the geocoordinates of the cities, we collected the climatic data of mean precipitation (mm month–1) and temperature (°C) from January to June (WorldClim) using WorldClim version 2.1 climate data (https://www.worldclim.org/data/worldclim21.html) at a resolution of 2.5 arc-minutes grid cells that contained a country or region. |
T35 |
1322-1592 |
Sentence |
denotes |
Regarding international travel linked to the disease transmission, we compiled the average annual number of foreign visitors (per year) for individual countries/regions from data published by the World Tourism Organization (https://www.e-unwto.org/toc/unwtotfb/current). |
T36 |
1593-1716 |
Sentence |
denotes |
We then calculated the relative amount of foreign visitors per population of each country or region to use in the analysis. |
T37 |
1717-1999 |
Sentence |
denotes |
Regarding region-specific host susceptibility to COVID-19, we collected data on the following three epidemiologic properties: the proportion of the population aged over 65 years, the malaria incidence (per year), and information regarding bacillus Calmette–Guérin (BCG) vaccination. |
T38 |
2000-2226 |
Sentence |
denotes |
We included these attributes in our analyses based on the assumptions that BCG vaccination and/or recurrent treatment with anti-malarial medications could be associated with providing some protection against COVID-19 [13, 14]. |
T39 |
2227-3152 |
Sentence |
denotes |
We compiled BCG data from the WHO (https://www.who.int/malaria/data/en/) and (https://apps.who.int/gho/data/view.main.80500?lang=en) and the BCG Atlas Team (http://www.bcgatlas.org/) on the following five attributes: i) the number of years since BCG vaccination was started (BCG_year); ii) the present situation regarding BCG vaccination (BCG_type), split into all vaccinated, partly vaccinated, vaccinated once in the past, or never vaccinated; iii) the relative frequency of post-1980 (i.e., the past 40 years) BCG vaccination for people aged less than 1 year old (BCG_rate); iv) the number of BCG vaccinations (MultipleBCG), describing countries as never having vaccinated their citizens with BCG, vaccinated their citizens with BCG only once, vaccinated their citizens with BCG multiple times in the past, or currently vaccinate their citizens with BCG multiple times; and v) tuberculosis cases per 1 million people (TB). |
T40 |
3153-3210 |
Sentence |
denotes |
These BCG-related variables are strongly intercorrelated. |
T41 |
3211-3526 |
Sentence |
denotes |
Therefore, we reduced the dimensions of these variables (BCG_year, BCG_type, BCG_rate, MultipleBCG, and TB) by extracting the first axis of the PCA analysis: the score of the PCA 1 axis was negatively correlated with the five variables, so the PCA 1 score multiplied by –1 was defined as the BCG vaccination effect. |
T42 |
3527-3590 |
Sentence |
denotes |
We also compiled socioeconomic data for each country or region. |
T43 |
3591-3949 |
Sentence |
denotes |
The population size, population density (per km2) (Gridded Population of the World GPW, v4.; https://sedac.ciesin.columbia.edu/data/collection/gpw-v4), gross domestic product (GDP in US dollars), and GDP per person were obtained from national census data (World Development Indicators; https://datacatalog.worldbank.org/dataset/world-development-indicators). |
T44 |
3951-3971 |
Sentence |
denotes |
Statistical analyses |
T45 |
3972-4189 |
Sentence |
denotes |
The monthly pattern for the cumulative number of COVID-19 cases in each country/region was visualized in relation to the geography, biome type, and climate (mean temperature and annual precipitation) of that location. |
T46 |
4190-4427 |
Sentence |
denotes |
In addition, the pattern of increasing COVID-19 case numbers was evaluated based on country type, with individual countries being classified into four types defined by the number of COVID-19 cases per week and the date of outbreak onset. |
T47 |
4428-4836 |
Sentence |
denotes |
To ensure the robustness of our results, we investigated the relationship between various environmental variables (climate, host susceptibility to COVID-19, international human mobility, and socioeconomic factors) and the number of COVID-19 cases (per 1 million population) using the two different approaches: conventional multiple linear regression and random forest, which is a machine-learning model [15]. |
T48 |
4837-4984 |
Sentence |
denotes |
We separately modeled the cumulative number of COVID-19 cases (per 1 million population) in successive periods from December 2019 to June 30, 2020. |
T49 |
4985-5583 |
Sentence |
denotes |
In the multiple regression analysis, we set the log-scaled cumulative number of COVID-19 cases within a period as the response variable and the climatic factors (mean temperature, squared mean temperature, and log-scaled monthly precipitation), socioeconomic conditions (log-scaled population density and GDP per person), international human mobility (the relative amount of foreign visitors per population) and region-specific COVID-19 susceptibility (the percentage of people aged ≥ 65 years, the log-scaled relative incidence of malaria, and the BCG vaccination effect) as explanatory variables. |
T50 |
5584-5847 |
Sentence |
denotes |
To control for country/region-specific observation biases, we included the length of time (measured in days) since the first confirmed COVID-19 case in each country/region and the number of COVID-19 tests conducted (as a measure of sampling effort) as covariates. |
T51 |
5848-6133 |
Sentence |
denotes |
In addition, we applied the trend surface method to take spatial autocorrelation into account as a covariate; we added the first eigenvector of the geo-distance matrix among the countries or regions, which was computed using the geocoordinates of the largest city, as a covariate [16]. |
T52 |
6134-6233 |
Sentence |
denotes |
The explanatory power of the model was evaluated by the adjusted coefficient of determination (R2). |
T53 |
6234-6477 |
Sentence |
denotes |
We also calculated the relative importance of each explanatory variable in a regression model according to its partial coefficient of determination and determined the predominant variables that explained the variance in the response variables. |
T54 |
6478-6560 |
Sentence |
denotes |
The statistical significance of each variable was determined by conducting F-test. |
T55 |
6561-6676 |
Sentence |
denotes |
All the explanatory variables were standardized to have a mean of zero and a variance of one before these analyses. |
T56 |
6677-6770 |
Sentence |
denotes |
The explanatory factors of the regression model were compared between the four country types. |
T57 |
6771-6890 |
Sentence |
denotes |
In the random forest model, we used the same set of response and explanatory variables, as well as the same covariates. |
T58 |
6891-6970 |
Sentence |
denotes |
In each run of the random forest analysis, we generated 1,000 regression trees. |
T59 |
6971-7060 |
Sentence |
denotes |
The model performance was evaluated by the proportion of variance explained by the model. |
T60 |
7061-7208 |
Sentence |
denotes |
We evaluated the relative importance of each explanatory variable based on the increase in the mean squared error when the variable was permutated. |
T61 |
7209-7344 |
Sentence |
denotes |
Before these analyses, we tested the collinearity between the explanatory variables by calculating the variance inflation factor (VIF). |
T62 |
7345-7500 |
Sentence |
denotes |
For the study period, the largest VIF value was 8.56, and the VIF at June 30, 2020 was 8.56, indicating the absence of multicollinearity in the regression. |
T63 |
7501-7741 |
Sentence |
denotes |
To confirm the testing effort bias on the number of confirmed cases, we conducted an additional analysis that accounted for the number of conducted tests (i.e., sampling efforts) in individual countries/regions, as a covariate in the model. |
T64 |
7742-7930 |
Sentence |
denotes |
Note that this analysis was applied to the data from 128/828 countries/regions, because testing data for many countries is currently unavailable (https://ourworldindata.org/covid-testing). |
T65 |
7931-8143 |
Sentence |
denotes |
All analyses were performed with the R environment for statistical computing [17]; the ‘sf’ package was used for graphics artworks [18] and the ‘randomForest’ package was used for the random forest analysis [19]. |