PMC:7510993 / 5061-13204 JSONTXT 8 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
T26 0-21 Sentence denotes Materials and methods
T27 23-35 Sentence denotes Data sources
T28 36-149 Sentence denotes We compiled geographic data on the number of reported COVID-19 cases per day from December 2019 to June 30, 2020.
T29 150-310 Sentence denotes We collected the numbers of COVID-19 cases for 1,020 countries/regions from various sources (see S1 Appendix for a list of data sources for the COVID-19 cases).
T30 311-471 Sentence denotes We then calculated the length of time (in days) since the onset of COVID-19 spread as defined by the date of the first confirmed case in each country or region.
T31 472-728 Sentence denotes We also examined the number of SARS-CoV-2 PCR tests conducted based on data published by the World Health Organization (WHO) (https://ourworldindata.org/covid-testing) to assess the influence of sampling effort on the number of confirmed cases of COVID-19.
T32 729-801 Sentence denotes For each country or region, we compiled several environmental variables.
T33 802-985 Sentence denotes For mapping cases of COVID-19, the longitude and latitude of the largest city and area for each country or region were extracted from GADM maps and data (https://gadm.org/index.html).
T34 986-1321 Sentence denotes Based on the geocoordinates of the cities, we collected the climatic data of mean precipitation (mm month–1) and temperature (°C) from January to June (WorldClim) using WorldClim version 2.1 climate data (https://www.worldclim.org/data/worldclim21.html) at a resolution of 2.5 arc-minutes grid cells that contained a country or region.
T35 1322-1592 Sentence denotes Regarding international travel linked to the disease transmission, we compiled the average annual number of foreign visitors (per year) for individual countries/regions from data published by the World Tourism Organization (https://www.e-unwto.org/toc/unwtotfb/current).
T36 1593-1716 Sentence denotes We then calculated the relative amount of foreign visitors per population of each country or region to use in the analysis.
T37 1717-1999 Sentence denotes Regarding region-specific host susceptibility to COVID-19, we collected data on the following three epidemiologic properties: the proportion of the population aged over 65 years, the malaria incidence (per year), and information regarding bacillus Calmette–Guérin (BCG) vaccination.
T38 2000-2226 Sentence denotes We included these attributes in our analyses based on the assumptions that BCG vaccination and/or recurrent treatment with anti-malarial medications could be associated with providing some protection against COVID-19 [13, 14].
T39 2227-3152 Sentence denotes We compiled BCG data from the WHO (https://www.who.int/malaria/data/en/) and (https://apps.who.int/gho/data/view.main.80500?lang=en) and the BCG Atlas Team (http://www.bcgatlas.org/) on the following five attributes: i) the number of years since BCG vaccination was started (BCG_year); ii) the present situation regarding BCG vaccination (BCG_type), split into all vaccinated, partly vaccinated, vaccinated once in the past, or never vaccinated; iii) the relative frequency of post-1980 (i.e., the past 40 years) BCG vaccination for people aged less than 1 year old (BCG_rate); iv) the number of BCG vaccinations (MultipleBCG), describing countries as never having vaccinated their citizens with BCG, vaccinated their citizens with BCG only once, vaccinated their citizens with BCG multiple times in the past, or currently vaccinate their citizens with BCG multiple times; and v) tuberculosis cases per 1 million people (TB).
T40 3153-3210 Sentence denotes These BCG-related variables are strongly intercorrelated.
T41 3211-3526 Sentence denotes Therefore, we reduced the dimensions of these variables (BCG_year, BCG_type, BCG_rate, MultipleBCG, and TB) by extracting the first axis of the PCA analysis: the score of the PCA 1 axis was negatively correlated with the five variables, so the PCA 1 score multiplied by –1 was defined as the BCG vaccination effect.
T42 3527-3590 Sentence denotes We also compiled socioeconomic data for each country or region.
T43 3591-3949 Sentence denotes The population size, population density (per km2) (Gridded Population of the World GPW, v4.; https://sedac.ciesin.columbia.edu/data/collection/gpw-v4), gross domestic product (GDP in US dollars), and GDP per person were obtained from national census data (World Development Indicators; https://datacatalog.worldbank.org/dataset/world-development-indicators).
T44 3951-3971 Sentence denotes Statistical analyses
T45 3972-4189 Sentence denotes The monthly pattern for the cumulative number of COVID-19 cases in each country/region was visualized in relation to the geography, biome type, and climate (mean temperature and annual precipitation) of that location.
T46 4190-4427 Sentence denotes In addition, the pattern of increasing COVID-19 case numbers was evaluated based on country type, with individual countries being classified into four types defined by the number of COVID-19 cases per week and the date of outbreak onset.
T47 4428-4836 Sentence denotes To ensure the robustness of our results, we investigated the relationship between various environmental variables (climate, host susceptibility to COVID-19, international human mobility, and socioeconomic factors) and the number of COVID-19 cases (per 1 million population) using the two different approaches: conventional multiple linear regression and random forest, which is a machine-learning model [15].
T48 4837-4984 Sentence denotes We separately modeled the cumulative number of COVID-19 cases (per 1 million population) in successive periods from December 2019 to June 30, 2020.
T49 4985-5583 Sentence denotes In the multiple regression analysis, we set the log-scaled cumulative number of COVID-19 cases within a period as the response variable and the climatic factors (mean temperature, squared mean temperature, and log-scaled monthly precipitation), socioeconomic conditions (log-scaled population density and GDP per person), international human mobility (the relative amount of foreign visitors per population) and region-specific COVID-19 susceptibility (the percentage of people aged ≥ 65 years, the log-scaled relative incidence of malaria, and the BCG vaccination effect) as explanatory variables.
T50 5584-5847 Sentence denotes To control for country/region-specific observation biases, we included the length of time (measured in days) since the first confirmed COVID-19 case in each country/region and the number of COVID-19 tests conducted (as a measure of sampling effort) as covariates.
T51 5848-6133 Sentence denotes In addition, we applied the trend surface method to take spatial autocorrelation into account as a covariate; we added the first eigenvector of the geo-distance matrix among the countries or regions, which was computed using the geocoordinates of the largest city, as a covariate [16].
T52 6134-6233 Sentence denotes The explanatory power of the model was evaluated by the adjusted coefficient of determination (R2).
T53 6234-6477 Sentence denotes We also calculated the relative importance of each explanatory variable in a regression model according to its partial coefficient of determination and determined the predominant variables that explained the variance in the response variables.
T54 6478-6560 Sentence denotes The statistical significance of each variable was determined by conducting F-test.
T55 6561-6676 Sentence denotes All the explanatory variables were standardized to have a mean of zero and a variance of one before these analyses.
T56 6677-6770 Sentence denotes The explanatory factors of the regression model were compared between the four country types.
T57 6771-6890 Sentence denotes In the random forest model, we used the same set of response and explanatory variables, as well as the same covariates.
T58 6891-6970 Sentence denotes In each run of the random forest analysis, we generated 1,000 regression trees.
T59 6971-7060 Sentence denotes The model performance was evaluated by the proportion of variance explained by the model.
T60 7061-7208 Sentence denotes We evaluated the relative importance of each explanatory variable based on the increase in the mean squared error when the variable was permutated.
T61 7209-7344 Sentence denotes Before these analyses, we tested the collinearity between the explanatory variables by calculating the variance inflation factor (VIF).
T62 7345-7500 Sentence denotes For the study period, the largest VIF value was 8.56, and the VIF at June 30, 2020 was 8.56, indicating the absence of multicollinearity in the regression.
T63 7501-7741 Sentence denotes To confirm the testing effort bias on the number of confirmed cases, we conducted an additional analysis that accounted for the number of conducted tests (i.e., sampling efforts) in individual countries/regions, as a covariate in the model.
T64 7742-7930 Sentence denotes Note that this analysis was applied to the data from 128/828 countries/regions, because testing data for many countries is currently unavailable (https://ourworldindata.org/covid-testing).
T65 7931-8143 Sentence denotes All analyses were performed with the R environment for statistical computing [17]; the ‘sf’ package was used for graphics artworks [18] and the ‘randomForest’ package was used for the random forest analysis [19].