PMC:7078825 / 590-14591 JSONTXT 14 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
T6 0-164 Sentence denotes The coronavirus disease 2019 (COVID-19) outbreak began in Wuhan, China, in late December 2019 and quickly spread to other cities in China in a matter of days [1,2].
T7 165-298 Sentence denotes It was announced as a public health emergency of international concern by the World Health Organization (WHO) on 30 January 2020 [3].
T8 299-424 Sentence denotes Predicting the development of the outbreak as early and as reliably as possible is critical for action to prevent its spread.
T9 425-623 Sentence denotes Internet searches and social media data have been reported to correlate with traditional surveillance data and can even predict the outbreak of disease epidemics several days or weeks earlier [4-9].
T10 624-791 Sentence denotes In this study, we aimed to evaluate the prediction value of the Internet search data from web-based search engines and social media for the COVID-19 outbreak in China.
T11 793-876 Sentence denotes Trends in daily laboratory-confirmed and suspected COVID-19 cases and Internet data
T12 877-1073 Sentence denotes The daily numbers of new laboratory-confirmed cases and suspected cases of COVID-19 were collected from the data published by the National Health Commission of China (NHC, http://www.nhc.gov.cn/).
T13 1074-1514 Sentence denotes A laboratory-confirmed case of COVID-19 was defined a patient with positive real-time RT-PCR to SARS-CoV-2, while a suspected case was defined as a patient with history of travelling to Wuhan City or in contact with COVID-19 cases in the 14 days before onset of symptoms and with clinical manifestation of fever, respiratory illness, pneumonia on computed tomography (CT) scan, and/or reduced white blood cells count, but no RT-PCR results.
T14 1515-1640 Sentence denotes The study period was set between 16 January and 11 February 2020, because the diagnosis criteria were set on 16 January 2020.
T15 1641-1809 Sentence denotes The results showed that the peak of daily new laboratory-confirmed cases was 3,887 on 4 February and the peak of daily new suspected cases was 5,328 on 5 February 2020.
T16 1810-2031 Sentence denotes Daily trend data related to specific search terms were acquired from Google Trends, Baidu Index, and Sina Weibo Index by setting the time parameter to ‘2 January to 12 February 2020’ and the location parameter to ‘China’.
T17 2032-2119 Sentence denotes We chose a period 2 weeks earlier than for the molecular diagnosis data for COVID-2019.
T18 2120-2192 Sentence denotes Two keywords, ‘coronavirus’ and ‘pneumonia’, were used in Google Trends.
T19 2193-2377 Sentence denotes The respective Chinese terms, ‘冠状病毒‘ and ‘肺炎’ were used in Baidu Index, the most popular web search engine in China, and Sina Weibo Index, a social media platform widely used in China.
T20 2378-2504 Sentence denotes The peak number of search queries in Baidu was 682,888 for ‘coronavirus’ and 760,460 for ‘pneumonia’, both on 25 January 2020.
T21 2505-2633 Sentence denotes The peak number of posts on Sina Weibo was 26,297,746 for ‘coronavirus’ and 30,704,753 for ‘pneumonia’, both on 21 January 2020.
T22 2634-2743 Sentence denotes Google Trends does not provide the raw number of search queries but the number normalised to the peak number.
T23 2744-2821 Sentence denotes The peaks for both keywords on Google Trends were reached on 25 January 2020.
T24 2822-3075 Sentence denotes Figure 1 shows the overall trends of data from the keyword search for ‘coronavirus’ (or ‘冠状病毒’) and ‘pneumonia’ (or ‘肺炎’) via Google Trends, Baidu Index and Sina Weibo Index, and the number of daily new laboratory-confirmed and suspected COVID-19 cases.
T25 3076-3275 Sentence denotes The data from Baidu Index, Sina Weibo Index and national COVID-19 daily incidence data were also normalised to the peak number, so that the values fall into the same range (0–100) during that period.
T26 3276-3439 Sentence denotes Figure 1 Searches for keywords ‘coronavirus’ and ‘pneumonia’, obtained via different indices, and number of daily new COVID-19 cases, China, January–February 2020
T27 3441-3529 Sentence denotes Lag correlation between daily laboratory-confirmed/suspected cases and Internet searches
T28 3530-3781 Sentence denotes Figure 2 and the Table showed the lag Spearman correlations between the daily new laboratory-confirmed cases (upper panel) and suspected cases (lower panel) of COVID-19 and the Internet search data from Google Trends, Baidu Index and Sina Weibo Index.
T29 3782-3949 Sentence denotes We found a high correlation with the Internet search data (r > 0.7) 8–10 days earlier for new laboratory-confirmed cases, and 5-7 days earlier for new suspected cases.
T30 3950-4181 Sentence denotes Figure 2 Lag correlations between new laboratory-confirmed cases and suspected cases of COVID-19 and data from Google Trends, Baidu Index and Weibo Index for the keywords ‘coronavirus’ and ‘pneumonia’, China, January–February 2020
T31 4182-4349 Sentence denotes Table Lag correlation coefficients and p values between Internet search data and daily new laboratory-confirmed/suspected COVID-19 cases, China, January–February 2020
T32 4350-4408 Sentence denotes Days earlier Google Trends Baidu Index Sina Weibo Index
T33 4409-4497 Sentence denotes Coronavirus p Pneumonia p Coronavirus p Pneumonia p Coronavirus p Pneumonia p
T34 4498-4616 Sentence denotes New laboratory-confirmed cases 0 0.176 0.370 −0.035 0.861 0.021 0.917 0.129 0.513 0.106 0.593 0.109 0.582
T35 4617-4702 Sentence denotes 1 0.324 0.093 0.122 0.537 0.160 0.416 0.265 0.172 0.202 0.303 0.190 0.332
T36 4703-4788 Sentence denotes 2 0.455 0.015 0.271 0.164 0.299 0.122 0.411 0.030 0.346 0.072 0.298 0.123
T37 4789-4874 Sentence denotes 3 0.561 0.002 0.388 0.041 0.406 0.032 0.516 0.005 0.431 0.022 0.408 0.031
T38 4875-4964 Sentence denotes 4 0.672 < 0.001 0.505 0.006 0.529 0.004 0.641 < 0.001 0.498 0.007 0.470 0.012
T39 4965-5058 Sentence denotes 5 0.779  < 0.001 0.606 0.001 0.624 < 0.001 0.722  < 0.001 0.562 0.002 0.553 0.002
T40 5059-5160 Sentence denotes 6 0.850  < 0.001 0.712  < 0.001 0.706  < 0.001 0.808  < 0.001 0.679 < 0.001 0.668 < 0.001
T41 5161-5264 Sentence denotes 7 0.902  < 0.001 0.777  < 0.001 0.750  < 0.001 0.861  < 0.001 0.751  < 0.001 0.754  < 0.001
T42 5265-5371 Sentence denotes 8 0.944  < 0.001 0.835  < 0.001 0.823  < 0.001 0.902   < 0.001 0.829  < 0.001 0.817  < 0.001
T43 5372-5478 Sentence denotes 9 0.958   < 0.001 0.878  < 0.001 0.887  < 0.001 0.892  < 0.001 0.876  < 0.001 0.872  < 0.001
T44 5479-5589 Sentence denotes 10 0.953  < 0.001 0.893   < 0.001 0.928  < 0.001 0.873  < 0.001 0.921  < 0.001 0.899   < 0.001
T45 5590-5694 Sentence denotes 11 0.924  < 0.001 0.845  < 0.001 0.925  < 0.001 0.786  < 0.001 0.917  < 0.001 0.875  < 0.001
T46 5695-5805 Sentence denotes 12 0.857  < 0.001 0.818  < 0.001 0.933   < 0.001 0.715  < 0.001 0.944   < 0.001 0.875  < 0.001
T47 5806-5907 Sentence denotes 13 0.815  < 0.001 0.762  < 0.001 0.908  < 0.001 0.609 0.001 0.916  < 0.001 0.812  < 0.001
T48 5908-6008 Sentence denotes 14 0.783  < 0.001 0.697 < 0.001 0.858  < 0.001 0.496 0.007 0.885  < 0.001 0.733  < 0.001
T49 6009-6121 Sentence denotes New suspected cases 0 −0.003 0.989 −0.372 0.073 −0.309 0.142 −0.091 0.671 −0.279 0.187 −0.309 0.142
T50 6122-6211 Sentence denotes 1 0.246 0.246 −0.116 0.590 −0.068 0.753 0.141 0.511 −0.078 0.716 −0.103 0.630
T51 6212-6297 Sentence denotes 2 0.413 0.045 0.104 0.630 0.125 0.560 0.346 0.098 0.089 0.680 0.050 0.818
T52 6298-6383 Sentence denotes 3 0.614 0.001 0.312 0.138 0.352 0.091 0.551 0.005 0.253 0.233 0.248 0.243
T53 6384-6474 Sentence denotes 4 0.768  < 0.001 0.514 0.010 0.538 0.007 0.697 < 0.001 0.431 0.035 0.383 0.065
T54 6475-6570 Sentence denotes 5 0.832  < 0.001 0.687 < 0.001 0.670 < 0.001 0.816  < 0.001 0.520 0.009 0.501 0.013
T55 6571-6675 Sentence denotes 6 0.912   < 0.001 0.771  < 0.001 0.725  < 0.001 0.895  < 0.001 0.672 < 0.001 0.670 < 0.001
T56 6676-6779 Sentence denotes 7 0.933  < 0.001 0.850  < 0.001 0.830  < 0.001 0.914  < 0.001 0.813  < 0.001 0.872  < 0.001
T57 6780-6898 Sentence denotes 8 0.875  < 0.001 0.960   < 0.001 0.906   < 0.001 0.926   < 0.001 0.924   < 0.001 0.907   < 0.001
T58 6899-7002 Sentence denotes 9 0.787  < 0.001 0.865  < 0.001 0.882  < 0.001 0.850  < 0.001 0.883  < 0.001 0.899  < 0.001
T59 7003-7107 Sentence denotes 10 0.744  < 0.001 0.827  < 0.001 0.841  < 0.001 0.766  < 0.001 0.818  < 0.001 0.832  < 0.001
T60 7108-7210 Sentence denotes 11 0.671 < 0.001 0.770  < 0.001 0.790  < 0.001 0.698 < 0.001 0.781  < 0.001 0.802  < 0.001
T61 7211-7305 Sentence denotes 12 0.544 0.006 0.693 < 0.001 0.686 < 0.001 0.559 0.005 0.697 < 0.001 0.683 < 0.001
T62 7306-7392 Sentence denotes 13 0.482 0.017 0.578 0.003 0.583 0.003 0.454 0.026 0.622 0.001 0.600 0.002
T63 7393-7479 Sentence denotes 14 0.497 0.013 0.448 0.028 0.547 0.006 0.288 0.173 0.609 0.002 0.550 0.005
T64 7480-7523 Sentence denotes Shaded text: high correlation with r > 0.7.
T65 7524-7561 Sentence denotes Text in italics: highest correlation.
T66 7562-7799 Sentence denotes For new laboratory-confirmed cases, the highest correlation was found 9, 12 and 12 days earlier for searches for the keyword ‘coronavirus’ in Google Trends, Baidu Index and Sina Weibo Index with, respectively, r = 0.958, 0.933 and 0.944.
T67 7800-7987 Sentence denotes For the keyword ‘pneumonia’, the highest correlation was found 10, 8 and 10 days earlier in Google Trends, Baidu Index and Sina Weibo Index, with r = 0.893, 0.944 and 0.899, respectively.
T68 7988-8102 Sentence denotes The lag correlation of new suspected cases was similar to the laboratory-confirmed cases, with a shorter lag time.
T69 8103-8302 Sentence denotes The highest correlation was found 6, 8 and 8 days earlier for searches for the keyword ‘coronavirus’ in Google Trends, Baidu Index and Sina Weibo Index, with r = 0.912, 0.906 and 0.924, respectively.
T70 8303-8483 Sentence denotes For the keyword ‘pneumonia’, the highest correlation was found all 8 days earlier in Google Trends, Baidu Index and Sina Weibo Index, with r = 0.960, 0.926 and 0.907, respectively.
T71 8485-8495 Sentence denotes Discussion
T72 8496-8795 Sentence denotes Our study demonstrated that the data obtained from Google Trends, Baidu Index and Sina Weibo Index on searches for the keywords ‘coronavirus’ and ‘pneumonia’ correlated with the published NHC data on daily incidence of laboratory-confirmed and suspected cases of COVID-19, with the maximum r > 0.89.
T73 8796-8981 Sentence denotes We also found that the peak interest for these keywords in Internet search engines and social media data was 10–14 days earlier than the incidence peak of COVID-19 published by the NHC.
T74 8982-9108 Sentence denotes The lag correlation showed a maximum correlation at 8–12 days for laboratory-confirmed cases and 6–8 days for suspected cases.
T75 9109-9261 Sentence denotes COVID-19 is a rapidly spreading infectious disease with, at the time of submission, more than 80,000 cases and a mortality so far known to be 3.4% [10].
T76 9262-9406 Sentence denotes It is important to predict the development of this outbreak as early and as reliably as possible, in order to take action to prevent its spread.
T77 9407-9639 Sentence denotes Our data showed that the two popularly used Internet search engines, Google and Baidu, and the social media platform, Sina Weibo, were able to predict the disease outbreak 1–2 weeks earlier than the traditional surveillance systems.
T78 9640-9869 Sentence denotes The role of Internet surveillance tools in early prediction of other epidemics has been reported previously, including for influenza [4], dengue fever [5], H1N1 [6], Zika [7], measles [8] and Middle East respiratory syndrome [9].
T79 9870-10062 Sentence denotes The availability of early information about infectious diseases through Internet search engines and social media will be helpful for making decisions related to disease control and prevention.
T80 10063-10204 Sentence denotes Internet search data have been shown to enable the monitoring of Middle East respiratory syndrome 3 days before laboratory confirmations [9].
T81 10205-10363 Sentence denotes However, our results showed a much longer lag time for reported new laboratory-confirmed and suspected COVID-19 cases compared with digital surveillance data.
T82 10364-10395 Sentence denotes There are several explanations.
T83 10396-10458 Sentence denotes Firstly, COVID-19 is a novel disease just recently recognised.
T84 10459-10566 Sentence denotes The first version of a guideline for diagnosis and management of COVID-19 was announced on 16 January 2020.
T85 10567-10686 Sentence denotes It took time for the medical professionals to learn about the virus and the disease in order to make correct diagnosis.
T86 10687-10822 Sentence denotes Secondly, the diagnosis of COVID-19 requires two independent confirmatory laboratory tests, which should be taken at least 1 day apart.
T87 10823-10932 Sentence denotes Our results showed that the lag correlation is shorter for the suspected than for laboratory-confirmed cases.
T88 10933-11126 Sentence denotes Thirdly, the supply of laboratory testing kits may have been insufficient in the early stages of the coronavirus outbreak, which would have limited the number of patients that can be confirmed.
T89 11127-11344 Sentence denotes Finally, the Internet searches and social media mentions are not only initiated by the patients and their family members, but also globally by the general public who are concerned about this rapidly spreading disease.
T90 11345-11545 Sentence denotes In addition, we found that the data from the Baidu Index and Sina Weibo Index could monitor the number of daily new confirmed and suspected cases from the NHC earlier than the data from Google Trends.
T91 11546-11675 Sentence denotes A possible explanation is that the Google is not a major search engine used in China, where Baidu and Sina Weibo are widely used.
T92 11676-11767 Sentence denotes The peak in the Sina Weibo Index was reached earlier than in Google Trends and Baidu Index.
T93 11768-11899 Sentence denotes This suggests that Sina Weibo, which also serves as a social medium, disseminated the information faster than traditional websites.
T94 11900-12020 Sentence denotes COVID-19 was firstly reported as ‘pneumonia of unknown aetiology’ or ‘pneumonia of unknown cause’ in late December 2019.
T95 12021-12104 Sentence denotes On 8 January 2020, a novel coronavirus was identified as the cause of this disease.
T96 12105-12283 Sentence denotes The disease was first named Novel coronavirus pneumonia by the NHC of China on 8 February and later ‘coronavirus disease 2019’ (abbreviated ‘COVID-19’) on 11 February by the WHO.
T97 12284-12345 Sentence denotes Our search period was defined from January 16 to February 11.
T98 12346-12502 Sentence denotes Therefore, we think that the two keywords ‘pneumonia’ and ‘coronavirus’ were sufficient to include most Internet content related to COVID-19 in this period.
T99 12503-12729 Sentence denotes We also used other terms such as ‘新冠‘ (novel coronavirus), ‘新型冠状病毒肺炎’ (novel coronavirus pneumonia) as keywords but they returned much smaller numbers of queries and posts and we did therefore not include them in the analysis.
T100 12730-12819 Sentence denotes It is also notable that the strength of correlation was different for different keywords.
T101 12820-13051 Sentence denotes On Google, the keyword ‘coronavirus’ had the highest correlation coefficient (r = 0.958) with daily new laboratory-confirmed cases, and ‘pneumonia’ had the highest correlation coefficient with daily new suspected cases (r = 0.960).
T102 13052-13118 Sentence denotes We found the same pattern in the Baidu Index and Sina Weibo Index.
T103 13119-13378 Sentence denotes An explanation could be that ‘coronavirus’ is linked to the viral pathogen which should be investigated by a laboratory test, while ‘pneumonia’ is a clinical term and should link stronger to the suspected cases that are based on clinical and imaging evidence.
T104 13379-13433 Sentence denotes A limitation of our study is its retrospective nature.
T105 13434-13668 Sentence denotes If the Internet search engines and social media data were used in a real-time surveillance system, finding the best lag time would be a challenge because we would not have any training data to calibrate the analysis for a new disease.
T106 13670-13680 Sentence denotes Conclusion
T107 13681-13830 Sentence denotes This study reveals the advantages of Internet surveillance using Sina Weibo Index, Google Trends and Baidu Index to monitor a new infectious disease.
T108 13831-13879 Sentence denotes Reliable data can be obtained early at low cost.
T109 13880-14001 Sentence denotes The Internet surveillance data provided an accurate and timely prediction about the outbreak and progression of COVID-19.