> top > projects > LitCovid-sentences > docs > PMC:7078825 > annotations

PMC:7078825 JSONTXT 24 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
T1 0-135 Sentence denotes Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data, China, 2020
T2 137-145 Sentence denotes Abstract
T3 146-324 Sentence denotes The peak of Internet searches and social media data about the coronavirus disease 2019 (COVID-19) outbreak occurred 10–14 days earlier than the peak of daily incidences in China.
T4 325-455 Sentence denotes Internet searches and social media data had high correlation with daily incidences, with the maximum r > 0.89 in all correlations.
T5 456-588 Sentence denotes The lag correlations also showed a maximum correlation at 8–12 days for laboratory-confirmed cases and 6–8 days for suspected cases.
T6 590-754 Sentence denotes The coronavirus disease 2019 (COVID-19) outbreak began in Wuhan, China, in late December 2019 and quickly spread to other cities in China in a matter of days [1,2].
T7 755-888 Sentence denotes It was announced as a public health emergency of international concern by the World Health Organization (WHO) on 30 January 2020 [3].
T8 889-1014 Sentence denotes Predicting the development of the outbreak as early and as reliably as possible is critical for action to prevent its spread.
T9 1015-1213 Sentence denotes Internet searches and social media data have been reported to correlate with traditional surveillance data and can even predict the outbreak of disease epidemics several days or weeks earlier [4-9].
T10 1214-1381 Sentence denotes In this study, we aimed to evaluate the prediction value of the Internet search data from web-based search engines and social media for the COVID-19 outbreak in China.
T11 1383-1466 Sentence denotes Trends in daily laboratory-confirmed and suspected COVID-19 cases and Internet data
T12 1467-1663 Sentence denotes The daily numbers of new laboratory-confirmed cases and suspected cases of COVID-19 were collected from the data published by the National Health Commission of China (NHC, http://www.nhc.gov.cn/).
T13 1664-2104 Sentence denotes A laboratory-confirmed case of COVID-19 was defined a patient with positive real-time RT-PCR to SARS-CoV-2, while a suspected case was defined as a patient with history of travelling to Wuhan City or in contact with COVID-19 cases in the 14 days before onset of symptoms and with clinical manifestation of fever, respiratory illness, pneumonia on computed tomography (CT) scan, and/or reduced white blood cells count, but no RT-PCR results.
T14 2105-2230 Sentence denotes The study period was set between 16 January and 11 February 2020, because the diagnosis criteria were set on 16 January 2020.
T15 2231-2399 Sentence denotes The results showed that the peak of daily new laboratory-confirmed cases was 3,887 on 4 February and the peak of daily new suspected cases was 5,328 on 5 February 2020.
T16 2400-2621 Sentence denotes Daily trend data related to specific search terms were acquired from Google Trends, Baidu Index, and Sina Weibo Index by setting the time parameter to ‘2 January to 12 February 2020’ and the location parameter to ‘China’.
T17 2622-2709 Sentence denotes We chose a period 2 weeks earlier than for the molecular diagnosis data for COVID-2019.
T18 2710-2782 Sentence denotes Two keywords, ‘coronavirus’ and ‘pneumonia’, were used in Google Trends.
T19 2783-2967 Sentence denotes The respective Chinese terms, ‘冠状病毒‘ and ‘肺炎’ were used in Baidu Index, the most popular web search engine in China, and Sina Weibo Index, a social media platform widely used in China.
T20 2968-3094 Sentence denotes The peak number of search queries in Baidu was 682,888 for ‘coronavirus’ and 760,460 for ‘pneumonia’, both on 25 January 2020.
T21 3095-3223 Sentence denotes The peak number of posts on Sina Weibo was 26,297,746 for ‘coronavirus’ and 30,704,753 for ‘pneumonia’, both on 21 January 2020.
T22 3224-3333 Sentence denotes Google Trends does not provide the raw number of search queries but the number normalised to the peak number.
T23 3334-3411 Sentence denotes The peaks for both keywords on Google Trends were reached on 25 January 2020.
T24 3412-3665 Sentence denotes Figure 1 shows the overall trends of data from the keyword search for ‘coronavirus’ (or ‘冠状病毒’) and ‘pneumonia’ (or ‘肺炎’) via Google Trends, Baidu Index and Sina Weibo Index, and the number of daily new laboratory-confirmed and suspected COVID-19 cases.
T25 3666-3865 Sentence denotes The data from Baidu Index, Sina Weibo Index and national COVID-19 daily incidence data were also normalised to the peak number, so that the values fall into the same range (0–100) during that period.
T26 3866-4029 Sentence denotes Figure 1 Searches for keywords ‘coronavirus’ and ‘pneumonia’, obtained via different indices, and number of daily new COVID-19 cases, China, January–February 2020
T27 4031-4119 Sentence denotes Lag correlation between daily laboratory-confirmed/suspected cases and Internet searches
T28 4120-4371 Sentence denotes Figure 2 and the Table showed the lag Spearman correlations between the daily new laboratory-confirmed cases (upper panel) and suspected cases (lower panel) of COVID-19 and the Internet search data from Google Trends, Baidu Index and Sina Weibo Index.
T29 4372-4539 Sentence denotes We found a high correlation with the Internet search data (r > 0.7) 8–10 days earlier for new laboratory-confirmed cases, and 5-7 days earlier for new suspected cases.
T30 4540-4771 Sentence denotes Figure 2 Lag correlations between new laboratory-confirmed cases and suspected cases of COVID-19 and data from Google Trends, Baidu Index and Weibo Index for the keywords ‘coronavirus’ and ‘pneumonia’, China, January–February 2020
T31 4772-4939 Sentence denotes Table Lag correlation coefficients and p values between Internet search data and daily new laboratory-confirmed/suspected COVID-19 cases, China, January–February 2020
T32 4940-4998 Sentence denotes Days earlier Google Trends Baidu Index Sina Weibo Index
T33 4999-5087 Sentence denotes Coronavirus p Pneumonia p Coronavirus p Pneumonia p Coronavirus p Pneumonia p
T34 5088-5206 Sentence denotes New laboratory-confirmed cases 0 0.176 0.370 −0.035 0.861 0.021 0.917 0.129 0.513 0.106 0.593 0.109 0.582
T35 5207-5292 Sentence denotes 1 0.324 0.093 0.122 0.537 0.160 0.416 0.265 0.172 0.202 0.303 0.190 0.332
T36 5293-5378 Sentence denotes 2 0.455 0.015 0.271 0.164 0.299 0.122 0.411 0.030 0.346 0.072 0.298 0.123
T37 5379-5464 Sentence denotes 3 0.561 0.002 0.388 0.041 0.406 0.032 0.516 0.005 0.431 0.022 0.408 0.031
T38 5465-5554 Sentence denotes 4 0.672 < 0.001 0.505 0.006 0.529 0.004 0.641 < 0.001 0.498 0.007 0.470 0.012
T39 5555-5648 Sentence denotes 5 0.779  < 0.001 0.606 0.001 0.624 < 0.001 0.722  < 0.001 0.562 0.002 0.553 0.002
T40 5649-5750 Sentence denotes 6 0.850  < 0.001 0.712  < 0.001 0.706  < 0.001 0.808  < 0.001 0.679 < 0.001 0.668 < 0.001
T41 5751-5854 Sentence denotes 7 0.902  < 0.001 0.777  < 0.001 0.750  < 0.001 0.861  < 0.001 0.751  < 0.001 0.754  < 0.001
T42 5855-5961 Sentence denotes 8 0.944  < 0.001 0.835  < 0.001 0.823  < 0.001 0.902   < 0.001 0.829  < 0.001 0.817  < 0.001
T43 5962-6068 Sentence denotes 9 0.958   < 0.001 0.878  < 0.001 0.887  < 0.001 0.892  < 0.001 0.876  < 0.001 0.872  < 0.001
T44 6069-6179 Sentence denotes 10 0.953  < 0.001 0.893   < 0.001 0.928  < 0.001 0.873  < 0.001 0.921  < 0.001 0.899   < 0.001
T45 6180-6284 Sentence denotes 11 0.924  < 0.001 0.845  < 0.001 0.925  < 0.001 0.786  < 0.001 0.917  < 0.001 0.875  < 0.001
T46 6285-6395 Sentence denotes 12 0.857  < 0.001 0.818  < 0.001 0.933   < 0.001 0.715  < 0.001 0.944   < 0.001 0.875  < 0.001
T47 6396-6497 Sentence denotes 13 0.815  < 0.001 0.762  < 0.001 0.908  < 0.001 0.609 0.001 0.916  < 0.001 0.812  < 0.001
T48 6498-6598 Sentence denotes 14 0.783  < 0.001 0.697 < 0.001 0.858  < 0.001 0.496 0.007 0.885  < 0.001 0.733  < 0.001
T49 6599-6711 Sentence denotes New suspected cases 0 −0.003 0.989 −0.372 0.073 −0.309 0.142 −0.091 0.671 −0.279 0.187 −0.309 0.142
T50 6712-6801 Sentence denotes 1 0.246 0.246 −0.116 0.590 −0.068 0.753 0.141 0.511 −0.078 0.716 −0.103 0.630
T51 6802-6887 Sentence denotes 2 0.413 0.045 0.104 0.630 0.125 0.560 0.346 0.098 0.089 0.680 0.050 0.818
T52 6888-6973 Sentence denotes 3 0.614 0.001 0.312 0.138 0.352 0.091 0.551 0.005 0.253 0.233 0.248 0.243
T53 6974-7064 Sentence denotes 4 0.768  < 0.001 0.514 0.010 0.538 0.007 0.697 < 0.001 0.431 0.035 0.383 0.065
T54 7065-7160 Sentence denotes 5 0.832  < 0.001 0.687 < 0.001 0.670 < 0.001 0.816  < 0.001 0.520 0.009 0.501 0.013
T55 7161-7265 Sentence denotes 6 0.912   < 0.001 0.771  < 0.001 0.725  < 0.001 0.895  < 0.001 0.672 < 0.001 0.670 < 0.001
T56 7266-7369 Sentence denotes 7 0.933  < 0.001 0.850  < 0.001 0.830  < 0.001 0.914  < 0.001 0.813  < 0.001 0.872  < 0.001
T57 7370-7488 Sentence denotes 8 0.875  < 0.001 0.960   < 0.001 0.906   < 0.001 0.926   < 0.001 0.924   < 0.001 0.907   < 0.001
T58 7489-7592 Sentence denotes 9 0.787  < 0.001 0.865  < 0.001 0.882  < 0.001 0.850  < 0.001 0.883  < 0.001 0.899  < 0.001
T59 7593-7697 Sentence denotes 10 0.744  < 0.001 0.827  < 0.001 0.841  < 0.001 0.766  < 0.001 0.818  < 0.001 0.832  < 0.001
T60 7698-7800 Sentence denotes 11 0.671 < 0.001 0.770  < 0.001 0.790  < 0.001 0.698 < 0.001 0.781  < 0.001 0.802  < 0.001
T61 7801-7895 Sentence denotes 12 0.544 0.006 0.693 < 0.001 0.686 < 0.001 0.559 0.005 0.697 < 0.001 0.683 < 0.001
T62 7896-7982 Sentence denotes 13 0.482 0.017 0.578 0.003 0.583 0.003 0.454 0.026 0.622 0.001 0.600 0.002
T63 7983-8069 Sentence denotes 14 0.497 0.013 0.448 0.028 0.547 0.006 0.288 0.173 0.609 0.002 0.550 0.005
T64 8070-8113 Sentence denotes Shaded text: high correlation with r > 0.7.
T65 8114-8151 Sentence denotes Text in italics: highest correlation.
T66 8152-8389 Sentence denotes For new laboratory-confirmed cases, the highest correlation was found 9, 12 and 12 days earlier for searches for the keyword ‘coronavirus’ in Google Trends, Baidu Index and Sina Weibo Index with, respectively, r = 0.958, 0.933 and 0.944.
T67 8390-8577 Sentence denotes For the keyword ‘pneumonia’, the highest correlation was found 10, 8 and 10 days earlier in Google Trends, Baidu Index and Sina Weibo Index, with r = 0.893, 0.944 and 0.899, respectively.
T68 8578-8692 Sentence denotes The lag correlation of new suspected cases was similar to the laboratory-confirmed cases, with a shorter lag time.
T69 8693-8892 Sentence denotes The highest correlation was found 6, 8 and 8 days earlier for searches for the keyword ‘coronavirus’ in Google Trends, Baidu Index and Sina Weibo Index, with r = 0.912, 0.906 and 0.924, respectively.
T70 8893-9073 Sentence denotes For the keyword ‘pneumonia’, the highest correlation was found all 8 days earlier in Google Trends, Baidu Index and Sina Weibo Index, with r = 0.960, 0.926 and 0.907, respectively.
T71 9075-9085 Sentence denotes Discussion
T72 9086-9385 Sentence denotes Our study demonstrated that the data obtained from Google Trends, Baidu Index and Sina Weibo Index on searches for the keywords ‘coronavirus’ and ‘pneumonia’ correlated with the published NHC data on daily incidence of laboratory-confirmed and suspected cases of COVID-19, with the maximum r > 0.89.
T73 9386-9571 Sentence denotes We also found that the peak interest for these keywords in Internet search engines and social media data was 10–14 days earlier than the incidence peak of COVID-19 published by the NHC.
T74 9572-9698 Sentence denotes The lag correlation showed a maximum correlation at 8–12 days for laboratory-confirmed cases and 6–8 days for suspected cases.
T75 9699-9851 Sentence denotes COVID-19 is a rapidly spreading infectious disease with, at the time of submission, more than 80,000 cases and a mortality so far known to be 3.4% [10].
T76 9852-9996 Sentence denotes It is important to predict the development of this outbreak as early and as reliably as possible, in order to take action to prevent its spread.
T77 9997-10229 Sentence denotes Our data showed that the two popularly used Internet search engines, Google and Baidu, and the social media platform, Sina Weibo, were able to predict the disease outbreak 1–2 weeks earlier than the traditional surveillance systems.
T78 10230-10459 Sentence denotes The role of Internet surveillance tools in early prediction of other epidemics has been reported previously, including for influenza [4], dengue fever [5], H1N1 [6], Zika [7], measles [8] and Middle East respiratory syndrome [9].
T79 10460-10652 Sentence denotes The availability of early information about infectious diseases through Internet search engines and social media will be helpful for making decisions related to disease control and prevention.
T80 10653-10794 Sentence denotes Internet search data have been shown to enable the monitoring of Middle East respiratory syndrome 3 days before laboratory confirmations [9].
T81 10795-10953 Sentence denotes However, our results showed a much longer lag time for reported new laboratory-confirmed and suspected COVID-19 cases compared with digital surveillance data.
T82 10954-10985 Sentence denotes There are several explanations.
T83 10986-11048 Sentence denotes Firstly, COVID-19 is a novel disease just recently recognised.
T84 11049-11156 Sentence denotes The first version of a guideline for diagnosis and management of COVID-19 was announced on 16 January 2020.
T85 11157-11276 Sentence denotes It took time for the medical professionals to learn about the virus and the disease in order to make correct diagnosis.
T86 11277-11412 Sentence denotes Secondly, the diagnosis of COVID-19 requires two independent confirmatory laboratory tests, which should be taken at least 1 day apart.
T87 11413-11522 Sentence denotes Our results showed that the lag correlation is shorter for the suspected than for laboratory-confirmed cases.
T88 11523-11716 Sentence denotes Thirdly, the supply of laboratory testing kits may have been insufficient in the early stages of the coronavirus outbreak, which would have limited the number of patients that can be confirmed.
T89 11717-11934 Sentence denotes Finally, the Internet searches and social media mentions are not only initiated by the patients and their family members, but also globally by the general public who are concerned about this rapidly spreading disease.
T90 11935-12135 Sentence denotes In addition, we found that the data from the Baidu Index and Sina Weibo Index could monitor the number of daily new confirmed and suspected cases from the NHC earlier than the data from Google Trends.
T91 12136-12265 Sentence denotes A possible explanation is that the Google is not a major search engine used in China, where Baidu and Sina Weibo are widely used.
T92 12266-12357 Sentence denotes The peak in the Sina Weibo Index was reached earlier than in Google Trends and Baidu Index.
T93 12358-12489 Sentence denotes This suggests that Sina Weibo, which also serves as a social medium, disseminated the information faster than traditional websites.
T94 12490-12610 Sentence denotes COVID-19 was firstly reported as ‘pneumonia of unknown aetiology’ or ‘pneumonia of unknown cause’ in late December 2019.
T95 12611-12694 Sentence denotes On 8 January 2020, a novel coronavirus was identified as the cause of this disease.
T96 12695-12873 Sentence denotes The disease was first named Novel coronavirus pneumonia by the NHC of China on 8 February and later ‘coronavirus disease 2019’ (abbreviated ‘COVID-19’) on 11 February by the WHO.
T97 12874-12935 Sentence denotes Our search period was defined from January 16 to February 11.
T98 12936-13092 Sentence denotes Therefore, we think that the two keywords ‘pneumonia’ and ‘coronavirus’ were sufficient to include most Internet content related to COVID-19 in this period.
T99 13093-13319 Sentence denotes We also used other terms such as ‘新冠‘ (novel coronavirus), ‘新型冠状病毒肺炎’ (novel coronavirus pneumonia) as keywords but they returned much smaller numbers of queries and posts and we did therefore not include them in the analysis.
T100 13320-13409 Sentence denotes It is also notable that the strength of correlation was different for different keywords.
T101 13410-13641 Sentence denotes On Google, the keyword ‘coronavirus’ had the highest correlation coefficient (r = 0.958) with daily new laboratory-confirmed cases, and ‘pneumonia’ had the highest correlation coefficient with daily new suspected cases (r = 0.960).
T102 13642-13708 Sentence denotes We found the same pattern in the Baidu Index and Sina Weibo Index.
T103 13709-13968 Sentence denotes An explanation could be that ‘coronavirus’ is linked to the viral pathogen which should be investigated by a laboratory test, while ‘pneumonia’ is a clinical term and should link stronger to the suspected cases that are based on clinical and imaging evidence.
T104 13969-14023 Sentence denotes A limitation of our study is its retrospective nature.
T105 14024-14258 Sentence denotes If the Internet search engines and social media data were used in a real-time surveillance system, finding the best lag time would be a challenge because we would not have any training data to calibrate the analysis for a new disease.
T106 14260-14270 Sentence denotes Conclusion
T107 14271-14420 Sentence denotes This study reveals the advantages of Internet surveillance using Sina Weibo Index, Google Trends and Baidu Index to monitor a new infectious disease.
T108 14421-14469 Sentence denotes Reliable data can be obtained early at low cost.
T109 14470-14591 Sentence denotes The Internet surveillance data provided an accurate and timely prediction about the outbreak and progression of COVID-19.
T110 14593-14609 Sentence denotes Acknowledgements
T111 14610-14628 Sentence denotes Funding statement:
T112 14629-14794 Sentence denotes This study was supported by the Grant for Key Disciplinary Project of Clinical Medicine under the Guangdong High-level University Development Program (002-18119101).
T113 14795-14920 Sentence denotes The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
T114 14922-14943 Sentence denotes Conflict of interest:
T115 14944-14958 Sentence denotes None declared.
T116 14959-15059 Sentence denotes Authors’ contributions: CL and LJC collected the data, analysed the data and drafted the manuscript.
T117 15060-15103 Sentence denotes XC, LJC, CPP and HC revised the manuscript.
T118 15104-15155 Sentence denotes MZ and HC convened the idea and designed the study.