PMC:7078825 / 9075-14258 JSONTXT 12 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
T71 0-10 Sentence denotes Discussion
T72 11-310 Sentence denotes Our study demonstrated that the data obtained from Google Trends, Baidu Index and Sina Weibo Index on searches for the keywords ‘coronavirus’ and ‘pneumonia’ correlated with the published NHC data on daily incidence of laboratory-confirmed and suspected cases of COVID-19, with the maximum r > 0.89.
T73 311-496 Sentence denotes We also found that the peak interest for these keywords in Internet search engines and social media data was 10–14 days earlier than the incidence peak of COVID-19 published by the NHC.
T74 497-623 Sentence denotes The lag correlation showed a maximum correlation at 8–12 days for laboratory-confirmed cases and 6–8 days for suspected cases.
T75 624-776 Sentence denotes COVID-19 is a rapidly spreading infectious disease with, at the time of submission, more than 80,000 cases and a mortality so far known to be 3.4% [10].
T76 777-921 Sentence denotes It is important to predict the development of this outbreak as early and as reliably as possible, in order to take action to prevent its spread.
T77 922-1154 Sentence denotes Our data showed that the two popularly used Internet search engines, Google and Baidu, and the social media platform, Sina Weibo, were able to predict the disease outbreak 1–2 weeks earlier than the traditional surveillance systems.
T78 1155-1384 Sentence denotes The role of Internet surveillance tools in early prediction of other epidemics has been reported previously, including for influenza [4], dengue fever [5], H1N1 [6], Zika [7], measles [8] and Middle East respiratory syndrome [9].
T79 1385-1577 Sentence denotes The availability of early information about infectious diseases through Internet search engines and social media will be helpful for making decisions related to disease control and prevention.
T80 1578-1719 Sentence denotes Internet search data have been shown to enable the monitoring of Middle East respiratory syndrome 3 days before laboratory confirmations [9].
T81 1720-1878 Sentence denotes However, our results showed a much longer lag time for reported new laboratory-confirmed and suspected COVID-19 cases compared with digital surveillance data.
T82 1879-1910 Sentence denotes There are several explanations.
T83 1911-1973 Sentence denotes Firstly, COVID-19 is a novel disease just recently recognised.
T84 1974-2081 Sentence denotes The first version of a guideline for diagnosis and management of COVID-19 was announced on 16 January 2020.
T85 2082-2201 Sentence denotes It took time for the medical professionals to learn about the virus and the disease in order to make correct diagnosis.
T86 2202-2337 Sentence denotes Secondly, the diagnosis of COVID-19 requires two independent confirmatory laboratory tests, which should be taken at least 1 day apart.
T87 2338-2447 Sentence denotes Our results showed that the lag correlation is shorter for the suspected than for laboratory-confirmed cases.
T88 2448-2641 Sentence denotes Thirdly, the supply of laboratory testing kits may have been insufficient in the early stages of the coronavirus outbreak, which would have limited the number of patients that can be confirmed.
T89 2642-2859 Sentence denotes Finally, the Internet searches and social media mentions are not only initiated by the patients and their family members, but also globally by the general public who are concerned about this rapidly spreading disease.
T90 2860-3060 Sentence denotes In addition, we found that the data from the Baidu Index and Sina Weibo Index could monitor the number of daily new confirmed and suspected cases from the NHC earlier than the data from Google Trends.
T91 3061-3190 Sentence denotes A possible explanation is that the Google is not a major search engine used in China, where Baidu and Sina Weibo are widely used.
T92 3191-3282 Sentence denotes The peak in the Sina Weibo Index was reached earlier than in Google Trends and Baidu Index.
T93 3283-3414 Sentence denotes This suggests that Sina Weibo, which also serves as a social medium, disseminated the information faster than traditional websites.
T94 3415-3535 Sentence denotes COVID-19 was firstly reported as ‘pneumonia of unknown aetiology’ or ‘pneumonia of unknown cause’ in late December 2019.
T95 3536-3619 Sentence denotes On 8 January 2020, a novel coronavirus was identified as the cause of this disease.
T96 3620-3798 Sentence denotes The disease was first named Novel coronavirus pneumonia by the NHC of China on 8 February and later ‘coronavirus disease 2019’ (abbreviated ‘COVID-19’) on 11 February by the WHO.
T97 3799-3860 Sentence denotes Our search period was defined from January 16 to February 11.
T98 3861-4017 Sentence denotes Therefore, we think that the two keywords ‘pneumonia’ and ‘coronavirus’ were sufficient to include most Internet content related to COVID-19 in this period.
T99 4018-4244 Sentence denotes We also used other terms such as ‘新冠‘ (novel coronavirus), ‘新型冠状病毒肺炎’ (novel coronavirus pneumonia) as keywords but they returned much smaller numbers of queries and posts and we did therefore not include them in the analysis.
T100 4245-4334 Sentence denotes It is also notable that the strength of correlation was different for different keywords.
T101 4335-4566 Sentence denotes On Google, the keyword ‘coronavirus’ had the highest correlation coefficient (r = 0.958) with daily new laboratory-confirmed cases, and ‘pneumonia’ had the highest correlation coefficient with daily new suspected cases (r = 0.960).
T102 4567-4633 Sentence denotes We found the same pattern in the Baidu Index and Sina Weibo Index.
T103 4634-4893 Sentence denotes An explanation could be that ‘coronavirus’ is linked to the viral pathogen which should be investigated by a laboratory test, while ‘pneumonia’ is a clinical term and should link stronger to the suspected cases that are based on clinical and imaging evidence.
T104 4894-4948 Sentence denotes A limitation of our study is its retrospective nature.
T105 4949-5183 Sentence denotes If the Internet search engines and social media data were used in a real-time surveillance system, finding the best lag time would be a challenge because we would not have any training data to calibrate the analysis for a new disease.