CORD-19:e977ca94d7eb88b4b650c25c3543e9d48afd535b JSONTXT 7 Projects

Annnotations TAB TSV DIC JSON TextAE Lectin_function

Id Subject Object Predicate Lexical cue
TextSentencer_T1 0-91 Sentence denotes Predicting social response to infectious disease outbreaks from internet-based news streams
TextSentencer_T2 93-101 Sentence denotes Abstract
TextSentencer_T3 102-261 Sentence denotes Infectious disease outbreaks often have consequences beyond human health, including concern among the population, economic instability, and sometimes violence.
TextSentencer_T4 262-420 Sentence denotes A warning system capable of anticipating social disruptions resulting from disease outbreaks is urgently needed to help decision makers prepare appropriately.
TextSentencer_T5 421-514 Sentence denotes We designed a system that operates in near real-time to identify and predict social response.
TextSentencer_T6 515-652 Sentence denotes Over 150,000 Internet-based news articles related to outbreaks of 16 diseases in 72 countries and territories were provided by HealthMap.
TextSentencer_T7 653-758 Sentence denotes These articles were automatically tagged with indicators of the disease activity and population reaction.
TextSentencer_T8 759-900 Sentence denotes An anomaly detection algorithm was implemented on the population reaction indicators to identify periods of unusually severe social response.
TextSentencer_T9 901-1052 Sentence denotes Then a model was developed to predict the probability of these periods of unusually severe social response occurring in the coming week, 2 and 3 weeks.
TextSentencer_T10 1053-1149 Sentence denotes This model exhibited remarkably strong performance for diseases with substantial media coverage.
TextSentencer_T11 1150-1357 Sentence denotes For country-disease pairs with a median of 20 or more articles per year, the onset of social response in the next week was correctly predicted over 60% of the time, and 87% of weeks were correctly predicted.
TextSentencer_T12 1358-1585 Sentence denotes Performance was weaker for diseases with little media coverage, and, for these diseases, the main utility of our system is in identifying social response when it occurs, rather than predicting when it will happen in the future.
TextSentencer_T13 1586-1773 Sentence denotes Overall, the developed near real-time prediction approach is a promising step toward developing predictive models to inform responders of the likely social consequences of disease spread.
TextSentencer_T14 1774-1874 Sentence denotes This research was funded by Defense Threat Reduction Agency (www.dtra.mil) contact HDTRA1-12-C-0061.
TextSentencer_T15 1876-2049 Sentence denotes Despite progress in the fight against infectious diseases, they remain a persistent threat to global health, claiming approximately 9.5 million lives annually (Lozano et al.
TextSentencer_T16 2050-2057 Sentence denotes 2012) .
TextSentencer_T17 2058-2133 Sentence denotes Moreover, the consequences of disease outbreaks extend beyond human health.
TextSentencer_T18 2134-2316 Sentence denotes Societal strain-ranging from anxiety and economic effects (Cheng 2004) to riots, violence, or flight (Kinsman 2012 )-frequently accompanies the outbreak of severe infectious disease.
TextSentencer_T19 2317-2569 Sentence denotes These social responses may ultimately impact national security and can limit responders' ability to combat the disease, as recently observed with the Ebola epidemic in West Africa (International Federation of Red Cross and Red Crescent Societies 2015).
TextSentencer_T20 2570-2758 Sentence denotes A warning system capable of anticipating the social consequences of epidemics will benefit decision makers and relief workers, helping them to allocate resources and respond appropriately.
TextSentencer_T21 2759-2903 Sentence denotes In this work, we present such a warning system and demonstrate its utility for predicting social response to disease outbreaks around the world.
TextSentencer_T22 2904-3066 Sentence denotes As social media and Internet news data are becoming increasingly prevalent, forecasting of social phenomena using these data has become an area of great interest.
TextSentencer_T23 3067-3221 Sentence denotes Social media and news data streams have been used to predict targets ranging from election results (Gayo-Avello 2013) and financial markets (Bollen et al.
TextSentencer_T24 3222-3318 Sentence denotes 2011; Schumaker and Chen 2009) to urban crime (Gerber 2014 ) and civil unrest (Montgomery et al.
TextSentencer_T25 3319-3354 Sentence denotes 2012; D'Orazio and Yonamine 2015) .
TextSentencer_T26 3355-3522 Sentence denotes Prominent systems, such as the Integrated Crisis Early Warning System (ICEWS) (O'Brien 2010), the Global Database of Events, Location, and Tone (GDELT) (Racette et al.
TextSentencer_T27 3523-3609 Sentence denotes 2014) , Early Model Based Event Recognition Based on Surrogates (EMBERS) (Doyle et al.
TextSentencer_T28 3610-3871 Sentence denotes 2014) , and Recorded Future (Truvé 2013) , harvest data streams from international, regional and local news sources, as well as social media and Internet forums, in order to forecast major political instability events, society-level behavior, and cyber threats.
TextSentencer_T29 3872-4043 Sentence denotes In the field of public health, several systems, including the Global Public Health Intelligence Network (GPHIN) (Mykhalovskiy and Weir 2006) , HealthMap (Brownstein et al.
TextSentencer_T30 4044-4110 Sentence denotes 2008) , ProMED-mail (Woodall 2001) , and Biocaster (Collier et al.
TextSentencer_T31 4111-4187 Sentence denotes 2008) , have been developed to facilitate outbreak detection and monitoring.
TextSentencer_T32 4188-4250 Sentence denotes These systems monitor data streams for diseasespecific events.
TextSentencer_T33 4251-4483 Sentence denotes Social reactions are frequently discussed in news streams covering disease outbreaks, and predicting the occurrence of social response that might disrupt response efforts is a natural next step for global disease monitoring systems.
TextSentencer_T34 4484-4620 Sentence denotes Social response to disease outbreaks is a relatively new area of interest for research, with studies primarily focusing on local events.
TextSentencer_T35 4621-4765 Sentence denotes Research has been conducted on the type, timing, and cause of social response for specific disease outbreaks (Sherlaw and Raude 2013; Lau et al.
TextSentencer_T36 4766-4893 Sentence denotes 2010) , including the 2003 SARS outbreak in Hong Kong (Cheng 2004 ) and the 2000-2001 Ebola outbreak in Uganda (Kinsman 2012) .
TextSentencer_T37 4894-5111 Sentence denotes Analysis of infectious disease outbreaks with and without social response has revealed that severe social response occurs most frequently when pathogens are clinically severe or are novel to local experts (Fast et al.
TextSentencer_T38 5112-5293 Sentence denotes 2015; McGrath 1991) , and that countries with low per-capita health expenditure and high levels of armed conflict and child mortality may be particularly susceptible (Vaisman et al.
TextSentencer_T39 5294-5301 Sentence denotes 2014 ).
TextSentencer_T40 5302-5426 Sentence denotes In the current work, we extend these efforts, laying the groundwork for a near real-time warning system for social response.
TextSentencer_T41 5427-5513 Sentence denotes The method provides forecasts of the social response for the coming 1, 2, and 3 weeks.
TextSentencer_T42 5514-5656 Sentence denotes The model's primary data source was a collection of Internet-based news articles from the HealthMap historical database and daily data stream.
TextSentencer_T43 5657-5685 Sentence denotes HealthMap (Brownstein et al.
TextSentencer_T44 5686-5920 Sentence denotes 2008) , in operation since 2006, aggregates epidemic intelligence from multiple data sources, including news, social media, crowdsourced intelligence, and formal reports to identify health events, often prior to formal investigations.
TextSentencer_T45 5921-6205 Sentence denotes It has been shown that information derived from Internet-based news sources provides early and accurate information for disease detection and analysis of spread (Wilson and Brownstein 2009 ), but the utility of such information for predicting social response has yet to be determined.
TextSentencer_T46 6206-6426 Sentence denotes In this work, we show that Internet-based news sources can used as the basis for a near realtime warning system for social response, especially for country-disease pairs with extensive Internet-based news media coverage.
TextSentencer_T47 6427-6607 Sentence denotes We used over 150,000 internet-based news articles provided by HealthMap, covering 16 diseases in 72 countries and territories around the globe, to validate our models' performance.
TextSentencer_T48 6608-6710 Sentence denotes Our primary objective was to forecast social response in response to the spread of infectious disease.
TextSentencer_T49 6711-6879 Sentence denotes Our method consisted of three primary steps: (1) data acquisition and indicator extraction, (2) social response target development, and (3) social response forecasting.
TextSentencer_T50 6880-7137 Sentence denotes The data acquisition and indicator extraction step consisted of the collection of Internet-based news articles describing disease outbreaks around the world and automated tagging of these articles with indicators of the disease activity and social response.
TextSentencer_T51 7138-7172 Sentence denotes This process is described in Sect.
TextSentencer_T52 7173-7177 Sentence denotes 2.1.
TextSentencer_T53 7178-7186 Sentence denotes In Sect.
TextSentencer_T54 7187-7314 Sentence denotes 2.2 we explain how the social response indicator counts were translated into a target for prediction of future social response.
TextSentencer_T55 7315-7462 Sentence denotes This target was created by identifying periods of unusually severe social response based on the weekly aggregated social response indicator counts.
TextSentencer_T56 7463-7631 Sentence denotes The volume of Internet-based news reporting varies dramatically between countries and diseases, so the raw counts of the indicators were unsuitable for use as a target.
TextSentencer_T57 7632-7709 Sentence denotes Instead, we needed to create a target by comparing against baseline behavior.
TextSentencer_T58 7710-7849 Sentence denotes For example, in China 42% of weeks had at least one indicator of social response to avian influenza; 27% of weeks had over five indicators.
TextSentencer_T59 7850-7970 Sentence denotes Therefore, for China a couple mentions of social response to avian influenza per week may be considered normal behavior.
TextSentencer_T60 7971-8119 Sentence denotes In Zimbabwe, only 3% of weeks had one or more indicators of social response to cholera, making just one mention of social response an unusual event.
TextSentencer_T61 8120-8238 Sentence denotes We used a Bayesian network to compare each week's social response profile with a baseline for the country and disease.
TextSentencer_T62 8239-8396 Sentence denotes Then, we used a statistical process control algorithm to identify periods of time that were sufficiently unusual to be considered periods of social response.
TextSentencer_T63 8397-8504 Sentence denotes This approach was derived from approaches developed for rapid disease outbreak detection (Buckeridge et al.
TextSentencer_T64 8505-8522 Sentence denotes 2005; Wong et al.
TextSentencer_T65 8523-8530 Sentence denotes 2003) .
TextSentencer_T66 8531-8660 Sentence denotes Outbreak detection algorithms take as an input syndromic surveillance data and output whether a disease outbreak is taking place.
TextSentencer_T67 8661-8802 Sentence denotes Our algorithm takes the social response indicator time series as an input and outputs whether an outbreak of social response is taking place.
TextSentencer_T68 8803-8820 Sentence denotes Finally, in Sect.
TextSentencer_T69 8821-8897 Sentence denotes 2.3 we describe the method developed for forecasting future social response.
TextSentencer_T70 8898-8941 Sentence denotes The entire approach is outlined in Fig. 1 .
TextSentencer_T71 8942-9095 Sentence denotes HealthMap collects a continuous stream of near real-time information on disease outbreaks, including Internet-based news articles and government reports.
TextSentencer_T72 9096-9191 Sentence denotes Over 150,000 such free-text documents, collected between 2006 and 2015, were used for modeling.
TextSentencer_T73 9192-9484 Sentence denotes These documents a b c Fig. 1 Overview of methods. a First, news articles were automatically collected and tagged with indicators of disease activity and social response. b Next, an anomaly detection approach was used to identify periods of time with unusually severe social response profiles.
TextSentencer_T74 9485-9745 Sentence denotes These periods were used as targets for social response forecasting. c Finally, the occurrence of unusually severe social response was forecast for the coming week, 2 and 3 weeks described breaking news events for 16 diseases 1 and 72 countries and territories.
TextSentencer_T75 9746-9834 Sentence denotes 2 The documents were automatically cleaned and, when necessary, translated into English.
TextSentencer_T76 9835-10165 Sentence denotes 3 We have developed a natural language processing approach to automatically tag the documents with indicators describing the spread of the disease (4 indicators), the perceived severity of the disease (3 indicators), the preventative measures taken (7 indicators), and the social response (6 indicators; Affective Social Response:
TextSentencer_T77 10166-10224 Sentence denotes Population Fear, Officials Fear; Economic Social Response:
TextSentencer_T78 10225-10288 Sentence denotes Economy Affected, Tourism Affected; Behavioral Social Response:
TextSentencer_T79 10289-10330 Sentence denotes Violence, and Healthcare Worker Protest).
TextSentencer_T80 10331-10478 Sentence denotes These indicators were created by searching within each sentence of the text for combinations of words or phrases describing the events of interest.
TextSentencer_T81 10479-10647 Sentence denotes Eventually, these indicators could be expanded to not include current events of interest, but also events expected to occur in the future according to the news sources.
TextSentencer_T82 10648-10722 Sentence denotes The indicator counts were aggregated by week for each country and disease.
TextSentencer_T83 10723-10828 Sentence denotes Bayesian network describing relationships between country, disease, and social response indicator counts.
TextSentencer_T84 10829-10914 Sentence denotes All social response indicator counts were dependent upon the country and the disease.
TextSentencer_T85 10915-11108 Sentence denotes We allowed relationships between social response indicator counts (e.g. the count for Violence depends upon the count for Population Fear) to be learned, but did not require such relationships.
TextSentencer_T86 11109-11244 Sentence denotes The pictured network is the Bayesian network in the case where no relationships were learned among the social response indicator counts
TextSentencer_T87 11245-11440 Sentence denotes We used a Bayesian network to calculate the joint probability of a social response profile (the vector of social response indicator counts), given prior profiles for the same country and disease.
TextSentencer_T88 11441-11579 Sentence denotes Since Bayesian networks allow for aggregation of many types of signals, they are a popular method for anomaly detection (Buckeridge et al.
TextSentencer_T89 11580-11600 Sentence denotes 2005; Mascaro et al.
TextSentencer_T90 11601-11621 Sentence denotes 2014; Rashidi et al.
TextSentencer_T91 11622-11629 Sentence denotes 2011 ).
TextSentencer_T92 11630-11746 Sentence denotes In the developed Bayesian network, all social response indicator counts were dependent upon the country and disease.
TextSentencer_T93 11747-11927 Sentence denotes Dependencies in the network between social responses (e.g. the count for Violence depends upon the count for Population Fear) could be learned, but were not required to be present.
TextSentencer_T94 11928-12059 Sentence denotes The structure of the network was learned using a hill-climbing greedy search, with the Bayesian Information Criterion as the score.
TextSentencer_T95 12060-12185 Sentence denotes In order to train the network, we required that at least 2 years of news articles be collected for each country-disease pair.
TextSentencer_T96 12186-12256 Sentence denotes Figure 2 depicts the Bayesian network, with all required dependencies.
TextSentencer_T97 12257-12376 Sentence denotes Let c i jkt be the observed indicator count for social response indicator k 4 in country i for disease j during week t.
TextSentencer_T98 12377-12438 Sentence denotes Let x i jkt be a discretized version of the indicator counts:
TextSentencer_T99 12439-12449 Sentence denotes otherwise.
TextSentencer_T100 12450-12453 Sentence denotes (1)
TextSentencer_T101 12454-12573 Sentence denotes The splits used to discretize the social response indicator counts were selected empirically based on analysis of data.
TextSentencer_T102 12574-12894 Sentence denotes For 99.3% of weeks, no articles indicating Population Fear were collected; 0.6% of weeks had 1 article, 0.1% had between 2 and 5 articles, 0.02% had between Let X kt be a random variable following the baseline distribution of social response indicator k, learned by the Bayesian network trained on weeks 1 through t − 1.
TextSentencer_T103 12895-13095 Sentence denotes Then, for each week t, country i, and disease j, we used likelihood weighting to calculate the probability of observing a social response profile as or more severe than the one observed during week t:
TextSentencer_T104 13096-13165 Sentence denotes The probabilities were translated into anomaly scores (Mascaro et al.
TextSentencer_T105 13166-13173 Sentence denotes 2014) :
TextSentencer_T106 13174-13287 Sentence denotes High anomaly scores indicate weeks with abnormally severe social response profiles, compared with previous weeks.
TextSentencer_T107 13288-13368 Sentence denotes For example, a week with a probability of 5% would have an anomaly score of 2.8.
TextSentencer_T108 13369-13437 Sentence denotes A week with a probability of 80% would have an anomaly score of 0.2.
TextSentencer_T109 13438-13565 Sentence denotes The next step was to identify multi-week periods of unusually severe social response, using the weekly anomaly scores, A i jt .
TextSentencer_T110 13566-13676 Sentence denotes For this task, we used the exponentially weighted moving average (EWMA) (Roberts 1959 ) of the anomaly scores.
TextSentencer_T111 13677-13789 Sentence denotes Alternative approaches to finding statistical breakpoints in social media data have been proposed (Servi 2013) .
TextSentencer_T112 13790-13962 Sentence denotes Nevertheless, researchers have found that EWMA is a "simple and robust" method for outbreak identification based on surveillance of sparse syndromic data (Buckeridge et al.
TextSentencer_T113 13963-14156 Sentence denotes 2005) , and, continuing the analogy of social response to disease, it is reasonable to expect that EWMA would provide good performance on a sparse data stream of social response anomaly scores.
TextSentencer_T114 14157-14257 Sentence denotes The EWMA, Z i jt , is the weighted average of all previous anomaly scores and is defined as follows:
TextSentencer_T115 14258-14275 Sentence denotes where λ ∈ (0, 1).
TextSentencer_T116 14276-14434 Sentence denotes Since 2 years (104 weeks) of news articles were collected before the anomaly scores were calculated, the EWMA was started on the 105th week, and Z i j104 = 0.
TextSentencer_T117 14435-14629 Sentence denotes We defined a binary indicator for the presence of unusually severe social response, which was 1 when the EWMA of the anomaly scores exceeded the upper control limit (UC L i jt ) and 0 otherwise:
TextSentencer_T118 14630-14638 Sentence denotes In Sect.
TextSentencer_T119 14639-14739 Sentence denotes 2.3, we introduce models to predict the probability that S i jt = 1 in the coming 1, 2, and 3 weeks.
TextSentencer_T120 14740-14831 Sentence denotes The upper control limit for an EWMA control chart is defined as follows (Montgomery 2009 ):
TextSentencer_T121 14832-14888 Sentence denotes with width of the control limit, L > 0, in-control mean,
TextSentencer_T122 14889-15017 Sentence denotes In the standard implementation of EWMA, both the upper control limit and EWMA are reset after the EWMA passes the control limit.
TextSentencer_T123 15018-15092 Sentence denotes We found that S i jt was most reasonable when these values were not reset.
TextSentencer_T124 15093-15179 Sentence denotes The EWMA parameter, L, was set to 3 based on the recommendation of Montgomery (2009) .
TextSentencer_T125 15180-15272 Sentence denotes The parameter, λ, was tuned by visually inspecting the S i jt indicators for several values.
TextSentencer_T126 15273-15320 Sentence denotes The tuning process used data from 30 countries.
TextSentencer_T127 15321-15395 Sentence denotes Prediction results for these countries are presented in Online Resource 2.
TextSentencer_T128 15396-15498 Sentence denotes The selected value, λ = 0.25, produced results for S i jt that corresponded well with analyst opinion.
TextSentencer_T129 15499-15672 Sentence denotes Figure 3 shows the social response indicator counts, the exponentially weighted moving average, and the social response binary indicator for dengue fever outbreaks in India.
TextSentencer_T130 15673-15762 Sentence denotes Figures depicting several other countries and diseases can be found in Online Resource 1.
TextSentencer_T131 15763-15852 Sentence denotes Our approach to defining the binary social response indicator has a number of advantages.
TextSentencer_T132 15853-16059 Sentence denotes First, especially for country-disease pairs with high volumes of Internet-based media attention, the social response indicator is robust to errors in the automatic tagging of the social response indicators.
TextSentencer_T133 16060-16213 Sentence denotes A single incorrect indicator will typically be insufficient to produce an anomaly score that is high enough to cause the EWMA to cross the control limit.
TextSentencer_T134 16214-16345 Sentence denotes While data cleaning could be used to limit the effect of incorrect indicators, it also risks accidental removal of true indicators.
TextSentencer_T135 16346-16447 Sentence denotes We believe that EWMA is a more conservative approach, and is more suitable to our particular problem.
TextSentencer_T136 16448-16681 Sentence denotes A second advantage is that the social response indicator is comparable across countries and diseases, since it is defined relative to a baseline for the country and disease, removing the effect of differing volumes of media coverage.
TextSentencer_T137 16682-16738 Sentence denotes Finally, the social response indicator is interpretable.
TextSentencer_T138 16739-16835 Sentence denotes A value of 1 always indicates that an unusually severe social response signal has been observed.
TextSentencer_T139 16836-16959 Sentence denotes Now, we introduce the approach to forecasting unusually severe social response the coming 1, 2, and 3 weeks (see Fig. 1c ).
TextSentencer_T140 16960-17088 Sentence denotes The news articles were transformed and structured into time-series, cross-section data with a binary dependent variable (BTSCS).
TextSentencer_T141 17089-17157 Sentence denotes This type of data structure has been previously studied (Beck et al.
TextSentencer_T142 17158-17233 Sentence denotes 1998) , with the key observation that BTSCS data are grouped duration data.
TextSentencer_T143 17234-17459 Sentence denotes Therefore, it is essential to predict the timing of (1) the transition from a state of no social response to a state of social response, and (2) the transition from a state of social response to a state of no social response.
TextSentencer_T144 17460-17637 Sentence denotes Note that transition from a state of no social response to a state of social response is a rare event, and most frequently our models predict that no transition will take place.
TextSentencer_T145 17638-17863 Sentence denotes Also, note that the signal indicating a transition from a state of no social response into a state of social response is likely different from the signal indicating a continuation of social response once it has already begun.
TextSentencer_T146 17864-17993 Sentence denotes It has been suggested that separate models should be built to predict the transitions into and out of a binary state (Beck et al.
TextSentencer_T147 17994-18093 Sentence denotes 2001; Jackman 2000) , and we adopted that suggestion here for our binary social response indicator.
TextSentencer_T148 18094-18327 Sentence denotes Because we were interested in forecasting social response over time horizons longer than 1 week, we defined a target, Y w i jt , indicating the occurrence of social response for disease j in country i in the w weeks following week t:
TextSentencer_T149 18328-18421 Sentence denotes Note that Y 1 i jt = S i j (t+1) , but in general Y w i jt is not equivalent to S i j (t+w) .
TextSentencer_T150 18422-18442 Sentence denotes We built two models:
TextSentencer_T151 18443-18471 Sentence denotes Model 0 → 1 and Model 1 → 0.
TextSentencer_T152 18472-18694 Sentence denotes Model 0 → 1 predicted the transition from a period of no social response into a period of social response [i.e. Model 0 → 1 estimates P(Y w i jt = 1 | S i jt = 0)], and Model 1 → 0 predicted whether the period of Table 1 .
TextSentencer_T153 18695-18811 Sentence denotes During periods with no social response (S i jt = 0), we used Model 0 → 1 to anticipate the onset of social response.
TextSentencer_T154 18812-18916 Sentence denotes Following social response onset (S i jt = 1), we used Model 1 → 0 to predict the end of social response.
TextSentencer_T155 18917-19004 Sentence denotes The Model 0 → 1 training set consisted of all observations from weeks 105 though t − w.
TextSentencer_T156 19005-19256 Sentence denotes The Model 1 → 0 training set consisted only of observations from weeks 105 through t − w that occurred within a period of social response (i.e. the observation from country i, disease j, and time t 0 would be included if t 0 ≤ t − w and S i jt 0 = 1).
TextSentencer_T157 19257-19404 Sentence denotes 5 The training data was kept separate from the testing data by training the model on weeks 105 through t − w and testing on week t for all t > 105.
TextSentencer_T158 19405-19586 Sentence denotes Since transitions from periods of no social response to periods of social response were extremely rare events, 6 the Synthetic Minority Oversampling Technique (SMOTE) (Chawla et al.
TextSentencer_T159 19587-19682 Sentence denotes 2002) was used on the Model 0 → 1 training set to increase the prevalence of the target to 20%.
TextSentencer_T160 19683-19819 Sentence denotes Edited nearest-neighbors (ENN) (Wilson 1972) was then used to remove examples that were misclassified by two of three nearest-neighbors.
TextSentencer_T161 19820-19961 Sentence denotes The combination of SMOTE and ENN has been shown to be effective for a number of prediction problems involving imbalanced data (Batista et al.
TextSentencer_T162 19962-19969 Sentence denotes 2004) .
TextSentencer_T163 19970-20079 Sentence denotes For both Model 0 → 1 and Model 1 → 0, features that had near-zero variance in the training data were removed.
TextSentencer_T164 20080-20194 Sentence denotes Finally, a random forest with 100 trees was trained (Breiman 2001) , and a prediction was generated for Y w i jt .
TextSentencer_T165 20195-20354 Sentence denotes Since the sequence of features is of interest in this problem, Hidden Markov Models could be considered as an alternative classifier (Rabiner and Juang 1986) .
TextSentencer_T166 20355-20485 Sentence denotes The model performance was evaluated on historical data for each of three time horizons: next week, next 2 weeks, and next 3 weeks.
TextSentencer_T167 20486-20606 Sentence denotes For each country-disease pair, 2 years of training Table 2 Overall performance of the social response prediction models.
TextSentencer_T168 20607-20856 Sentence denotes Model performance was evaluated based on six metrics: accuracy, sensitivity, sensitivity looking only at weeks with articles in the preceding 3 weeks, sensitivity looking only at weeks with articles in the preceding week, specificity, and precision.
TextSentencer_T169 20857-21062 Sentence denotes Model 0 → 1 predicted the onset of periods of social response, while Model 1 → 0 predicted the end of such periods data were observed before the first target was predicted for model performance evaluation.
TextSentencer_T170 21063-21247 Sentence denotes In the results, we show how performance is affected by the length of the prediction window (1, 2, or 3 weeks) and by the volume of news articles published for the country-disease pair.
TextSentencer_T171 21248-21455 Sentence denotes Online Resource 2 provides additional summarized prediction results, including results for the model features and results for the set of 30 countries that were used for initial model construction and tuning.
TextSentencer_T172 21456-21572 Sentence denotes We used several metrics to evaluate the performance of our model: accuracy, sensitivity, specificity, and precision.
TextSentencer_T173 21573-21743 Sentence denotes 7 In addition, we evaluated models' sensitivity looking only at weeks with social response that had at least one news article published in the prior one or prior 3 weeks.
TextSentencer_T174 21744-21912 Sentence denotes The two additional sensitivity metrics were used because a large percentage of weeks with social response, 48%, had no articles on the disease in the preceding 3 weeks.
TextSentencer_T175 21913-22051 Sentence denotes Because there were no articles in the preceding weeks, those targets were essentially impossible to predict using data from news articles.
TextSentencer_T176 22052-22128 Sentence denotes Therefore, we wanted to assess our model's sensitivity excluding such weeks.
TextSentencer_T177 22129-22303 Sentence denotes The developed models achieved good performance for country-disease pairs with substantial media coverage, and fair performance for country-disease pairs with little coverage.
TextSentencer_T178 22304-22573 Sentence denotes Table 2 shows both Model 0 → 1 and Model 1 → 0 performance aggregated for all country-a b Fig. 3 Identification of periods of unusually severe social response for dengue fever outbreaks in India. a The social response indicator counts are shown by social response type.
TextSentencer_T179 22574-22825 Sentence denotes Overall, the peaks in social response indicator counts align well with the binary social response indicator, S i jt . b The exponentially weighted moving average of the anomaly scores (Z i jt ) is shown along with the upper control limit (UC L i jt ).
TextSentencer_T180 22826-22957 Sentence denotes The binary social response indicator is 1 when the exponentially weighted moving average surpasses the control limit disease pairs.
TextSentencer_T181 22958-23127 Sentence denotes Model 0 → 1 exhibited 46% sensitivity in predicting the onset of a social response period in the next week for weeks with at least one articles in the preceding 3 weeks.
TextSentencer_T182 23128-23233 Sentence denotes Model predictions over longer time horizons were slightly less sensitive, but substantially more precise.
TextSentencer_T183 23234-23368 Sentence denotes Model 0 → 1's relatively low precision for the target Y 1 i jt appears to result largely from premature prediction of social response.
TextSentencer_T184 23369-23518 Sentence denotes Twenty-four percent of Model 0 → 1 false positive predictions for Y 1 i jt occurred in the 6 weeks prior to the onset of a period of social response.
TextSentencer_T185 23519-23721 Sentence denotes In these cases, the model likely detected indications that the situation was worsening, but predicted that the transition into a period of social response would take place sooner than actually occurred.
TextSentencer_T186 23722-23866 Sentence denotes Model 1 → 0 consistently predicted the end of periods of social response for all time horizons, with over 74% specificity and over 90% accuracy.
TextSentencer_T187 23867-23974 Sentence denotes Figure 4 shows the predictions for social response in the next 2 weeks for dengue fever outbreaks in India.
TextSentencer_T188 23975-24087 Sentence denotes During no social response periods, the model predicted a low probability of social response in the next 2 weeks.
TextSentencer_T189 24088-24203 Sentence denotes As the onset of a period of social response was approached, the predicted probability of social response increased.
TextSentencer_T190 24204-24277 Sentence denotes As the period of social response ended, the predicted probabilities fell.
TextSentencer_T191 24278-24382 Sentence denotes Additional figures depicting results for other countries and diseases can be found in Online Resource 3.
TextSentencer_T192 24383-24509 Sentence denotes The performance of the model varied depending upon the quantity of Internet-based news reporting for the country-disease pair.
TextSentencer_T193 24510-24723 Sentence denotes Table 3 compares the Model 0 → 1 performance for country-disease pairs that had a median of 20 or more articles per year in our data 8 with performance for pairs that had median of fewer than 20 articles per year.
TextSentencer_T194 24724-24792 Sentence denotes The model performance was greatly improved with higher media volume.
TextSentencer_T195 24793-25026 Sentence denotes For country-disease pairs Table 3 Comparison of performance for predicting the onset of social response (Model 0 → 1) for countrydisease pairs with a median of 20 or more news articles per year and those with fewer articles per year.
TextSentencer_T196 25027-25276 Sentence denotes Model performance was evaluated based on six metrics: accuracy, sensitivity, sensitivity looking only at weeks with articles in the preceding 3 weeks, sensitivity looking only at weeks with articles in the preceding week, specificity, and precision.
TextSentencer_T197 25277-25501 Sentence denotes Model sensitivity and precision were dramatically higher for the country-disease pairs with a median of 20 or more articles per year, than for the pairs with fewer articles per year Accuracy Sensitivity Specificity Precision
TextSentencer_T198 25502-25560 Sentence denotes All weeks Weeks with one or more articles in prior 3 weeks
TextSentencer_T199 25561-25606 Sentence denotes Weeks with one or more articles in prior week
TextSentencer_T200 25607-25812 Sentence denotes Next with a median of 20 or more articles per year, the onset of social response in the next week was correctly predicted over 60% of the time (67% of the time among events with articles in the past week).
TextSentencer_T201 25813-25896 Sentence denotes The overall accuracy of the model was over 83% for each of the three time horizons.
TextSentencer_T202 25897-26222 Sentence denotes High accuracy (over 98% for all three time horizons) was achieved for country-disease pairs with a median of less than 20 articles per year, but the model was not successful at predicting the onset of social response, with only 12% sensitivity for Model 0 → 1 in predicting the occurrence of social response in the next week.
TextSentencer_T203 26223-26553 Sentence denotes Sensitivity was much higher, 31%, when looking only at weeks with one or more articles published in the prior week, suggesting that lack of articles in the weeks preceding the onset of social response contributes to the low sensitivity of Model 0 → 1 for country-disease pairs with median news coverage below 20 articles per week.
TextSentencer_T204 26554-26653 Sentence denotes Model 1 → 0 performance for different volumes of news media coverage is shown in Online Resource 3.
TextSentencer_T205 26654-26949 Sentence denotes The presented results confirm that information derived from Internet-based news sources not only provides early and accurate information for disease detection and analysis of spread, but can also be successfully used for detecting and predicting social response associated with detected disease.
TextSentencer_T206 26950-27185 Sentence denotes The developed models predicting the onset of social response and monitoring its progress and subsequent decline achieved good performance for diseases that receive substantial media attention in the country in which they are spreading.
TextSentencer_T207 27186-27363 Sentence denotes For countrydisease pairs with a median of more than 20 articles per year in our data, the onset of social response in the next week was correctly predicted over 60% of the time.
TextSentencer_T208 27364-27490 Sentence denotes Sensitivity was higher still, 67%, when looking only at social response events with news articles published in the prior week.
TextSentencer_T209 27491-27761 Sentence denotes The continuation of periods of social response was predicted with over a b Fig. 4 Predicted probability of unusually severe social response in the next 2 weeks for dengue fever outbreaks in India. a The social response indicator counts are shown by social response type.
TextSentencer_T210 27762-27940 Sentence denotes Periods during which social response was occurring (S i jt = 1) are shaded in grey. b The predicted probability of social response in the next 2 weeks (P(Y 2 i jt = 1)) is shown.
TextSentencer_T211 27941-28114 Sentence denotes The predictions are colored according to whether an incorrect (false positive or false negative; red) or correct (true positive or true negative; black) prediction was made.
TextSentencer_T212 28115-28416 Sentence denotes Overall, the predictions exhibit the desired behavior-low probability of social response in the next 2 weeks was predicted during no social response periods, and, as a social response period were approached, the predicted probability increased. (Color figure online) 95% success for all time horizons.
TextSentencer_T213 28417-28504 Sentence denotes Their end was also predicted consistently, with over 74% success for all time horizons.
TextSentencer_T214 28505-28798 Sentence denotes Compared with predictions for social response in the coming week, predictions for social response in the coming 2 and 3 weeks were slightly less sensitive (39% for the next 3 weeks vs. 36% for the next week), but substantially more precise (22% for the next 3 weeks vs. 15% for the next week).
TextSentencer_T215 28799-28884 Sentence denotes Thus, in practice, predictions over relatively long time horizons may be most useful.
TextSentencer_T216 28885-29020 Sentence denotes Country-disease pairs that received little media attention were not good candidates for predicting the onset of future social response.
TextSentencer_T217 29021-29135 Sentence denotes Internet-based news reporting on these pairs often does not begin until after social response has already started.
TextSentencer_T218 29136-29333 Sentence denotes For country-disease pairs with a median of less than 20 articles per year, 58% of weeks with social response had no articles about the disease in the prior 3 weeks; 85% had three or fewer articles.
TextSentencer_T219 29334-29491 Sentence denotes For such diseases, the main utility of our system is in identifying social response when it occurs, rather than predicting when it will happen in the future.
TextSentencer_T220 29492-29603 Sentence denotes There are several reasons why a disease would receive little media attention in a country prior to an outbreak.
TextSentencer_T221 29604-29764 Sentence denotes One reason is that the country has a relatively undeveloped online news reporting system, and few articles are published about any type of disease transmission.
TextSentencer_T222 29765-29903 Sentence denotes Other possible reasons are that the disease is perceived as benign and not newsworthy, or that government censorship suppresses reporting.
TextSentencer_T223 29904-30079 Sentence denotes In these cases, it is possible that alternative data sources 9 could be used to supplement data from Internet-based news media to improve prediction of future social response.
TextSentencer_T224 30080-30218 Sentence denotes Another reason why a disease would receive little reporting prior to an outbreak is that the disease is newly introduced into the country.
TextSentencer_T225 30219-30322 Sentence denotes In our data, the emergence of a new disease in a country is frequently associated with social response.
TextSentencer_T226 30323-30561 Sentence denotes There is little that can be done to improve prediction of the onset of this social response, because forecasting the exact timing of the introduction of a disease into a country is beyond the ability of current biosurveillance techniques.
TextSentencer_T227 30562-30781 Sentence denotes In summary, we have developed an approach for anticipating social response to infectious disease spread in near real-time, and have evaluated it using outbreaks of 16 different diseases in 72 locations around the world.
TextSentencer_T228 30782-30964 Sentence denotes We have demonstrated that Internet-based news can serve as a good data source for predicting social reaction to disease spread, when there is sufficient news coverage of the disease.
TextSentencer_T229 30965-31190 Sentence denotes In general, our system is most effective for countries with active Internet-news reporting systems and for diseases that receive frequent coverageavian influenza, cholera, dengue fever, influenza, malaria, measles, and polio.
TextSentencer_T230 31191-31495 Sentence denotes By identifying ongoing social response and alerting decision makers and biosurveillance experts to probable social response in the near future, this warning system will provide responders with the information needed to better combat both the disease spread itself and its detrimental social consequences.