PubAnnotation

Id	Subject	Object	Predicate	Lexical cue
T1	0-69	Epistemic_statement	denotes	Early detection of dynamic harmful cascades in large-scale networks ଝ
T2	80-217	Epistemic_statement	denotes	Quickly detecting harmful cascades in networks can allow us to analyze the causes and prevent further spreading of destructive influence.
T3	371-567	Epistemic_statement	denotes	However, the harmful cascades are usually dynamic (e.g., the cascade initiators and diffusion trajectories can change over the time), which can severely destroy the robustness of selected sensors.
T4	673-830	Epistemic_statement	denotes	Motivated by the observation, in this paper we investigate the scalable sensor selection problem for early detection of dynamic harmful cascades in networks.
T5	1304-1436	Epistemic_statement	denotes	Based on that, we propose an efficient upper bound based greedy (UBG) algorithm with the theoretical performance guarantee reserved.
T6	1881-2038	Epistemic_statement	denotes	A contagious disease like Severe Acute Respiratory Syndrome (SARS) can spread quickly through a population contact network and lead to an epidemic [12, 34] .
T7	2039-2162	Epistemic_statement	denotes	A computer virus on a few servers can fast spread to other servers or computers in a computer connection network [15, 17] .
T8	2163-2281	Epistemic_statement	denotes	In a similar vein, a rumor started by a few individuals can spread quickly through the online social network [2, 26] .
T9	2282-2473	Epistemic_statement	denotes	It is crucial to detect the harmful cascades as soon as they happen or shortly thereafter, since it allows us to study the causes and prevent further spreading of harmful influence [32, 38] .
T10	2474-2698	Epistemic_statement	denotes	In practice, it is commonly infeasible or unaffordable to monitor all nodes at all times, and therefore a common way to detect cascades is to select important nodes where we can place sensors for monitoring [7, 19, 31, 39] .
T11	2699-2979	Epistemic_statement	denotes	However, existing methods usually viewed the cascade data as static and deterministic, ignoring an important fact that the harmful cascades are usually dynamic and time-variant, in the sense that the cascade initiators and diffusion trajectories can change randomly over the time.
T12	2980-3117	Epistemic_statement	denotes	Actually the dynamic brings new challenges to network monitoring, since it can severely destroy the robustness of selected sensors [22] .
T13	3118-3288	Epistemic_statement	denotes	Meanwhile the existing sensor selection algorithms can only work for small networks and not be scalable well to large networks of the https://doi.org/10.1016/j.jocs.2017.
T14	3360-3533	Epistemic_statement	denotes	Motivated by these observations, in this paper we investigate the scalable solutions to sensor selection problem for early detection of dynamic harmful cascades in networks.
T15	4109-4444	Epistemic_statement	denotes	Under this unified network framework, we specifically propose a new dynamic cascade model to describe harmful cascade diffusions, and define a detection time minimization (DTM) problem S * = argmin \|S\|=k,S⊆V D(S), where k is a given parameter determined by budget or monitor capacity, S is sensor nodes set, and D(S) is detection time.
T16	4858-5086	Epistemic_statement	denotes	To further address the scalability issue, we propose an efficient upper bound based greedy (UBG) algorithm and its two accelerations Quickest-Path-UBG and Local-Reduction-UBG to cater for different types of large-scale networks.
T17	5307-5505	Epistemic_statement	denotes	Detecting the whole network by monitoring finite sensor nodes has been widely applied to detect water contaminations in a water distribution network [19] and virus outbreaks in a human society [7] .
T18	5632-5760	Epistemic_statement	denotes	However the effects along this idea are commonly unsatisfied, since they ignore the multi-step complexity of network diffusions.
T19	6184-6315	Epistemic_statement	denotes	In their works the data are a set of deterministic scenarios and the random property of diffusion are not taken into consideration.
T20	6385-6575	Epistemic_statement	denotes	In addition to the above two types of methods, another related work [1] set sensors along fixed paths in the network so as to gather sufficient information to locate possible contaminations.
T21	7174-7351	Epistemic_statement	denotes	By contrast, our work is geared to investigate the scalable sensor selection problem for early detection of dynamic harmful cascades in networks from a model-driven prospective.
T22	7352-7659	Epistemic_statement	denotes	The main difference from previous works is that we take dynamic properties of cascades into consideration: (1) the cascade initiator is dynamic since we do not know who will initiate another harmful diffusion next time, and (2) the diffusion process is dynamic since the propagation trajectory is uncertain.
T23	7660-7789	Epistemic_statement	denotes	Under these two dynamic sources, we aim to accurately and fast select k nodes as sensors for early detection of harmful cascades.
T24	7790-7958	Epistemic_statement	denotes	Besides, as the size of network becomes more and more large-scale, we also need to ensure the effectiveness of the proposed sensor selection methods for large networks.
T25	8513-8631	Epistemic_statement	denotes	• We show the NP-hardness of DTM problem as it can be shown to contain the Set Cover problem as a simple special case.
T26	8844-8913	Epistemic_statement	denotes	• We theoretically establish new upper bounds for the remaining time.
T27	8914-9128	Epistemic_statement	denotes	Based on these bounds, we further propose a new Upper Bound based Greedy (UBG in short) algorithm which can significantly reduce the number of estimations in greedy-based algorithms, especially at the initial step.
T28	9355-9485	Epistemic_statement	denotes	In addition, our methods not only aim at early detection of harmful cascades, but also shed light on a host of other applications.
T29	9486-9639	Epistemic_statement	denotes	For example, the goal of Emerging Topic Detection is to identify emergent topics in a social network, assuming full access to the stream of all postings.
T30	9895-10150	Epistemic_statement	denotes	Another example, an emerging trend in algorithmic stock trading is the use of automatic search through the Web and social networks for pieces of information that can be used in trading decisions before they appear in the more popular news sites [23, 27] .
T31	10256-10358	Epistemic_statement	denotes	The theoretical framework and method established in this paper can also be used in these applications.
T32	10739-10860	Epistemic_statement	denotes	To prune the unnecessary estimation calls, we analysis the upper bounds for the new proposed UBG algorithm in Section 5 .
T33	11596-11805	Epistemic_statement	denotes	By considering operational models for the dynamic diffusion of a harmful item through a network G, represented by a directed graph, we will speak of each individual node as being either infected or uninfected.
T34	11806-12027	Epistemic_statement	denotes	Motivated by the properties of contaminants like rumor and virus, we will focus from now on the progressive case in which nodes can switch from being uninfected to being infected, but do not switch in the other direction.
T35	12166-12450	Epistemic_statement	denotes	Thus, the process can be reviewed with respect to some particular uninfected node v: as time unfolds, more and more of v's neighbors become infected; at some point, this may cause v to become infected, and v's decision may in turn trigger further infections by nodes connected with v.
T36	12451-12658	Epistemic_statement	denotes	Motivated by above observation, we put forward a dynamic susceptible-infected (DSI) model for the harmful cascade spreading, which can be seen as a variant of the common susceptible-infected (SI) model [3] .
T37	12905-13000	Epistemic_statement	denotes	can be seen as a priori knowledge of nodes being infected initially, where u ∈ V (u) = 1 holds.
T38	13294-13417	Epistemic_statement	denotes	Then, at step t + 1, each node u ∈ J t may infect its out-neighbors v ∈ V \J t with an independent probability of ip(u, v).
T39	14330-14549	Epistemic_statement	denotes	Note that the detection time (u, S) can be viewed as a special type of stopping time in stochastic process theory [11] .ᮀ Let the variable X denote the random initiator with distribution , i.e., for every u ∈ V, we have
T40	14550-14656	Epistemic_statement	denotes	The (expected) detection time from a random initiator to one of the selected sensors in S can be define as
T41	14657-14716	Epistemic_statement	denotes	where E is an expectation operator under the cascade model.
T42	14717-14820	Epistemic_statement	denotes	By the conditional expectation, we can calculate the expected detection time D(S) in the following way:
T43	14821-14832	Epistemic_statement	denotes	Indeed, Eq.
T44	14833-14861	Epistemic_statement	denotes	(5) can be reached like this
T45	14862-14921	Epistemic_statement	denotes	where (X) denotes the -algebra generated by variable X. Eq.
T46	15082-15245	Epistemic_statement	denotes	Our goal is to find k nodes as sensors in a network in order to minimize the time until an infection -starting from a random initiator in the network -is detected.
T47	15246-15430	Epistemic_statement	denotes	Formally, we formulate the problem as the following discrete optimization problem: we want to find a subset S * ⊆ V such that \|S * \| = k and D(S * ) = min{D(S)\|\|S\| = k, S ⊆ V}, i. e .,
T48	15431-15501	Epistemic_statement	denotes	where k is a given parameter determined by budget or monitor capacity.
T49	15502-15578	Epistemic_statement	denotes	We call this as detection time minimization problem (DTM problem for short).
T50	15707-15851	Epistemic_statement	denotes	(6), the first issue is how to calculate the objective function D(S) given a sensor set S. This question isn't as easy as its description in Eq.
T51	16034-16227	Epistemic_statement	denotes	Thus, it is not initially obvious that the process is even well-defined, in the sense that it yields the same distribution over outcomes regardless of how we schedule the attempted activations.
T52	16228-16365	Epistemic_statement	denotes	Actually the computation of D(S) is #P-hard, by showing a reduction from the positive partitioned 2-DNF assignment counting problem [9] .
T53	16523-16725	Epistemic_statement	denotes	ᮀ Since it is intractable to exactly compute D(S) on a typically sized graph, a natural idea is to employ Monte-Carlo methods to estimate D(S), which can be implemented in two different ways as follows:
T54	16726-16912	Epistemic_statement	denotes	The expected detection time D(S) is obtained by directly simulating the random process of diffusion triggered by a random node, say u, chosen according to the distribution defined as Eq.
T55	17778-17948	Epistemic_statement	denotes	We can flip all coins a priori to produce a weighted graph G = (V, E, c), where an edge (u, v) is labeled with the time cost c(u, v) -a sample of random variable C(u, v).
T56	18898-19076	Epistemic_statement	denotes	For estimating a specific D(S), the simulation method is faster, because it only needs to examine a small portion of edges while the snapshot method has to examine all the edges.
T57	19515-19660	Epistemic_statement	denotes	(6), as it can be shown to contain the Set Cover problem as a simple special case, then we propose a simple greedy algorithm for the DTM problem.
T58	19829-19918	Epistemic_statement	denotes	., u n }, we wish to know whether there exist k of the subsets whose union is equal to U.
T59	19919-19953	Epistemic_statement	denotes	We can assume that k < n < m here.
T60	19954-20031	Epistemic_statement	denotes	We show that this can be viewed as a special case of the optimal problem (6).
T61	20345-20458	Epistemic_statement	denotes	Define the probability distribution = { (u), u ∈ V} on the knowledge of nodes being initially infected as follows
T62	20459-20576	Epistemic_statement	denotes	The Set Cover problem is equivalent to deciding if there is a set S of k nodes in this bipartite graph with D(S) ≤ 1.
T63	20693-21115	Epistemic_statement	denotes	Monitoring k nodes to detect diffusion initiated from U corresponding to sets in a Set Cover solution results in covering all n nodes corresponding to the ground set U, and if any set S of k nodes has D(S) ≤ 1, then the Set Cover problem must be solvable.ᮀ Since the optimization problem is NP-hard and the network is prohibitively large, we cannot compute the optimum value to verify the actual quality of approximations.
T64	21196-21290	Epistemic_statement	denotes	To make better use of the greedy algorithm, we consider an equivalent optimization problem Eq.
T65	21502-21531	Epistemic_statement	denotes	Theoretically, a non-negative
T66	21532-21553	Epistemic_statement	denotes	for all S ⊆ T. Proof.
T67	22021-22136	Epistemic_statement	denotes	(14) is proven and we thus get that R(S) is submodular.ᮀ By above properties in Theorem 4, the problem given in Eq.
T68	22137-22408	Epistemic_statement	denotes	(9) can be approximated by the greedy algorithm in Algorithm 1 with the set function f : = R. For any submodular and monotone function f with f(∅) =0, the problem of finding a set S of size k that maximizes f(S) can be approximated by the greedy algorithm in Algorithm 1.
T69	22580-22755	Epistemic_statement	denotes	It is shown that the algorithm guarantees an approximation ratio of f(S)/f(S * ) ≥ 1 −1/e, where S is the output of the greedy algorithm and S * is the optimal solution [30] .
T70	22756-22972	Epistemic_statement	denotes	In Greedy(k, R), a thorny problem is that there is no efficient way to compute R(S) given a placement S, and we turn to run snapshots for 10, 000 trials to obtain an accurate estimate of R(S), mentioned in Section 3.
T71	23291-23552	Epistemic_statement	denotes	Hence, in order to improve the efficiency of Greedy(k, R), one can either reduce the number of calls for evaluating R(S), or develop advanced heuristic algorithms which can conduct fast and approximate estimations for R(S) at the expense of accuracy guarantees.
T72	23715-23908	Epistemic_statement	denotes	The principle behind is that the marginal gain of a node in the current iteration cannot be more than that in previous iterations, and thus the number of estimation calls can be greatly pruned.
T73	24159-24399	Epistemic_statement	denotes	In particular, in the first round to establish the initial upper bounds, CELF needs to estimate R({v}) using MC simulations for each node v, leading to N times of MC calls, which is time-consuming, especially when the network is very large.
T74	24400-24600	Epistemic_statement	denotes	The limitation leads to a rather fundamental question that, can we derive an upper bound of R({v}) which can be used to further prune unnecessary detection time estimations (MC calls) in Greedy(k, R)?
T75	24728-24870	Epistemic_statement	denotes	Based on the bound, we propose a new greedy algorithm Upper Bound based Greedy (UBG for short), which outperforms the original CELF algorithm.
T76	24871-24981	Epistemic_statement	denotes	Essentially different from the bounds that derived for the influence spread under the IC model [40] , we here
T77	24982-25036	Epistemic_statement	denotes	In this part, we aim to derive an upper bound of R(v).
T78	25288-25364	Epistemic_statement	denotes	For v ∈ V , the remaining time R(v) under the DSI model can be calculated as
T79	25365-25371	Epistemic_statement	denotes	Proof.
T80	25505-25646	Epistemic_statement	denotes	ᮀ Proposition 2 reveals that we can treat the global remaining time R(v) as a summation of all T max propagation steps of local probabilities
T81	25647-25738	Epistemic_statement	denotes	Based on Proposition 2, a following question is, what is the relationship between two sets,
T82	25739-25753	Epistemic_statement	denotes	Proposition 3.
T83	25805-25885	Epistemic_statement	denotes	For k ≥ 1, by the definition of conditional expectation and DSI model, we obtain
T84	25886-25954	Epistemic_statement	denotes	where the first '≤' is due to 1 {v/ ∈J k−1 } ≤ 1, and the second '≤'
T85	25955-26142	Epistemic_statement	denotes	x i in the above derivation.ᮀ Proposition 3 clearly identifies the ordering relationship between two adjacent elements in the series P u (v ∈ J 0 ), P u (v ∈ J 1 ), P u (v ∈ J 2 ), · · ·.
T86	26484-26542	Epistemic_statement	denotes	Now we can rewrite Proposition 3 by using the matrix form,
T87	26543-26627	Epistemic_statement	denotes	By iteration, we further get that u t ≤ u 0 · (E + IP) t , where E is a unit matrix.
T88	26775-26875	Epistemic_statement	denotes	With the above preparations, we can present the results on upper bound of remaining time as follows,
T89	26876-26945	Epistemic_statement	denotes	where E is a unit matrix and [A] (u,v) means the element at position
T90	26946-27029	Epistemic_statement	denotes	where is a prior distribution on the likelihood of nodes being the infected source.
T91	27030-27101	Epistemic_statement	denotes	We first use a toy example to explain how to calculate the upper bound.
T92	27286-27370	Epistemic_statement	denotes	To overcome the difficulty, we adopt the following procedure to calculate Tmax−1 t=0
T93	27371-27493	Epistemic_statement	denotes	If t ≥1 ≤ T max − 1, it implies that we do not need to calculate (E + IP) t any more when t ≥ t ≥1 , and therefore we have
T94	27494-27546	Epistemic_statement	denotes	where 1 is a N × N matrix with all elements being 1.
T95	27822-27908	Epistemic_statement	denotes	Based on the upper bound, we propose a new UBG algorithm for early outbreak detection.
T96	28062-28246	Epistemic_statement	denotes	However, CELF demands N (the number of nodes in the network) remaining time estimations to establish the initial bounds of marginal increments, which is time expensive on large graphs.
T97	28440-28573	Epistemic_statement	denotes	This way, the nodes are ranked by their upper bound scores which can be used to prune unnecessary calculations in the CELF algorithm.
T98	28610-28748	Epistemic_statement	denotes	Even with the UBG algorithm proposed in Section 5, its running time is still unbearable and may not be suitable for large social networks.
T99	28749-28924	Epistemic_statement	denotes	Although UBG can greatly reduce the number of remaining time estimation calls, each estimation call is very timeconsuming, as it needs produce enough samples and average them.
T100	28925-29091	Epistemic_statement	denotes	Hence a possible alternative to further accelerate UBG is to employ heuristics to approximate the remaining time R(S) or the marginal return R(S ∪ {u}) − R(S) in UBG.
T101	29234-29474	Epistemic_statement	denotes	The experimental results will show that (i) both of them are efficient in terms of running time, and (ii) the former one applies to sparse networks better, while the latter one is more effective in dense networks in terms of detection time.
T102	29584-29653	Epistemic_statement	denotes	(10), the key point of estimating R(S) is how to estimate E[ (u, S)].
T103	29654-29966	Epistemic_statement	denotes	Note that E[ (u, S)] measures the expected time delay of a diffusion initiated from node u propagating to sensor set S. An intuitive idea is that the most likely propagation path should be the quickest path from node u to the set S. Hence, a question is how to measure the quickest path from node u to the set S?
T104	30037-30177	Epistemic_statement	denotes	According to the geometric distribution, if a random variable X is distributed geometrically with a parameter p, it follows that E[X] = 1/p.
T105	30178-30442	Epistemic_statement	denotes	Since the time that a node u spends in infecting its uninfected neighborhood v is distributed geometrically with the parameter ip(u, v), the value 1/ip(u, v) should be the expected time cost that an infection propagates from node u to node v along the edge (u, v).
T106	30696-30841	Epistemic_statement	denotes	(27) Intuitively, d(u, S) denotes the expected time that an infection propagates from u to S along the quickest path in the graph G = (V, E, m) .
T107	30977-31009	Epistemic_statement	denotes	Theoretically we have Theorem 6.
T108	31010-31171	Epistemic_statement	denotes	When the shortest path from u to S in graph G = (V, E, m) is unique, the quantity d(u, S) ∧ T max can be used to approximate the detection time E[ (u, S)], i.e.,
T109	31172-31178	Epistemic_statement	denotes	Proof.
T110	31437-31476	Epistemic_statement	denotes	(28), we have the following derivation,
T111	31477-31520	Epistemic_statement	denotes	Hence we can approximate the remaining time
T112	31521-31611	Epistemic_statement	denotes	rather than the heavy Monte-Carlo estimation MC(S ∪ {u}) in the 09th row of UBG algorithm.
T113	31706-31945	Epistemic_statement	denotes	Note that the reason we call this acceleration as Quickest-Path-UBG rather than Shortest-Path-UBG lies in that, our Quickest-Path-UBG introduces a new measurement to reflect the infection time cost rather than the distance in common sense.
T114	32091-32275	Epistemic_statement	denotes	To facilitate the description, we will approximate the marginal reduction D(S) − D(S ∪ {u}) rather than the marginal return R(S ∪ {u}) − R(S), due to the fact that R(S) = T max − D(S).
T115	32418-32580	Epistemic_statement	denotes	These methods usually looked over the multi-step complexity of network diffusions, and assumed that the infection can propagate ahead only one hop before the end.
T116	32581-32689	Epistemic_statement	denotes	In other words, if a node initiates an infection, the diffusion spread is at most its first-order neighbors.
T117	32690-32735	Epistemic_statement	denotes	We call this as one-hop assumption hereafter.
T118	32973-33116	Epistemic_statement	denotes	Under this observation, we will propose a tractable method upon the one-hop assumption to approximate the marginal reduction D(S) − D(S ∪ {u}).
T119	33117-33554	Epistemic_statement	denotes	We first introduce a useful result in probability theory: if the random variable X i is distributed geometrically with parameter p i for i = 1, · · ·, n and they are independent, then ∧ i=1,· · ·,n X i is distributed geometrically with parameter 1 − According to the characteristic of the DSI model, the time cost that an infected node w spends in infecting its uninfected neighbor v is distributed geometrically with parameter ip(w, v).
T120	33680-33774	Epistemic_statement	denotes	Under the onehop assumption, the expected detection time of w can thus be calculated like this
T121	33775-33896	Epistemic_statement	denotes	If the initially infected node w has no neighbors in sensor set S, the expected detection time of w is defined as T max .
T122	35738-35822	Epistemic_statement	denotes	The Twitter and Epinions data sets can both be obtained from Stanford Datasource 2 .
T123	35823-35954	Epistemic_statement	denotes	Epinions is a general consumer review site where visitors can read reviews about a variety of items to help them decide a purchase.
T124	35955-36072	Epistemic_statement	denotes	The synthetic Small-world data set is the type of graphs in which each node can be reached by a small number of hops.
T125	36933-37066	Epistemic_statement	denotes	A heuristic algorithm which requires any pair of sensors to be at least d hops away, where d is as large as it can choose k monitors.
T126	37864-38159	Epistemic_statement	denotes	We mainly report results on a uniform infection probability of 0.1 assigned to each directed link in the network, i.e., ip(u, v) = 0.1 for any directed edge (u, v) ∈ E. One can refer to the work [14, 33, 36] for learning real values of the parameters {ip(u, v) : (u, v) ∈ E} from available data.
T127	38414-38663	Epistemic_statement	denotes	Here we select 10 nodes with the highest in-degrees in each network as the sensor set S. We can find that Propagation Simulation and Snapshot Simulation release almost the same estimation results, which confirms their equivalence in estimating D(S).
T128	38912-39098	Epistemic_statement	denotes	We can see that the cumulative time cost of Propagation Simulation increases linearly, while that of Snapshot Simulation has a big jump at the first sensor set and then increases slowly.
T129	39099-39310	Epistemic_statement	denotes	The reason behind is that Snapshot Simulation needs to establish numerous snapshots to estimate the first D(S 1 ), and these established snapshots can be reused in the posterior estimations of {D(S i )} 10 i=2 .
T130	39311-39400	Epistemic_statement	denotes	Table 4 shows the gap between the real value of remaining time R(S) and its upper bounds.
T131	39678-39847	Epistemic_statement	denotes	From the results, we can observe that the call number in UBG and its two accelerations are significantly reduced compared to that in CELF, especially at the first round.
T132	39848-39985	Epistemic_statement	denotes	One may notice that in Table 5 , CELF occasionally defeats our methods, but the total call number of our methods are much less than CELF.
T133	40265-40362	Epistemic_statement	denotes	From the observation, we can conclude that our UBG is more efficient than CELF on large networks.
T134	41459-41543	Epistemic_statement	denotes	One may argue that such a low improvement of UBG can be neglected in large networks.
T135	42294-42381	Epistemic_statement	denotes	We assign a uniform infection probability ip to each directed link under the DSI Model.
T136	42407-42502	Epistemic_statement	denotes	7 , we can conclude that with the parameter ip growing larger, the CELF is more time-consuming.
T137	43266-43394	Epistemic_statement	denotes	We take infimum operations on both sides of the first step conclusion, i.e., inf T(u, S 1 ∪ S 2 ) = inf T(u, S 1 ) ∪ T(u, S 2 ).
T138	43734-44069	Epistemic_statement	denotes	3) Similarly, in order to prove this property, we only need to , {u}) , i.e., for any t ≥ 0, the probability that t is located in T(u, {v}) equals the probability that t is located in 4) If S ⊂ T, in order to prove this property, we only need to prove T(u, S) ⊂ T(u, T ) and notice the fact that the infimum of a larger set is smaller.
T139	44203-44285	Epistemic_statement	denotes	So, T(u, S) ⊂ T(u, T ), by which we can easily conclude the property, is verified.
T140	44489-44536	Epistemic_statement	denotes	Therefore, we can conclude 0 ≤ (u, S) ≤ T max .
T141	44984-45135	Epistemic_statement	denotes	(A2): G = (V, E) is a directed graph, where each edge e ∈ E is associated with a weight w e which is geometrically distributed with a parameter p, i.e.
T142	45224-45320	Epistemic_statement	denotes	What is the probability of the event that the shortest path from s to t has length T at least k?
T143	45498-45668	Epistemic_statement	denotes	Also, we add a source node s and an edge (s, x i ), associated with weight w (s,x i ) which is geometrically distributed with parameter 1/2, for any i ∈ {i : (i, j) ∈ E}.
T144	45960-46215	Epistemic_statement	denotes	Now we can build a probability-preserving bijection between the valuations of X i , Y j and the subgraphs of the constructed random graph: for a valuation , the corresponding subgraph is the one where each edge adjacent to x i has length 1 iff (X i ) = 1.
T145	46638-47001	Epistemic_statement	denotes	Note that any edge of length greater than 1 is irrelevant as the structure of the graph, which ensures it can never be part of a path of length 3 from s to t, For any path from s to t must jump three times: from s to some x i , from x i to some y j , and from y j to t. Thus, from what we have stated, (A1) can be reducible to (A2), which implies (A2) is #P-hard.
T146	47002-47057	Epistemic_statement	denotes	We aim to solve the problem (denoted by (A)) of calcu-
T147	47058-47210	Epistemic_statement	denotes	In order to prove the random variables T(u, S) and c(u, S) are identically distributed, we need to prove P[T (u, S) = t] = P[c(u, S) = t] for any t > 0.
T148	47211-47274	Epistemic_statement	denotes	On the one hand, we can prove P[T (u, S) = t] ≤ P[c(u, S) = t].
T149	47275-47385	Epistemic_statement	denotes	Given t > 0, we find that for any simulation with T(u, S) = t, a snapshot with c(u, S) = t can be constructed.
T150	48125-48193	Epistemic_statement	denotes	On the other hand, we can also get P[c(u, S) = t] ≤ P[T (u, S) = t].
T151	48194-48278	Epistemic_statement	denotes	For any snapshot with c(u, S) = t, a simulation with T(u, S) = t can be constructed.
T152	48855-48944	Epistemic_statement	denotes	We now can construct a simulation of the diffusion model (I t ) t≥0 = (J t+1 \J t ) t≥0 .
T153	49403-49554	Epistemic_statement	denotes	To address this problem, we proposed an upper bound based greedy (UBG) algorithm and its two accelerative algorithms to cater for lager scale networks.
T154	49792-49840	Epistemic_statement	denotes	There are several interesting future directions.
T155	49841-50012	Epistemic_statement	denotes	First, the detection time D(S) is # P-hard to calculate exactly, it is still a question how to design an efficient algorithm to estimate D(S) with a theoretical guarantee.
T156	50013-50204	Epistemic_statement	denotes	Second, the infection probability ip and prior distribution are predefined in this paper, how to learn these parameters for the DSI model from available cascade data still remains unexplored.

T1

0-69

Epistemic_statement

denotes

Early detection of dynamic harmful cascades in large-scale networks ଝ

T2

80-217

Epistemic_statement

denotes

Quickly detecting harmful cascades in networks can allow us to analyze the causes and prevent further spreading of destructive influence.

T3

371-567

Epistemic_statement

denotes

However, the harmful cascades are usually dynamic (e.g., the cascade initiators and diffusion trajectories can change over the time), which can severely destroy the robustness of selected sensors.

T4

673-830

Epistemic_statement

denotes

Motivated by the observation, in this paper we investigate the scalable sensor selection problem for early detection of dynamic harmful cascades in networks.

T5

1304-1436

Epistemic_statement

denotes

Based on that, we propose an efficient upper bound based greedy (UBG) algorithm with the theoretical performance guarantee reserved.

T6

1881-2038

Epistemic_statement

denotes

A contagious disease like Severe Acute Respiratory Syndrome (SARS) can spread quickly through a population contact network and lead to an epidemic [12, 34] .

T7

2039-2162

Epistemic_statement

denotes

A computer virus on a few servers can fast spread to other servers or computers in a computer connection network [15, 17] .

T8

2163-2281

Epistemic_statement

denotes

In a similar vein, a rumor started by a few individuals can spread quickly through the online social network [2, 26] .

T9

2282-2473

Epistemic_statement

denotes

It is crucial to detect the harmful cascades as soon as they happen or shortly thereafter, since it allows us to study the causes and prevent further spreading of harmful influence [32, 38] .

T10

2474-2698

Epistemic_statement

denotes

In practice, it is commonly infeasible or unaffordable to monitor all nodes at all times, and therefore a common way to detect cascades is to select important nodes where we can place sensors for monitoring [7, 19, 31, 39] .

T11

2699-2979

Epistemic_statement

denotes

However, existing methods usually viewed the cascade data as static and deterministic, ignoring an important fact that the harmful cascades are usually dynamic and time-variant, in the sense that the cascade initiators and diffusion trajectories can change randomly over the time.

T12

2980-3117

Epistemic_statement

denotes

Actually the dynamic brings new challenges to network monitoring, since it can severely destroy the robustness of selected sensors [22] .

T13

3118-3288

Epistemic_statement

denotes

Meanwhile the existing sensor selection algorithms can only work for small networks and not be scalable well to large networks of the https://doi.org/10.1016/j.jocs.2017.

T14

3360-3533

Epistemic_statement

denotes

Motivated by these observations, in this paper we investigate the scalable solutions to sensor selection problem for early detection of dynamic harmful cascades in networks.

T15

4109-4444

Epistemic_statement

denotes

Under this unified network framework, we specifically propose a new dynamic cascade model to describe harmful cascade diffusions, and define a detection time minimization (DTM) problem S * = argmin |S|=k,S⊆V D(S), where k is a given parameter determined by budget or monitor capacity, S is sensor nodes set, and D(S) is detection time.

T16

4858-5086

Epistemic_statement

denotes

To further address the scalability issue, we propose an efficient upper bound based greedy (UBG) algorithm and its two accelerations Quickest-Path-UBG and Local-Reduction-UBG to cater for different types of large-scale networks.

T17

5307-5505

Epistemic_statement

denotes

Detecting the whole network by monitoring finite sensor nodes has been widely applied to detect water contaminations in a water distribution network [19] and virus outbreaks in a human society [7] .

T18

5632-5760

Epistemic_statement

denotes

However the effects along this idea are commonly unsatisfied, since they ignore the multi-step complexity of network diffusions.

T19

6184-6315

Epistemic_statement

denotes

In their works the data are a set of deterministic scenarios and the random property of diffusion are not taken into consideration.

T20

6385-6575

Epistemic_statement

denotes

In addition to the above two types of methods, another related work [1] set sensors along fixed paths in the network so as to gather sufficient information to locate possible contaminations.

T21

7174-7351

Epistemic_statement

denotes

By contrast, our work is geared to investigate the scalable sensor selection problem for early detection of dynamic harmful cascades in networks from a model-driven prospective.

T22

7352-7659

Epistemic_statement

denotes

The main difference from previous works is that we take dynamic properties of cascades into consideration: (1) the cascade initiator is dynamic since we do not know who will initiate another harmful diffusion next time, and (2) the diffusion process is dynamic since the propagation trajectory is uncertain.

T23

7660-7789

Epistemic_statement

denotes

Under these two dynamic sources, we aim to accurately and fast select k nodes as sensors for early detection of harmful cascades.

T24

7790-7958

Epistemic_statement

denotes

Besides, as the size of network becomes more and more large-scale, we also need to ensure the effectiveness of the proposed sensor selection methods for large networks.

T25

8513-8631

Epistemic_statement

denotes

• We show the NP-hardness of DTM problem as it can be shown to contain the Set Cover problem as a simple special case.

T26

8844-8913

Epistemic_statement

denotes

• We theoretically establish new upper bounds for the remaining time.

T27

8914-9128

Epistemic_statement

denotes

Based on these bounds, we further propose a new Upper Bound based Greedy (UBG in short) algorithm which can significantly reduce the number of estimations in greedy-based algorithms, especially at the initial step.

T28

9355-9485

Epistemic_statement

denotes

In addition, our methods not only aim at early detection of harmful cascades, but also shed light on a host of other applications.

T29

9486-9639

Epistemic_statement

denotes

For example, the goal of Emerging Topic Detection is to identify emergent topics in a social network, assuming full access to the stream of all postings.

T30

9895-10150

Epistemic_statement

denotes

Another example, an emerging trend in algorithmic stock trading is the use of automatic search through the Web and social networks for pieces of information that can be used in trading decisions before they appear in the more popular news sites [23, 27] .

T31

10256-10358

Epistemic_statement

denotes

The theoretical framework and method established in this paper can also be used in these applications.

T32

10739-10860

Epistemic_statement

denotes

To prune the unnecessary estimation calls, we analysis the upper bounds for the new proposed UBG algorithm in Section 5 .

T33

11596-11805

Epistemic_statement

denotes

By considering operational models for the dynamic diffusion of a harmful item through a network G, represented by a directed graph, we will speak of each individual node as being either infected or uninfected.

T34

11806-12027

Epistemic_statement

denotes

Motivated by the properties of contaminants like rumor and virus, we will focus from now on the progressive case in which nodes can switch from being uninfected to being infected, but do not switch in the other direction.

T35

12166-12450

Epistemic_statement

denotes

Thus, the process can be reviewed with respect to some particular uninfected node v: as time unfolds, more and more of v's neighbors become infected; at some point, this may cause v to become infected, and v's decision may in turn trigger further infections by nodes connected with v.

T36

12451-12658

Epistemic_statement

denotes

Motivated by above observation, we put forward a dynamic susceptible-infected (DSI) model for the harmful cascade spreading, which can be seen as a variant of the common susceptible-infected (SI) model [3] .

T37

12905-13000

Epistemic_statement

denotes

can be seen as a priori knowledge of nodes being infected initially, where u ∈ V (u) = 1 holds.

T38

13294-13417

Epistemic_statement

denotes

Then, at step t + 1, each node u ∈ J t may infect its out-neighbors v ∈ V \J t with an independent probability of ip(u, v).

T39

14330-14549

Epistemic_statement

denotes

Note that the detection time (u, S) can be viewed as a special type of stopping time in stochastic process theory [11] .ᮀ Let the variable X denote the random initiator with distribution , i.e., for every u ∈ V, we have

T40

14550-14656

Epistemic_statement

denotes

The (expected) detection time from a random initiator to one of the selected sensors in S can be define as

T41

14657-14716

Epistemic_statement

denotes

where E is an expectation operator under the cascade model.

T42

14717-14820

Epistemic_statement

denotes

By the conditional expectation, we can calculate the expected detection time D(S) in the following way:

T43

14821-14832

Epistemic_statement

denotes

Indeed, Eq.

T44

14833-14861

Epistemic_statement

denotes

(5) can be reached like this

T45

14862-14921

Epistemic_statement

denotes

where (X) denotes the -algebra generated by variable X. Eq.

T46

15082-15245

Epistemic_statement

denotes

Our goal is to find k nodes as sensors in a network in order to minimize the time until an infection -starting from a random initiator in the network -is detected.

T47

15246-15430

Epistemic_statement

denotes

Formally, we formulate the problem as the following discrete optimization problem: we want to find a subset S * ⊆ V such that |S * | = k and D(S * ) = min{D(S)||S| = k, S ⊆ V}, i. e .,

T48

15431-15501

Epistemic_statement

denotes

where k is a given parameter determined by budget or monitor capacity.

T49

15502-15578

Epistemic_statement

denotes

We call this as detection time minimization problem (DTM problem for short).

T50

15707-15851

Epistemic_statement

denotes

(6), the first issue is how to calculate the objective function D(S) given a sensor set S. This question isn't as easy as its description in Eq.

T51

16034-16227

Epistemic_statement

denotes

Thus, it is not initially obvious that the process is even well-defined, in the sense that it yields the same distribution over outcomes regardless of how we schedule the attempted activations.

T52

16228-16365

Epistemic_statement

denotes

Actually the computation of D(S) is #P-hard, by showing a reduction from the positive partitioned 2-DNF assignment counting problem [9] .

T53

16523-16725

Epistemic_statement

denotes

ᮀ Since it is intractable to exactly compute D(S) on a typically sized graph, a natural idea is to employ Monte-Carlo methods to estimate D(S), which can be implemented in two different ways as follows:

T54

16726-16912

Epistemic_statement

denotes

The expected detection time D(S) is obtained by directly simulating the random process of diffusion triggered by a random node, say u, chosen according to the distribution defined as Eq.

T55

17778-17948

Epistemic_statement

denotes

We can flip all coins a priori to produce a weighted graph G = (V, E, c), where an edge (u, v) is labeled with the time cost c(u, v) -a sample of random variable C(u, v).

T56

18898-19076

Epistemic_statement

denotes

For estimating a specific D(S), the simulation method is faster, because it only needs to examine a small portion of edges while the snapshot method has to examine all the edges.

T57

19515-19660

Epistemic_statement

denotes

(6), as it can be shown to contain the Set Cover problem as a simple special case, then we propose a simple greedy algorithm for the DTM problem.

T58

19829-19918

Epistemic_statement

denotes

., u n }, we wish to know whether there exist k of the subsets whose union is equal to U.

T59

19919-19953

Epistemic_statement

denotes

We can assume that k < n < m here.

T60

19954-20031

Epistemic_statement

denotes

We show that this can be viewed as a special case of the optimal problem (6).

T61

20345-20458

Epistemic_statement

denotes

Define the probability distribution = { (u), u ∈ V} on the knowledge of nodes being initially infected as follows

T62

20459-20576

Epistemic_statement

denotes

The Set Cover problem is equivalent to deciding if there is a set S of k nodes in this bipartite graph with D(S) ≤ 1.

T63

20693-21115

Epistemic_statement

denotes

Monitoring k nodes to detect diffusion initiated from U corresponding to sets in a Set Cover solution results in covering all n nodes corresponding to the ground set U, and if any set S of k nodes has D(S) ≤ 1, then the Set Cover problem must be solvable.ᮀ Since the optimization problem is NP-hard and the network is prohibitively large, we cannot compute the optimum value to verify the actual quality of approximations.

T64

21196-21290

Epistemic_statement

denotes

To make better use of the greedy algorithm, we consider an equivalent optimization problem Eq.

T65

21502-21531

Epistemic_statement

denotes

Theoretically, a non-negative

T66

21532-21553

Epistemic_statement

denotes

for all S ⊆ T. Proof.

T67

22021-22136

Epistemic_statement

denotes

(14) is proven and we thus get that R(S) is submodular.ᮀ By above properties in Theorem 4, the problem given in Eq.

T68

22137-22408

Epistemic_statement

denotes

(9) can be approximated by the greedy algorithm in Algorithm 1 with the set function f : = R. For any submodular and monotone function f with f(∅) =0, the problem of finding a set S of size k that maximizes f(S) can be approximated by the greedy algorithm in Algorithm 1.

T69

22580-22755

Epistemic_statement

denotes

It is shown that the algorithm guarantees an approximation ratio of f(S)/f(S * ) ≥ 1 −1/e, where S is the output of the greedy algorithm and S * is the optimal solution [30] .

T70

22756-22972

Epistemic_statement

denotes

In Greedy(k, R), a thorny problem is that there is no efficient way to compute R(S) given a placement S, and we turn to run snapshots for 10, 000 trials to obtain an accurate estimate of R(S), mentioned in Section 3.

T71

23291-23552

Epistemic_statement

denotes

Hence, in order to improve the efficiency of Greedy(k, R), one can either reduce the number of calls for evaluating R(S), or develop advanced heuristic algorithms which can conduct fast and approximate estimations for R(S) at the expense of accuracy guarantees.

T72

23715-23908

Epistemic_statement

denotes

The principle behind is that the marginal gain of a node in the current iteration cannot be more than that in previous iterations, and thus the number of estimation calls can be greatly pruned.

T73

24159-24399

Epistemic_statement

denotes

In particular, in the first round to establish the initial upper bounds, CELF needs to estimate R({v}) using MC simulations for each node v, leading to N times of MC calls, which is time-consuming, especially when the network is very large.

T74

24400-24600

Epistemic_statement

denotes

The limitation leads to a rather fundamental question that, can we derive an upper bound of R({v}) which can be used to further prune unnecessary detection time estimations (MC calls) in Greedy(k, R)?

T75

24728-24870

Epistemic_statement

denotes

Based on the bound, we propose a new greedy algorithm Upper Bound based Greedy (UBG for short), which outperforms the original CELF algorithm.

T76

24871-24981

Epistemic_statement

denotes

Essentially different from the bounds that derived for the influence spread under the IC model [40] , we here

T77

24982-25036

Epistemic_statement

denotes

In this part, we aim to derive an upper bound of R(v).

T78

25288-25364

Epistemic_statement

denotes

For v ∈ V , the remaining time R(v) under the DSI model can be calculated as

T79

25365-25371

Epistemic_statement

denotes

Proof.

T80

25505-25646

Epistemic_statement

denotes

ᮀ Proposition 2 reveals that we can treat the global remaining time R(v) as a summation of all T max propagation steps of local probabilities

T81

25647-25738

Epistemic_statement

denotes

Based on Proposition 2, a following question is, what is the relationship between two sets,

T82

25739-25753

Epistemic_statement

denotes

Proposition 3.

T83

25805-25885

Epistemic_statement

denotes

For k ≥ 1, by the definition of conditional expectation and DSI model, we obtain

T84

25886-25954

Epistemic_statement

denotes

where the first '≤' is due to 1 {v/ ∈J k−1 } ≤ 1, and the second '≤'

T85

25955-26142

Epistemic_statement

denotes

x i in the above derivation.ᮀ Proposition 3 clearly identifies the ordering relationship between two adjacent elements in the series P u (v ∈ J 0 ), P u (v ∈ J 1 ), P u (v ∈ J 2 ), · · ·.

T86

26484-26542

Epistemic_statement

denotes

Now we can rewrite Proposition 3 by using the matrix form,

T87

26543-26627

Epistemic_statement

denotes

By iteration, we further get that u t ≤ u 0 · (E + IP) t , where E is a unit matrix.

T88

26775-26875

Epistemic_statement

denotes

With the above preparations, we can present the results on upper bound of remaining time as follows,

T89

26876-26945

Epistemic_statement

denotes

where E is a unit matrix and [A] (u,v) means the element at position

T90

26946-27029

Epistemic_statement

denotes

where is a prior distribution on the likelihood of nodes being the infected source.

T91

27030-27101

Epistemic_statement

denotes

We first use a toy example to explain how to calculate the upper bound.

T92

27286-27370

Epistemic_statement

denotes

To overcome the difficulty, we adopt the following procedure to calculate Tmax−1 t=0

T93

27371-27493

Epistemic_statement

denotes

If t ≥1 ≤ T max − 1, it implies that we do not need to calculate (E + IP) t any more when t ≥ t ≥1 , and therefore we have

T94

27494-27546

Epistemic_statement

denotes

where 1 is a N × N matrix with all elements being 1.

T95

27822-27908

Epistemic_statement

denotes

Based on the upper bound, we propose a new UBG algorithm for early outbreak detection.

T96

28062-28246

Epistemic_statement

denotes

However, CELF demands N (the number of nodes in the network) remaining time estimations to establish the initial bounds of marginal increments, which is time expensive on large graphs.

T97

28440-28573

Epistemic_statement

denotes

This way, the nodes are ranked by their upper bound scores which can be used to prune unnecessary calculations in the CELF algorithm.

T98

28610-28748

Epistemic_statement

denotes

Even with the UBG algorithm proposed in Section 5, its running time is still unbearable and may not be suitable for large social networks.

T99

28749-28924

Epistemic_statement

denotes

Although UBG can greatly reduce the number of remaining time estimation calls, each estimation call is very timeconsuming, as it needs produce enough samples and average them.

T100

28925-29091

Epistemic_statement

denotes

Hence a possible alternative to further accelerate UBG is to employ heuristics to approximate the remaining time R(S) or the marginal return R(S ∪ {u}) − R(S) in UBG.

T101

29234-29474

Epistemic_statement

denotes

The experimental results will show that (i) both of them are efficient in terms of running time, and (ii) the former one applies to sparse networks better, while the latter one is more effective in dense networks in terms of detection time.

T102

29584-29653

Epistemic_statement

denotes

(10), the key point of estimating R(S) is how to estimate E[ (u, S)].

T103

29654-29966

Epistemic_statement

denotes

Note that E[ (u, S)] measures the expected time delay of a diffusion initiated from node u propagating to sensor set S. An intuitive idea is that the most likely propagation path should be the quickest path from node u to the set S. Hence, a question is how to measure the quickest path from node u to the set S?

T104

30037-30177

Epistemic_statement

denotes

According to the geometric distribution, if a random variable X is distributed geometrically with a parameter p, it follows that E[X] = 1/p.

T105

30178-30442

Epistemic_statement

denotes

Since the time that a node u spends in infecting its uninfected neighborhood v is distributed geometrically with the parameter ip(u, v), the value 1/ip(u, v) should be the expected time cost that an infection propagates from node u to node v along the edge (u, v).

T106

30696-30841

Epistemic_statement

denotes

(27) Intuitively, d(u, S) denotes the expected time that an infection propagates from u to S along the quickest path in the graph G = (V, E, m) .

T107

30977-31009

Epistemic_statement

denotes

Theoretically we have Theorem 6.

T108

31010-31171

Epistemic_statement

denotes

When the shortest path from u to S in graph G = (V, E, m) is unique, the quantity d(u, S) ∧ T max can be used to approximate the detection time E[ (u, S)], i.e.,

T109

31172-31178

Epistemic_statement

denotes

Proof.

T110

31437-31476

Epistemic_statement

denotes

(28), we have the following derivation,

T111

31477-31520

Epistemic_statement

denotes

Hence we can approximate the remaining time

T112

31521-31611

Epistemic_statement

denotes

rather than the heavy Monte-Carlo estimation MC(S ∪ {u}) in the 09th row of UBG algorithm.

T113

31706-31945

Epistemic_statement

denotes

Note that the reason we call this acceleration as Quickest-Path-UBG rather than Shortest-Path-UBG lies in that, our Quickest-Path-UBG introduces a new measurement to reflect the infection time cost rather than the distance in common sense.

T114

32091-32275

Epistemic_statement

denotes

To facilitate the description, we will approximate the marginal reduction D(S) − D(S ∪ {u}) rather than the marginal return R(S ∪ {u}) − R(S), due to the fact that R(S) = T max − D(S).

T115

32418-32580

Epistemic_statement

denotes

These methods usually looked over the multi-step complexity of network diffusions, and assumed that the infection can propagate ahead only one hop before the end.

T116

32581-32689

Epistemic_statement

denotes

In other words, if a node initiates an infection, the diffusion spread is at most its first-order neighbors.

T117

32690-32735

Epistemic_statement

denotes

We call this as one-hop assumption hereafter.

T118

32973-33116

Epistemic_statement

denotes

Under this observation, we will propose a tractable method upon the one-hop assumption to approximate the marginal reduction D(S) − D(S ∪ {u}).

T119

33117-33554

Epistemic_statement

denotes

We first introduce a useful result in probability theory: if the random variable X i is distributed geometrically with parameter p i for i = 1, · · ·, n and they are independent, then ∧ i=1,· · ·,n X i is distributed geometrically with parameter 1 − According to the characteristic of the DSI model, the time cost that an infected node w spends in infecting its uninfected neighbor v is distributed geometrically with parameter ip(w, v).

T120

33680-33774

Epistemic_statement

denotes

Under the onehop assumption, the expected detection time of w can thus be calculated like this

T121

33775-33896

Epistemic_statement

denotes

If the initially infected node w has no neighbors in sensor set S, the expected detection time of w is defined as T max .

T122

35738-35822

Epistemic_statement

denotes

The Twitter and Epinions data sets can both be obtained from Stanford Datasource 2 .

T123

35823-35954

Epistemic_statement

denotes

Epinions is a general consumer review site where visitors can read reviews about a variety of items to help them decide a purchase.

T124

35955-36072

Epistemic_statement

denotes

The synthetic Small-world data set is the type of graphs in which each node can be reached by a small number of hops.

T125

36933-37066

Epistemic_statement

denotes

A heuristic algorithm which requires any pair of sensors to be at least d hops away, where d is as large as it can choose k monitors.

T126

37864-38159

Epistemic_statement

denotes

We mainly report results on a uniform infection probability of 0.1 assigned to each directed link in the network, i.e., ip(u, v) = 0.1 for any directed edge (u, v) ∈ E. One can refer to the work [14, 33, 36] for learning real values of the parameters {ip(u, v) : (u, v) ∈ E} from available data.

T127

38414-38663

Epistemic_statement

denotes

Here we select 10 nodes with the highest in-degrees in each network as the sensor set S. We can find that Propagation Simulation and Snapshot Simulation release almost the same estimation results, which confirms their equivalence in estimating D(S).

T128

38912-39098

Epistemic_statement

denotes

We can see that the cumulative time cost of Propagation Simulation increases linearly, while that of Snapshot Simulation has a big jump at the first sensor set and then increases slowly.

T129

39099-39310

Epistemic_statement

denotes

The reason behind is that Snapshot Simulation needs to establish numerous snapshots to estimate the first D(S 1 ), and these established snapshots can be reused in the posterior estimations of {D(S i )} 10 i=2 .

T130

39311-39400

Epistemic_statement

denotes

Table 4 shows the gap between the real value of remaining time R(S) and its upper bounds.

T131

39678-39847

Epistemic_statement

denotes

From the results, we can observe that the call number in UBG and its two accelerations are significantly reduced compared to that in CELF, especially at the first round.

T132

39848-39985

Epistemic_statement

denotes

One may notice that in Table 5 , CELF occasionally defeats our methods, but the total call number of our methods are much less than CELF.

T133

40265-40362

Epistemic_statement

denotes

From the observation, we can conclude that our UBG is more efficient than CELF on large networks.

T134

41459-41543

Epistemic_statement

denotes

One may argue that such a low improvement of UBG can be neglected in large networks.

T135

42294-42381

Epistemic_statement

denotes

We assign a uniform infection probability ip to each directed link under the DSI Model.

T136

42407-42502

Epistemic_statement

denotes

7 , we can conclude that with the parameter ip growing larger, the CELF is more time-consuming.

T137

43266-43394

Epistemic_statement

denotes

We take infimum operations on both sides of the first step conclusion, i.e., inf T(u, S 1 ∪ S 2 ) = inf T(u, S 1 ) ∪ T(u, S 2 ).

T138

43734-44069

Epistemic_statement

denotes

3) Similarly, in order to prove this property, we only need to , {u}) , i.e., for any t ≥ 0, the probability that t is located in T(u, {v}) equals the probability that t is located in 4) If S ⊂ T, in order to prove this property, we only need to prove T(u, S) ⊂ T(u, T ) and notice the fact that the infimum of a larger set is smaller.

T139

44203-44285

Epistemic_statement

denotes

So, T(u, S) ⊂ T(u, T ), by which we can easily conclude the property, is verified.

T140

44489-44536

Epistemic_statement

denotes

Therefore, we can conclude 0 ≤ (u, S) ≤ T max .

T141

44984-45135

Epistemic_statement

denotes

(A2): G = (V, E) is a directed graph, where each edge e ∈ E is associated with a weight w e which is geometrically distributed with a parameter p, i.e.

T142

45224-45320

Epistemic_statement

denotes

What is the probability of the event that the shortest path from s to t has length T at least k?

T143

45498-45668

Epistemic_statement

denotes

Also, we add a source node s and an edge (s, x i ), associated with weight w (s,x i ) which is geometrically distributed with parameter 1/2, for any i ∈ {i : (i, j) ∈ E}.

T144

45960-46215

Epistemic_statement

denotes

Now we can build a probability-preserving bijection between the valuations of X i , Y j and the subgraphs of the constructed random graph: for a valuation , the corresponding subgraph is the one where each edge adjacent to x i has length 1 iff (X i ) = 1.

T145

46638-47001

Epistemic_statement

denotes

Note that any edge of length greater than 1 is irrelevant as the structure of the graph, which ensures it can never be part of a path of length 3 from s to t, For any path from s to t must jump three times: from s to some x i , from x i to some y j , and from y j to t. Thus, from what we have stated, (A1) can be reducible to (A2), which implies (A2) is #P-hard.

T146

47002-47057

Epistemic_statement

denotes

We aim to solve the problem (denoted by (A)) of calcu-

T147

47058-47210

Epistemic_statement

denotes

In order to prove the random variables T(u, S) and c(u, S) are identically distributed, we need to prove P[T (u, S) = t] = P[c(u, S) = t] for any t > 0.

T148

47211-47274

Epistemic_statement

denotes

On the one hand, we can prove P[T (u, S) = t] ≤ P[c(u, S) = t].

T149

47275-47385

Epistemic_statement

denotes

Given t > 0, we find that for any simulation with T(u, S) = t, a snapshot with c(u, S) = t can be constructed.

T150

48125-48193

Epistemic_statement

denotes

On the other hand, we can also get P[c(u, S) = t] ≤ P[T (u, S) = t].

T151

48194-48278

Epistemic_statement

denotes

For any snapshot with c(u, S) = t, a simulation with T(u, S) = t can be constructed.

T152

48855-48944

Epistemic_statement

denotes

We now can construct a simulation of the diffusion model (I t ) t≥0 = (J t+1 \J t ) t≥0 .

T153

49403-49554

Epistemic_statement

denotes

To address this problem, we proposed an upper bound based greedy (UBG) algorithm and its two accelerative algorithms to cater for lager scale networks.

T154

49792-49840

Epistemic_statement

denotes

There are several interesting future directions.

T155

49841-50012

Epistemic_statement

denotes

First, the detection time D(S) is # P-hard to calculate exactly, it is still a question how to design an efficient algorithm to estimate D(S) with a theoretical guarantee.

T156

50013-50204

Epistemic_statement

denotes

Second, the infection probability ip and prior distribution are predefined in this paper, how to learn these parameters for the DSI model from available cascade data still remains unexplored.

CORD-19:6e2db98aa242e3e84116afdfad5e250943c55ac5 JSON TXT 12 Projects

Annnotations TAB TSV DIC JSON TextAE-old TextAE

CORD-19:6e2db98aa242e3e84116afdfad5e250943c55ac5 JSONTXT 12 Projects

Annnotations TAB TSV DIC JSON TextAE-old TextAE

CORD-19:6e2db98aa242e3e84116afdfad5e250943c55ac5 JSON TXT 12 Projects