Id |
Subject |
Object |
Predicate |
Lexical cue |
T1 |
0-69 |
Epistemic_statement |
denotes |
Early detection of dynamic harmful cascades in large-scale networks ଝ |
T2 |
80-217 |
Epistemic_statement |
denotes |
Quickly detecting harmful cascades in networks can allow us to analyze the causes and prevent further spreading of destructive influence. |
T3 |
371-567 |
Epistemic_statement |
denotes |
However, the harmful cascades are usually dynamic (e.g., the cascade initiators and diffusion trajectories can change over the time), which can severely destroy the robustness of selected sensors. |
T4 |
673-830 |
Epistemic_statement |
denotes |
Motivated by the observation, in this paper we investigate the scalable sensor selection problem for early detection of dynamic harmful cascades in networks. |
T5 |
1304-1436 |
Epistemic_statement |
denotes |
Based on that, we propose an efficient upper bound based greedy (UBG) algorithm with the theoretical performance guarantee reserved. |
T6 |
1881-2038 |
Epistemic_statement |
denotes |
A contagious disease like Severe Acute Respiratory Syndrome (SARS) can spread quickly through a population contact network and lead to an epidemic [12, 34] . |
T7 |
2039-2162 |
Epistemic_statement |
denotes |
A computer virus on a few servers can fast spread to other servers or computers in a computer connection network [15, 17] . |
T8 |
2163-2281 |
Epistemic_statement |
denotes |
In a similar vein, a rumor started by a few individuals can spread quickly through the online social network [2, 26] . |
T9 |
2282-2473 |
Epistemic_statement |
denotes |
It is crucial to detect the harmful cascades as soon as they happen or shortly thereafter, since it allows us to study the causes and prevent further spreading of harmful influence [32, 38] . |
T10 |
2474-2698 |
Epistemic_statement |
denotes |
In practice, it is commonly infeasible or unaffordable to monitor all nodes at all times, and therefore a common way to detect cascades is to select important nodes where we can place sensors for monitoring [7, 19, 31, 39] . |
T11 |
2699-2979 |
Epistemic_statement |
denotes |
However, existing methods usually viewed the cascade data as static and deterministic, ignoring an important fact that the harmful cascades are usually dynamic and time-variant, in the sense that the cascade initiators and diffusion trajectories can change randomly over the time. |
T12 |
2980-3117 |
Epistemic_statement |
denotes |
Actually the dynamic brings new challenges to network monitoring, since it can severely destroy the robustness of selected sensors [22] . |
T13 |
3118-3288 |
Epistemic_statement |
denotes |
Meanwhile the existing sensor selection algorithms can only work for small networks and not be scalable well to large networks of the https://doi.org/10.1016/j.jocs.2017. |
T14 |
3360-3533 |
Epistemic_statement |
denotes |
Motivated by these observations, in this paper we investigate the scalable solutions to sensor selection problem for early detection of dynamic harmful cascades in networks. |
T15 |
4109-4444 |
Epistemic_statement |
denotes |
Under this unified network framework, we specifically propose a new dynamic cascade model to describe harmful cascade diffusions, and define a detection time minimization (DTM) problem S * = argmin |S|=k,S⊆V D(S), where k is a given parameter determined by budget or monitor capacity, S is sensor nodes set, and D(S) is detection time. |
T16 |
4858-5086 |
Epistemic_statement |
denotes |
To further address the scalability issue, we propose an efficient upper bound based greedy (UBG) algorithm and its two accelerations Quickest-Path-UBG and Local-Reduction-UBG to cater for different types of large-scale networks. |
T17 |
5307-5505 |
Epistemic_statement |
denotes |
Detecting the whole network by monitoring finite sensor nodes has been widely applied to detect water contaminations in a water distribution network [19] and virus outbreaks in a human society [7] . |
T18 |
5632-5760 |
Epistemic_statement |
denotes |
However the effects along this idea are commonly unsatisfied, since they ignore the multi-step complexity of network diffusions. |
T19 |
6184-6315 |
Epistemic_statement |
denotes |
In their works the data are a set of deterministic scenarios and the random property of diffusion are not taken into consideration. |
T20 |
6385-6575 |
Epistemic_statement |
denotes |
In addition to the above two types of methods, another related work [1] set sensors along fixed paths in the network so as to gather sufficient information to locate possible contaminations. |
T21 |
7174-7351 |
Epistemic_statement |
denotes |
By contrast, our work is geared to investigate the scalable sensor selection problem for early detection of dynamic harmful cascades in networks from a model-driven prospective. |
T22 |
7352-7659 |
Epistemic_statement |
denotes |
The main difference from previous works is that we take dynamic properties of cascades into consideration: (1) the cascade initiator is dynamic since we do not know who will initiate another harmful diffusion next time, and (2) the diffusion process is dynamic since the propagation trajectory is uncertain. |
T23 |
7660-7789 |
Epistemic_statement |
denotes |
Under these two dynamic sources, we aim to accurately and fast select k nodes as sensors for early detection of harmful cascades. |
T24 |
7790-7958 |
Epistemic_statement |
denotes |
Besides, as the size of network becomes more and more large-scale, we also need to ensure the effectiveness of the proposed sensor selection methods for large networks. |
T25 |
8513-8631 |
Epistemic_statement |
denotes |
• We show the NP-hardness of DTM problem as it can be shown to contain the Set Cover problem as a simple special case. |
T26 |
8844-8913 |
Epistemic_statement |
denotes |
• We theoretically establish new upper bounds for the remaining time. |
T27 |
8914-9128 |
Epistemic_statement |
denotes |
Based on these bounds, we further propose a new Upper Bound based Greedy (UBG in short) algorithm which can significantly reduce the number of estimations in greedy-based algorithms, especially at the initial step. |
T28 |
9355-9485 |
Epistemic_statement |
denotes |
In addition, our methods not only aim at early detection of harmful cascades, but also shed light on a host of other applications. |
T29 |
9486-9639 |
Epistemic_statement |
denotes |
For example, the goal of Emerging Topic Detection is to identify emergent topics in a social network, assuming full access to the stream of all postings. |
T30 |
9895-10150 |
Epistemic_statement |
denotes |
Another example, an emerging trend in algorithmic stock trading is the use of automatic search through the Web and social networks for pieces of information that can be used in trading decisions before they appear in the more popular news sites [23, 27] . |
T31 |
10256-10358 |
Epistemic_statement |
denotes |
The theoretical framework and method established in this paper can also be used in these applications. |
T32 |
10739-10860 |
Epistemic_statement |
denotes |
To prune the unnecessary estimation calls, we analysis the upper bounds for the new proposed UBG algorithm in Section 5 . |
T33 |
11596-11805 |
Epistemic_statement |
denotes |
By considering operational models for the dynamic diffusion of a harmful item through a network G, represented by a directed graph, we will speak of each individual node as being either infected or uninfected. |
T34 |
11806-12027 |
Epistemic_statement |
denotes |
Motivated by the properties of contaminants like rumor and virus, we will focus from now on the progressive case in which nodes can switch from being uninfected to being infected, but do not switch in the other direction. |
T35 |
12166-12450 |
Epistemic_statement |
denotes |
Thus, the process can be reviewed with respect to some particular uninfected node v: as time unfolds, more and more of v's neighbors become infected; at some point, this may cause v to become infected, and v's decision may in turn trigger further infections by nodes connected with v. |
T36 |
12451-12658 |
Epistemic_statement |
denotes |
Motivated by above observation, we put forward a dynamic susceptible-infected (DSI) model for the harmful cascade spreading, which can be seen as a variant of the common susceptible-infected (SI) model [3] . |
T37 |
12905-13000 |
Epistemic_statement |
denotes |
can be seen as a priori knowledge of nodes being infected initially, where u ∈ V (u) = 1 holds. |
T38 |
13294-13417 |
Epistemic_statement |
denotes |
Then, at step t + 1, each node u ∈ J t may infect its out-neighbors v ∈ V \J t with an independent probability of ip(u, v). |
T39 |
14330-14549 |
Epistemic_statement |
denotes |
Note that the detection time (u, S) can be viewed as a special type of stopping time in stochastic process theory [11] .ᮀ Let the variable X denote the random initiator with distribution , i.e., for every u ∈ V, we have |
T40 |
14550-14656 |
Epistemic_statement |
denotes |
The (expected) detection time from a random initiator to one of the selected sensors in S can be define as |
T41 |
14657-14716 |
Epistemic_statement |
denotes |
where E is an expectation operator under the cascade model. |
T42 |
14717-14820 |
Epistemic_statement |
denotes |
By the conditional expectation, we can calculate the expected detection time D(S) in the following way: |
T43 |
14821-14832 |
Epistemic_statement |
denotes |
Indeed, Eq. |
T44 |
14833-14861 |
Epistemic_statement |
denotes |
(5) can be reached like this |
T45 |
14862-14921 |
Epistemic_statement |
denotes |
where (X) denotes the -algebra generated by variable X. Eq. |
T46 |
15082-15245 |
Epistemic_statement |
denotes |
Our goal is to find k nodes as sensors in a network in order to minimize the time until an infection -starting from a random initiator in the network -is detected. |
T47 |
15246-15430 |
Epistemic_statement |
denotes |
Formally, we formulate the problem as the following discrete optimization problem: we want to find a subset S * ⊆ V such that |S * | = k and D(S * ) = min{D(S)||S| = k, S ⊆ V}, i. e ., |
T48 |
15431-15501 |
Epistemic_statement |
denotes |
where k is a given parameter determined by budget or monitor capacity. |
T49 |
15502-15578 |
Epistemic_statement |
denotes |
We call this as detection time minimization problem (DTM problem for short). |
T50 |
15707-15851 |
Epistemic_statement |
denotes |
(6), the first issue is how to calculate the objective function D(S) given a sensor set S. This question isn't as easy as its description in Eq. |
T51 |
16034-16227 |
Epistemic_statement |
denotes |
Thus, it is not initially obvious that the process is even well-defined, in the sense that it yields the same distribution over outcomes regardless of how we schedule the attempted activations. |
T52 |
16228-16365 |
Epistemic_statement |
denotes |
Actually the computation of D(S) is #P-hard, by showing a reduction from the positive partitioned 2-DNF assignment counting problem [9] . |
T53 |
16523-16725 |
Epistemic_statement |
denotes |
ᮀ Since it is intractable to exactly compute D(S) on a typically sized graph, a natural idea is to employ Monte-Carlo methods to estimate D(S), which can be implemented in two different ways as follows: |
T54 |
16726-16912 |
Epistemic_statement |
denotes |
The expected detection time D(S) is obtained by directly simulating the random process of diffusion triggered by a random node, say u, chosen according to the distribution defined as Eq. |
T55 |
17778-17948 |
Epistemic_statement |
denotes |
We can flip all coins a priori to produce a weighted graph G = (V, E, c), where an edge (u, v) is labeled with the time cost c(u, v) -a sample of random variable C(u, v). |
T56 |
18898-19076 |
Epistemic_statement |
denotes |
For estimating a specific D(S), the simulation method is faster, because it only needs to examine a small portion of edges while the snapshot method has to examine all the edges. |
T57 |
19515-19660 |
Epistemic_statement |
denotes |
(6), as it can be shown to contain the Set Cover problem as a simple special case, then we propose a simple greedy algorithm for the DTM problem. |
T58 |
19829-19918 |
Epistemic_statement |
denotes |
., u n }, we wish to know whether there exist k of the subsets whose union is equal to U. |
T59 |
19919-19953 |
Epistemic_statement |
denotes |
We can assume that k < n < m here. |
T60 |
19954-20031 |
Epistemic_statement |
denotes |
We show that this can be viewed as a special case of the optimal problem (6). |
T61 |
20345-20458 |
Epistemic_statement |
denotes |
Define the probability distribution = { (u), u ∈ V} on the knowledge of nodes being initially infected as follows |
T62 |
20459-20576 |
Epistemic_statement |
denotes |
The Set Cover problem is equivalent to deciding if there is a set S of k nodes in this bipartite graph with D(S) ≤ 1. |
T63 |
20693-21115 |
Epistemic_statement |
denotes |
Monitoring k nodes to detect diffusion initiated from U corresponding to sets in a Set Cover solution results in covering all n nodes corresponding to the ground set U, and if any set S of k nodes has D(S) ≤ 1, then the Set Cover problem must be solvable.ᮀ Since the optimization problem is NP-hard and the network is prohibitively large, we cannot compute the optimum value to verify the actual quality of approximations. |
T64 |
21196-21290 |
Epistemic_statement |
denotes |
To make better use of the greedy algorithm, we consider an equivalent optimization problem Eq. |
T65 |
21502-21531 |
Epistemic_statement |
denotes |
Theoretically, a non-negative |
T66 |
21532-21553 |
Epistemic_statement |
denotes |
for all S ⊆ T. Proof. |
T67 |
22021-22136 |
Epistemic_statement |
denotes |
(14) is proven and we thus get that R(S) is submodular.ᮀ By above properties in Theorem 4, the problem given in Eq. |
T68 |
22137-22408 |
Epistemic_statement |
denotes |
(9) can be approximated by the greedy algorithm in Algorithm 1 with the set function f : = R. For any submodular and monotone function f with f(∅) =0, the problem of finding a set S of size k that maximizes f(S) can be approximated by the greedy algorithm in Algorithm 1. |
T69 |
22580-22755 |
Epistemic_statement |
denotes |
It is shown that the algorithm guarantees an approximation ratio of f(S)/f(S * ) ≥ 1 −1/e, where S is the output of the greedy algorithm and S * is the optimal solution [30] . |
T70 |
22756-22972 |
Epistemic_statement |
denotes |
In Greedy(k, R), a thorny problem is that there is no efficient way to compute R(S) given a placement S, and we turn to run snapshots for 10, 000 trials to obtain an accurate estimate of R(S), mentioned in Section 3. |
T71 |
23291-23552 |
Epistemic_statement |
denotes |
Hence, in order to improve the efficiency of Greedy(k, R), one can either reduce the number of calls for evaluating R(S), or develop advanced heuristic algorithms which can conduct fast and approximate estimations for R(S) at the expense of accuracy guarantees. |
T72 |
23715-23908 |
Epistemic_statement |
denotes |
The principle behind is that the marginal gain of a node in the current iteration cannot be more than that in previous iterations, and thus the number of estimation calls can be greatly pruned. |
T73 |
24159-24399 |
Epistemic_statement |
denotes |
In particular, in the first round to establish the initial upper bounds, CELF needs to estimate R({v}) using MC simulations for each node v, leading to N times of MC calls, which is time-consuming, especially when the network is very large. |
T74 |
24400-24600 |
Epistemic_statement |
denotes |
The limitation leads to a rather fundamental question that, can we derive an upper bound of R({v}) which can be used to further prune unnecessary detection time estimations (MC calls) in Greedy(k, R)? |
T75 |
24728-24870 |
Epistemic_statement |
denotes |
Based on the bound, we propose a new greedy algorithm Upper Bound based Greedy (UBG for short), which outperforms the original CELF algorithm. |
T76 |
24871-24981 |
Epistemic_statement |
denotes |
Essentially different from the bounds that derived for the influence spread under the IC model [40] , we here |
T77 |
24982-25036 |
Epistemic_statement |
denotes |
In this part, we aim to derive an upper bound of R(v). |
T78 |
25288-25364 |
Epistemic_statement |
denotes |
For v ∈ V , the remaining time R(v) under the DSI model can be calculated as |
T79 |
25365-25371 |
Epistemic_statement |
denotes |
Proof. |
T80 |
25505-25646 |
Epistemic_statement |
denotes |
ᮀ Proposition 2 reveals that we can treat the global remaining time R(v) as a summation of all T max propagation steps of local probabilities |
T81 |
25647-25738 |
Epistemic_statement |
denotes |
Based on Proposition 2, a following question is, what is the relationship between two sets, |
T82 |
25739-25753 |
Epistemic_statement |
denotes |
Proposition 3. |
T83 |
25805-25885 |
Epistemic_statement |
denotes |
For k ≥ 1, by the definition of conditional expectation and DSI model, we obtain |
T84 |
25886-25954 |
Epistemic_statement |
denotes |
where the first '≤' is due to 1 {v/ ∈J k−1 } ≤ 1, and the second '≤' |
T85 |
25955-26142 |
Epistemic_statement |
denotes |
x i in the above derivation.ᮀ Proposition 3 clearly identifies the ordering relationship between two adjacent elements in the series P u (v ∈ J 0 ), P u (v ∈ J 1 ), P u (v ∈ J 2 ), · · ·. |
T86 |
26484-26542 |
Epistemic_statement |
denotes |
Now we can rewrite Proposition 3 by using the matrix form, |
T87 |
26543-26627 |
Epistemic_statement |
denotes |
By iteration, we further get that u t ≤ u 0 · (E + IP) t , where E is a unit matrix. |
T88 |
26775-26875 |
Epistemic_statement |
denotes |
With the above preparations, we can present the results on upper bound of remaining time as follows, |
T89 |
26876-26945 |
Epistemic_statement |
denotes |
where E is a unit matrix and [A] (u,v) means the element at position |
T90 |
26946-27029 |
Epistemic_statement |
denotes |
where is a prior distribution on the likelihood of nodes being the infected source. |
T91 |
27030-27101 |
Epistemic_statement |
denotes |
We first use a toy example to explain how to calculate the upper bound. |
T92 |
27286-27370 |
Epistemic_statement |
denotes |
To overcome the difficulty, we adopt the following procedure to calculate Tmax−1 t=0 |
T93 |
27371-27493 |
Epistemic_statement |
denotes |
If t ≥1 ≤ T max − 1, it implies that we do not need to calculate (E + IP) t any more when t ≥ t ≥1 , and therefore we have |
T94 |
27494-27546 |
Epistemic_statement |
denotes |
where 1 is a N × N matrix with all elements being 1. |
T95 |
27822-27908 |
Epistemic_statement |
denotes |
Based on the upper bound, we propose a new UBG algorithm for early outbreak detection. |
T96 |
28062-28246 |
Epistemic_statement |
denotes |
However, CELF demands N (the number of nodes in the network) remaining time estimations to establish the initial bounds of marginal increments, which is time expensive on large graphs. |
T97 |
28440-28573 |
Epistemic_statement |
denotes |
This way, the nodes are ranked by their upper bound scores which can be used to prune unnecessary calculations in the CELF algorithm. |
T98 |
28610-28748 |
Epistemic_statement |
denotes |
Even with the UBG algorithm proposed in Section 5, its running time is still unbearable and may not be suitable for large social networks. |
T99 |
28749-28924 |
Epistemic_statement |
denotes |
Although UBG can greatly reduce the number of remaining time estimation calls, each estimation call is very timeconsuming, as it needs produce enough samples and average them. |
T100 |
28925-29091 |
Epistemic_statement |
denotes |
Hence a possible alternative to further accelerate UBG is to employ heuristics to approximate the remaining time R(S) or the marginal return R(S ∪ {u}) − R(S) in UBG. |
T101 |
29234-29474 |
Epistemic_statement |
denotes |
The experimental results will show that (i) both of them are efficient in terms of running time, and (ii) the former one applies to sparse networks better, while the latter one is more effective in dense networks in terms of detection time. |
T102 |
29584-29653 |
Epistemic_statement |
denotes |
(10), the key point of estimating R(S) is how to estimate E[ (u, S)]. |
T103 |
29654-29966 |
Epistemic_statement |
denotes |
Note that E[ (u, S)] measures the expected time delay of a diffusion initiated from node u propagating to sensor set S. An intuitive idea is that the most likely propagation path should be the quickest path from node u to the set S. Hence, a question is how to measure the quickest path from node u to the set S? |
T104 |
30037-30177 |
Epistemic_statement |
denotes |
According to the geometric distribution, if a random variable X is distributed geometrically with a parameter p, it follows that E[X] = 1/p. |
T105 |
30178-30442 |
Epistemic_statement |
denotes |
Since the time that a node u spends in infecting its uninfected neighborhood v is distributed geometrically with the parameter ip(u, v), the value 1/ip(u, v) should be the expected time cost that an infection propagates from node u to node v along the edge (u, v). |
T106 |
30696-30841 |
Epistemic_statement |
denotes |
(27) Intuitively, d(u, S) denotes the expected time that an infection propagates from u to S along the quickest path in the graph G = (V, E, m) . |
T107 |
30977-31009 |
Epistemic_statement |
denotes |
Theoretically we have Theorem 6. |
T108 |
31010-31171 |
Epistemic_statement |
denotes |
When the shortest path from u to S in graph G = (V, E, m) is unique, the quantity d(u, S) ∧ T max can be used to approximate the detection time E[ (u, S)], i.e., |
T109 |
31172-31178 |
Epistemic_statement |
denotes |
Proof. |
T110 |
31437-31476 |
Epistemic_statement |
denotes |
(28), we have the following derivation, |
T111 |
31477-31520 |
Epistemic_statement |
denotes |
Hence we can approximate the remaining time |
T112 |
31521-31611 |
Epistemic_statement |
denotes |
rather than the heavy Monte-Carlo estimation MC(S ∪ {u}) in the 09th row of UBG algorithm. |
T113 |
31706-31945 |
Epistemic_statement |
denotes |
Note that the reason we call this acceleration as Quickest-Path-UBG rather than Shortest-Path-UBG lies in that, our Quickest-Path-UBG introduces a new measurement to reflect the infection time cost rather than the distance in common sense. |
T114 |
32091-32275 |
Epistemic_statement |
denotes |
To facilitate the description, we will approximate the marginal reduction D(S) − D(S ∪ {u}) rather than the marginal return R(S ∪ {u}) − R(S), due to the fact that R(S) = T max − D(S). |
T115 |
32418-32580 |
Epistemic_statement |
denotes |
These methods usually looked over the multi-step complexity of network diffusions, and assumed that the infection can propagate ahead only one hop before the end. |
T116 |
32581-32689 |
Epistemic_statement |
denotes |
In other words, if a node initiates an infection, the diffusion spread is at most its first-order neighbors. |
T117 |
32690-32735 |
Epistemic_statement |
denotes |
We call this as one-hop assumption hereafter. |
T118 |
32973-33116 |
Epistemic_statement |
denotes |
Under this observation, we will propose a tractable method upon the one-hop assumption to approximate the marginal reduction D(S) − D(S ∪ {u}). |
T119 |
33117-33554 |
Epistemic_statement |
denotes |
We first introduce a useful result in probability theory: if the random variable X i is distributed geometrically with parameter p i for i = 1, · · ·, n and they are independent, then ∧ i=1,· · ·,n X i is distributed geometrically with parameter 1 − According to the characteristic of the DSI model, the time cost that an infected node w spends in infecting its uninfected neighbor v is distributed geometrically with parameter ip(w, v). |
T120 |
33680-33774 |
Epistemic_statement |
denotes |
Under the onehop assumption, the expected detection time of w can thus be calculated like this |
T121 |
33775-33896 |
Epistemic_statement |
denotes |
If the initially infected node w has no neighbors in sensor set S, the expected detection time of w is defined as T max . |
T122 |
35738-35822 |
Epistemic_statement |
denotes |
The Twitter and Epinions data sets can both be obtained from Stanford Datasource 2 . |
T123 |
35823-35954 |
Epistemic_statement |
denotes |
Epinions is a general consumer review site where visitors can read reviews about a variety of items to help them decide a purchase. |
T124 |
35955-36072 |
Epistemic_statement |
denotes |
The synthetic Small-world data set is the type of graphs in which each node can be reached by a small number of hops. |
T125 |
36933-37066 |
Epistemic_statement |
denotes |
A heuristic algorithm which requires any pair of sensors to be at least d hops away, where d is as large as it can choose k monitors. |
T126 |
37864-38159 |
Epistemic_statement |
denotes |
We mainly report results on a uniform infection probability of 0.1 assigned to each directed link in the network, i.e., ip(u, v) = 0.1 for any directed edge (u, v) ∈ E. One can refer to the work [14, 33, 36] for learning real values of the parameters {ip(u, v) : (u, v) ∈ E} from available data. |
T127 |
38414-38663 |
Epistemic_statement |
denotes |
Here we select 10 nodes with the highest in-degrees in each network as the sensor set S. We can find that Propagation Simulation and Snapshot Simulation release almost the same estimation results, which confirms their equivalence in estimating D(S). |
T128 |
38912-39098 |
Epistemic_statement |
denotes |
We can see that the cumulative time cost of Propagation Simulation increases linearly, while that of Snapshot Simulation has a big jump at the first sensor set and then increases slowly. |
T129 |
39099-39310 |
Epistemic_statement |
denotes |
The reason behind is that Snapshot Simulation needs to establish numerous snapshots to estimate the first D(S 1 ), and these established snapshots can be reused in the posterior estimations of {D(S i )} 10 i=2 . |
T130 |
39311-39400 |
Epistemic_statement |
denotes |
Table 4 shows the gap between the real value of remaining time R(S) and its upper bounds. |
T131 |
39678-39847 |
Epistemic_statement |
denotes |
From the results, we can observe that the call number in UBG and its two accelerations are significantly reduced compared to that in CELF, especially at the first round. |
T132 |
39848-39985 |
Epistemic_statement |
denotes |
One may notice that in Table 5 , CELF occasionally defeats our methods, but the total call number of our methods are much less than CELF. |
T133 |
40265-40362 |
Epistemic_statement |
denotes |
From the observation, we can conclude that our UBG is more efficient than CELF on large networks. |
T134 |
41459-41543 |
Epistemic_statement |
denotes |
One may argue that such a low improvement of UBG can be neglected in large networks. |
T135 |
42294-42381 |
Epistemic_statement |
denotes |
We assign a uniform infection probability ip to each directed link under the DSI Model. |
T136 |
42407-42502 |
Epistemic_statement |
denotes |
7 , we can conclude that with the parameter ip growing larger, the CELF is more time-consuming. |
T137 |
43266-43394 |
Epistemic_statement |
denotes |
We take infimum operations on both sides of the first step conclusion, i.e., inf T(u, S 1 ∪ S 2 ) = inf T(u, S 1 ) ∪ T(u, S 2 ). |
T138 |
43734-44069 |
Epistemic_statement |
denotes |
3) Similarly, in order to prove this property, we only need to , {u}) , i.e., for any t ≥ 0, the probability that t is located in T(u, {v}) equals the probability that t is located in 4) If S ⊂ T, in order to prove this property, we only need to prove T(u, S) ⊂ T(u, T ) and notice the fact that the infimum of a larger set is smaller. |
T139 |
44203-44285 |
Epistemic_statement |
denotes |
So, T(u, S) ⊂ T(u, T ), by which we can easily conclude the property, is verified. |
T140 |
44489-44536 |
Epistemic_statement |
denotes |
Therefore, we can conclude 0 ≤ (u, S) ≤ T max . |
T141 |
44984-45135 |
Epistemic_statement |
denotes |
(A2): G = (V, E) is a directed graph, where each edge e ∈ E is associated with a weight w e which is geometrically distributed with a parameter p, i.e. |
T142 |
45224-45320 |
Epistemic_statement |
denotes |
What is the probability of the event that the shortest path from s to t has length T at least k? |
T143 |
45498-45668 |
Epistemic_statement |
denotes |
Also, we add a source node s and an edge (s, x i ), associated with weight w (s,x i ) which is geometrically distributed with parameter 1/2, for any i ∈ {i : (i, j) ∈ E}. |
T144 |
45960-46215 |
Epistemic_statement |
denotes |
Now we can build a probability-preserving bijection between the valuations of X i , Y j and the subgraphs of the constructed random graph: for a valuation , the corresponding subgraph is the one where each edge adjacent to x i has length 1 iff (X i ) = 1. |
T145 |
46638-47001 |
Epistemic_statement |
denotes |
Note that any edge of length greater than 1 is irrelevant as the structure of the graph, which ensures it can never be part of a path of length 3 from s to t, For any path from s to t must jump three times: from s to some x i , from x i to some y j , and from y j to t. Thus, from what we have stated, (A1) can be reducible to (A2), which implies (A2) is #P-hard. |
T146 |
47002-47057 |
Epistemic_statement |
denotes |
We aim to solve the problem (denoted by (A)) of calcu- |
T147 |
47058-47210 |
Epistemic_statement |
denotes |
In order to prove the random variables T(u, S) and c(u, S) are identically distributed, we need to prove P[T (u, S) = t] = P[c(u, S) = t] for any t > 0. |
T148 |
47211-47274 |
Epistemic_statement |
denotes |
On the one hand, we can prove P[T (u, S) = t] ≤ P[c(u, S) = t]. |
T149 |
47275-47385 |
Epistemic_statement |
denotes |
Given t > 0, we find that for any simulation with T(u, S) = t, a snapshot with c(u, S) = t can be constructed. |
T150 |
48125-48193 |
Epistemic_statement |
denotes |
On the other hand, we can also get P[c(u, S) = t] ≤ P[T (u, S) = t]. |
T151 |
48194-48278 |
Epistemic_statement |
denotes |
For any snapshot with c(u, S) = t, a simulation with T(u, S) = t can be constructed. |
T152 |
48855-48944 |
Epistemic_statement |
denotes |
We now can construct a simulation of the diffusion model (I t ) t≥0 = (J t+1 \J t ) t≥0 . |
T153 |
49403-49554 |
Epistemic_statement |
denotes |
To address this problem, we proposed an upper bound based greedy (UBG) algorithm and its two accelerative algorithms to cater for lager scale networks. |
T154 |
49792-49840 |
Epistemic_statement |
denotes |
There are several interesting future directions. |
T155 |
49841-50012 |
Epistemic_statement |
denotes |
First, the detection time D(S) is # P-hard to calculate exactly, it is still a question how to design an efficient algorithm to estimate D(S) with a theoretical guarantee. |
T156 |
50013-50204 |
Epistemic_statement |
denotes |
Second, the infection probability ip and prior distribution are predefined in this paper, how to learn these parameters for the DSI model from available cascade data still remains unexplored. |