Spatial Visualization of Cluster-Specific COVID-19 Transmission Network in South Korea During the Early Epidemic Phase
Abstract
Coronavirus disease 2019 (COVID-19) has been rapidly spreading throughout China and other countries including South Korea. As of March 12, 2020, a total number of 7,869 cases and 66 deaths had been documented in
Using spatial visualization, this paper identified two early transmission clusters in South Korea (Daegu cluster and capital area cluster). Using a degree-weighted centrality measure, this paper proposes potential super-spreaders of the virus in the visualized clusters.
Compared to various epidemiological measures such as the basic reproduction number, spatial visualizations of the cluster-specific transmission networks and the proposed centrality measure may be more useful to characterize super-spreaders and the spread of the virus especially in the early epidemic phase.
The first pneumonia cases of unknown origin were identified in Wuhan in early December
China and other countries including South Korea. As of March 17, 2020, a total of 198,181 laboratory-confirmed cases had been documented globally with 7,965 deaths. The World Health Organization (WHO) has declared COVID-19 an international public health concern.2 The confirmed patients in South Korea had either visited or came from China. Secondary and tertiary transmissions have occurred since then, which have led to an accelerating rate of transmission in South Korea. As of March 17, 2020, a total number of 8,320 cases and 81 deaths had been documented in South Korea.
With the launch of COVID-19 data hub, officials from the White House and other national organizations issued a call to action for researchers in a multitude of disciplines such as computer science, epidemiology, economics, and statistics. Open access data such as epidemiological data, interactive web-based dashboards, and descriptive statistics have informed many about the current state of the pandemic.3,4 With a concomitant effort to combat the virus and to better understand virus etiologies, Korea Centers for Disease Control and Prevention (KCDC), an organization under the South Korean Ministry of Welfare and Health, has made many datasets available online that are unique to COVID-19 confirmed South Korea cases.5 The datasets only include confirmed COVID-19 patients with unique numeric patient identifiers, geographical data, and infection information if available. In an epidemiological dataset, they released the region of the affected patient, the identifier of the person who infected the patient, . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint and the number of contacts with other people. The aim of this report is to create spatial visualizations of early COVID-19 transmission networks in South Korea using these data, which may indicate transmission patterns for each network.
The time series data of COVID-19 status in South Korea is analyzed to provide updated statistics. Using a spatial visualization of confirmed patients during an early epidemic phase, two major clusters are identified. As of March 12, 7,869 positive cases had been documented in South Korea, and 70 positive cases have information of the identifiers of who infected them.
Although the first confirmed case in South Korea was identified on January 20, 2020, the number of confirmed cases showed a rapid growth on February 19, 2020 with a total number of 1,261 cases with 12 deaths based on the KCDC.6
As of March, newly reported cases in South Korea show that the numbers of positive cases and deaths seem to be declining and new cases remain within known clusters. Therefore, identifying early clusters and examining the confirmed cases in these early clusters, from January 20, 2020
to February 19, 2020 are crucial because these clusters remain the longest lasting sources of transmission. Out of 70 patients, only a subset of patients infected from confirmed cases from an early epidemic phase (January 20, 2020 to February 19, 2020) is used to create the network from the epidemiological data to further visualize the transmission networks of these two clusters. All the analysis and visualizations are performed using the ggplot2 software in R as well as Cytoscape.7,8
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint
The time series data contains both overall statistics such as the number of tests as well as geographical data within South Korea from January 20, 2020 to March 12, 2020. Figure 1 shows the time series data of the cumulative COVID-19 statistics from January 20, 2020 to March 12, 2020. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint probability distributions, which further complicates the interpretability. Instead, visualizing the transmission networks could be useful to understand the spread of the virus.
Although it may be clear that the 31st case in the Daegu cluster is a super-spreader, it is uncertain who is the super-spreader in the capital area cluster. Out of 15 distinct cases in the network, nine cases have reported degrees in the dataset (number of contacts), which allows the use of centrality algorithms to understand the role of particle nodes in a graph and their impact on this transmission network. denotes the degree of the ℎ case. Since six nodes are missing degrees, the population average degree is used to impute missing degree information. We define a population degree , which is calculated after imputing missing degrees with an assumption that every node in the network is independent of each other. Table 1 shows the number of degrees for each case before and after imputation. case number degree degree imputed 3 16 16 6 17 17 10 43 43 11 0 0 21 6 6 28 1 1 29 117 117 30 27 27 56 32 83 32 112 32 136 32 362 32 1257 32 1913 61 61 Table 1 . Number of degrees in the capital cluster before after imputation . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint Betweenness centrality is another graph centrality measure that captures the influence of a node over the flow of information between every pair of nodes in the network with the assumption that information flows over the shortest paths between them. Between centrality ( ) for a node is defined as
where is the number of shortest paths with edges and as their end edges while ( ) is the number of those shortest paths that include node . 11 We propose a degree-weighted betweenness centrality ( ), which prioritizes nodes with high degrees while penalizing them by ( ) = ∑ ( ) ≠ ≠ * to capture the super-spreader in the capital area network. By looking at betweenness centrality only, the 6th case who transmitted the virus to five distinct cases has the highest betweenness centrality. However, a degree-weighted measure indicates that the 29th case with a much larger degree is the most central node in the network. This metric may be useful for small networks with limited information to identify super-spreaders in the early transmission networks.
What happened in China shows that quarantine, social distancing, and isolation of infected populations may be able to contain the epidemic.12 This is encouraging for the many countries where COVID-19 is beginning to spread. South Korea once had the fastest growing rate of infection outside of China. Korea's confirmed cases have risen rapidly since the identification of the super node in the Daegu cluster since late February. Since then, the country has shown success in its mitigation efforts in both the number of newly confirmed cases and deaths. The majority of new cases originate from those original clusters, one of which is likely a superspreader, which is suggested by the spatial network generated.
Similar observations were seen during the Middle East respiratory syndrome (MERS) in South Korea where the syndrome was spread rapidly by super-spreaders.13 Therefore, it is important to have a better understanding of these clusters during the early epidemic phase, and visualizing them may help us understand how the virus is being spread. Spatial networks can visualize early transmission clusters, and the proposed degree-weighted betweenness centrality measure can further help identify super-spreaders in the identified clusters, which may not only reduce the . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint spread of the virus but may also help with policymaking such as enforced social distancing or quarantining.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint
|