Simulating epidemiological processes using community-structured scale-free Networks

The transmission mechanisms of infectious diseases are well known and treated in the literature. However, considering that the interactions during the transmission are complex, they may be simulated by computational models. The understanding of the disease dynamics is of great interest, since it makes possible to avoid implications and consequences of a possible outbreak. In this sense, the objective of this work is to present a model that employs scale-free networks with community structures to represent the interactions between individuals of a population. According to the results, the tool simulated successfully the spread of Influenza in a population of 6,500 individuals and proved consistent with parameters found in the literature. As main conclusion, we argue that the community structure in a contact network can significantly affects the disease dynamics.


Introduction
Epidemiology is a science that quantitatively studies the distribution of health-disease phenomenon in human populations and its determining factors.Some authors also claims that epidemiology also allows assessing the effectiveness of interventions in public health.Epidemiological research is responsible for producing knowledge about the health-disease through studying the frequency and distribution of diseases in the human population and identifying their causes.From individual point of view, the course of disease is described by the period between the time at which the person starts to experience symptoms of the disease and the time at which the disease end (e.g., cure or death).However, from epidemiological point of view is much more important to understand the distribution in time and space of the infectious contacts taken by the individual infected with other individuals and how this is reflected in the spread of infection by the population.
The effective management of diseases might save millions of people and also play an important role in recovering the disruptions in social life and economy.For better disaster management and surveillance plans against an epidemic, it is important to determine the estimated number of individuals that need to be treated, cared, isolated, vaccinated and hospitalized.In this sense, epidemiological modeling enables researchers to determine these requirements [1].Currently, it is possible to find in literature mathematical and computational models aiming to contribute to the understanding and eradication of infectious diseases and several epidemiological models were proposed using different approaches for the simulation of disease dynamics.Most used approaches are the equation-based models [2,3], cellular automata models [4,5], agent-based models [6,7] and models network-based models.
Even though these approaches are able to describe some aspects of the propagation of infectious disease through populations, it is clear that human societies are much more complex than well mixed populations (as occurs in equation models), or two dimensional lattices, or sets of diffusing individuals.This is also true for other complex systems where a set of individuals form a kind of network through which some field can propagate in the same way as an infectious disease.Considering this scope, the objective of this work is to provide an epidemiological simulation model using scale-free networks with community structures for the representation of interactions between individuals of a population.The use of free-scale networks for this purpose has already been explored in the literature [8,9,10] including the analysis of community-structured ones [11,12], however, these articles present a theoretical approach, dealing with the physical aspects of networks, while our work aims to present a model implementation and practical results.This paper is organized as follows.Section 2 discusses the compartmental models used for modeling the processes of disease transmission.Section 3 present the concepts related to scale-free community-structured networks.Section 4 describes the proposed epidemiological model and its implementation.Results and discussions are provided in Section 5. Finally, conclusion and future works are in Section 6.

Compartmental Models
Most epidemic models are based on dividing the host population (e.g, humans or vectors) into a small number of compartments, each containing individuals that are identical in terms of their status with respect to the disease in question [13].Such compartments are, for example, the group of individuals susceptible to the disease, the group of infectious individuals and the group of immune individuals.
In diseases transmission study, the individuals that compose the host population are divided into compartments, and the most common are: Susceptible (S) -healthy individuals, but susceptible to contract the disease when is in contact with infected ones; Exposed (E) -individuals carrying the disease, but they are not transmitters, so the individuals who have been infected by the parasite cannot transmit it to others; Infectious (I) -individuals who have or not the disease symptoms, but are capable to transmit it to others; Removed (R) -individuals removed, in other words, those that are isolated, or killed, or recovered and immune to the disease.
The choice of which compartments to use in a model depends on the characteristics of a particular disease and the purpose of the model.For example, in SEIR model (Figure 1) an individual starts in S class, i.e, the class of people who can get infected.So when there is an adequate contact of an infective individual (from class I) with a susceptible individual and this individual gets infected, then this susceptible enters the class of exposed individuals E. These are the people in latent period who are infected but not yet infectious.When they become infectious (they are able to communicate the disease), then they enter class I.And finally, they enter the class R, acquiring permanent immunity.Acronyms for epidemiology models are based on the flow patterns between these compartments: SI, SIS, SIR, SIRS, SEIR, SEIRS, SEI, SEIS [14].Many epidemiological models have only temporal dynamic, and assume that, the contacts among the individuals of different classes are homogeneous, without taking into consideration spatial aspects and that the contacts are made through individual interactions, making them not consistent with reality.Network models provide a natural way of describing a population and their interactions.Nodes (vertices) of the network represent individuals and edges (links) depict interactions between individuals that could potentially lead to transmission of infection.Similar network representations can be used in a number of contexts, such as transportation networks, communication networks and social networks [15].
The epidemics dynamics has been recently studied in several types of networks, such as random networks, small-world networks and scale-free networks.In this work, we apply scale-free networks for representing the population [16].These network class are called this way because they do not have a pattern for the number of links as happens in random networks which follows a Poisson distribution.Scale-free networks present a feature, called preferential attachment, which is the tendency of a new vertex to connect at a vertex's network that has a high degree of links.This characteristic involves networks with few vertexes highly connected, called hubs, and many vertexes with few connections [15].Thus, it makes no sense to speak of scale or average number of edges.
The scale-free networks model more used currently it is the Barabási-Albert model, and it is based on two factors: growth and preferential attachment [16].The network creation begins with a small number of nodes N 0 connected and every time a new node is added which forms M new connections, with M < N 0 at different nodes already present in the network, following the concept of preferential attachment.Preferential attachment is modeled using a probabilistic relation, in which each time step each vertex possess a probability Π to acquire a new connection.The probability of the node i receive a connection depends the number of connections existing already, such Π(k i ) = k i / k j , where k i is the node degree and k j is the sum of degrees of whole network.

Community-Structured Networks
A community can be understood as a set of inter-related individuals that inhabit the same place, and is defined in a complex network as a subset of vertices that have more numerous connections with their internal elements than the other networks elements.The existence of structure of communities can substantially affect the epidemiological dynamics, and this work aims to study this effects in community-structured scale-free networks.
An example of creation of a complex network with structure of communities is shown in Figure 2.The network creation begins with C nuclei of communities, where each one contains M 0 totally connected vertices.In the next step, the vertices are linked with elements of this own community, according to the BA model.After the creation of isolated communities, the inter-community vertices are created.The number of links among communities could vary according to parameters of the model.
The result of the network building is the individuals contact pattern that will be used in the simulation of epidemiological processes.Note that this pattern can be modified during the iterations by, for example, a change of population behavior, by insertion or remotion of vertices representing births, deaths or migration of individuals.Other ways to generate free-scale networks with structure of communities can be found in the literature [11,12].

Proposed Model
The proposed model is based in the concepts presented in Sections 2 and 3.Each individual is represented by a set of attributes that includes identification number, gender, age, community and compartment that could be used in the simulation of different diseases.In addition, the contacts between the individuals of the population is represented by a Free-Scale Network, considering community structure.This communities can be used to represent different cities, districts or locations (houses, schools, hospitals, etc.), depending on the scale to be simulated.
The disease dynamics is simulated through temporal evolution of the contacts between individuals, defined by contact network and according to a compartmental model representing a specific disease.The initial individuals condition and the disease characteristics are set by parameters informed to the model.

Model Implementation
The model was implemented in C language under Linux operating system and uses the library Iniparser3 standardize the input files and the SQLite4 database to store the results.The tool is composed of 3 distinct modules responsible for the generation of population, relationship network and simulation of epidemiological phenomenon.Figure 3 illustrates the simulation process using the 3 modules and described in Sections  Currently, there are ready-to-use packages for the generation of complex networks, e.g.Complex Networks Package for MatLab 5 and igraph for R 6 , however, we developed our own implementation to understand the whole process and to enable a tailor made simulation environment.

Population Module
The simulation process of an epidemiological process is initiated with the creation of target population through the population module.As input data it is used an input file (.ini file) defined by the user, as seen in Figure 3.This file contains the parameters for the creation of communities, indicating the name of the population, the number of individuals and distribution of gender and age. Figure 4 shows an example of file of population configuration.

Figura 4: Example of file configuration of population module
In this example, the input file defines two communities, Z 0 containing 1000 and Z 1 containing 1200 individuals.The communities are divided equally between male and female gender and each gender is divided into 4 age groups (f_age1, f_age2, f_age3, f_age4).The gender and age may be used in cases of diseases simulations that affect more individuals with certain characteristics such as measles and chickenpox for children between 0 and 2 years.
Once read the information, the population module creates the communities and their respective population, distributing to all individuals an identification number, gender, age, community (zone), initial compartment (status) and how many days is in this compartment (days), according to the input file .ini.These last two information are initialized with value 0 and will change only in the simulation moment.The information generated for each individual are stored in a population file (.pop), as can be seen in Figure 5.

Network Generator Module
After the population creation, the network generator module creates the contact network.As can be seen in Figure 3, the module has as input the population file (.pop), the initial parameters to network creation M 0 and N , and a file of communities adjacency (.adj).
The Figure 6 shows an example of a file .adjand the correspondent community-structured network.The file describes the connectivity between the communities and the maximum probability of an individual in a community C i has contact with an individual of a community C j .This means, for example, that an individual of the community "0" has maximum probability of 30% to have contact with an individual of the community "2" and 70% with an individual of the community "4".
The process to create contact networks follows the algorithm presented in Section 3.1, by which initially is created a core community and internal contacts to community and then is created the external contacts.As a result it is obtained a file (.net) which contains the edges between the elements, representing the possible contacts of an individual.An example is shown in Figure 7, which presents the links of the individual 1486 with the elements 5, 1005, 1172, 4081, 4513 and 4762.
Note that for each individual there is only one contacts list, which are related local and external contacts in the same way.Nevertheless, logically there are two networks levels, which a level specifies connectivity among the communities, given by the file .adjand the other the connectivity among individuals, given by the file .net.

Simulation Module
Defining the contact network, it is possible to simulate the disease transmission process in a population using the simulation module.As shown in Figure 8, the module has as input the population files (.pop), the network file (.net) and a configuration file of the simulation (.sim).In the file of initial configuration of the simulation (Figure 9) it is defined the period simulated (days), what type of compartmental model will be used (model), what population files (population) and network (edges) will be used, the disease characteristics, as the periods of infection (infection), latency (latency) and acquired immunity (immunity), mortality rates (mortality) and the probability of disease transmission (transmission) between an infected individual and a susceptible individual.The other parameters are concerning to initial distribution of elements susceptible, exposed, infected and immune.The simulation module algorithm is based on the compartmental models, as described in Section 2, implemented as a state machine.State machines allow us to think about the "state" of a system at a particular point in time and characterize the behavior of the system based on that state [17].The state of a system is defined as its condition at a particular point in time, and in the case of disease simulation using a SEIR model, the possible states are: susceptible (S), latent (E), infected (I) and recovered (R).In our model, the transitions that enable moving from one state to another are: • Contact with an infected individual: makes the individual leaves the susceptible condition for the latency condition, given a probability of transmission; • End of latency period: makes the individual leaves the latency condition for the infection condition, which allows infect other individuals; • End of infectivity period: the individual recovers from the disease, leaving the condition infected for the condition recovered; • Death: individual leaves any condition for the recovered condition.
Thus, each individual of the population has a state machine that determines his compartment.Note that the state machine can be different according to each model or simulated disease.The simulation module allows the addition of customized state machines, giving flexibility in modeling of several diseases.So far, we implemented machines for models SIS, SIR and SEIR.The state machine that represents a SEIR model (used in case studies of this work) can be seen in Figure 10.Simulation process starts with the reading of input files.After, the population characteristics parameters are loaded form .sim file, providing the initial state for the state machine of each individual.In the sequence, the time (t) is iterated, causing changes in individuals states according to the transitions shown in Figure 10 and represented in following pseudo-code.The results of a simulation is the evolution of disease transmission over the selected period.The output data are recorded in SQLite databases for further analysis.The choice to store the data in this kind of file is the portability and the ease of extracting many information using only SQL queries, without the need for a parser for each type of information we need to extract.

Results and Discussion
This section presents two case studies.In the first case the communities have a large amount of edges linking individuals from other communities.In the second case the communities have few edges of this kind.The objective of this test is to check how the links among communities would affect the spread of a specific disease.Note that the results presented are the average of 15 runs for each test in order to discard results statistically not relevant.

Case Study 1: Strongly-Coupled Communities
In this case study we specified five communities with a large number of inter-communities edges, distributed according to Figure 11.Note that in figure the values of inter-communities connectivity are different among them.This factor, as can be seen in results, influence significantly the process of disease transmission.The disease simulated is a variation of Influenza with latency of 3 days, period of infectivity of 4 days, mortality of 0.01%, transmission rate of 6%, and the end of the infection period confers partial immunity to 100 days.Therefore, the simulated model is the type SEIR.
Although in real situations should consider gender and age of the population, in the simulations these characteristics was unconsidered to facilitate the data analysis.The simulation started with 99.85% of susceptible individuals and 0.15% of infected ones, initially distributed in the community D.
In a community, the epidemic influenza usually reaches a peak in 2 to 3 weeks after the initial outbreak of the disease, and lasts between 5 to 10 weeks [18].We simulated a period of 60 days, mean time of the Influenza outbreak duration, and the results of the simulation can be observed in Figures 12 and 13.
The results are consistent with those data in the reference quoted.Figure 12 shows that the peak of infection occurred about the day 21, which match up with the peak natural of the disease.The results also show that the latency curve, which takes into account the latency period and the contacts network, has a peak that occurs after approximately 18 days.The natural period of an Influenza outbreak lasts from 35 to 70 days.The results of simulation show that 45 days after, the outbreak ended, remaining within the average interval.
Considering that initially only community D had individuals infected, the curve resulting from the simulation of this community shows that in fact the infection spread initially in this community, spreading in sequence to others.Considering that community E has direct connection with the community D, and it has the largest number of individuals, and consequently a greater number of external contacts, the curve of infection occurred following the curve of the community D and showed the greatest value of infection to 22 days of simulation.
In relation to the curve of the community B of Figure 12, although this community is connected directly with the community D, the rate of contact between them is small, as shown in Figure 11.Thus, the community B was significantly more infected by the community E, as the contact rate among these communities is high.As a result, the curve present a displacement in relation to the first two (D and E).In the case of community A, results show that the infection curve has a mild growth by the fact that the contact rates among the communities are low and the displacement of the curve is due to the fact there is no direct contact with the community D and that infection of their individuals occurs through individuals of the communities B, C and E. The case of the community C is similar, except to the fact that receives a significant contribution only from community A.

Strongly-Coupled Networks with Immunization
Using the same scenario presented in the previous section, we tested the effectiveness of immunization applied in population of community D, 10 days after the disease have appeared.
The results presented in Figure 14 show the importance of immunization in reducing the total number of infected.This simulation could to represent a hypothetical case of a mass vaccination in the community D, where appeared the first cases of infected individuals.This situation modeled the vaccination 10 days after the start of the outbreak, considering that the notification to the authorities of public health was not made immediately after the appearance of the first cases.This simulation shows the relevance of occurrence notification of infectious diseases, because measures can be taken to minimize the impact of infection in a specific population.
In fact, the results presented in Figure 12 show that 830 individuals would be infected by the disease if immunization is not applied.Considering immunization being realized only in the community D, the total number of infected dropped to 442 individuals.This shows that vaccination, even restricted to a part of the population (community initially infected) already provides relevant results in terms of public health.
To complement the previous discussion, Figure 15 shows that the curve of infection of the community D does not show any infected individual after 16 days.This result is by the fact that vaccination occurred in tenth day of simulation only to susceptible individuals.Taking into consideration that the latency period is 3 days and the infection period is 4 days, and that individuals which are in the first day of the latency stage was not vaccinated, we can observe a smaller growth in number of infected individuals and the total absence around the sixteenth day.

Case Study 2: Loosely-Coupled Networks
For the second case study the same scenario of the previous study was used.However, this time the communities have less connectivity among them, as illustrated in Figure 16.The results of simulation using weakly coupled communities can be seen in Figures 17 and 18 discussed before.Note that Figure 18 presents results of the fraction between the number of infected and the total number of individuals in the community, unlike the graphs of Figures 13 and 15.The justification for presenting the data in this form is to highlight the importance of connectivity among communities.Figure 17 shows that the qualitative behavior of the simulation using the weakly coupled communities, as expected, is similar to results obtained from the simulation with strongly coupled communities, since the number of individuals and the rate of contacts are the same in both cases.Two main differences between the results occur due to the fact that in a weakly coupled community there is a longer period to reach the peak of infection.In the first case, the peak was reached in 21 days, in the second case the peak occurred 2 days after.Besides this fact, considering the latency periods and the characteristic of contacts network, it is possible observed that the total number of infected was lower, decreasing from 830 to 729 infected individuals.A hypothesis to justify this value is that the retro-infections from other communities is lower, resulting of smaller connectivity than in the previous case.
According to the results shown in Figure 18, initially there is the infection of individuals of the community D. Considering the rates of contact and the connectivity inter-communities shown in Figure 16, the community E is the next to present infected individuals.Taking into consideration the contacts between the communities E-A and E-B and the disease latency period, the curves of the results of communities A and B start on the same day but have different growth due to the coupling of B to be different of the coupling of A.
Note that by the fact of the community D initially present the infected individuals and have direct contact with the community E, which has the largest number of individuals, the interaction inter-communities D-E-B is stronger than the interaction D-E-A-C, since the community C is initially isolated, not having infected individuals to feedback the infection in A.

Conclusion and Future Work
The objective of this work was to show a simulation tool of epidemiological phenomena based on scalefree networks with community structures.For each scenario it has two levels of contacts networks, where a level specifies connectivity between the communities and the other the connectivity between individuals.Note that logically, these networks are implemented in the same way, where the distinction is charge of the possibility to configure them in different ways in relation to topology and to contacts rate between individuals.
For the process simulation, we used a compartmental models implemented through state machines, where each compartment is represented by a state of the machine and the functional relations among the compartments are represented by transitions.
The results obtained for the case studies presented, representing hypothetical scenarios, were qualitative and quantitative consistent with the data found in literature.With proper parameterization and integration of real data about the population and disease, this model may be used in simulation of realistic events.
Aiming at the simulation of real phenomena, one of the works in progress consists to simulate diseases reported to health agencies in Cascavel -Paraná -Brazil.Considering that the number of inhabitants in the city is approximately 300,000, it is necessary parallelize the model aiming to improve performance and reduce the processing time.

Figura 2 :
Figura 2: Creation of a network with community structure: step 1 -creation of nuclei of the communities; step 2insertion of intra-community vertices; step 3 -insertion of inter-community connections.

Figura 5 :
Figura 5: Example individuals generated by population module

Figura 6 :
Figura 6: Example of a file .adjand corresponding community adjacencies

Figura 10 :
Figura 10: State machine of the SEIR model

Figura 17 :
Figura 17: Simulation results in the loosely-coupled network