## A basic epidemiological mannequin: the *SIR* mannequin

Earlier than diving into the real-life software, we first introduce the mannequin that will likely be used.

There are various epidemiological fashions however we’ll use one of many easiest, the ** SIR mannequin**. Tim Church buildings’ clarification of this mannequin and easy methods to match it utilizing R is so good, I’ll reproduce it right here with a couple of minor adjustments.

The essential concept behind the *SIR* mannequin (**S**usceptible – **I**nfectious – **R**ecovered) of communicable illness outbreaks is that there are three teams (additionally referred to as compartments) of individuals:

- those that are wholesome however vulnerable to the illness:
*S* - the infectious (and thus, contaminated) individuals:
*I* - individuals who have recovered:
*R*

To mannequin the dynamics of the outbreak we’d like three differential equations to explain the charges of change in every group, parameterised by:

- (beta) which controls the transition between
*S*and*I* - (gamma) which controls the transition between
*I*and*R*

Formally, this provides:

[frac{dS}{dt} = – frac{beta IS}{N}]

[frac{dI}{dt} = frac{beta IS}{N} – gamma I]

[frac{dR}{dt} = gamma I]

Earlier than becoming the *SIR* mannequin to the information, step one is to precise these differential equations as an R perform, with respect to time *t*.

```
SIR <- perform(time, state, parameters) { par <- as.listing(c(state, parameters)) with(par, { dS <- -beta * I * S / N dI <- beta * I * S / N - gamma * I dR <- gamma * I listing(c(dS, dI, dR)) })
}
```

## Becoming a *SIR* mannequin to the Belgium knowledge

To suit the mannequin to the information we’d like two issues:

- a solver for these differential equations
- an optimiser to search out the optimum values for our two unknown parameters, (beta) and (gamma)

The perform `ode()`

(for abnormal differential equations) from the `{deSolve}`

R bundle makes fixing the system of equations simple, and to search out the optimum values for the parameters we want to estimate, we will simply use the `optim()`

perform constructed into base R.

Particularly, what we have to do is minimise the sum of the squared variations between (I(t)), which is the variety of individuals within the infectious compartment (I) at time (t), and the corresponding variety of instances as predicted by our mannequin (hat{I}(t)). This amount is named the residual sum of squares (*RSS*):

[RSS(beta, gamma) = sum_t big(I(t) – hat{I}(t) big)^2]

To be able to match a mannequin to the incidence knowledge for Belgium, we’d like a worth *N* for the preliminary uninfected inhabitants. The inhabitants of Belgium in November 2019 was 11,515,793 individuals, in accordance with Wikipedia. We’ll thus use *N = 11515793* because the preliminary uninfected inhabitants.

Subsequent, we have to create a vector with the day by day cumulative incidence for Belgium, from February 4 (when our day by day incidence knowledge begins), by way of to March 30 (final obtainable date on the time of publication of this text). We’ll then examine the anticipated incidence from the *SIR* mannequin fitted to those knowledge with the precise incidence since February 4. We additionally have to initialise the values for *N*, *S*, *I* and *R*. Notice that the day by day cumulative incidence for Belgium is extracted from the `{coronavirus}`

R bundle developed by Rami Krispin.

```
# devtools::install_github("RamiKrispin/coronavirus")
library(coronavirus)
knowledge(coronavirus) `%>%` <- magrittr::`%>%` # extract the cumulative incidence
df <- coronavirus %>% dplyr::filter(Nation.Area == "Belgium") %>% dplyr::group_by(date, sort) %>% dplyr::summarise(complete = sum(instances, na.rm = TRUE)) %>% tidyr::pivot_wider( names_from = sort, values_from = complete ) %>% dplyr::prepare(date) %>% dplyr::ungroup() %>% dplyr::mutate(lively = confirmed - demise - recovered) %>% dplyr::mutate( confirmed_cum = cumsum(confirmed), death_cum = cumsum(demise), recovered_cum = cumsum(recovered), active_cum = cumsum(lively) ) # put the day by day cumulative incidence numbers for Belgium from
# Feb Four to March 30 right into a vector referred to as Contaminated
library(lubridate)
Contaminated <- subset(df, date >= ymd("2020-02-04"))$confirmed_cum # Create an incrementing Day vector the identical size as our
# instances vector
Day <- 1:(size(Contaminated)) # now specify preliminary values for N, S, I and R
N <- 11515793
init <- c( S = N - Contaminated[1], I = Contaminated[1], R = 0
)
```

Then we have to outline a perform to calculate the *RSS*, given a set of values for (beta) and (gamma).

```
# outline a perform to calculate the residual sum of squares
# (RSS), passing in parameters beta and gamma which can be to be
# optimised for the perfect match to the incidence knowledge
RSS <- perform(parameters) { names(parameters) <- c("beta", "gamma") out <- ode(y = init, occasions = Day, func = SIR, parms = parameters) match <- out[, 3] sum((Contaminated - match)^2)
}
```

Lastly, we will match the *SIR* mannequin to our knowledge by discovering the values for (beta) and (gamma) that minimise the residual sum of squares between the noticed cumulative incidence (noticed in Belgium) and the anticipated cumulative incidence (predicted by our mannequin). We additionally have to examine that our mannequin has converged, as indicated by the message proven beneath:

```
# now discover the values of beta and gamma that give the
# smallest RSS, which represents the perfect match to the information.
# Begin with values of 0.5 for every, and constrain them to
# the interval Zero to 1.0 # set up.packages("deSolve")
library(deSolve) Choose <- optim(c(0.5, 0.5), RSS, technique = "L-BFGS-B", decrease = c(0, 0), higher = c(1, 1)
) # examine for convergence
Choose$message
```

`## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"`

Convergence is confirmed. Now we will look at the fitted values for (beta) and (gamma):

```
Opt_par <- setNames(Choose$par, c("beta", "gamma"))
Opt_par
```

`## beta gamma ## 0.5986056 0.4271395`

Do not forget that (beta) controls the transition between *S* and *I* (i.e., vulnerable and infectious) and (gamma) controls the transition between *I* and *R* (i.e., infectious and recovered). Nevertheless, these values don’t imply rather a lot however we use them to get the fitted numbers of individuals in every compartment of our *SIR* mannequin for the dates as much as March 30 that had been used to suit the mannequin, and examine these fitted values with the noticed (actual) knowledge.

```
sir_start_date <- "2020-02-04" # time in days for predictions
t <- 1:as.integer(immediately() - ymd(sir_start_date)) # get the fitted values from our SIR mannequin
fitted_cumulative_incidence <- knowledge.body(ode( y = init, occasions = t, func = SIR, parms = Opt_par
)) # add a Date column and the noticed incidence knowledge
library(dplyr)
fitted_cumulative_incidence <- fitted_cumulative_incidence %>% mutate( Date = ymd(sir_start_date) + days(t - 1), Nation = "Belgium", cumulative_incident_cases = Contaminated ) # plot the information
library(ggplot2)
fitted_cumulative_incidence %>% ggplot(aes(x = Date)) + geom_line(aes(y = I), color = "pink") + geom_point(aes(y = cumulative_incident_cases), color = "blue") + labs( y = "Cumulative incidence", title = "COVID-19 fitted vs noticed cumulative incidence, Belgium", subtitle = "(Pink = fitted incidence from SIR mannequin, blue = noticed incidence)" ) + theme_minimal()
```

From the above graph we see that the variety of noticed confirmed instances follows, sadly, the variety of confirmed instances anticipated by our mannequin. The truth that each traits are overlapping signifies that the pandemic is clearly in an exponential part in Belgium. Extra knowledge could be wanted to see whether or not this pattern is confirmed in the long run.

The next graph is comparable than the earlier one, besides that the *y*-axis is measured on a log scale. This sort of plot is known as a semi-log plot or extra exactly a log-linear plot as a result of solely the *y*-axis is reworked with a logarithm scale. Remodeling the dimensions in log has the benefit that it’s extra simply readable by way of distinction between the noticed and anticipated variety of confirmed instances and it additionally reveals how the variety of noticed confirmed instances differs from an exponential pattern.

`fitted_cumulative_incidence %>% ggplot(aes(x = Date)) + geom_line(aes(y = I), color = "pink") + geom_point(aes(y = cumulative_incident_cases), color = "blue") + labs( y = "Cumulative incidence (log scale)", title = "COVID-19 fitted vs noticed cumulative incidence, Belgium", subtitle = "(Pink = fitted incidence from SIR mannequin, blue = noticed incidence)" ) + theme_minimal() + scale_y_log10()`

The plot signifies that, firstly of the pandemic and till March 12, the variety of confirmed instances stayed beneath what could be anticipated in an exponential part. Specifically, the variety of confirmed instances stayed fixed at 1 case from February Four to February 29. From March 13 and till March 30, the variety of confirmed instances saved rising at a charge near an exponential charge.

We additionally discover a small bounce between March 12 and March 13, which can probably point out an error within the knowledge assortment, or a change within the testing/screening strategies.

## Replica quantity (R_0)

Our *SIR* mannequin seems to be like a superb match to the noticed cumulative incidence knowledge in Belgium, so we will now use our fitted mannequin to calculate the essential copy quantity (R_0), additionally referred as fundamental copy ratio.

The copy quantity offers the common variety of vulnerable people who find themselves contaminated by every infectious individual. In different phrases, the copy quantity refers back to the variety of wholesome those that get contaminated per variety of sick individuals. When (R_0 > 1) the illness begins spreading in a inhabitants, however not if (R_0 < 1). Normally, the bigger the worth of (R_0), the tougher it’s to manage the epidemic.

Formally, we’ve:

[R_0 = frac{beta}{gamma}]

We will compute it in R:

`Opt_par`

`## beta gamma ## 0.5986056 0.4271395`

```
R0 <- as.numeric(Opt_par[1] / Opt_par[2])
R0
```

`## [1] 1.401429`

An (R_0) of 1.Four is barely beneath the values calculated by others for COVID-19 and the (R_0) for SARS and MERS, that are related illnesses additionally brought on by coronavirus. That is primarily as a consequence of the truth that the variety of confirmed instances stayed fixed and equal to 1 firstly of the pandemic.

A (R_0) of 1.Four implies that, on common in Belgium, 1.Four individuals are contaminated for every contaminated individual.

## Utilizing our mannequin to investigate the outbreak if there was no intervention

It’s instructive to make use of our mannequin fitted to the primary 56 days of obtainable knowledge on confirmed instances in Belgium, to see what would occur if the outbreak had been left to run its course, with out public well being intervention.

```
# time in days for predictions
t <- 1:120 # get the fitted values from our SIR mannequin
fitted_cumulative_incidence <- knowledge.body(ode( y = init, occasions = t, func = SIR, parms = Opt_par
)) # add a Date column and be a part of the noticed incidence knowledge
fitted_cumulative_incidence <- fitted_cumulative_incidence %>% mutate( Date = ymd(sir_start_date) + days(t - 1), Nation = "Belgium", cumulative_incident_cases = I ) # plot the information
fitted_cumulative_incidence %>% ggplot(aes(x = Date)) + geom_line(aes(y = I), color = "pink") + geom_line(aes(y = S), color = "black") + geom_line(aes(y = R), color = "inexperienced") + geom_point(aes(y = c(Contaminated, rep(NA, size(t) - size(Contaminated)))), color = "blue" ) + scale_y_continuous(labels = scales::comma) + labs(y = "Individuals", title = "COVID-19 fitted vs noticed cumulative incidence, Belgium") + scale_colour_manual(title = "", values = c( pink = "pink", black = "black", inexperienced = "inexperienced", blue = "blue" ), labels = c( "Prone", "Recovered", "Noticed incidence", "Infectious" )) + theme_minimal()
```

The identical graph in log scale for the *y*-axis and with a legend for higher readability:

```
# plot the information
fitted_cumulative_incidence %>% ggplot(aes(x = Date)) + geom_line(aes(y = I, color = "pink")) + geom_line(aes(y = S, color = "black")) + geom_line(aes(y = R, color = "inexperienced")) + geom_point(aes(y = c(Contaminated, rep(NA, size(t) - size(Contaminated))), color = "blue")) + scale_y_log10(labels = scales::comma) + labs(y = "Individuals (log scale)", title = "COVID-19 fitted vs noticed cumulative incidence, Belgium") + scale_colour_manual(title = "", values = c(pink = "pink", black = "black", inexperienced = "inexperienced", blue = "blue"), labels = c("Prone", "Noticed incidence", "Recovered", "Infectious")) + theme_minimal()
```

### Extra abstract statistics

Different fascinating statistics will be computed from the match of our mannequin. For instance:

- the date of the height of the pandemic
- the variety of extreme instances
- the variety of individuals in want of intensive care
- the variety of deaths

```
match <- fitted_cumulative_incidence # peak of pandemic
match[fit$I == max(fit$I), c("Date", "I")]
```

```
## Date I
## 87 2020-04-30 525315.7
```

```
# extreme instances
max_infected <- max(match$I)
max_infected / 5
```

`## [1] 105063.1`

```
# instances with want for intensive care
max_infected * 0.06
```

`## [1] 31518.94`

```
# deaths with supposed 0.7% fatality charge
max_infected * 0.007
```

`## [1] 3677.21`

Given these predictions, with the very same settings and no intervention in any respect to restrict the unfold of the pandemic, the height in Belgium is anticipated to be reached by the tip of April. About 525,000 individuals could be contaminated by then, which interprets to about 105,000 extreme instances, about 32,000 individuals in want of intensive care and as much as 3,700 deaths (assuming a 0.7% fatality charge, as urged by this supply).

At this level, we perceive why such strict containment measures and laws are taken in Belgium!

Notice that these predictions must be taken with loads of warning. On the one hand, as talked about above, they’re based mostly on reasonably unrealistic assumptions (for instance, no public well being interventions, mounted copy quantity (R_0), and so forth.). Extra superior projections are potential with the `{projections}`

bundle, amongst others (see this part for extra info on this matter). However, we nonetheless need to watch out and strictly comply with public well being interventions as a result of earlier pandemics equivalent to H1N1 and Spanish flu have proven that extremely excessive numbers usually are not inconceivable!

The aim of this text was to provide an illustration of how such analyses are completed in R with a easy epidemiological mannequin. These are the numbers our easy mannequin produces and we hope they’re mistaken.