Estimating R In The UK
Note this post needs revision as the method of calculating R0, especially after the peak, is incorrect - see here for details
I have written previously about the R0 package and how it can be used to estimate R, and I have used this to build a Shiny app to re-calculate R on demand with the latest data. In the app, I make a rather wildly inaccurate, or rather, unstable prediction about when the peak or plateau will end by using the slope of R. This instability magnifies the underlying uncertainty in estimated R over time, which in turn depends on the accuracy of the data and the R0 model.
What is the consensus value for R across the UK as I write this (12th May 2020)? According to this BBC article, R is currently in the range 0.75 to 1 (according to Imperial College London?). What does the R0 package with the “time-dependent” model say:
library(tidyverse) library(lubridate) library(R0)
confirmed <- read.csv(url("https://raw.githubusercontent.com/CSSEGISandData/2019-nCoV/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) confirmed <- subset(confirmed, select = -c(Lat, Long)) ukConfirmed <- confirmed %>% filter(Country.Region == "United Kingdom") sums <- colSums(ukConfirmed[,-match(c("Province.State", "Country.Region"), names(ukConfirmed))], na.rm=TRUE) sumsByDateCode <- as.data.frame(t(t(sums))) colnames(sumsByDateCode) <- c("count") sumsByDateCode$datecode <- rownames(sumsByDateCode) dailyCounts <-mutate(sumsByDateCode, date = mdy(substring(datecode, 2))) dailyCounts$day <- seq.int(nrow(dailyCounts)) dailyCounts <- dailyCounts %>% filter(count > 0)
We’ll continue to use a generation time distribution with a mean of 5 days, standard deviation 1.9:
meanGenerationTime <- generation.time("gamma", c(5, 1.9))
Let’s estimate R:
est <- estimate.R(dailyCounts$count, methods=c("TD"), GT=meanGenerationTime)
estimatedR <- est$estimates$TD$R[1:length(est$estimates$TD$R)-1] estimateRByDay <- as.data.frame(estimatedR) %>% rename(R = estimatedR) %>% mutate(day = row_number()) plot(estimateRByDay$day, estimateRByDay$R, xlab="days", ylab="R", type="l", main="Effective R") abline(h=1, col="gray60")
This model predicts that the latest value of R is 1.1087439, which is above the aforementioned range. Critically, it’s above 1.
So, can we quantify the errors in our estimate? For starters, the R0 package estimation produces a 95% confidence interval, which we can plot. Note that the confidence interval variable contains data for one day in the future, so we’ll slice it off before plotting:
confInt <- est$estimates$TD$conf.int %>% mutate(day = row_number()) %>% slice(-n()) plot(estimateRByDay$day, estimateRByDay$R, xlab="days", ylab="R", type="l", main="Effective R") abline(h=1, col="gray60") lines(confInt$day, confInt$upper, col="orange") lines(confInt$day, confInt$lower, col="orange")
The graph shows increasing confidence (ie narrower bands) as more data comes in and the slope settles down. The upper and lower confidence levels (shown above in orange) have a range on the last day of 1.0540665 to 1.1647471, so the R0 package is pretty confident with the figure 1.1087439.
So, setting aside R for a moment, what does the predicted case data look like, compared to actual case data? Note the predicted case data day 1 starts from the first case, which happens on day 10, so we need to offset our predictions to match this:
predictedCases <- as.data.frame(est$estimates$TD$pred) %>% rename(prediction = "est$estimates$TD$pred") %>% mutate(day = row_number() + dailyCounts$day) %>% slice(-n())
The actual cases are in black and the predictions in green. This doesn’t look like a bad fit either. Indeed it was seen previously that the “time-dependent” method produced a better fit than all the other options that the R0 package supplies.
We can see what the other methods the latest estimate of R to be easily enough:
allMethodsEstimate <- estimate.R(dailyCounts$count, methods=c("TD", "EG", "ML", "SB"), GT=meanGenerationTime) lastDay <- allMethodsEstimate$end -1 timeDependentR <- allMethodsEstimate$estimates$TD$R[lastDay] exponentialGrowthR <- allMethodsEstimate$estimates$EG$R maximumLikelihoodR <- allMethodsEstimate$estimates$ML$R sequentialBayesianR <- allMethodsEstimate$estimates$SB$R[lastDay] variousRs <- data.frame(method=factor(), R=numeric()) variousRs <- variousRs %>% add_row(method="Time-Dependent", R=timeDependentR) %>% add_row(method="Exponential Growth", R=exponentialGrowthR) %>% add_row(method="Maximum Likelihood", R=maximumLikelihoodR) %>% add_row(method="Sequential Bayesian", R=sequentialBayesianR)
The “Time-Dependent” method still aligns more closely with the published range of 0.75-1.0.
Is the generation time a factor? I used the distribution of 5 days with a standard deviation of 1.9, as per this paper. What effect does varying this have?
This CDC article reports a generation time with a distribution with a mean of 3.96 days and a standard deviation of 4.75 days. Using the “time-dependent” method, what is the most recent estimated R value?
est2 <- estimate.R(dailyCounts$count, methods=c("TD"), GT=generation.time("gamma", c(3.96, 4.75))) estimatedR2 <- est2$estimates$TD$R[est2$end-1]
With this generation time, the estimated most recent R is 1.1079562. This isn’t significantly different.
Very roughly speaking, the higher the generation time, the larger R is estimated to be. Compare the estimates for a generation time mean of 20 days versus 1 day:
estHigh <- estimate.R(dailyCounts$count, methods=c("TD"), GT=generation.time("gamma", c(20, 1))) estimatedRHigh <- estHigh$estimates$TD$R[estHigh$end-1] estLow <- estimate.R(dailyCounts$count, methods=c("TD"), GT=generation.time("gamma", c(1, 1))) estimatedRLow <- estLow$estimates$TD$R[estLow$end-1]
The 20 -day estimate is 1.7229135 whereas the 1-day estimate is 1.0286179. While these differences are very critical when considering the R=1 boundary, it doesn’t look like different generation time estimates would get us close to the 0.75-1.0 range with the given case data.
I haven’t managed to find any specific details about how the published R values are produced. I think I will have to spend more time digging through the Imperial College London COVID-19 scientific resources to find out.