A COVID Small A number of

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You possibly can report challenge in regards to the content material on this web page right here)


Need to share your content material on R-bloggers? click on right here when you’ve got a weblog, or right here in the event you do not.

John Burn-Murdoch has been doing superb work on the Monetary Occasions producing numerous visualizations of the progress of COVID-19. One in all his latest pictures is a small-multiple plot of instances by nation, exhibiting the trajectory of the outbreak for a lot of international locations, with a the background of every small-multiple panel additionally exhibiting (in gray) the trajectory of each different nation for comparability. It’s a helpful approach. On this instance, I’ll draw a model of it in R and ggplot. The primary distinction is that as an alternative of ordering the panels alphabetically by nation, I’ll organize them from highest to lowest present reported instances.

Right here’s the determine we’ll find yourself with:

covid small multiple

Cumulative reported COVID-19 instances so far, high 50 International locations

There are two small methods. First, getting all the info to point out (in gray) in every panel whereas highlighting simply one nation. Second, for causes of area, transferring the panel labels (in ggplot’s terminology, the strip labels) contained in the panels, with the intention to tighten up the area a bit. Doing that is actually the identical trick each occasions, viz, making a some mini-datasets to make use of for explicit layers of the plot.

The code for this (together with code to tug the info) is in my COVID GitHub repository. See the repo for particulars on downloading and cleansing it. Simply this morning the ECDC modified the way it’s supplying its information, transferring from an Excel file to your alternative of JSON, CSV, or XML, so this earlier submit strolling by means of the method for the Excel file is already outdated for the downloading step. There’s a brand new perform within the repo, although.

We’ll begin with the info principally cleaned and arranged.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


> cov_case_curve
# A tibble: 1,165 x 9
# Teams: iso3 [94] date cname iso3 instances deaths cu_cases cu_deaths days_elapsed end_label <</span>date> <</span>chr> <</span>chr> <</span>dbl> <</span>dbl> <</span>dbl> <</span>dbl> <</span>drtn> <</span>chr> 1 2020-01-19 China CHN 136 1 216 3 0 days NA 2 2020-01-20 China CHN 19 0 235 3 1 days NA 3 2020-01-21 China CHN 151 3 386 6 2 days NA 4 2020-01-22 China CHN 140 11 526 17 3 days NA 5 2020-01-23 China CHN 97 0 623 17 4 days NA 6 2020-01-24 China CHN 259 9 882 26 5 days NA 7 2020-01-25 China CHN 441 15 1323 41 6 days NA 8 2020-01-26 China CHN 665 15 1988 56 7 days NA 9 2020-01-27 China CHN 787 25 2775 81 8 days NA 10 2020-01-28 China CHN 1753 25 4528 106 9 days NA # … with 1,155 extra rows

Then we pick the highest 50 international locations, isolating their most case worth. The code here’s a bit inefficient as I preserve having to recode a number of the nation names within the mini-datasets. There are different inefficiencies too, however oh effectively. I’ll clear them up later.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

 top_50 <- cov_case_curve %>% group_by(cname) %>% filter(cu_cases == max(cu_cases)) %>% ungroup() %>% top_n(50, cu_cases) %>% choose(iso3, cname, cu_cases) %>% mutate(days_elapsed = 1, cu_cases = max(cov_case_curve$cu_cases) - 1e4, cname = recode(cname, `United States` = "USA", `Iran, Islamic Republic of` = "Iran", `Korea, Republic of` = "South Korea", `United Kingdom` = "UK")) top_50 # A tibble: 50 x 4 iso3 cname cu_cases days_elapsed <</span>chr> <</span>chr> <</span>dbl> <</span>dbl> 1 ARG Argentina 75991 1 2 AUS Australia 75991 1 3 AUT Austria 75991 1 4 BEL Belgium 75991 1 5 BRA Brazil 75991 1 6 CAN Canada 75991 1 7 CHL Chile 75991 1 8 CHN China 75991 1 9 CZE Czech Republic 75991 1
10 DNK Denmark 75991 1
# … with 40 extra rows 

This provides us our label layer. We’ve set days_elapsed and cu_cases values to the identical factor for each nation, as a result of these are the x and y areas the place the nation labels will go.

Subsequent, a knowledge layer for the gray line traces and a knowledge layer for the little endpoints on the present case-count worth.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


cov_case_curve_bg <- cov_case_curve %>% choose(-cname) %>% filter(iso3 %in% top_50$iso3) cov_case_curve_endpoints <- cov_case_curve %>% filter(iso3 %in% top_50$iso3) %>% mutate(cname = recode(cname, `United States` = "USA", `Iran, Islamic Republic of` = "Iran", `Korea, Republic of` = "South Korea", `United Kingdom` = "UK")) %>% group_by(iso3) %>% filter(cu_cases == max(cu_cases)) %>% choose(cname, iso3, days_elapsed, cu_cases) %>% ungroup()

We drop cname within the cov_case_curve_bg layer, as a result of we’re going to side by that worth with the primary dataset in a second. That’s the trick that permits the traces for all of the international locations to seem in every panel.

And now we will draw the plot. I really want to repair that nation recode—a major instance of DRY.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

 cov_case_sm <- cov_case_curve %>% filter(iso3 %in% top_50$iso3) %>% mutate(cname = recode(cname, `United States` = "USA", `Iran, Islamic Republic of` = "Iran", `Korea, Republic of` = "South Korea", `United Kingdom` = "UK")) %>% ggplot(mapping = aes(x = days_elapsed, y = cu_cases)) + # The road traces for each nation, in each panel geom_line(information = cov_case_curve_bg, aes(group = iso3), measurement = 0.15, coloration = "grey80") + # The road hint in pink, for the nation in any given panel geom_line(coloration = "firebrick", lineend = "spherical") + # The purpose on the finish. Bonus trick: some factors can have fills! geom_point(information = cov_case_curve_endpoints, measurement = 1.1, form = 21, coloration = "firebrick", fill = "firebrick2" ) + # The nation label contained in the panel, in lieu of the strip label geom_text(information = top_50, mapping = aes(label = cname), vjust = "inward", hjust = "inward", fontface = "daring", coloration = "firebrick", measurement = 2.1) + # Log remodel and pleasant labels scale_y_log10(labels = scales::label_number_si()) + # Aspect by nation, order from excessive to low facet_wrap(~ reorder(cname, -cu_cases), ncol = 5) + labs(x = "Days Since 100th Confirmed Case", y = "Cumulative Variety of Circumstances (log10 scale)", title = "Cumulative Variety of Reported Circumstances of COVID-19: High 50 International locations", subtitle = paste("Information as of", format(max(cov_curve$date), "%A, %B %e, %Y")), caption = "Kieran Healy @kjhealy / Information: https://www.ecdc.europa.eu/") + theme(plot.title = element_text(measurement = rel(1), face = "daring"), plot.subtitle = element_text(measurement = rel(0.7)), plot.caption = element_text(measurement = rel(1)), # flip off the strip label and tighten the panel spacing strip.textual content = element_blank(), panel.spacing.x = unit(-0.05, "strains"), panel.spacing.y = unit(0.3, "strains"), axis.textual content.y = element_text(measurement = rel(0.5)), axis.title.x = element_text(measurement = rel(1)), axis.title.y = element_text(measurement = rel(1)), axis.textual content.x = element_text(measurement = rel(0.5)), legend.textual content = element_text(measurement = rel(1))) ggsave("figures/cov_case_sm.png", cov_case_sm, width = 10, peak = 12, dpi = 300) 



If you happen to obtained this far, why not subscribe for updates from the location? Select your taste: e-mail, twitter, RSS, or fb

Leave a Reply

Your email address will not be published. Required fields are marked *