How you can standardize group colours in knowledge visualizations in R

One finest apply in visualization is to make your coloration scheme constant throughout figures.

As an illustration, should you’re making a number of plots of the dataset — say a gaggle of 5 firms — you need to have every firm have the identical, constant coloring throughout all these plots.

R has some nice knowledge visualization capabilities. Notably the ggplot2 bundle makes it really easy to spin up a handsome visualization shortly.

The default in R is to take a look at the variety of teams in your knowledge, and decide “evenly spaced” colours throughout a hue coloration wheel. This appears to be like nice straight out of the field:

# set up.packages('ggplot2')
library(ggplot2) theme_set(new = theme_minimal()) # units a default theme set.seed(1) # guarantee reproducibility # generate some knowledge
n_companies = 5
df1 = knowledge.body( firm = paste('Firm', seq_len(n_companies), sep = '_'), workers = pattern(50:500, n_companies), stringsAsFactors = FALSE
) # make a easy column/bar plot
ggplot(knowledge = df1) + geom_col(aes(x = firm, y = workers, fill = firm))

Nonetheless, it may be difficult is to make coloring constant throughout plots.

As an illustration, suppose we need to visualize a subset of those knowledge factors.

index_subset1 = c(1, 3, 4, 5) # specify a subset # make a plot utilizing the subsetted dataframe
ggplot(knowledge = df1[index_subset1, ]) + geom_col(aes(x = firm, y = workers, fill = firm))

As you’ll be able to see the colour scheme has now modified. With one much less group / firm, R now picks Four new colours evenly spaced across the coloration wheel. All however the first are totally different to the unique colours we had for the businesses.

One approach to take care of this in R and ggplot2, is so as to add a scale_* layer to the plot.

Right here we manually set Hex coloration values within the scale_fill_manual operate. These hex values I offered I do know to be the default R values for 4 teams.

# set up.packages('scales') # the hue_pal operate from the scales bundle appears to be like up a lot of evenly spaced colours
# which we will save as a vector of character hex values
default_palette = scales::hue_pal()(5) # these colours we will then use in a scale_* operate to manually override the colour schema
ggplot(knowledge = df1[index_subset1, ]) + geom_col(aes(x = firm, y = workers, fill = firm)) + scale_fill_manual(values = default_palette[-2]) # we take away the factor that belonged to firm 2

As you’ll be able to see, the colours at the moment are aligned with the earlier schema. Solely Firm 2 is dropped, however all different firms retained their coloration.

Nonetheless, this was very a lot hard-coded into our program. We needed to specify which firm to drop utilizing the default_palette[-2].

If the subset adjustments, which regularly occurs in actual life, our resolution will break because the values within the palette now not align with the teams R encounters:

index_subset2 = c(1, 2, 5) # however the subset would possibly change # and all manually-set colours will instantly misalign
ggplot(knowledge = df1[index_subset2, ]) + geom_col(aes(x = firm, y = workers, fill = firm)) + scale_fill_manual(values = default_palette[-2])

Happily, R is a great language, and you may work your means round this!

All we have to do is created, what I name, a named-color palette!

It’s so simple as specifying a vector of hex coloration values! Alternatively, you need to use the grDevices::rainbow or grDevices::colours() capabilities, or one of many many capabilities included within the scales bundle

# you'll be able to hard-code a palette utilizing coloration strings
c('purple', 'blue', 'inexperienced') # or you need to use the rainbow or colours capabilities of the grDevices bundle
rainbow(n_companies)
colours()[seq_len(n_companies)] # or you need to use the scales::hue_pal() operate
palette1 = scales::hue_pal()(n_companies)
print(palette1)
[1] "#F8766D" "#A3A500" "#00BF7D" "#00B0F6" "#E76BF3"

Now we have to assign names to this vector of hex coloration values. And these names should correspond to the labels of the teams that we need to colorize.

You need to use the names operate for this.

names(palette1) = df1$firm
print(palette1)
Company_1 Company_2 Company_3 Company_4 Company_5 "#F8766D" "#A3A500" "#00BF7D" "#00B0F6" "#E76BF3"

However I want to make use of the setNames operate so I can do the inititialization, task, and naming simulatenously. It’s all the identical although.

palette1_named = setNames(object = scales::hue_pal()(n_companies), nm = df1$firm)
print(palette1_named)
Company_1 Company_2 Company_3 Company_4 Company_5 "#F8766D" "#A3A500" "#00BF7D" "#00B0F6" "#E76BF3"

With this named coloration vector and the scale_*_manual capabilities we will now manually override the fill and coloration schemes in a versatile means. This leads to the identical plot we had with out utilizing the scale_*_manual operate:

ggplot(knowledge = df1) + geom_col(aes(x = firm, y = workers, fill = firm)) + scale_fill_manual(values = palette1_named)

Nonetheless, now it doesn’t matter if the dataframe is subsetted, as we particularly inform R which colours to make use of for which group labels by the use of the named coloration palette:

# the colours stay the identical if some teams aren't discovered
ggplot(knowledge = df1[index_subset1, ]) + geom_col(aes(x = firm, y = workers, fill = firm)) + scale_fill_manual(values = palette1_named)

# and in addition if different teams aren't discovered
ggplot(knowledge = df1[index_subset2, ]) + geom_col(aes(x = firm, y = workers, fill = firm)) + scale_fill_manual(values = palette1_named)

As soon as you’re conscious of those superpowers, you are able to do a lot extra with them!

How about highlighting a particular group?

Simply set all the opposite colours to ‘gray’…

# lets create an all gray coloration palette vector
palette2 = rep('gray', occasions = n_companies)
palette2_named = setNames(object = palette2, nm = df1$firm)
print(palette2_named)
Company_1 Company_2 Company_3 Company_4 Company_5 "gray" "gray" "gray" "gray" "gray"
# this appears to be like horrible in a plot
ggplot(knowledge = df1) + geom_col(aes(x = firm, y = workers, fill = firm)) + scale_fill_manual(values = palette2_named)

… and assign one of many firm’s colours to be a distinct coloration

# override one of many 'gray' parts utilizing an index by identify
palette2_named['Company_2'] = 'purple'
print(palette2_named)
Company_1 Company_2 Company_3 Company_4 Company_5 "gray" "purple" "gray" "gray" "gray"
# and our plot is professionally highlighting a sure group
ggplot(knowledge = df1) + geom_col(aes(x = firm, y = workers, fill = firm)) + scale_fill_manual(values = palette2_named)

We will apply these rules to different forms of knowledge and plots.

As an illustration, let’s generate a while sequence knowledge…

timepoints = 10
df2 = knowledge.body( firm = rep(df1$firm, every = timepoints), workers = rep(df1$workers, every = timepoints) + spherical(rnorm(n = nrow(df1) * timepoints, imply = 0, sd = 10)), time = rep(seq_len(timepoints), occasions = n_companies), stringsAsFactors = FALSE
)

… and visualize these utilizing a line plot, including the colour palette in the identical means as earlier than:

ggplot(knowledge = df2) + geom_line(aes(x = time, y = workers, col = firm), measurement = 2) + scale_color_manual(values = palette1_named)

If we miss one of many firms — let’s skip Firm 2 — the palette makes positive the others remained coloured as specified:

ggplot(knowledge = df2[df2$company %in% df1$company[index_subset1], ]) + geom_line(aes(x = time, y = workers, col = firm), measurement = 2) + scale_color_manual(values = palette1_named)

Additionally the highlighted coloration palete we used earlier than will nonetheless work like a attraction!

ggplot(knowledge = df2) + geom_line(aes(x = time, y = workers, col = firm), measurement = 2) + scale_color_manual(values = palette2_named)

Now, let’s scale up the issue! Faux we have now not 5, however 20 firms.

The code will work all the identical!

set.seed(1) # guarantee reproducibility # generate new knowledge for extra firms
n_companies = 20
df1 = knowledge.body( firm = paste('Firm', seq_len(n_companies), sep = '_'), workers = pattern(50:500, n_companies), stringsAsFactors = FALSE
) # lets create an all gray coloration palette vector
palette2 = rep('gray', occasions = n_companies)
palette2_named = setNames(object = palette2, nm = df1$firm) # spotlight one firm in a distinct coloration
palette2_named['Company_2'] = 'purple'
print(palette2_named) # make a bar plot
ggplot(knowledge = df1) + geom_col(aes(x = firm, y = workers, fill = firm)) + scale_fill_manual(values = palette2_named) + theme(axis.textual content.x = element_text(angle = 45, hjust = 1, vjust = 1)) # rotate and align the x labels

Additionally for the time sequence line plot:

timepoints = 10
df2 = knowledge.body( firm = rep(df1$firm, every = timepoints), workers = rep(df1$workers, every = timepoints) + spherical(rnorm(n = nrow(df1) * timepoints, imply = 0, sd = 10)), time = rep(seq_len(timepoints), occasions = n_companies), stringsAsFactors = FALSE
) ggplot(knowledge = df2) + geom_line(aes(x = time, y = workers, col = firm), measurement = 2) + scale_color_manual(values = palette2_named)

The probabilities are countless; the facility is now yours!

Simply suppose on the effectivity acquire should you would make a customized coloration palette, with as an example your firm’s model colours!

For extra R methods to up your programming productiveness and effectiveness, go to the R suggestions and methods web page!



If you happen to bought this far, why not subscribe for updates from the positioning? Select your taste: e-mail, twitter, RSS, or fb

Leave a Reply

Your email address will not be published. Required fields are marked *