Code
%>%
soldiers mutate(
race = fct_infreq(race)
%>%
)
ggplot(aes(x = race, fill = race))+
geom_bar()+
scale_x_discrete(labels = scales::label_wrap(8)) # one of many ways to fix long labels on the axis
and the forcats
package
Steen Flammild Harsted
January 10, 2024
forcats
Use the soldiers
dataset for the following exercises.
race
race
is now ordered alphabetically (try: soldiers %>% count(race))
)
fct_infreq()
)01_import.R
filex = race
. Do you see the difference?
Ethnicity
Background information: DODRace
was collected by assigning fixed race values (1:7) to each soldier. Ethnicity
was a black space where the soldiers themselves have filled out their race.
Ethnicity
using count()
and view()
fct_lump()
OMG.. we probably need to merge some of the many Ethnicity
groups.
Try with fct_lump()
fct_collapse()
Hmmm… fct_lump()
is probably not the best choice for the Ethnicity
variable. It has too many groups, and many groups have similar sounding names. We need to fix this manually.
01_import.R
file
category
DODRace
, race
, and Ethnicity
are all true factors in the sense that no values in any of these variables are more ´race´ than other values. Think about category
do the values here imply more or less of the same thing?
category
by changing it into an ordered variable. Use the function factor()
and set the argument ordered = TRUE
01_import.R
filecategory
- Notice the difference?
forcats
skills
forcats
+ ggplot
+ dplyr
mpg
a4 quattro
and a4
into a4
I have cheated and used some functions (str_to_title()
and facet_grid()
), and theme options that I haven´t showed you before.
mpg %>%
group_by(manufacturer) %>%
mutate(
model = model %>% str_to_title() %>% fct_collapse(A4 = c("A4", "A4 Quattro")) %>% fct_infreq(),
manufacturer = str_to_title(manufacturer)) %>%
ggplot(aes(y = model, fill = manufacturer)) +
geom_bar()+
geom_text(aes(x = 0.5, label = model), size =3, hjust = 0, check_overlap = TRUE)+ # Display the model name at the position (x=0.5, y = model)
scale_x_continuous(expand = c(0,0))+ # Remove the padding between the y-axis and the start of the bars
facet_grid(rows = vars(manufacturer),
scales = "free_y",
space = "free_y",
switch = "y")+
ggthemes::theme_pander()+
theme(axis.text.y=element_blank(), # Remove the names from y-axis (we used geom_tect instead)
axis.ticks.y = element_blank(), # Remove y axis ticks (the small lines)
strip.text.y.left = element_text(angle = 0, hjust = 1), # Change strip text orientation
legend.position = "none" # remove fill legend
) +
labs(
title = "Count of car models in `ggplot2::mpg` data set",
x = "Count of car models",
y = NULL,
caption = "Consider a different fill color scale. The current one seems to imply a gradient"
)
starwars
plotI have cheated and used three functions (str_to_title()
, after_stat()
, and scale_fill_gradient()
), that I haven´t showed you before.
starwars %>%
mutate(eye_color = fct_recode(str_to_title(eye_color))) %>% # Change all factor levels to Title case
mutate(eye_color = fct_lump(eye_color, 7) %>% fct_infreq()) %>%
ggplot(aes(y = eye_color,
fill = after_stat(count))) + # Set the fill color to the count value
geom_bar() +
ggthemes::theme_foundation()+
scale_fill_gradient(low = "grey", high = "black")+ # Create a new fill scale going from grey to black
labs(
x = "Count",
y = "Eye Color",
title = "Eye color counts of Starwars characters",
caption = "Consider... Is the grey gradient disturbing? e.g. ´brown´ has a black color ")+
theme(plot.title.position = "plot") # Place the title all the way to the left side
table1
(another inbuilt dataset)table1
displays the number of TB cases documented by WHO in Afghanistan, Brazil, and China between 1999 and 2000.
case_pr_mill_pop
(cases pr. million)country
labels so that (Afghanistan = Afg, Brazil = Bra, China = Chi)country
factor after case_pr_mill_pop
country