第 4 章 Categorical data
4.1 Aesthetics: group
ggplot() +
geom_line(
mapping=aes(
x=c(1, 2, 3),
y=c(2, 3, 2),
)
) +
geom_line(
mapping=aes(
x=c(1, 2, 3),
y=c(5, 2, 6)
)
)Use group aesthetic to combine
- multiple same geom layers
into one.
ggplot() +
geom_line(
mapping=aes(
x=c(1, 2, 3, 1, 2, 3),
y=c(2, 3, 2, 5, 2, 6),
group=c("m", "m", "m", "f", "f", "f"),
)
)ggplot() +
geom_line(
mapping=aes(
x=c(1, 2, 3, 1, 2, 3),
y=c(2, 3, 2, 5, 2, 6),
group=c("m", "m", "m", "f", "f", "f"),
color=c("m", "m", "m", "f", "f", "f")
)
)- Any aesthetic differentiates group can replace group.
ggplot() +
geom_line(
mapping=aes(
x=c(1, 2, 3, 1, 2, 3),
y=c(2, 3, 2, 5, 2, 6),
# group=c("m", "m", "m", "f", "f", "f"),
color=c("m", "m", "m", "f", "f", "f")
)
)- When there is no aesthetic mapping to differentiate groups, use
groupaesthetic mapping.
4.2 Geom overlapping
When geom layers overlap, we can use
alphaaesthetic.
If multiple geometries are created within the one geom_ call (using grouping aesthetics), we can also set
- position: “stack,” “dodge” or “jitter” (some of them might not apply to certain
geom_)
https://clauswilke.com/dataviz/visualizing-proportions.html#fig:health-vs-age
x is continuous, or discrete with many types
y the cumulative proportion
ggplot() +
geom_area(
mapping=aes(
x=c(1, 2, 3),
y=c(0.2, 0.3, 0.2),
)
) +
geom_area(
mapping=aes(
x=c(1, 2, 3),
y=c(0.4, 0.3, 0.52) + c(0.2, 0.3, 0.2) # the additive is for accumulative purpose
),
alpha=0.5
)4.3 Position: stack
put y on top of the overlapping geom’s y
create accumulative result.
ggplot() +
geom_area(
mapping=aes(
x=c(1, 2, 3,
1, 2, 3),
y=c(0.2, 0.3, 0.2,
0.4, 0.3, 0.52),
fill=c("m", "m", "m",
"f", "f", "f")
),
position="stack" #input$position
)stack position is accumulative; no need to compute the accumulative value yourself.
the default position in
geom_areais “stack.” Therefore, you can omit position argument.
data_cat1 <- data.frame(
x=c(1, 2, 3, 1, 2, 3),
y=c(0.2, 0.3, 0.2, 0.4, 0.4, 0.52),
fill=c("m", "m", "m", "f", "f", "f")
)ggplot(
data=data_cat1
) +
geom_area(
mapping=aes(
x=x,
y=y,
fill=fill
)
)When aesthetic mapping involves with unordered data, it will
convert the data series into factor (unless the series is already a factor);
conduct the mapping according to the level sequence of the converted factor.
data_cat1$fill |>
factor() |>
levels()4.4 Factor
When grouping aesthetics vary the look of geometries across different groups of data, it is crucial that users declare the mapped series with proper class.
factor(data_series, levels)parsesdata_seriesinto a categorical data with expressing sequence defined bylevels.If omit
levelsthe level sequence will be determined by the collateral sequence defined by your operating system.
ggplot(
data=data_cat1
) +
geom_area(
mapping=aes(
x=x,
y=y,
fill=factor(fill, levels=c("m", "f"))
)
)- Here we declare factor on-the-go.
We can also declare factor in the data frame first:
data_cat1_copy <- data_cat1
data_cat1_copy$fill |>
factor(levels=c("m", "f")) ->
data_cat1_copy$fill|>is a R 4.0+ equipped operator, which makes:
f(x, ....) # equivalent to
x |> f(...)ggplot(
data=data_cat1_copy
) +
geom_area(
mapping=aes(
x=x,
y=y,
fill=fill
)
)4.5 Proportional data
data_cat2_wide <- data.frame(
x=c(1, 2, 3),
y_a=c(0.2, 0.3, 0.2),
y_b=c(0.4, 0.4, 0.52),
y_c=c(0.4, 0.3, 0.28)
)
data_cat2_wide |>
tidyr::pivot_longer(
cols=y_a:y_c,
names_to = "fill",
values_to= "y"
) ->
data_cat2
View(data_cat2)ggplot(
data=data_cat2
) +
geom_area(
mapping=aes(
x=x,
y=y,
fill=fill
),
color="white"
)When x mapping series has limited cases and is discrete, a bar chart with position dodge is better.
ggplot(
data=data_cat2
) +
geom_col(
mapping=aes(
x=x,
y=y,
fill=fill
),
color="white",
width=0.8, #input$width
size=0, #input$size
position = "dodge" #input$position
)width: the width of the barsize: the size of the stroke
Pie chart:
- not good for comparing proportion across more than one dimension
library(dplyr)
data_cat2 %>%
filter(
x==1
) ->
data_cat2_x1onlyggplot(
data=data_cat2_x1only
) +
geom_col(
aes(
x=x,
y=y,
fill=fill
)
)ggplot(
data=data_cat2_x1only
) +
geom_col(
aes(
x=x,
y=y,
fill=fill
)
) +
coord_polar(
theta = "y"
)4.6 Adding text
adding text
ggplot(
data=data_cat2_x1only
) +
geom_col(
aes(
x=x,
y=y,
fill=fill
)
) +
geom_text(
aes(
x=x,
y=y,
label=fill
),
position = "stack"
)geom_colstack sequence is based onfilllevel sequence.geom_textstack sequence is based on observation sequence.
Grouping aesthetics determine the sequence of stacking. In geom_col, fill is the grouping aesthetic. To make geom_text stack labels in sequence as fill in geom_col, we can put group=fill in geom_text to create such a sequence.
ggplot(
data=data_cat2_x1only
) +
geom_col(
aes(
x=x,
y=y,
fill=fill
)
) +
geom_text(
aes(
x=x,
y=y,
label=fill,
group=fill
),
position = "stack"
)Change labels to represent the proportion values of y
ggplot(
data=data_cat2_x1only
) +
geom_col(
aes(
x=x,
y=y,
fill=fill
)
) +
geom_text(
aes(
x=x,
y=y,
label=y, # use y to label now
group=fill
),
position = "stack"
)positionargument also takes position functions.When you know what type of position you want, you can use corresponding position function to fine tune the position.
ggplot(
data=data_cat2_x1only
) +
geom_col(
aes(
x=x,
y=y,
fill=fill
)
) +
geom_text(
aes(
x=x,
y=y,
label=y,
group=fill
),
position = position_stack(vjust=0.5)
)ggplot(
data=data_cat2_x1only
) +
geom_col(
aes(
x=x,
y=y,
fill=fill
)
) +
geom_text(
aes(
x=x,
y=y,
label=y,
group=fill
),
position = position_stack(vjust=0.5)
) +
coord_polar(
theta = "y"
) +
theme_void()When x-axis is also representing a categorical data:
dy=0.03 # input$dy
ggplot(
data=data_cat2
) +
geom_col(
mapping=aes(
x=x,
y=y,
fill=fill
),
color="white",
width=0.8, #input$width
position = "dodge" #input$position
)+
geom_text(
mapping=aes(
x=x,
y=y-dy,
group=fill,
label=y
),
size=8, #input$size
position=position_dodge(width=
0.8 #input$dodge
)
)- text position_dodge has the same width as
geom_colto ensure the same dodging distance.
4.7 More on position
https://ggplot2.tidyverse.org/reference/index.html#section-position-adjustment
4.8 Coordination flip
ggplot()+
geom_col(
mapping=
aes(
x=c("A", "B", "C"),
y=c(56, 77, 92)
)
)+
coord_flip()Another common application of coord_flip is:
dx=4 #input$dx
h=0.5 #input$h
dt=0 #input$dt
ggplot()+
geom_col(
mapping=aes(
x=c(1, 1),
y=c(306, 232),
fill=c("biden","trump")
),
width=1
)+
geom_segment(
mapping=aes(
x=1-h,
y=270,
xend=1+h,
yend=270
)
)+
geom_text(
mapping=aes(
x=1+dt,
y=270,
label="270"
),
size=8 #input$text
)+
xlim(1-dx, 1+dx)+ # make sure cover 0.5-1.5 so the bar width can be accomodate
coord_flip()+
theme_void()+
theme(legend.position = "none")4.9 Summary
Grouping aesthetic separate a data frame into various subsample data frame and apply the
geom_function to each one of them in the sequence determined by the mapping factor’s levels sequence.When
groupaesthetic and other aesthetic share the same mapping variable,groupaesthetic can be ignored.When deal with grouping variable, values of y from different groups at the same x can have position choice:
- “identity”: respect ys as it is.
- “stack”: stack ys according to grouping level sequence.
- “dodge”: respect ys as it is but move their x values left and right according to grouping level sequence.
- “identity”: respect ys as it is.
4.10 Exercise
1
2
3
4