This vignette is intended to showcase the usage of the gghalves
extension by going through the individual _half_
geom
s to explain details of usage and function arguments.
The general idea of gghalves
stems from this StackOverflow question on how to plot a hybrid boxplot. This led to me developing the ggpol extension for ggplot2
. However, the fact that ggpol
has become a sort of aggregation for all kinds of geom
s over time, and seeing that many things can be cut in half, has ultimately led to this library.
The idea is that many geom
s that aggregate data, such as geom_boxplot
, geom_violin
and geom_dotplot
are (near) symmetric. Given that the space to display information is limited, we can make better use of it by cutting the geom
s in half and displaying additional geom
s that e.g. give information about the sample size.
GeomHalfPoint
, perhaps counterintuitively, does not display a literal half-circle. Rather, it plots the data points such that
_half_
geomFurther, by default geom_half_point
jitters the points horizontally and vertically.
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_point()
The way this works is that transformation = PositionJitter
is passed to the geom
. We could play with the default values of this transformation by passing along a transformation_params
argument
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_point(transformation_params = list(height = 0, width = 0.001, seed = 1))
#> Warning: Ignoring unknown parameters: transformation_params
or we could change the transformation
argument itself:
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_point(transformation = PositionIdentity)
Making the transformation work with custom Position
s from ggplot2
extensions is something that will hopefully be included in future updates of this package.
Sometimes we want to color points within the aes()
groupings. In that case, we can make use of geom_half_point_panel()
.
ggplot(iris, aes(y = Sepal.Width)) +
geom_half_boxplot() +
geom_half_point_panel(aes(x = 0.5, color = Species), range_scale = .5)
Like all _half_
geoms, geom_half_point
also takes a side
argument, with l
for left and r
for right.
GeomHalfBoxplot
displays a boxplot that is cut in half and plotted either on the left or right side of the space allotted to the specific factor on the x-axis.
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_boxplot()
Additionally to the standard side
argument, you can also center
the half-boxplot and decide whether an errorbar is drawn or not.
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_boxplot(side = "r", center = TRUE, errorbar.draw = FALSE)
GeomHalfViolin
draws a half-violin plot. Besides the side
argument, it supports all the arguments that can be passed to the standard GeomViolin
.
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_violin()
Furthermore, if we have a binary grouping variable (such as control/treatment) we can plot side-by-side comparisons with the optional split
aesthetic:
ggplot() +
geom_half_violin(
data = ToothGrowth,
aes(x = as.factor(dose), y = len, split = supp, fill = supp),
position = "identity"
)
GeomHalfDotplot
is slightly different from the other _half_
geom
s in that it does not support a side
argument, since this is already inherently built into the standard GeomDotplot
via stackdir
:
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_violin() +
geom_dotplot(binaxis = "y", method="histodot", stackdir="up")
#> Bin width defaults to 1/30 of the range of the data. Pick better value with `binwidth`.
So, given that geom_dotplot
can be used as a _half_
geom
, why the need for geom_half_dotplot
? The reason is that geom_dotplot
does not support dodging when there are multiple factors in play. Let’s consider the following example:
<- data.frame(score = rgamma(150, 4, 1),
df gender = sample(c("M", "F"), 150, replace = TRUE),
genotype = factor(sample(1:3, 150, replace = TRUE)))
Given this data, we want to group by genotype
, but also separate the plots by gender
. This does not quite work using the standard geom
:
ggplot(df, aes(x = genotype, y = score, fill = gender)) +
geom_half_violin() +
geom_dotplot(binaxis = "y", method="histodot", stackdir="up", position = PositionDodge)
#> Bin width defaults to 1/30 of the range of the data. Pick better value with `binwidth`.
Using geom_half_dotplot
, however, we can make this work:
ggplot(df, aes(x = genotype, y = score, fill = gender)) +
geom_half_violin() +
geom_half_dotplot(method="histodot", stackdir="up")
#> Bin width defaults to 1/30 of the range of the data. Pick better value with `binwidth`.
As mentioned in the package description, gghalves
can work well in combination with certain ggplot2
extensions. One of them is geom_beeswarm
of the ggbeeswarm
package. Note that, currently, you will need to install the latest version from GitHub to support the passing of beeswarmArgs
.
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_boxplot() +
geom_beeswarm(beeswarmArgs = list(side = 1))
Lastly, let us remake the plot displayed in the GitHub Readme. It is for display-purposes only, and thus uses a lot of filtering and a lot of geom
s…
ggplot() +
geom_half_boxplot(
data = iris %>% filter(Species=="setosa"),
aes(x = Species, y = Sepal.Length, fill = Species), outlier.color = NA) +
::geom_beeswarm(
ggbeeswarmdata = iris %>% filter(Species=="setosa"),
aes(x = Species, y = Sepal.Length, fill = Species, color = Species), beeswarmArgs=list(side=+1)
+
)
geom_half_violin(
data = iris %>% filter(Species=="versicolor"),
aes(x = Species, y = Sepal.Length, fill = Species), side="r") +
geom_half_dotplot(
data = iris %>% filter(Species=="versicolor"),
aes(x = Species, y = Sepal.Length, fill = Species), method="histodot", stackdir="down") +
geom_half_boxplot(
data = iris %>% filter(Species=="virginica"),
aes(x = Species, y = Sepal.Length, fill = Species), side = "r", errorbar.draw = TRUE,
outlier.color = NA) +
geom_half_point(
data = iris %>% filter(Species=="virginica"),
aes(x = Species, y = Sepal.Length, fill = Species, color = Species), side = "l") +
scale_fill_manual(values = c("setosa" = "#cba1d2", "versicolor"="#7067CF","virginica"="#B7C0EE")) +
scale_color_manual(values = c("setosa" = "#cba1d2", "versicolor"="#7067CF","virginica"="#B7C0EE")) +
theme(legend.position = "none")