ggmatplot
is a quick and easy way of plotting the columns of two matrices or data frames against each other using ggplot2
.
ggmatplot
is built upon ggplot2
, and its functionality is inspired by matplot
. Therefore, ggmatplot
can be considered as a ggplot
version of matplot
.
ggmatplot
do?Similar to matplot
, ggmatplot
plots a vector against the columns of a matrix, or the columns of two matrices against each other, or a vector/matrix on its own. However, unlike matplot
, ggmatplot
returns a ggplot
object.
Suppose we have a covariate vector x
and a matrix z
with the response y
and the fitted value fit.y
as the two columns.
# vector x
<- c(rnorm(100, sd = 2))
x
head(x)
#> [1] -1.5523838 0.6616481 -0.7609050 -0.1805734 2.6801909 -1.4998950
# matrix z
<- x * 0.5 + rnorm(100, sd = 1)
y <- fitted(lm(y ~ x))
fit.y <- cbind(actual = y,
z fitted = fit.y)
head(z)
#> actual fitted
#> 1 -1.2737256 -1.0038396
#> 2 0.3032631 0.2026667
#> 3 -2.6566818 -0.5725341
#> 4 0.5961153 -0.2562904
#> 5 2.2466124 1.3026440
#> 6 0.3743863 -0.9752366
ggmatplot
plots vector x
against each column of matrix z
using the default plot_type = "point"
. This will be represented on the resulting plot as two groups, identified using different shapes and colors.
library(ggmatplot)
ggmatplot(x, z)
The default aesthetics used to differentiate the two groups can be updated using ggmatplot()
arguments. Since the two groups in this example are differentiated using their shapes and colors, the shape
and color
parameters can be used to change them. If we want points in both groups to have the same shape, we can simply set the shape
parameter to a single value. However, if we want the points in the groups to be differentiated by color, we can pass a list of colors as the color
parameter - but we should make sure the number of colors in the list matches up with the number of groups.
ggmatplot(x, z,
shape = "circle", # using a single shape over both groups
color = c("blue","purple") # assigning two colors to the two groups
)
Since ggmatplot
is built upon ggplot2
and creates a ggplot object, ggplot add ons such as scales, faceting specifications, coordinate systems, and themes can be added on to plots created using ggmatplot
too.
Each plot_type
allowed by the ggmatplot()
function is also built upon a ggplot2 geom
(geometric object), as listed here. Therefore, ggmatplot()
will support additional parameters specific to each plot type. These are often aesthetics, used to set an aesthetic to a fixed value, like size = 2
or alpha = 0.5
. However, they can also be other parameters specific to different types of plots.
ggmatplot(x, z,
shape = "circle",
color = c("blue","purple"),
size = 2,
alpha = 0.5
+
) theme_bw()
This list of examples includes other types of plots we can create using ggmatplot
.
ggmatplot
over ggplot2
?ggplot2
requires wide format data to be wrangled into long format for plotting, which can be quite cumbersome when creating simple plots. Therefore, the motivation for ggmatplot
is to provide a solution that allows ggplot2
to handle wide format data. Although ggmatplot
doesn’t provide the same flexibility as ggplot2
, it can be used as a workaround for having to wrangle wide format data into long format and creating simple plots using ggplot2
.
Suppose we want to use the iris
dataset to plot the distributions of its numeric variables individually.
library(tidyr)
library(dplyr)
<- iris %>%
iris_numeric select(is.numeric)
head(iris_numeric)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3.0 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5.0 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
If we were to plot this data using ggplot2
, we’d have to wrangle the data into long format before plotting.
<- iris_numeric %>%
iris_numeric_long pivot_longer(cols = everything(),
names_to = "Feature",
values_to = "Measurement")
head(iris_numeric_long)
#> # A tibble: 6 × 2
#> Feature Measurement
#> <chr> <dbl>
#> 1 Sepal.Length 5.1
#> 2 Sepal.Width 3.5
#> 3 Petal.Length 1.4
#> 4 Petal.Width 0.2
#> 5 Sepal.Length 4.9
#> 6 Sepal.Width 3
%>%
iris_numeric_long ggplot(aes(x = Measurement,
color = Feature)) +
geom_density()
But the wide format data can be directly used with ggmatplot
to achieve the same result. Note that the order of the categories in the legend follows the column order in the original dataset.
ggmatplot(iris_numeric, plot_type = "density", alpha = 0)
Suppose we also have the following dataset of the monthly totals of international airline passengers(in thousands) from January 1949 to December 1960.
<- matrix(AirPassengers,
AirPassengers ncol = 12, byrow = FALSE,
dimnames = list(month.abb, as.character(1949:1960))
)
AirPassengers#> 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
#> Jan 112 115 145 171 196 204 242 284 315 340 360 417
#> Feb 118 126 150 180 196 188 233 277 301 318 342 391
#> Mar 132 141 178 193 236 235 267 317 356 362 406 419
#> Apr 129 135 163 181 235 227 269 313 348 348 396 461
#> May 121 125 172 183 229 234 270 318 355 363 420 472
#> Jun 135 149 178 218 243 264 315 374 422 435 472 535
#> Jul 148 170 199 230 264 302 364 413 465 491 548 622
#> Aug 148 170 199 242 272 293 347 405 467 505 559 606
#> Sep 136 158 184 209 237 259 312 355 404 404 463 508
#> Oct 119 133 162 191 211 229 274 306 347 359 407 461
#> Nov 104 114 146 172 180 203 237 271 305 310 362 390
#> Dec 118 140 166 194 201 229 278 306 336 337 405 432
If we want to plot the trend of the number of passengers over the years using ggplot2
, we’d have to wrangle the data into long format. But we can use ggmatplot
as a workaround.
First, we can split the data into two matrices as follows:
months
: a vector containing the list of months
nPassengers
: a matrix of passenger numbers with each column representing a year
<- rownames(AirPassengers)
months <- AirPassengers[, 1:12] nPassengers
Then we can use ggmatplot()
to plot the months
matrix against each column of the nPassengers
matrix - which can be more simply understood as grouping the plot using each column(year
) of the nPassengers
matrix.
ggmatplot(
x = months,
y = nPassengers,
plot_type = "line",
size = 1,
legend_label = c(1949:1960),
xlab = "Month",
ylab = "Total airline passengers (in thousands)",
legend_title = "Year"
+
) theme_minimal()