The common package is a lightweight package that contains solutions for commonly encountered problems when working in Base R.
Here is a list of the functions and a short explanation of each:
paste0()
function.Normally, when working in Base R, it is necessary to quote variable names when passing them into a function or operator. For example, observe the R subset brackets:
# Variable names passed to subset are quoted
<- mtcars[1:10 , c("mpg", "cyl", "disp")]
dat
# View results
dat
mpg cyl disp21.0 6 160.0
Mazda RX4 21.0 6 160.0
Mazda RX4 Wag 710 22.8 4 108.0
Datsun 4 Drive 21.4 6 258.0
Hornet 18.7 8 360.0
Hornet Sportabout 18.1 6 225.0
Valiant 360 14.3 8 360.0
Duster 24.4 4 146.7
Merc 240D 230 22.8 4 140.8
Merc 280 19.2 6 167.6 Merc
Some Base R functions and almost all tidyverse functions use Non-standard Evaluation (NSE) when passing variable names. This style of evaluation allows the user to type variables without using quotation marks or other methods of resolution.
Picking up from the previous example, let’s now subset the
dat
data frame created above using the
subset()
function, which uses NSE:
# No quotes on "cyl" using subset() function
<- subset(dat, cyl == 4)
dt
# View results
dt# mpg cyl disp
# Datsun 710 22.8 4 108.0
# Merc 240D 24.4 4 146.7
# Merc 230 22.8 4 140.8
The v()
function in the common package
is a quoting function. It allows you to use Non-Standard Evaluation
(NSE) even on functions that were not specifically written for NSE.
Observe:
# Create a vector of unquoted names
<- v(mpg, cyl, disp)
v1
# Result is a quoted vector
v1# [1] "mpg" "cyl" "disp"
# Variable names not quoted
<- mtcars[1:10, v(mpg, cyl, disp)]
dat2
# Works as expected
dat2# mpg cyl disp
# Mazda RX4 21.0 6 160.0
# Mazda RX4 Wag 21.0 6 160.0
# Datsun 710 22.8 4 108.0
# Hornet 4 Drive 21.4 6 258.0
# Hornet Sportabout 18.7 8 360.0
# Valiant 18.1 6 225.0
# Duster 360 14.3 8 360.0
# Merc 240D 24.4 4 146.7
# Merc 230 22.8 4 140.8
# Merc 280 19.2 6 167.6
Base R provides sort and order functions that work adequately on vectors. For data frames, the options are more limited. In particular, if you want to sort a data frame by multiple columns, there are no functions in Base R to do it. The R documentation makes the following suggestion:
# Prepare data
<- mtcars[1:10, 1:3]
dat
# Get sort order
<- do.call('order', dat[ ,c("cyl", "mpg")])
ord
# Sort data
dat[ord, ]# mpg cyl disp
# Datsun 710 22.8 4 108.0
# Merc 230 22.8 4 140.8
# Merc 240D 24.4 4 146.7
# Valiant 18.1 6 225.0
# Merc 280 19.2 6 167.6
# Mazda RX4 21.0 6 160.0
# Mazda RX4 Wag 21.0 6 160.0
# Hornet 4 Drive 21.4 6 258.0
# Duster 360 14.3 8 360.0
# Hornet Sportabout 18.7 8 360.0
In the above example, notice that a) there is no actual sorting function for data frames, and b) the method illustrated above provides no way to control the sort order of the variables involved. They are all sorted ascending.
The sort.data.frame()
function is an overload to the
generic sort()
function that is tailored for data frames.
It allows you to sort by multiple columns, and control the sort
direction for each sort variable. Here is an example:
# Sort by cyl then mpg
<- sort(dat, by = v(cyl, mpg))
dat1
dat1# mpg cyl disp
# Datsun 710 22.8 4 108.0
# Merc 230 22.8 4 140.8
# Merc 240D 24.4 4 146.7
# Valiant 18.1 6 225.0
# Merc 280 19.2 6 167.6
# Mazda RX4 21.0 6 160.0
# Mazda RX4 Wag 21.0 6 160.0
# Hornet 4 Drive 21.4 6 258.0
# Duster 360 14.3 8 360.0
# Hornet Sportabout 18.7 8 360.0
# Sort by cyl descending then mpg ascending
<- sort(dat, by = v(cyl, mpg),
dat2 ascending = c(FALSE, TRUE))
dat2# mpg cyl disp
# Duster 360 14.3 8 360.0
# Hornet Sportabout 18.7 8 360.0
# Valiant 18.1 6 225.0
# Merc 280 19.2 6 167.6
# Mazda RX4 21.0 6 160.0
# Mazda RX4 Wag 21.0 6 160.0
# Hornet 4 Drive 21.4 6 258.0
# Datsun 710 22.8 4 108.0
# Merc 230 22.8 4 140.8
# Merc 240D 24.4 4 146.7
The sort.data.frame()
function also allows you to
control whether NA values are sorted to the top or bottom. See the
documentation for further information and more examples.
top
While many data operations in R do not require control over the labels on a data frame, some types of programming do. Particularly in situations where you are sharing data between multiple people and groups, the column labels can provide valuable information about the data contained in a particular column.
Unfortunately, Base R does not supply an easy way to manipulate these
labels. The only approach is to use the attr()
function to
set the labels individually for each column. Like this:
# Prepare data
<- mtcars[1:10, 1:3]
dat
# Assign labels
attr(dat$mpg, "label") <- "Miles Per Gallon"
attr(dat$cyl, "label") <- "Cylinders"
attr(dat$disp, "label") <- "Displacement"
The labels.data.frame()
function is an overload to the
Base R labels()
function that is specific to data frames.
The function allows you to set labels for an entire data frame using a
named list. Here is an example:
# Prepare data
<- mtcars[1:10, 1:3]
dat
# Assign labels
labels(dat) <- list(mpg = "Miles Per Gallon",
cyl = "Cylinders",
disp = "Displacement")
# View label attributes
labels(dat)
# $mpg
# [1] "Miles Per Gallon"
#
# $cyl
# [1] "Cylinders"
#
# $disp
# [1] "Displacement"
This function makes it much easier to set and retrieve labels on a
data frame. The labels make it easier for users to understand the data.
This function should be included in Base R, but for some reason is
not.
top
Most programming languages provide a built-in concatenation operator.
R does not. Instead, it provides the paste()
and
paste0()
functions. While these functions do perform
concatenation adequately, it is sometimes more convenient to have an
operator.
The %p%
operator is an infix version of the
paste0()
function. It provides the same functionality of
paste0()
, but in a more compact manner. Like so:
# Concatenation using paste0() function
print(paste0("There are ", nrow(mtcars), " rows in the mtcars data frame"))
# [1] "There are 32 rows in the mtcars data frame"
# Concatenation using %p% operator
print("There are " %p% nrow(mtcars) %p% " rows in the mtcars data frame")
# [1] "There are 32 rows in the mtcars data frame"
The common package contains an enhanced equality
operator. The objective of the %eq%
operator is to return a
TRUE or FALSE value when any two objects are compared. This enhanced
equality operator is useful for situations when you don’t want to check
for NULL or NA values, or care about the data types of the objects you
are comparing.
The %eq%
operator also compares data frames. The
comparison will include all data values, but no attributes. This
functionality is particularly useful when comparing tibbles, as tibbles
often have many attributes assigned by dplyr
functions.
Below is an example of several comparisons using the
%eq%
infix operator:
# Comparing of NULLs and NA
NULL %eq% NULL # TRUE
NULL %eq% NA # FALSE
NA %eq% NA # TRUE
1 %eq% NULL # FALSE
1 %eq% NA # FALSE
# Comparing of atomic values
1 %eq% 1 # TRUE
"one" %eq% "one" # TRUE
1 %eq% "one" # FALSE
1 %eq% Sys.Date() # FALSE
# Comparing of vectors
<- c("A", "B", "C")
v1 <- c("A", "B", "C", "D")
v2 %eq% v1 # TRUE
v1 %eq% v2 # FALSE
v1
# Comparing of data frames
%eq% mtcars # TRUE
mtcars %eq% iris # FALSE
mtcars %eq% iris[1:50,] # FALSE
iris
# Mixing it up
%eq% NULL # FALSE
mtcars %eq% NA # FALSE
v1 1 %eq% v1 # FALSE
While it can be advantageous to have a comparison operator that does
not give errors when encountering a NULL or NA value, note that this
behavior can also mask problems with your code. Therefore, use the
%eq%
operator with care.
top
Most programming languages provide a simple way to get the path of
the currently running program. This basic feature has been left out of
R. The Sys.path()
function aims to make up for the
oversight.
# Get current path
<- Sys.path()
pth
# View path
pth# [1] "C:/packages/common/vignettes/common.Rmd"
Note that this function returns the full path of the currently
running program, including the file name and extension. This
functionality is different from getwd()
, which returns only
the current working directory.
Credit for this function goes to Andrew Simmons and the
this.path package. The this.path
functionality is renamed and provided here out of convenience.
top
As everyone knows, the R round()
function rounds to the
nearest even. For example:
# Prepare sample vector
<- seq(0.5,9.5,by=1)
v1
v1# [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
# Base R round function
<- round(v1)
r1
# Rounds to nearest even
r1# [1] 0 2 2 4 4 6 6 8 8 10
However, humans and other software systems usually round 5 up. The reasons for R rounding the way it does are valid. Yet this difference in the way R rounds sometimes makes it difficult to compare R results to results from other software systems, particularly SAS®. It would be convenient if there were another rounding function that could be used when trying to compare R results to SAS®.
That is the purpose of the roundup()
function. Observe
the differences in output to what was shown above:
# Round up function
<- roundup(v1)
r2
# Rounds 5 up
r2# [1] 1 2 3 4 5 6 7 8 9 10
Note that the function behaves differently when rounding negative values.
# Negate original vector
<- -v1
v2
v2# [1] -0.5 -1.5 -2.5 -3.5 -4.5 -5.5 -6.5 -7.5 -8.5 -9.5
# Rounding negative values
<- roundup(v2)
r3
# Rounds away from zero
r3# [1] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
As you can see, when dealing with negative numbers, the
roundup()
function actually rounds down. “Round away from
zero” is the best description of this function. The rounding logic of
the roundup()
function matches SAS® software, and can be
used when comparing output between the two systems.
top
Sometimes you know the name of the file you are looking for, but do not know the exact location. It might be in the directory above your program, or it might be in the directory below. It could be one level up, or 3 levels up.
The file.find()
function provides an easy way to search
for files you are looking for. You tell the function where to start
searching from and what to look for, and it will begin looking in the
base directory. Once the base directory is searched, it will expand the
search above and below the base directory. The search routine will
continue expanding the search until it hits the limits imposed by the
up
and down
parameters. Here is an
example:
# Look for a file named "globals.R"
<- file.find(getwd(), "globals.R")
pths
pths
# Look for Rdata files three levels up, and two levels down
<- file.find(getwd(), "*.Rdata", up = 3, down = 2)
pths pths
The function will return a vector of full paths that meet the search criteria, and are within bounds of the search. If no file is found that meets the search criteria, the function returns a NULL.
The dir.find()
function works the same as
file.find()
, but for directories instead of files. Note
that these two functions may be used together to perform complex
searches. top
Sometimes you have a data frame with many variables, and you need to
perform an operation on only some of them. The find.names()
function can help you subset these variable names. There are parameters
to define the search criteria, provide exclusions, and a beginning and
ending range to perform the search. Here are some simple examples:
# Prepare data
<- mtcars
dat
# View names
names(dat)
# [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
# Get all names starting with "c"
find.names(dat, pattern = "c*")
# [1] "cyl" "carb"
# Get all names starting with "c" or "d"
find.names(dat, pattern = c("c*", "d*"))
# [1] "cyl" "carb" "disp" "drat"
# Get names starting with "c" or "d" from column 4 on
find.names(dat, pattern = c("c*", "d*"), start = 4)
# [1] "carb" "drat"
Base R functions that work with data frames are annoying in that they often drop any attributes assigned to data frame columns. Observe:
# Prepare sample dataset
<- mtcars[ , 1:3]
dat
# Assign some labels
labels(dat) <- list(mpg = "Miles Per Gallon",
cyl = "Cylinders",
disp = "Displacement")
# View labels
labels(dat)
# $mpg
# [1] "Miles Per Gallon"
#
# $cyl
# [1] "Cylinders"
#
# $disp
# [1] "Displacement"
# Subset the data
<- subset(dat, cyl == 4)
dat2
# Labels are gone!
labels(dat2)
# list()
To get the attributes back, one must copy the attributes from the
original data frame to the subset data frame. That is what the
copy.attributes()
function does. Picking up from the
example above, let’s now restore the attributes lost during the
subset()
operation:
# Restore attributes
<- copy.attributes(dat, dat2)
dat2
# Labels are back!
labels(dat2)
# $mpg
# [1] "Miles Per Gallon"
#
# $cyl
# [1] "Cylinders"
#
# $disp
# [1] "Displacement"
There are many occasions when you need to create a superscript or subscript. The UTF-8 character set provides superscript and subscript versions of many commonly used characters. For example, the following code can be used to add a superscript ‘1’ to the front of a footnote string:
Remembering these UTF-8 codes, however, can be a challenge for most
people. The supsc()
and subsc()
functions look
up the superscript or subscript version of a normal character, without
having to remember or research the proper UTF-8 code.
Using these functions, we can therefore rewrite the above example as follows:
Here are a couple more examples:
Note that using the glue package, you can embed these functions directly in your character strings: