This vignette is aimed at package authors who need to update their code because of a backward incompatible change to dplyr. We do try and minimise backward incompatible changes as much as possible, but sometimes they are necessary in order to radically simplify existing code, or unlock a lot of potential value in the future.
This vignette starts with some general advice on writing package code that works with multiple version of dplyr, then continues to discuss specific changes in dplyr versions.
Ideally, you want to make sure that your package works with both the released version and the development version of dplyr. This is typically a little bit more work, but has two big advantages:
It’s more convenient for your users, since they’re not forced to update dplyr if they don’t want to.
It’s easier on CRAN since it doesn’t require a massive coordinated release of multiple packages.
To make code work with multiple versions of a package, your first tool is the simple if statement:
if (utils::packageVersion("dplyr") > "0.5.0") {
# code for new version
else {
} # code for old version
}
Always condition on > current-version
, not
>= next-version
because this will ensure that this
branch is also used for the development version of the package. For
example, if the current release is version “0.5.0”, the development
version will be “0.5.0.9000”.
Occasionally, you’ll run into a situation where the
NAMESPACE
has changed and you need to conditionally import
different functions. This typically occurs when functions are moved from
one package to another. We try out best to provide automatic fallbacks,
but this is not always possible. Often you can work around the problem
by avoiding importFrom
and using ::
instead.
Do this where possible:
if (utils::packageVersion("dplyr") > "0.5.0") {
::build_sql(...)
dbplyrelse {
} ::build_sql(...)
dplyr }
This will generate an R CMD check
NOTE (because the one
of the functions will always be missing), but this is ok. Simply explain
that you get the note because you have written a wrapper to make sure
your code is backward compatible.
Sometimes it’s not possible to avoid importFrom()
. For
example you might be importing a generic so that you can define a method
for it. In this case, you can take advantage of a little-known feature
in the NAMESPACE
file: you can include if
statements.
#' @rawNamespace
#' if (utils::packageVersion("dplyr") > "0.5.0") {
#' importFrom("dbplyr", "build_sql")
#' } else {
#' importFrom("dplyr", "build_sql")
#' }
Almost all database related code has been moved out of dplyr and into a new package, dbplyr. This makes dplyr simpler, and will make it easier to release fixes for bugs that only affect databases. If you’ve implemented a database backend for dplyr, please read the backend news on the backend.
Depending on what generics you use, and what generics you provide
methods for you, you may need to write some conditional code. To help
make this easier we’ve written wrap_dbplyr_obj()
which will
write the helper code for you:
wrap_dbplyr_obj("build_sql")
wrap_dbplyr_obj("base_agg")
Simply copy the results of this function in your package.
These will generate R CMD check
NOTES, so make sure to
tell CRAN that this is to ensure backward compatibility.
verbs_()
Because the tidyeval framework allows us to combine SE and NSE semantics within the same functions, the underscored verbs have been softly deprecated.
The legacy underscored versions take objects for which a
lazyeval::as.lazy()
method is defined. This includes
symbols and calls, strings, and formulas. All of these objects have been
replaced with quosures and you can call tidyeval verbs with unquoted
quosures:
<- quo(cyl)
quo select(mtcars, !! quo)
Symbolic expressions are also supported, but note that bare symbols and calls do not carry scope information. If you’re referring to objects in the data frame, it’s safe to omit specifying an enclosure:
<- quote(cyl)
sym select(mtcars, !! sym)
<- quote(mean(cyl))
call summarise(mtcars, cyl = !! call)
Transforming objects into quosures is generally straightforward. To
enclose with the current environment, you can unquote directly in
quo()
or you can use as_quosure()
:
quo(!! sym)
#> <quosure>
#> expr: ^cyl
#> env: global
quo(!! call)
#> <quosure>
#> expr: ^mean(cyl)
#> env: global
::as_quosure(sym)
rlang#> Warning: `as_quosure()` requires an explicit environment as of rlang 0.3.0.
#> Please supply `env`.
#> This warning is displayed once per session.
#> <quosure>
#> expr: ^cyl
#> env: global
::as_quosure(call)
rlang#> <quosure>
#> expr: ^mean(cyl)
#> env: global
Note that while formulas and quosures are very similar objects (and in the most general sense, formulas are quosures), they can’t be used interchangeably in tidyeval functions. Early implementations did treat bare formulas as quosures, but this created compatibility issues with modelling functions of the stats package. Fortunately, it’s easy to transform formulas to quosures that will self-evaluate in tidyeval functions:
<- ~cyl
f
f#> ~cyl
::as_quosure(f)
rlang#> <quosure>
#> expr: ^cyl
#> env: global
Finally, and perhaps most importantly, strings are not and should not be parsed. As developers, it is tempting to try and solve problems using strings because we have been trained to work with strings rather than quoted expressions. However it’s almost always the wrong way to approach the problem. The exception is for creating symbols. In that case it is perfectly legitimate to use strings:
::sym("cyl")
rlang#> cyl
::syms(letters[1:3])
rlang#> [[1]]
#> a
#>
#> [[2]]
#> b
#>
#> [[3]]
#> c
But you should never use strings to create calls. Instead you can use quasiquotation:
<- rlang::syms(c("foo", "bar", "baz"))
syms quo(my_call(!!! syms))
#> <quosure>
#> expr: ^my_call(foo, bar, baz)
#> env: global
<- rlang::sym("my_call")
fun quo((!!fun)(!!! syms))
#> <quosure>
#> expr: ^my_call(foo, bar, baz)
#> env: global
Or create the call with call2()
:
<- rlang::call2("my_call", !!! syms)
call
call#> my_call(foo, bar, baz)
::as_quosure(call)
rlang#> <quosure>
#> expr: ^my_call(foo, bar, baz)
#> env: global
# Or equivalently:
quo(!! rlang::call2("my_call", !!! syms))
#> <quosure>
#> expr: ^my_call(foo, bar, baz)
#> env: global
Note that idioms based on interp()
should now generally
be avoided and replaced with quasiquotation. Where you used to
interpolate:
::interp(~ mean(var), var = rlang::sym("mpg")) lazyeval
You would now unquote:
<- "mpg"
var quo(mean(!! rlang::sym(var)))
See also vignette("programming")
for more about
quasiquotation and quosures.
mutate_each()
and
summarise_each()
These functions have been replaced by a more complete family of
functions. This family has suffixes _if
, _at
and _all
and includes more verbs than just
mutate
summarise
.
If you need to update your code to the new family, there are two
relevant functions depending on which variables you apply
funs()
to. If you called mutate_each()
without
supplying a selection of variables, funs
is applied to all
variables. In this case, you should update your code to use
mutate_all()
instead:
mutate_each(starwars, funs(as.character))
mutate_all(starwars, funs(as.character))
Note that the new verbs support bare functions as well, so you don’t
necessarily need to wrap with funs()
:
mutate_all(starwars, as.character)
On the other hand, if you supplied a variable selection, you should
use mutate_at()
. The variable selection should be wrapped
with vars()
.
mutate_each(starwars, funs(as.character), height, mass)
mutate_at(starwars, vars(height, mass), as.character)
vars()
supports all the selection helpers that you
usually use with select()
:
summarise_at(mtcars, vars(starts_with("d")), mean)
Note that instead of a vars()
selection, you can also
supply character vectors of column names:
mutate_at(starwars, c("height", "mass"), as.character)