Peter Meißner
2015-12-14
Base R's once done choice of setting stringsAsFactors within data.frame()
and as.data.frame()
to TRUE by default is a design decision that makes sense (more efficient storage, building statistical models with factors makes sense) on the one hand and on the other hand is supposedly the most often complained about piece of code in the R infrastructure. A search through the source code of all CRAN packages in December 2015 https://github.com/search?utf8=%E2%9C%93&q=user%3Acran+stringsAsFactors&type=Code
resulted in 3,795 results for mentions of stringsAsFactors
and most of them simply set the value to FALSE
. The hellno package provides an explicit solution to the problem without changing R itself or having to mess around with options. It tries to solve this problem by providing alternative data.frame()
and as.data.frame()
functions that are in fact simple wrappers around base R's data.frame()
and as.data.frame()
with stringAsFactors
option set to HELLNO
(equals to FALSE
) by default.
R's default behaviour...
df1 <- data.frame(a=letters[1:3])
df1$a
## [1] a b c
## Levels: a b c
class(df1$a)
## [1] "factor"
R's default behaviour after loading the package
library(hellno)
##
## Attaching package: 'hellno'
##
## Die folgenden Objekte sind maskiert von 'package:base':
##
## as.data.frame, data.frame
df2 <- data.frame(a=letters[1:3])
df2$a
## [1] "a" "b" "c"
class(df2$a)
## [1] "character"
While using the hellno package in interactive R mode is nice, in fact it could have been achieved simply by doing something like this: options("stringsAsFactors"=FALSE)
. The strength of hellno is that it can be imported when writing packages and therefore providing as.data.frame()
and data.frame()
with stringsAsFactors
Option consistently set to FALSE
. Once imported stringsAsFactors=FALSE
will be the default for all uses of data.frame()
and as.data.frame()
within all package functions BUT NOT OUTSIDE OF IT. Thus it provides a way to ease programming while also ensuring that package users might still choose which flavor of stringsAsFactors
they like best.
Let us see how this works following a little example. Again, let us start with loading hellno package:
library(hellno)
data.frame(a=letters[1:2])$a
## [1] "a" "b"
As shown before, character vector are not transformed to factor.
We unload hellno again to start clean.
unloadNamespace("hellno")
Now we install the hellnotest package from Github and load it. The package uses hellno internally in two functions. While internal uses of data.frame()
and as.data.frame()
will work with stringsAsFactors=FALSE
as default this does not change how things work everywhere else.
if( !("hellnotests" %in% installed.packages()) ){
devtools::install_github("petermeissner/hellnotests")
}
library(hellnotests)
data.frame(a=letters[1:2])$a
## [1] a b
## Levels: a b
While all functions within the package use hellno's alternative implementations:
hellno_df
## function ()
## {
## data.frame(a = letters[1:3])$a
## }
## <environment: namespace:hellnotests>
... and hence for them string conversion is no matter anymore:
hellno_df()
## [1] "a" "b" "c"
... and once again to bring the point home:
data.frame(a=letters[1:2])$a
## [1] a b
## Levels: a b
WRITING PACAKGES WITH HELLNO DOES NOT CHANGE OUTSIDE BEHAVIOR.