library(datefixR)
This vignette describes the functionality of datefixR
in
more detail than the README.
DatefixR
is a lightweight package consisting of a two
user-accessible functions, fix_date_char()
and
fix_date_df()
, which converts dates written in different
formats into R’s built-in Date
class. The former is
designed to modify character vectors whilst the latter is intended for
modifying columns of a data frame (or tibble).
Practically, this package is most useful when handling date data which has been supplied via text boxes (instead of a date-specific input with a consistent date format). However, this package may also be useful to validate the format of date data (see date and month imputation).
Firstly, we will demonstrate date standardization without imputation. We consider a data frame with two columns of dates in differing formats with no missing data.
<- data.frame(
bad.dates id = seq(5),
some.dates = c(
"02/05/92",
"01-04-2020",
"1996/05/01",
"2020-05-01",
"02-04-96"
),some.more.dates = c(
"01 03 2015",
"02/05/00",
"01/05/1990",
"03-Dec-2012",
"02 April 2020"
)
)::kable(bad.dates) knitr
id | some.dates | some.more.dates |
---|---|---|
1 | 02/05/92 | 01 03 2015 |
2 | 01-04-2020 | 02/05/00 |
3 | 1996/05/01 | 01/05/1990 |
4 | 2020-05-01 | 03-Dec-2012 |
5 | 02-04-96 | 02 April 2020 |
fix_date_df()
requires two arguments, df
, a
data frame (or tibble) object and col.names
, a character
vector containing the names of columns to be standardized. By default,
the first column of the data frame is assumed to contain row IDs. These
IDs are used if a warning or error is raised to assist with locating the
source of the error. The ID column can also be manually provided via the
id
argument.
The output from this function is a data frame object with the
selected date columns now belonging to the Date
class.
<- fix_date_df(
fixed.dates
bad.dates,c("some.dates", "some.more.dates")
)::kable(fixed.dates) knitr
id | some.dates | some.more.dates |
---|---|---|
1 | 1992-05-02 | 2015-03-01 |
2 | 2020-04-01 | 2000-05-02 |
3 | 1996-05-01 | 1990-05-01 |
4 | 2020-05-01 | 2012-12-03 |
5 | 1996-04-02 | 2020-04-02 |
datefixR
can handle many different formats including -,
/, or white space separation, year-first or day-first, and month
supplied as a number, an abbreviation or full length name.
fix_date_char()
is similar to fix_date_df()
but only applies to a single character object.
fix_date_char("01 02 2014")
#> [1] "2014-02-01"
Functions in datefixR
assume day-first instead of
month-first when day, month, and year are all given (unless year is
given first). However this behavior can be modified by passing
format = "mdy"
to function calls.
fix_date_char("01 02 2014", format = "mdy")
#> [1] "2014-01-02"
By default, datefixR
imputes missing months as July, and
missing days of the month as the first day. As such, “1992” converts
to
fix_date_char("1992")
#> [1] "1992-07-01"
The argument for defaulting to July is 1-2 July is halfway through
the year (on a non leap year). Therefore assuming the year supplied is
indeed correct, you are only a maximum of 6 months off from the true
date. However, this behavior can be changed by supplying the
day.impute
and month.impute
arguments with an
integer corresponding to the desired day and month. For example,
day.impute = 1
and month.impute = 1
results in
the first day of January being imputed instead.
fix_date_char("1992", day.impute = 1, month.impute = 1)
#> [1] "1992-01-01"
The imputation mechanism can also be modified to impute
NA
if a month or day is missing by setting
day.impute
or month.impute
to NA
.
This will also result in a warning being raised.
fix_date_char("1992", month.impute = NA)
#> Warning: NA imputed (date: 1992)
#> [1] NA
Finally, imputation can be prevented by setting
day.impute
or month.impute
to
NULL
. This will result in an error being raised if the day
or month are missing respectively.
fix_date_char("1992", month.impute = NULL)
# ERROR
day.impute
and month.impute
can also be
passed to fix_date_df()
for similar functionality.
<- data.frame(
example.df id = seq(1, 3),
some.dates = c("2014", "April 1990", "Mar 19")
)fix_date_df(example.df, "some.dates", day.impute = 1, month.impute = 1)
#> id some.dates
#> 1 1 2014-01-01
#> 2 2 1990-04-01
#> 3 3 2019-03-01
If you use this package in your research, please consider citing
datefixR
. An up-to-date citation can be obtained by
running
citation("datefixR")
#>
#> To cite datefixR in publications use:
#>
#> Constantine-Cooke, Nathan . datefixR: Fix Really Messy Dates in R. R
#> package version 1.1.0. DOI: 10.5281/zenodo.5655311.
#> https://CRAN.R-project.org/package=datefixR
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {{datefixR}: Standardize Dates in Different Formats or with Missing Data},
#> author = {Nathan Constantine-Cooke},
#> url = {https://CRAN.R-project.org/package=datefixR},
#> doi = {10.5281/zenodo.5655311},
#> note = {R package version 1.1.0},
#> }