Maintainer: | Julie Josse, Imke Mayer, Nicholas Tierney, and Nathalie Vialaneix (r-miss-tastic team) |
Contact: | r-miss-tastic at clementine.wf |
Version: | 2022-08-24 |
URL: | https://CRAN.R-project.org/view=MissingData |
Source: | https://github.com/cran-task-views/MissingData/ |
Contributions: | Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide. |
Citation: | Julie Josse, Imke Mayer, Nicholas Tierney, and Nathalie Vialaneix (r-miss-tastic team) (2022). CRAN Task View: Missing Data. Version 2022-08-24. URL https://CRAN.R-project.org/view=MissingData. |
Installation: | The packages from this task view can be installed automatically using the ctv package. For example, ctv::install.views("MissingData", coreOnly = TRUE) installs all the core packages or ctv::update.views("MissingData") installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details. |
Missing data are very frequently found in datasets. Base R provides a few options to handle them using computations that involve only observed data (na.rm = TRUE
in functions mean
, var
, … or use = complete.obs|na.or.complete|pairwise.complete.obs
in functions cov
, cor
, …). The base package stats
also contains the generic function na.action
that extracts information of the NA
action used to create an object. In addition, the package ie2misc contains a dyadic operator +
that behaves differently than the original +
operator regarding missing data.
These basic options are complemented by many packages on CRAN. In this task view, we focused on the most important ones, which have been published more than one year ago and are regularly updated. The task view is structured into main topics:
In addition to the present task view, this reference website on missing data might also be helpful. Complementary information might also be found in TimeSeries, SpatioTemporal, Survival, and OfficialStatistics. Note that most packages covering temporal, and spatio-temporal interpolation and censored data are not covered by the Missing Data task view.
If you think we have missed some important packages in this list, please e-mail the maintainers or submit an issue or pull request in the GitHub repository linked above.
Exploration of missing data
ampute
of mice, the package simFrame, which proposes a very general framework for simulations, or the package simglm, which simulates data and missing values in simple and generalized linear regression models. Similarly, imputeTestbench provides a benchmark to evaluate univariate time series imputation.Likelihood based approaches
em.norm
for multivariate Gaussian data), norm2 (using the function emNorm
), in cat (function em.cat
for multivariate categorical data), in mix (function em.mix
for multivariate mixed categorical and continuous data). These packages also implement Bayesian approaches (with Imputation and Posterior steps) for the same models (functions da.
XXX for norm
, cat
and mix
) and can be used to obtain imputed complete datasets or multiple imputations (functions imp.
XXX for norm
, cat
and mix
), once the model parameters have been estimated. monomvn proposes similar methods for multivariate normal and Student distributions when the missingness pattern is monotonic.MixtComp
. It can be used in combination with RMixtCompUtilities, which provides various graphical, getter, and utility functions.Single imputation
hotdeck
) and a fractional version (using weights) is provided in FHDI. StatMatch also uses hot-deck imputation to impute surveys from an external dataset.regressionImp
). iai tunes optimal imputation based on knn, tree or SVM.Multiple imputation
Some of the above mentioned packages can also handle multiple imputations.
In addition, mitools provides a generic approach to handle multiple imputation in combination with any imputation method, NADIA provides a uniform interface to compare the performance of several imputation algorithms, cobalt computes balance tables and plots for multiply imputed datasets, and SynthTools provides confidence intervals for multiply imputed datasets.
Weighting methods
Specific types of data
data.table
framework.Specific tasks
Specific application fields