boost_tree()
engine. To supply engine-specific arguments
that are documented in xgboost::xgb.train()
as arguments to
be passed via params
, supply the list elements directly as
named arguments to set_engine()
. Read more in
?details_boost_tree_xgboost
(#787).Enable the use of case weights for models that support them.
show_model_info()
now indicates which models can
utilize case weights.
Model type functions will now message informatively if a needed parsnip extension package is not loaded (#731).
Refactored internals of model specification printing functions.
These changes are non-breaking for extension packages, but the new
print_model_spec()
helper is exported for use in extensions
if desired (#739).
Fixed bug where previously set engine arguments would propagate
through update()
methods despite fresh = TRUE
(#704).
Fixed a bug where an error would be thrown if arguments to model functions were namespaced (#745).
predict(type = "prob")
will now provide an error if
the outcome variable has a level called "class"
(#720).
An inconsistency for probability type predictions for two-class GAM models was fixed (#708)
Fixed translated printing for null_model()
(#752)
Added a glm_grouped()
function to convert long data
to the grouped format required by glm()
for logistic
regression.
xgb_train()
now allows for case weights
Added ctree_train()
and cforest_train()
wrappers for the functions in the partykit package. Engines for these
will be added to other parsnip extension packages.
Exported xgb_predict()
which wraps xgboost’s
predict()
method for use with parsnip extension packages
(#688).
Added a developer function, .model_param_name_key
that translates names of tuning parameters.
Fixed a major bug in spark models induced in the previous version (#671).
Updated the parsnip add-in with new models and engines.
Updated parameter ranges for some tunable()
methods
and added a missing engine argument for brulee models.
Added information about how to install the mixOmics package for PLS models (#680)
Bayesian additive regression trees (BART) were added via the
bart()
function.
Added the "glm"
engine for linear_reg()
for numeric outcomes (#624).
Added brulee
engines for linear_reg()
,
logistic_reg()
, multinom_reg()
and
mlp()
.
A bug for class predictions of two-class GAM models was fixed (#541)
Fixed a bug for logistic_reg()
with the LiblineaR
engine (#552).
The list column produced when creating survival probability
predictions is now always called .pred
(with
.pred_survival
being used inside of the list
column).
Fixed outcome type checking affecting a subset of regression models (#625).
Prediction using multinom_reg()
with the
nnet
engine with a single row no longer fails
(#612).
When the xy interface is used and the underlying model expects to use a matrix, a better warning is issued when predictors contain non-numeric columns (including dates).
The fit time is only calculated when the verbosity
argument of control_parsnip()
is 2L or greater. Also, the
call to system.time()
now uses
gcFirst = FALSE
. (#611)
fit_control()
is soft-deprecated in favor of
control_parsnip()
.
New extract_parameter_set_dials()
method to extract
parameter sets from model specs.
New extract_parameter_dials()
method to extract a
single parameter from model specs.
Argument interval
was added for prediction: For
types "survival"
and "quantile"
, estimates for
the confidence or prediction interval can be added if available
(#615).
set_dependency()
now allows developers to create
package requirements that are specific to the model’s mode
(#604).
varying()
is soft-deprecated in favor of
tune()
.
varying_args()
is soft-deprecated in favor of
tune_args()
.
An autoplot()
method was added for glmnet objects,
showing the coefficient paths versus the penalty values (#642).
parsnip is now more robust working with keras and tensorflow for a larger range of versions (#596).
xgboost engines now use the new iterationrange
parameter instead of the deprecated ntreelimit
(#656).
devtools::load_all()
(#653).A model function (gen_additive_mod()
) was added for
generalized additive models.
Each model now has a default engine that is used when the model
is defined. The default for each model is listed in the help documents.
This also adds functionality to declare an engine in the model
specification function. set_engine()
is still required if
engine-specific arguments need to be added. (#513)
parsnip now checks for a valid combination of engine and mode (#529)
The default engine for multinom_reg()
was changed to
nnet
.
The helper functions .convert_form_to_xy_fit()
,
.convert_form_to_xy_new()
,
.convert_xy_to_form_fit()
, and
.convert_xy_to_form_new()
for converting between formula
and matrix interface are now exported for developer use (#508).
Fix bug in augment()
when non-predictor, non-outcome
variables are included in data (#510).
New article “Fitting and Predicting with parsnip” which contains examples for various combinations of model type and engine. ( #527)
A new linear SVM model svm_linear()
is now available
with the LiblineaR
engine (#424) and the
kernlab
engine (#438), and the LiblineaR
engine is available for logistic_reg()
as well (#429).
These models can use sparse matrices via fit_xy()
(#447)
and have a tidy
method (#474).
For models with glmnet
engines:
penalty
(either a single
numeric value or a value of tune()
) (#481).path_values
can be used to
set the lambda
path as a specific set of numbers
(independent of the value of penalty
). A pure ridge
regression models (i.e., mixture = 1
) will generate
incorrect values if the path does not include zero. See issue #431 for
discussion (#486).The liquidSVM
engine for svm_rbf()
was
deprecated due to that package’s removal from CRAN. (#425)
The xgboost engine for boosted trees was translating
mtry
to xgboost’s colsample_bytree
. We now map
mtry
to colsample_bynode
since that is more
consistent with how random forest works. colsample_bytree
can still be optimized by passing it in as an engine argument.
colsample_bynode
was added to xgboost after the
parsnip
package code was written. (#495)
For xgboost, mtry
and colsample_bytree
can be passed as integer counts or proportions, while
subsample
and validation
should always be
proportions. xgb_train()
now has a new option
counts
(TRUE
or FALSE
) that
states which scale for mtry
and
colsample_bytree
is being used. (#461)
Re-licensed package from GPL-2 to MIT. See consent from copyright holders here.
set_mode()
now checks if mode
is
compatible with the model class, similar to
new_model_spec()
(@jtlandis, #467). Both
set_mode()
and set_engine()
now error for
NULL
or missing arguments (#503).
Re-organized model documentation:
update
methods were moved out of the model help files
(#479).generics::required_pkgs()
was extended for
parsnip
objects.
Prediction functions now give a consistent error when a user uses
an unavailable value of type
(#489)
The augment()
method was changed to avoid failing if
the model does not enable class probabilities. The method now returns
tibbles despite the input data class (#487) (#478)
xgboost engines now respect the event_level
option
for predictions (#460).
An RStudio add-in is available that makes writing multiple
parsnip
model specifications to the source window. It can
be accessed via the IDE addin menus or by calling
parsnip_addin()
.
For xgboost
models, users can now pass
objective
to set_engine("xgboost")
.
(#403)
Changes to test for cases when CRAN cannot get
xgboost
to work on their Solaris configuration.
There is now an augument()
method for fitted models.
See augment.model_fit
. (#401)
Column names for x
are now required when
fit_xy()
is used. (#398)
There is now an event_level
argument for the
xgboost
engine. (#420)
New mode “censored regression” and new prediction types “linear_pred”, “time”, “survival”, “hazard”. (#396)
Censored regression models cannot use fit_xy()
(use
fit()
). (#442)
show_engines()
will provide information on the
current set for a model.
For three models (glmnet
, xgboost
, and
ranger
), enable sparse matrix use via fit_xy()
(#373).
Some added protections were added for function arguments that are
dependent on the data dimensions (e.g., mtry
,
neighbors
, min_n
, etc). (#184)
Infrastructure was improved for running parsnip
models in parallel using PSOCK clusters on Windows.
A glance()
method for model_fit
objects
was added (#325)
Specific tidy()
methods for glmnet
models fit via parsnip
were created so that the
coefficients for the specific fitted parsnip
model are
returned.
glmnet
models were fitting two intercepts
(#349)
The various update()
methods now work with
engine-specific parameters.
parsnip
now has options to set specific types of
predictor encodings for different models. For example,
ranger
models run using parsnip
and
workflows
do the same thing by not creating
indicator variables. These encodings can be overridden using the
blueprint
options in workflows
. As a
consequence, it is possible to get a different model fit that previous
versions of parsnip
. More details about specific encoding
changes are below. (#326)tidyr
>= 1.0.0 is now required.
SVM models produced by kernlab
now use the formula
method (see breaking change notice above). This change was due to how
ksvm()
made indicator variables for factor predictors (with
one-hot encodings). Since the ordinary formula method did not do this,
the data are passed as-is to ksvm()
so that the results are
closer to what one would get if ksmv()
were called
directly.
MARS models produced by earth
now use the formula
method.
For xgboost
, a one-hot encoding is used when
indicator variables are created.
Under-the-hood changes were made so that non-standard data arguments in the modeling packages can be accommodated. (#315)
A new main argument was added to boost_tree()
called
stop_iter
for early stopping. The xgb_train()
function gained arguments for early stopping and a percentage of data to
leave out for a validation set.
If fit()
is used and the underlying model uses a
formula, the actual formula is pass to the model (instead of a
placeholder). This makes the model call better.
A function named repair_call()
was added. This can
help change the underlying models call
object to better
reflect what they would have obtained if the model function had been
used directly (instead of via parsnip
). This is only useful
when the user chooses a formula interface and the model uses a formula
interface. It will also be of limited use when a recipes is used to
construct the feature set in workflows
or
tune
.
The predict()
function now checks to see if required
modeling packages are installed. The packages are loaded (but not
attached). (#249) (#308) (tidymodels/workflows#45)
The function req_pkgs()
is a user interface to
determining the required packages. (#308)
liquidSVM
was added as an engine for
svm_rbf()
(#300)tidy()
was broken on R 4.0.glmnet
model.glmnet
was removed as a dependency since the new
version depends on 3.6.0 or greater. Keeping it would constrain
parsnip
to that same requirement. All glmnet
tests are run locally.
A set of internal functions are now exported. These are helpful when creating a new package that registers new model specifications.
nnet
was added as an engine to
multinom_reg()
#209parsnip
and the underlying model function) for
spark
boosted trees and some keras
models. See
897c927.The time elapsed during model fitting is stored in the
$elapsed
slot of the parsnip model object, and is printed
when the model object is printed.
Some default parameter ranges were updated for SVM, KNN, and MARS models.
The model udpate()
methods gained a
parameters
argument for cases when the parameters are
contained in a tibble or list.
fit_control()
is soft-deprecated in favor of
control_parsnip()
.
A
bug was fixed standardizing the output column types of
multi_predict
and predict
for
multinom_reg
.
A
bug was fixed related to using data descriptors and
fit_xy()
.
A bug was fixed related to the column names generated by
multi_predict()
. The top-level tibble will always have a
column named .pred
and this list column contains tibbles
across sub-models. The column names for these sub-model tibbles will
have names consistent with predict()
(which was previously
incorrect). See 43c15db.
A
bug was fixed standardizing the column names of nnet
class probability predictions.
Test case update due to CRAN running extra tests (#202)
Unplanned release based on CRAN requirements for Solaris.
The method that parsnip
stores the model information
has changed. Any custom models from previous versions will need to use
the new method for registering models. The methods are detailed in
?get_model_env
and the package
vignette for adding models.
The mode needs to be declared for models that can be used for more than one mode prior to fitting and/or translation.
For surv_reg()
, the engine that uses the
survival
package is now called survival
instead of survreg
.
For glmnet
models, the full regularization path is
always fit regardless of the value given to penalty
.
Previously, the model was fit with passing penalty
to
glmnet
’s lambda
argument and the model could
only make predictions at those specific values. (#195)
add_rowindex()
can create a column called
.row
to a data frame.
If a computational engine is not explicitly set, a default will be used. Each default is documented on the corresponding model page. A warning is issued at fit time unless verbosity is zero.
nearest_neighbor()
gained a
multi_predict
method. The multi_predict()
documentation is a little better organized.
A suite of internal functions were added to help with upcoming model tuning features.
A parsnip
object always saved the name(s) of the
outcome variable(s) for proper naming of the predicted values.
Small release driven by changes in sample()
in the
current r-devel.
A “null model” is now available that fits a predictor-free model (using the mean of the outcome for regression or the mode for classification).
fit_xy()
can take a single column data frame or
matrix for y
without error
varying_args()
now has a full
argument
to control whether the full set of possible varying arguments is
returned (as opposed to only the arguments that are actually
varying).
fit_control()
not returns an S3 method.
For classification models, an error occurs if the outcome data are not encoded as factors (#115).
The prediction modules (e.g. predict_class
,
predict_numeric
, etc) were de-exported. These were internal
functions that were not to be used by the users and the users were using
them.
An event time data set (check_times
) was included
that is the time (in seconds) to run R CMD check
using the
“r-devel-windows-ix86+x86_64` flavor. Packages that errored are
censored.
varying_args()
now uses the version from the
generics
package. This means that the first argument,
x
, has been renamed to object
to align with
generics.
For the recipes step method of varying_args()
, there
is now error checking to catch if a user tries to specify an argument
that cannot be varying as varying (for example, the
id
) (#132).
find_varying()
, the internal function for detecting
varying arguments, now returns correct results when a size 0 argument is
provided. It can also now detect varying arguments nested deeply into a
call (#131, #134).
For multinomial regression, the .pred_
prefix is now
only added to prediction column names once (#107).
For multinomial regression using glmnet,
multi_predict()
now pulls the correct default penalty
(#108).
Confidence and prediction intervals for logistic regression were only computed the intervals for a single level. Both are now computed. (#156)
First CRAN release
set_engine()
. There is no engine
argumentothers
has been replaced by ...
regularization
was changed to penalty
in a
few models to be consistent with this
change.predict
methods, the earth
package will need to be attached to be fully operational.snake_case
, newdata
was changed to new_data
.predict_raw
method was added.fit
interface was previously used to cover both the
x/y interface as well as the formula interface. Now, fit()
is the formula interface and fit_xy()
is for the x/y interface.NEWS.md
file to track changes to the
package.predict
methods were overhauled to
be consistent.