First, a word of caution. The examples shown in this section are meant to simply show what the functions do and not what the best model is. For a specific use case, please perform the necessary model checks, post-hoc analyses, and/or choose predictor variables and model types as appropriate based on domain knowledge.
With this in mind, let us look at how we can perform modeling tasks using manymodelr
.
multi_model_1
This is one of the core functions of the package. multi_model_1
aims to allow model fitting, prediction, and reporting with a single function. The multi
part of the function’s name reflects the fact that we can fit several model types with one function. An example follows next.
For purposes of this report, we create a simple dataset to use.
library(manymodelr)
set.seed(520)
# Create a simple dataset with a binary target
# Here normal is a fictional target where we assume that it meets
# some criterion means
sample_data <- data.frame(normal = as.factor(rep(c("Yes", "No"), 500)),
height=rnorm(100, mean=0.5, sd=0.2),
weight=runif(100,0, 0.6),
yield = rnorm(100, mean =520, sd = 10))
head(sample_data)
#> normal height weight yield
#> 1 Yes 0.2849090 0.13442312 520.2837
#> 2 No 0.2427826 0.37484971 504.4754
#> 3 Yes 0.2579432 0.47134828 515.6463
#> 4 No 0.5175604 0.50143592 522.2247
#> 5 Yes 0.4026023 0.47171755 502.6406
#> 6 No 0.9789886 0.04191937 509.4663
set.seed(520)
train_set<-createDataPartition(sample_data$normal,p=0.6,list=FALSE)
valid_set<-sample_data[-train_set,]
train_set<-sample_data[train_set,]
ctrl<-trainControl(method="cv",number=5)
m<-multi_model_1(train_set,"normal",".",c("knn","rpart"),
"Accuracy",ctrl,new_data =valid_set)
The above returns a list containing metrics, predictions, and a model summary. These can be extracted as shown below.
m$metric
#> # A tibble: 1 x 2
#> knn_accuracy rpart_accuracy
#> <dbl> <dbl>
#> 1 0.872 0.68
head(m$predictions)
#> # A tibble: 6 x 2
#> knn rpart
#> <chr> <chr>
#> 1 Yes Yes
#> 2 No Yes
#> 3 No No
#> 4 No Yes
#> 5 No No
#> 6 Yes Yes
This is similar to multi_model_1
with one difference: it does not use metrics such as RMSE, accuracy and the like. This function is useful if one would like to fit and predict “simpler models” like generalized linear models or linear models. Let’s take a look:
# fit a linear model and get predictions
lin_model <- multi_model_2(mtcars[1:16,],mtcars[17:32,],"mpg","wt","lm")
lin_model[c("predicted", "mpg")]
#> predicted mpg
#> Mazda RX4 10.17314 21.0
#> Mazda RX4 Wag 24.32264 21.0
#> Datsun 710 26.95458 22.8
#> Hornet 4 Drive 25.96479 21.4
#> Hornet Sportabout 23.13039 18.7
#> Valiant 18.38390 18.1
#> Duster 360 18.76632 14.3
#> Merc 240D 16.94420 24.4
#> Merc 230 16.92171 22.8
#> Merc 280 25.51488 19.2
#> Merc 280C 24.59258 17.8
#> Merc 450SE 27.41348 16.4
#> Merc 450SL 19.95856 17.3
#> Merc 450SLC 21.75818 15.2
#> Cadillac Fleetwood 18.15895 10.4
#> Lincoln Continental 21.71319 10.4
From the above, we see that wt
alone may not be a great predictor for mpg
. We can fit a multi-linear model with other predictors. Let’s say disp
and drat
are important too, then we add those to the model.
multi_lin <- multi_model_2(mtcars[1:16, ], mtcars[17:32,],"mpg", "wt + disp + drat","lm")
multi_lin[,c("predicted", "mpg")]
#> predicted mpg
#> Mazda RX4 10.43041 21.0
#> Mazda RX4 Wag 24.39765 21.0
#> Datsun 710 25.56629 22.8
#> Hornet 4 Drive 25.38957 21.4
#> Hornet Sportabout 23.15234 18.7
#> Valiant 17.36908 18.1
#> Duster 360 17.67102 14.3
#> Merc 240D 15.59802 24.4
#> Merc 230 14.96161 22.8
#> Merc 280 25.05592 19.2
#> Merc 280C 23.66222 17.8
#> Merc 450SE 25.95326 16.4
#> Merc 450SL 17.05637 17.3
#> Merc 450SLC 21.97756 15.2
#> Cadillac Fleetwood 17.22593 10.4
#> Lincoln Continental 22.17872 10.4
fit_model
This function allows us to fit any kind of model without necessarily returning predictions.
lm_model <- fit_model(mtcars,"mpg","wt","lm")
lm_model
#>
#> Call:
#> lm(formula = mpg ~ wt, data = use_df)
#>
#> Coefficients:
#> (Intercept) wt
#> 37.285 -5.344
fit_models
This is similar to fit_model
with the ability to fit many models with many predictors at once. A simple linear model for instance:
models<-fit_models(df=sample_data,yname=c("height", "weight"),xname="yield",
modeltype="glm")
One can then use these models as one may wish. To add residuals from these models for example:
res_residuals <- lapply(models[[1]], add_model_residuals,sample_data)
res_predictions <- lapply(models[[1]], add_model_predictions, sample_data, sample_data)
# Get height predictions for the model height ~ yield
head(res_predictions[[1]])
#> normal height weight yield predicted
#> 1 Yes 0.2849090 0.13442312 520.2837 0.5028866
#> 2 No 0.2427826 0.37484971 504.4754 0.4943626
#> 3 Yes 0.2579432 0.47134828 515.6463 0.5003860
#> 4 No 0.5175604 0.50143592 522.2247 0.5039331
#> 5 Yes 0.4026023 0.47171755 502.6406 0.4933732
#> 6 No 0.9789886 0.04191937 509.4663 0.4970537
If one would like to drop non-numeric columns from the analysis, one can set drop_non_numeric
to TRUE
as follows. The same can be done for fit_model
above:
fit_models(df=sample_data,yname=c("height","weight"),
xname=".",modeltype=c("lm","glm"), drop_non_numeric = TRUE)
#> [[1]]
#> [[1]][[1]]
#>
#> Call:
#> lm(formula = height ~ ., data = use_df)
#>
#> Coefficients:
#> (Intercept) weight yield
#> 0.2176942 -0.2185572 0.0006712
#>
#>
#> [[1]][[2]]
#>
#> Call:
#> lm(formula = weight ~ ., data = use_df)
#>
#> Coefficients:
#> (Intercept) height yield
#> 0.0112753 -0.1463926 0.0006827
#>
#>
#>
#> [[2]]
#> [[2]][[1]]
#>
#> Call: glm(formula = height ~ ., data = use_df)
#>
#> Coefficients:
#> (Intercept) weight yield
#> 0.2176942 -0.2185572 0.0006712
#>
#> Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
#> Null Deviance: 45.82
#> Residual Deviance: 44.32 AIC: -270.3
#>
#> [[2]][[2]]
#>
#> Call: glm(formula = weight ~ ., data = use_df)
#>
#> Coefficients:
#> (Intercept) height yield
#> 0.0112753 -0.1463926 0.0006827
#>
#> Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
#> Null Deviance: 30.7
#> Residual Deviance: 29.69 AIC: -671.1
To extract information about a given model, we can use extract_model_info
as follows.
extract_model_info(lm_model, "r2")
#> [1] 0.7528328
To extract the adjusted R squared:
extract_model_info(lm_model, "adj_r2")
#> [1] 0.7445939
For the p value:
extract_model_info(lm_model, "p_value")
#> (Intercept) wt
#> 8.241799e-19 1.293959e-10
To extract multiple attributes:
extract_model_info(lm_model,c("p_value","response","call","predictors"))
#> $p_value
#> (Intercept) wt
#> 8.241799e-19 1.293959e-10
#>
#> $response
#> [1] "mpg"
#>
#> $call
#> lm(formula = mpg ~ wt, data = use_df)
#>
#> $predictors
#> [1] "wt"
This is not restricted to linear models but will work for most model types. See help(extract_model_info)
to see currently supported model types.