Once a model has been estimated, it can be used to predict
probabilities and / or outcomes for a set of alternatives. This vignette
demonstrates examples of how to so using the predict()
method along with an estimated model.
You can make predictions for any set of alternatives, so long as the
columns in the alternatives correspond to estimated coefficients in your
model. By default, if no new data are provided via the
newdata
argument, then predictions will be made for the
original data used to estimate the model.
Predictions can be made using both preference space and WTP space models, as well as multinomial logit and mixed logit models. For mixed logit models, heterogeneity is modeled by simulating draws from the population estimates of the estimated model.
In the example below, a preference space MNL model is estimated
(mnl_pref
) and then used to predict probabilities for the
data used to estimate the model:
library("logitr")
<- logitr(
mnl_pref data = yogurt,
outcome = 'choice',
obsID = 'obsID',
pars = c('price', 'feat', 'brand')
)
<- predict(mnl_pref)
probs head(probs)
#> obsID predicted_prob
#> 1 1 0.41802407
#> 2 1 0.02118240
#> 3 1 0.23691737
#> 4 1 0.32387615
#> 5 2 0.26643822
#> 6 2 0.02255486
The predict()
method returns a data frame containing the
observation ID as well as the predicted probabilities. The original data
can also be returned in the data frame by setting
returnData = TRUE
:
<- predict(mnl_pref, returnData = TRUE)
probs head(probs)
#> obsID predicted_prob price feat brandhiland brandweight brandyoplait choice
#> 1 1 0.41802407 8.1 0 0 0 0 0
#> 2 1 0.02118240 6.1 0 1 0 0 0
#> 3 1 0.23691737 7.9 0 0 1 0 1
#> 4 1 0.32387615 10.8 0 0 0 1 0
#> 5 2 0.26643822 9.8 0 0 0 0 1
#> 6 2 0.02255486 6.4 0 1 0 0 0
To make predictions for a new set of alternatives, use the
newdata
argument. The example below makes predictions for
just two of the choice observations from the yogurt
dataset:
<- subset(
data %in% c(42, 13),
yogurt, obsID select = c('obsID', 'alt', 'price', 'feat', 'brand'))
<- predict(
probs_mnl_pref
mnl_pref,newdata = data,
obsID = "obsID",
)
probs_mnl_pref#> obsID predicted_prob
#> 1 13 0.43685145
#> 2 13 0.03312986
#> 3 13 0.19155548
#> 4 13 0.33846321
#> 5 42 0.60764778
#> 6 42 0.02602007
#> 7 42 0.17803313
#> 8 42 0.18829902
The ci
argument can be used to obtain upper and lower
bounds of a confidence interval for predicted probabilities, which are
estimated using the Krinsky and Robb parametric bootstrapping method
(Krinsky and Robb 1986). For example, a
95% CI is obtained with ci = 0.95
:
<- predict(
probs_mnl_pref
mnl_pref,newdata = data,
obsID = "obsID",
ci = 0.95
)
probs_mnl_pref#> obsID predicted_prob predicted_prob_lower predicted_prob_upper
#> 1 13 0.43685145 0.41490293 0.45807457
#> 2 13 0.03312986 0.02605820 0.04220765
#> 3 13 0.19155548 0.17660037 0.20716517
#> 4 13 0.33846321 0.31736913 0.35773877
#> 5 42 0.60764778 0.57204770 0.64057358
#> 6 42 0.02602007 0.01872655 0.03732723
#> 7 42 0.17803313 0.16220968 0.19451696
#> 8 42 0.18829902 0.16778560 0.20931234
WTP space models can also be used to predict probabilities. In the
example below, a WTP space model is estimated and used to predict
probabilities for the same data
data frame as in the
previous examples:
<- logitr(
mnl_wtp data = yogurt,
outcome = 'choice',
obsID = 'obsID',
pars = c('feat', 'brand'),
scalePar = 'price',
numMultiStarts = 10
)
<- predict(
probs_mnl_wtp
mnl_wtp,newdata = data,
obsID = "obsID",
ci = 0.95
)
probs_mnl_wtp#> obsID predicted_prob predicted_prob_lower predicted_prob_upper
#> 1 13 0.43686141 0.41626125 0.45759058
#> 2 13 0.03312947 0.02631329 0.04204450
#> 3 13 0.19154829 0.17587351 0.20824388
#> 4 13 0.33846083 0.31824968 0.35810533
#> 5 42 0.60767120 0.57426109 0.63738465
#> 6 42 0.02601800 0.01817677 0.03564045
#> 7 42 0.17802363 0.16251472 0.19536331
#> 8 42 0.18828717 0.16746411 0.20889147
Here is a bar chart comparing the predicted probabilities from the preference space and WTP space models. Since both models are equivalent except in different spaces, the predicted probabilities are identical:
library("ggplot2")
<- rbind(probs_mnl_pref, probs_mnl_wtp)
probs $model <- c(rep("mnl_pref", 8), rep("mnl_wtp", 8))
probs$alt <- rep(c("dannon", "hiland", "weight", "yoplait"), 4)
probs$obs <- paste0("Observation ID: ", probs$obsID)
probsggplot(probs, aes(x = alt, y = predicted_prob, fill = model)) +
geom_bar(stat = 'identity', width = 0.7, position = "dodge") +
geom_errorbar(aes(ymin = predicted_prob_lower, ymax = predicted_prob_upper),
width = 0.2, position = position_dodge(width = 0.7)) +
facet_wrap(vars(obs)) +
scale_y_continuous(limits = c(0, 1)) +
labs(x = 'Alternative', y = 'Expected Choice Probabilities') +
theme_bw()
The predict()
method can also be used to predict
outcomes by setting type = "outcome"
(the default is
"prob"
for predicting probabilities). In the examples
below, outcomes are predicted using the same preference space and WTP
space models as in the previous examples. The returnData
argument is also set to TRUE
so that the predicted outcomes
can be compared to the actual choices made:
<- predict(mnl_pref, type = "outcome", returnData = TRUE)
outcomes_pref head(outcomes_pref)
#> obsID predicted_outcome price feat brandhiland brandweight brandyoplait
#> 1 1 1 8.1 0 0 0 0
#> 2 1 0 6.1 0 1 0 0
#> 3 1 0 7.9 0 0 1 0
#> 4 1 0 10.8 0 0 0 1
#> 5 2 0 9.8 0 0 0 0
#> 6 2 0 6.4 0 1 0 0
#> choice
#> 1 0
#> 2 0
#> 3 1
#> 4 0
#> 5 1
#> 6 0
<- predict(mnl_wtp, type = "outcome", returnData = TRUE)
outcomes_wtp head(outcomes_wtp)
#> obsID predicted_outcome feat brandhiland brandweight brandyoplait scalePar
#> 1 1 1 0 0 0 0 8.1
#> 2 1 0 0 1 0 0 6.1
#> 3 1 0 0 0 1 0 7.9
#> 4 1 0 0 0 0 1 10.8
#> 5 2 1 0 0 0 0 9.8
#> 6 2 0 0 1 0 0 6.4
#> choice
#> 1 0
#> 2 0
#> 3 1
#> 4 0
#> 5 1
#> 6 0
The accuracy of each model can be computed by dividing the number of correctly predicted choices by the total number of choices:
<- subset(outcomes_pref, choice == 1)
chosen_pref $correct <- chosen_pref$choice == chosen_pref$predicted_outcome
chosen_pref<- sum(chosen_pref$correct) / nrow(chosen_pref)
accuracy_pref
accuracy_pref#> [1] 0.3888889
<- subset(outcomes_wtp, choice == 1)
chosen_wtp $correct <- chosen_wtp$choice == chosen_wtp$predicted_outcome
chosen_wtp<- sum(chosen_wtp$correct) / nrow(chosen_wtp)
accuracy_wtp
accuracy_wtp#> [1] 0.3909619
These results show that both models correctly predicted choice for
approximately 39% of the observations in the yogurt
data
frame, which is significantly better than random (25%).