This document will walk you through some of the methods you could use to generate pooled model results that account for both sampling variability and across imputation variability. The package hot.deck
does not come with a set of functions to do inference, so we will show you how you could use the data generated by hot.deck
in combination with glm.mids
(and similarly lm.mids
) from the mice
package, zelig
from the Zelig
package and by using MIcombine
from the mitools
package on a list of model objects.
The data we will use come from Poe, Tate, and Keith (1999) dealing with democracy and state repression. First we need to call the hot.deck
routine on the dataset.
library(hot.deck)
data(isq99)
hot.deck(isq99, sdCutoff=3, IDvars = c("IDORIGIN", "YEAR"))
out <-#> Warning in hot.deck(isq99, sdCutoff = 3, IDvars = c("IDORIGIN", "YEAR")): 52 observations with no observed data. These observations were removed
#> Warning in hot.deck(isq99, sdCutoff = 3, IDvars = c("IDORIGIN", "YEAR")): 45 of 4661 imputations with # donors < 5, consider increasing sdCutoff or using method='p.draw'
This shows us that there are still 45 observations with fewer than 5 donors. Using a different method or further widening the sdCutoff
parameter may alleviate the problem. If you want to see the frequency distribution of the number of donors, you could look at:
sapply(out$donors, length)
numdonors <- sapply(out$donors, length)
numdonors <- ifelse(numdonors > 5, 6, numdonors)
numdonors <- factor(numdonors, levels=1:6, labels=c(1:5, ">5"))
numdonors <-table(numdonors)
#> numdonors
#> 1 2 3 4 5 >5
#> 18 10 11 6 20 4596
Before running a model, three variables have to be created from those existing. Generally, if variables are deterministic functions of other variables (e.g., transformations, lags, etc…) it is advisable to impute the constituent variables of the calculations and then do the calculations after the fact. Here, we need to lag the AI
variable and create percentage change variables for both population and per-capita GNP. First, to create the lag of AI
, PCGNP
and LPOP
. To do this, we will make a little function.
function(dat, x, id, time){
tscslag <- apply(dat[, c(id, time)], 1, paste, collapse=".")
obs <- dat[[time]] - 1
tm1 <- apply(cbind(dat[[id]], tm1), 1, paste, collapse=".")
lagobs <- dat[match(lagobs, obs), x]
lagx <-
}for(i in 1:length(out$data)){
$data[[i]]$lagAI <- tscslag(out$data[[i]], "AI", "IDORIGIN", "YEAR")
out$data[[i]]$lagPCGNP <- tscslag(out$data[[i]], "PCGNP", "IDORIGIN", "YEAR")
out$data[[i]]$lagLPOP <- tscslag(out$data[[i]], "LPOP", "IDORIGIN", "YEAR")
out }
Now, we can use the lagged values of PCGNP
and LPOP
, to create percentage change variables:
for(i in 1:length(out$data)){
$data[[i]]$pctchgPCGNP <- with(out$data[[i]], c(PCGNP-lagPCGNP)/lagPCGNP)
out$data[[i]]$pctchgLPOP <- with(out$data[[i]], c(LPOP-lagLPOP)/lagLPOP)
out }
You can use the MIcombine
command from the mitools
package to generate inferences, too. Here, you have to produce a list of model estimates and the function will combine across the different results.
# initialize list
hd2amelia(out)
out <- list()
results <-# loop over imputed datasets
for(i in 1:length(out$imputations)){
lm(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + LPOP + MIL2 + LEFT +
results[[i]] <- BRIT + POLRT + CWARCOW + IWARCOW2, data=out$imputations[[i]])
}summary(mitools::MIcombine(results))
#> Multiple imputation results:
#> MIcombine.default(results)
#> results se (lower upper) missInfo
#> (Intercept) 5.331437e-01 1.590137e-01 2.081119e-01 8.581754e-01 41 %
#> lagAI 4.631427e-01 1.865951e-02 4.256578e-01 5.006277e-01 31 %
#> pctchgPCGNP 5.988383e-03 3.892269e-03 -2.488344e-03 1.446511e-02 63 %
#> PCGNP -2.136767e-05 3.184249e-06 -2.772946e-05 -1.500588e-05 27 %
#> pctchgLPOP -6.311582e-01 1.425422e+00 -3.993216e+00 2.730900e+00 80 %
#> LPOP 7.487820e-02 8.753472e-03 5.759435e-02 9.216205e-02 17 %
#> MIL2 1.015658e-01 4.301407e-02 1.333586e-02 1.897957e-01 42 %
#> LEFT -1.628915e-01 5.421410e-02 -2.756770e-01 -5.010597e-02 48 %
#> BRIT -1.287129e-01 3.114049e-02 -1.898496e-01 -6.757634e-02 8 %
#> POLRT -7.204190e-02 9.919891e-03 -9.221483e-02 -5.186898e-02 38 %
#> CWARCOW 6.237819e-01 5.715956e-02 5.096580e-01 7.379058e-01 27 %
#> IWARCOW2 1.814699e-01 5.570487e-02 7.133305e-02 2.916067e-01 18 %
The final method for combining results is to convert the data object returned by the hot.deck
function to an object of class mids
. This can be done with the datalist2mids
function from the miceadds
package.
miceadds::datalist2mids(out$imputations)
out.mids <-#> Warning: Number of logged events: 1
summary(mice::pool(mice::lm.mids(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + LPOP + MIL2 + LEFT +
s <-+ POLRT + CWARCOW + IWARCOW2, data=out.mids)))
BRIT #> Warning: Use with(imp, lm(yourmodel).
print(s, digits=4)
#> term estimate std.error statistic df p.value
#> 1 (Intercept) 5.383e-01 2.046e-01 2.6310 10.390 2.439e-02
#> 2 lagAI 4.694e-01 2.087e-02 22.4870 22.843 0.000e+00
#> 3 pctchgPCGNP 1.182e-03 5.107e-03 0.2315 6.972 8.236e-01
#> 4 PCGNP -1.998e-05 3.511e-06 -5.6909 24.624 6.669e-06
#> 5 pctchgLPOP -4.829e-01 1.112e+00 -0.4341 9.233 6.742e-01
#> 6 LPOP 7.295e-02 1.083e-02 6.7359 19.749 1.593e-06
#> 7 MIL2 9.926e-02 3.987e-02 2.4894 50.175 1.615e-02
#> 8 LEFT -1.535e-01 4.934e-02 -3.1120 38.888 3.476e-03
#> 9 BRIT -1.216e-01 3.592e-02 -3.3850 43.506 1.518e-03
#> 10 POLRT -7.111e-02 1.088e-02 -6.5345 19.262 2.748e-06
#> 11 CWARCOW 6.165e-01 6.984e-02 8.8277 16.373 1.259e-07
#> 12 IWARCOW2 1.826e-01 5.224e-02 3.4947 971.761 4.960e-04
Poe, Steven, C. Neal Tate, and Linda Camp Keith. 1999. “Repression of the Human Right to Personal Integrity Revisited: A Global, Cross-National Study Covering the Years 1976-1993.” International Studies Quarterly 43: 291–313.