Using Multiple Hot Deck Data Sets for Inference

Skyler Cranmer, Jeff Gill, Natalie Jackson, Andreas Murr, David A. Armstrong II, Simon Heuberger

August 17, 2021

This document will walk you through some of the methods you could use to generate pooled model results that account for both sampling variability and across imputation variability. The package hot.deck does not come with a set of functions to do inference, so we will show you how you could use the data generated by hot.deck in combination with glm.mids (and similarly lm.mids) from the mice package, zelig from the Zelig package and by using MIcombine from the mitools package on a list of model objects.

1 Generating Imputations

The data we will use come from Poe, Tate, and Keith (1999) dealing with democracy and state repression. First we need to call the hot.deck routine on the dataset.

library(hot.deck)
data(isq99)
out <- hot.deck(isq99, sdCutoff=3, IDvars = c("IDORIGIN", "YEAR"))
#> Warning in hot.deck(isq99, sdCutoff = 3, IDvars = c("IDORIGIN", "YEAR")): 52 observations with no observed data.  These observations were removed
#> Warning in hot.deck(isq99, sdCutoff = 3, IDvars = c("IDORIGIN", "YEAR")): 45 of 4661 imputations with # donors < 5, consider increasing sdCutoff or using method='p.draw'

This shows us that there are still 45 observations with fewer than 5 donors. Using a different method or further widening the sdCutoff parameter may alleviate the problem. If you want to see the frequency distribution of the number of donors, you could look at:

numdonors <- sapply(out$donors, length)
numdonors <- sapply(out$donors, length)
numdonors <- ifelse(numdonors > 5, 6, numdonors)
numdonors <- factor(numdonors, levels=1:6, labels=c(1:5, ">5"))
table(numdonors)
#> numdonors
#>    1    2    3    4    5   >5 
#>   18   10   11    6   20 4596

Before running a model, three variables have to be created from those existing. Generally, if variables are deterministic functions of other variables (e.g., transformations, lags, etc…) it is advisable to impute the constituent variables of the calculations and then do the calculations after the fact. Here, we need to lag the AI variable and create percentage change variables for both population and per-capita GNP. First, to create the lag of AI, PCGNP and LPOP. To do this, we will make a little function.

tscslag <- function(dat, x, id, time){
  obs <- apply(dat[, c(id, time)], 1, paste, collapse=".")
  tm1 <- dat[[time]] - 1
  lagobs <- apply(cbind(dat[[id]], tm1), 1, paste, collapse=".")
  lagx <- dat[match(lagobs, obs), x]
}
for(i in 1:length(out$data)){
  out$data[[i]]$lagAI <- tscslag(out$data[[i]], "AI", "IDORIGIN", "YEAR")
  out$data[[i]]$lagPCGNP <- tscslag(out$data[[i]], "PCGNP", "IDORIGIN", "YEAR")
  out$data[[i]]$lagLPOP <- tscslag(out$data[[i]], "LPOP", "IDORIGIN", "YEAR")
}

Now, we can use the lagged values of PCGNP and LPOP, to create percentage change variables:

for(i in 1:length(out$data)){
  out$data[[i]]$pctchgPCGNP <- with(out$data[[i]], c(PCGNP-lagPCGNP)/lagPCGNP)
  out$data[[i]]$pctchgLPOP <- with(out$data[[i]], c(LPOP-lagLPOP)/lagLPOP)
}

1.1 Using MIcombine

You can use the MIcombine command from the mitools package to generate inferences, too. Here, you have to produce a list of model estimates and the function will combine across the different results.

# initialize list
out <- hd2amelia(out)
results <- list()
# loop over imputed datasets
for(i in 1:length(out$imputations)){
    results[[i]] <- lm(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + LPOP + MIL2 + LEFT +
    BRIT + POLRT + CWARCOW + IWARCOW2, data=out$imputations[[i]])
}
summary(mitools::MIcombine(results))
#> Multiple imputation results:
#>       MIcombine.default(results)
#>                   results           se        (lower        upper) missInfo
#> (Intercept)  5.331437e-01 1.590137e-01  2.081119e-01  8.581754e-01     41 %
#> lagAI        4.631427e-01 1.865951e-02  4.256578e-01  5.006277e-01     31 %
#> pctchgPCGNP  5.988383e-03 3.892269e-03 -2.488344e-03  1.446511e-02     63 %
#> PCGNP       -2.136767e-05 3.184249e-06 -2.772946e-05 -1.500588e-05     27 %
#> pctchgLPOP  -6.311582e-01 1.425422e+00 -3.993216e+00  2.730900e+00     80 %
#> LPOP         7.487820e-02 8.753472e-03  5.759435e-02  9.216205e-02     17 %
#> MIL2         1.015658e-01 4.301407e-02  1.333586e-02  1.897957e-01     42 %
#> LEFT        -1.628915e-01 5.421410e-02 -2.756770e-01 -5.010597e-02     48 %
#> BRIT        -1.287129e-01 3.114049e-02 -1.898496e-01 -6.757634e-02      8 %
#> POLRT       -7.204190e-02 9.919891e-03 -9.221483e-02 -5.186898e-02     38 %
#> CWARCOW      6.237819e-01 5.715956e-02  5.096580e-01  7.379058e-01     27 %
#> IWARCOW2     1.814699e-01 5.570487e-02  7.133305e-02  2.916067e-01     18 %

1.2 Using mids

The final method for combining results is to convert the data object returned by the hot.deck function to an object of class mids. This can be done with the datalist2mids function from the miceadds package.

out.mids <- miceadds::datalist2mids(out$imputations)
#> Warning: Number of logged events: 1
s <- summary(mice::pool(mice::lm.mids(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + LPOP + MIL2 + LEFT +
BRIT + POLRT + CWARCOW + IWARCOW2, data=out.mids)))
#> Warning: Use with(imp, lm(yourmodel).
print(s, digits=4)
#>           term   estimate std.error statistic      df   p.value
#> 1  (Intercept)  5.383e-01 2.046e-01    2.6310  10.390 2.439e-02
#> 2        lagAI  4.694e-01 2.087e-02   22.4870  22.843 0.000e+00
#> 3  pctchgPCGNP  1.182e-03 5.107e-03    0.2315   6.972 8.236e-01
#> 4        PCGNP -1.998e-05 3.511e-06   -5.6909  24.624 6.669e-06
#> 5   pctchgLPOP -4.829e-01 1.112e+00   -0.4341   9.233 6.742e-01
#> 6         LPOP  7.295e-02 1.083e-02    6.7359  19.749 1.593e-06
#> 7         MIL2  9.926e-02 3.987e-02    2.4894  50.175 1.615e-02
#> 8         LEFT -1.535e-01 4.934e-02   -3.1120  38.888 3.476e-03
#> 9         BRIT -1.216e-01 3.592e-02   -3.3850  43.506 1.518e-03
#> 10       POLRT -7.111e-02 1.088e-02   -6.5345  19.262 2.748e-06
#> 11     CWARCOW  6.165e-01 6.984e-02    8.8277  16.373 1.259e-07
#> 12    IWARCOW2  1.826e-01 5.224e-02    3.4947 971.761 4.960e-04

References

Poe, Steven, C. Neal Tate, and Linda Camp Keith. 1999. “Repression of the Human Right to Personal Integrity Revisited: A Global, Cross-National Study Covering the Years 1976-1993.” International Studies Quarterly 43: 291–313.