This vignette illustrates how to estimate a longitudinal IRT following the methodology described in Proust-Lima et al. (2022 - https://doi.org/10.1016/j.ymeth.2022.01.005). The model combines a graded response measurement model to link a series of binary or ordinal items measured repeatedly over time with their underlying latent process. Simultaneously, a linear mixed model describes the trajectory of the underlying latent process over time.

Continuous-time longitudinal IRT model estimation

We consider here a longitudinal IRT model with natural cubic splines on time on the waiting list to account for a possible nonlinear trajectory over time, and we adjust the trajectory for the group. We consider 2 internal knots placed at 7 and 15, and shift the right boundary at 60 due to the long tail of the distribution. In the main analysis, we estimate a model with no differential item functioning (DIF) on the group and no response shift (RS) on time.

Estimation of the model with no DIF/RS

To reduce the computation time, we first estimate a model with only 1 random-effect, a random intercept. And then, we use the estimates as initial values for the estimation of the model that also includes random-effects on the time functions. Estimation involves a numerical integration approximated by quasi Monte-Carlo (QMC) with 1000 points. This induces very intensive and long computations but was shown to give accurate results in simulations.

modIRT_i <- multlcmm(hads_2 + hads_4 +hads_6 + hads_8 +hads_10+hads_12 + hads_14 ~ ns(time,knots=c(7,15),Boundary.knots = c(0,60))*grp,random=~1,data=simdataHADS,subject="ID",link="thresholds",methInteg="QMC",nMC=1000)
# use the estimates as initial values - the vector c(0,1,0,0,1,0,0,0,1) initializes the cholesky matrix of the random-effects
Binit <- c(modIRT_i$best[1:7],c(0,1,0,0,1,0,0,0,1),modIRT_i$best[8:length(modIRT_i$best)])

modIRT <- multlcmm(hads_2 + hads_4 +hads_6 + hads_8 +hads_10+hads_12 + hads_14 ~ ns(time,knots=c(7,15),Boundary.knots = c(0,60))*grp,random=~1+ns(time,knots=c(7,15),Boundary.knots = c(0,60)),data=simdataHADS,subject="ID",link="thresholds",methInteg="QMC",nMC=1000, B=Binit)

The summary of the estimation:

summary(modIRT)
#> General latent class mixed model 
#>      fitted by maximum likelihood method 
#>  
#> multlcmm(fixed = hads_2 + hads_4 + hads_6 + hads_8 + hads_10 + 
#>     hads_12 + hads_14 ~ ns(time, knots = c(7, 15), Boundary.knots = c(0, 
#>     60)) * grp, random = ~1 + ns(time, knots = c(7, 15), Boundary.knots = c(0, 
#>     60)), subject = "ID", link = "thresholds", data = simdataHADS, 
#>     methInteg = "QMC", nMC = 1000, nproc = 20)
#>  
#> Statistical Model: 
#>      Dataset: simdataHADS 
#>      Number of subjects: 561 
#>      Number of observations: 7980 
#>      Number of latent classes: 1 
#>      Number of parameters: 44  
#>      Link functions: Thresholds for hads_2 
#>                      Thresholds for hads_4 
#>                      Thresholds for hads_6 
#>                      Thresholds for hads_8 
#>                      Thresholds for hads_10 
#>                      Thresholds for hads_12 
#>                      Thresholds for hads_14 
#>  
#> Iteration process: 
#>      Convergence criteria satisfied 
#>      Number of iterations:  23 
#>      Convergence criteria: parameters= 1.5e-07 
#>                          : likelihood= 2.9e-05 
#>                          : second derivatives= 6e-06 
#>  
#> Goodness-of-fit statistics: 
#>      maximum log-likelihood: -7218.07  
#>      AIC: 14524.13  
#>      BIC: 14714.64  
#>  
#> Maximum Likelihood Estimates: 
#>  
#> Fixed effects in the longitudinal model:
#> 
#>                               coef      Se   Wald p-value
#> intercept (not estimated)  0.00000                       
#> ns(...)1                  -0.12557 0.13637 -0.921 0.35718
#> ns(...)2                   0.21602 0.22892  0.944 0.34534
#> ns(...)3                   0.52819 0.17406  3.034 0.00241
#> grp                       -0.56649 0.15226 -3.721 0.00020
#> ns(...)1:grp               0.21044 0.20316  1.036 0.30027
#> ns(...)2:grp               0.46316 0.33009  1.403 0.16058
#> ns(...)3:grp              -0.00636 0.24646 -0.026 0.97942
#> 
#> 
#> Variance-covariance matrix of the random-effects:
#> (the variance of the first random effect is not estimated)
#>           intercept ns(...)1 ns(...)2 ns(...)3
#> intercept   1.00000                           
#> ns(...)1   -0.46121  0.84470                  
#> ns(...)2   -1.14421  1.12605  2.66212         
#> ns(...)3   -0.26878  0.58923  0.66262  0.46238
#> 
#>                            hads_2   hads_4   hads_6   hads_8  hads_10  hads_12
#> Residual standard error:  0.79159  0.64076  1.03750  1.00833  1.06157  0.68536
#>                           hads_14
#> Residual standard error:  1.63011
#> 
#> Parameters of the link functions:
#> 
#>                     coef      Se   Wald p-value
#> hads_2-Thresh1  -0.53483 0.11863 -4.508 0.00001
#> hads_2-Thresh2   1.13007 0.05254 21.509 0.00000
#> hads_2-Thresh3   0.89283 0.04974 17.949 0.00000
#> hads_4-Thresh1  -0.22854 0.10998 -2.078 0.03770
#> hads_4-Thresh2   1.01804 0.04766 21.361 0.00000
#> hads_4-Thresh3   1.04826 0.05791 18.102 0.00000
#> hads_6-Thresh1  -0.37848 0.11770 -3.216 0.00130
#> hads_6-Thresh2   1.35627 0.06718 20.188 0.00000
#> hads_6-Thresh3   1.32585 0.09338 14.198 0.00000
#> hads_8-Thresh1  -1.37992 0.16595 -8.315 0.00000
#> hads_8-Thresh2   1.34364 0.06319 21.262 0.00000
#> hads_8-Thresh3   1.10211 0.05741 19.196 0.00000
#> hads_10-Thresh1 -0.01232 0.11044 -0.112 0.91115
#> hads_10-Thresh2  1.00872 0.05355 18.839 0.00000
#> hads_10-Thresh3  1.10947 0.06459 17.176 0.00000
#> hads_12-Thresh1 -0.19444 0.11018 -1.765 0.07760
#> hads_12-Thresh2  0.99923 0.04764 20.973 0.00000
#> hads_12-Thresh3  1.02758 0.05720 17.966 0.00000
#> hads_14-Thresh1  0.92540 0.17048  5.428 0.00000
#> hads_14-Thresh2  1.49075 0.09725 15.330 0.00000
#> hads_14-Thresh3  0.87089 0.10087  8.634 0.00000

Predicted underlying depressive symptomatolgy trajectory

plot of predicted mean trajectories

The predicted trajectory of the underlying process can be obtained with predictL function and the associated plot function.

datnew <- data.frame(time = seq(0,75,by=1))
datnew$grp <- 0
pIRT0 <- predictL(modIRT,datnew,var.time="time",confint = T)
datnew$grp <- 1
pIRT1 <- predictL(modIRT,datnew,var.time="time",confint = T)
plot(pIRT0,col=1,lwd=2,ylim=c(-1.5,1.5),legend=NULL,main="",ylab="latent depressive symptomatology",xlab="months since entry on the waiting list",type="l",bty="l",shades=T)
plot(pIRT1,add=T,col=4,lwd=2,shades=T)
legend(x="topleft",legend=c("dialysed","preemptive"),lty=c(1,1),col=c(1,4),lwd=2,bty="n")

Posteriori distribution

To better appreciate the range of the underlying depressive symptomatology, the empirical posterior distribution can be computed

beta <- modIRT$best
t <- 0:72
Z <- cbind(rep(1,length(t)),ns(t,knots=c(7,15),Boundary.knots = c(0,60)))
chol <- matrix(0,ncol=4,nrow=4)
chol[upper.tri(chol, diag = T)] <- c(1,beta[7:15])
library(mvtnorm)
Lambda0 <- rmvnorm(10000,mean = Z%*%c(0,beta[1:3]),Z%*%t(chol)%*%chol%*%t(Z))
Lambda1 <- rmvnorm(10000,mean = Z%*%beta[4:7],Z%*%t(chol)%*%chol%*%t(Z))
Lambda <- data.frame(Lambda = as.vector(rbind(Lambda0,Lambda1)))
phist <- ggplot(Lambda,aes(x=Lambda))+ geom_density(color="grey", fill="grey") + theme_bw() +
  xlab("underlying depressive symptomatology") +xlim(-7,7)
phist
Warning: Removed 32004 rows containing non-finite values (stat_density).

The 95% range of the underlying distribution is approximately:

quantile(Lambda$Lambda,p=c(0.025,0.975))
#>      2.5%     97.5% 
#> -5.459380  5.533434

Location and discrimination of the items (with SE by Delta-Method)

The location and discrimination parameters are transformations of the estimated parameters. They are retrieved with the following code:

isolate the corresponding parameters within the vector of estimated parameters (and the corresponding variance covariance matrix)

## Parameters
nlevel <- 4
nitems <- 7
levels <- rep(nlevel,nitems)
npm <- length(modIRT$best)
seuils <- modIRT$best[(npm-(nlevel-1)*(nitems)+1):(npm)]
err <- abs(modIRT$best[(npm-(nlevel-1)*(nitems)-(nitems-1)):(npm-(nlevel-1)*(nitems))])
seuils
#>     Thresh1     Thresh2     Thresh3     Thresh1     Thresh2     Thresh3 
#> -0.53482702  1.13007471  0.89283211 -0.22854149  1.01804259  1.04826421 
#>     Thresh1     Thresh2     Thresh3     Thresh1     Thresh2     Thresh3 
#> -0.37847963  1.35626552  1.32584882 -1.37992176  1.34363560  1.10210680 
#>     Thresh1     Thresh2     Thresh3     Thresh1     Thresh2     Thresh3 
#> -0.01232312  1.00871824  1.10947092 -0.19444285  0.99923061  1.02758213 
#>     Thresh1     Thresh2     Thresh3 
#>  0.92539951  1.49075284  0.87088592
err
#> std.err 1 std.err 2 std.err 3 std.err 4 std.err 5 std.err 6 std.err 7 
#> 0.7915877 0.6407606 1.0374977 1.0083297 1.0615722 0.6853637 1.6301109
# Variance
Vseuils <- VarCov(modIRT)[(npm-(nlevel-1)*(nitems)+1):(npm),(npm-(nlevel-1)*(nitems)+1):(npm)]
Verr <- VarCov(modIRT)[(npm-(nlevel-1)*(nitems)-(nitems-1)):(npm-(nlevel-1)*(nitems)),(npm-(nlevel-1)*(nitems)-(nitems-1)):(npm-(nlevel-1)*(nitems))]

retransform these parameters to obtain the locations and the discrimination of each item with corresponding standard deviation (computed via Delta-method)

# generic function
location <- function(min,max,param,Vparam){
  loc <- param[1]
  se <- sqrt(Vparam[1,1])
  param2 <- rep(0,length(param))
  param2[1] <- 1
  if ((max-min)>1) {
    for (l in 1:(max-min-1)) {
      param2[l+1] <- 2*param[l+1]
      loc[l+1] <- loc[l] + param[1+l]^2
      se[l+1] <- sqrt(t(param2) %*%Vparam %*%param2)
    }
  }
  return(c(loc,se))
}
# application
ItemLoc <- sapply(1:nitems,function(k){location(min=0,max=nlevel-1,param=seuils[((nlevel-1)*(k-1)+1):((nlevel-1)*k)],Vparam=Vseuils[((nlevel-1)*(k-1)+1):((nlevel-1)*k),((nlevel-1)*(k-1)+1):((nlevel-1)*k)])})
colnames(ItemLoc) <- paste("Item",(1:nitems)*2)
ItemLoc <- ItemLoc[c(1,4,2,5,3,6),]
rownames(ItemLoc) <- rep(c("Threshold","SE"),nlevel-1)
discrimination <- 1/abs(err)
sediscr <- diag(err^(-2))%*%Verr%*%diag(err^(-2))

The 3 thresholds and discrimination estimates of each item are:

t(rbind(ItemLoc,discrimination,Se=sqrt(diag(sediscr))))
#>           Threshold        SE Threshold        SE Threshold        SE
#> Item 2  -0.53482702 0.1186315 0.7422418 0.1288603  1.539391 0.1787193
#> Item 4  -0.22854149 0.1099764 0.8078692 0.1294450  1.906727 0.2056184
#> Item 6  -0.37847963 0.1177005 1.4609765 0.1831477  3.218852 0.3678817
#> Item 8  -1.37992176 0.1659480 0.4254349 0.1201425  1.640074 0.1930347
#> Item 10 -0.01232312 0.1104402 1.0051894 0.1503464  2.236115 0.2509225
#> Item 12 -0.19444285 0.1101780 0.8040190 0.1314460  1.859944 0.2049276
#> Item 14  0.92539951 0.1704771 3.1477435 0.4110930  3.906186 0.5132664
#>         discrimination         Se
#> Item 2       1.2632839 0.12076274
#> Item 4       1.5606453 0.15190436
#> Item 6       0.9638575 0.09865421
#> Item 8       0.9917392 0.09708010
#> Item 10      0.9419990 0.09708166
#> Item 12      1.4590793 0.14172215
#> Item 14      0.6134552 0.07734546

item category probability curve

The curve of each item category probability according to the underlying level of depressive symptomatology can be obtain usinf the ItemInfo function.

## computations
info_modIRT <- ItemInfo(modIRT, lprocess=seq(-6,6,0.1))

meaning <- c("Enjoy","Laugh","Cheerful" ,"Slow" ,"Appearance" ,"Looking Forward" ,"Leisure")
items <- paste("hads", seq(2,14,2), sep="_")

## automatic graph
par(mfrow=c(2,4), mar=c(3,2,2,1), mgp=c(2,0.5,0))
for(k in 1:7)
{     
 plot(info_modIRT, which="LevelProb", outcome=items[k],
      main=paste("Item",2*k,"-",meaning[k]), lwd=2, legend=NULL)
}
plot(0,axes=FALSE, xlab="", ylab="", col="white")
legend("center", legend=paste("Level",0:3), col=1:4, lwd=2, box.lty=0)

## graph with ggplot
p <- NULL
for (k in 1:7){
ICC  <- info_modIRT$LevelInfo[which(info_modIRT$LevelInfo[,1]==items[k]),]
p[[k]] <- (ggplot(ICC)
      + geom_line(aes(x = Lprocess, y = Prob, group = Level,color=Level), show.legend = F,alpha = 1,size=1.2)
      # + stat_smooth(aes(x = time, y = hads_scorea), method = "loess", size = 0.75)
      + theme_bw()
      + labs(title=paste("Item",2*k,"-",meaning[k]))
      + xlab("construct")
      + ylim(0,1)
      + ylab("Probability of item level"))
}
p[[8]] <- (ggplot(ICC)
      + geom_line(aes(x = Lprocess, y = Prob, group = Level,color=Level),alpha = 1,size=1.2)
      + theme_bw()
)
legend <- get_legend(p[[8]])
grid.arrange(p[[1]],p[[2]],p[[3]],p[[4]],p[[5]],p[[6]],p[[7]],as_ggplot(legend),ncol=4)

Item characteristic curves

The following code computes the expectation of each item according to the underlying level of depressive symptomatology. This is achieved with predictYcond function with two plot possibilities: direct plot function or ggplot

expe <- predictYcond(modIRT,lprocess = seq(-6,6,by=0.1))
# via the internal plot function 
plot(expe, xlab="underlying depressive symptomatology", main="Item Expectation Curves",
     legend=paste("Item",(1:nitems)*2), lwd=2)

# via ggplot
j <- table(expe$pred$Yname)[1]
expe$pred$item <- as.factor(c(rep(2,j),rep(4,j),rep(6,j),rep(8,j),rep(10,j),rep(12,j),rep(14,j)))
p <- (ggplot(expe$pred)
      + geom_line(aes(x = Lprocess, y = Ypred, group=item,color=item),alpha = 1,size=1.2)
      + theme_bw()
      + xlab("underlying depressive symptomatology")
      + ylim(0,3)
      + ylab("Item Expectation"))
p

Item Information Function

The level of information brought by each item category (information share) and brought in total by each item is also computed by the ItemInfo function. The curves can be again plotted directly with options which=“LevelInfo” and which=“ItemInfo” respectively.

by Category

par(mfrow=c(2,4), mar=c(3,2,2,1), mgp=c(2,0.5,0))
for(k in 1:7)
{     
 plot(info_modIRT, which="LevelInfo", outcome=items[k],
 main=paste("Item",2*k,"-",meaning[k]), lwd=2, legend=NULL, ylim=c(0,1.3))
}
plot(0,axes=FALSE, xlab="", ylab="", col="white")
legend("center", legend=paste("Level",0:3), col=1:4, lwd=2, box.lty=0)

by Item

plot(info_modIRT, which="ItemInfo", lwd=2, legend.loc="topleft")

Predicted item trajectory according to time

Item predicted trajectories according to a specific profile of covariates can be computed using predictY function:

head(datnew)
datnew$grp <- 0
ns0 <- predictY(modIRT,var.time = "time",newdata=datnew,methInteg = 1,nsim=2000,draws=T)
datnew$grp <- 1
ns1 <- predictY(modIRT,var.time = "time",newdata=datnew,methInteg = 1,nsim=2000,draws=T)

par(mfrow=c(2,4), mar=c(3,2,2,1), mgp=c(2,0.5,0))
for(k in 1:7){
plot(ns0,outcome = k,shades = T,ylim=c(0,3),bty="l",legend=NULL,main=paste("Item",2*k,"-",meaning[k]),ylab="Item level",xlab="months on the waiting list")
plot(ns1,outcome=k,shades=T,add=T,col=2)
}

Continuous-time longitudinal IRT model

Importation

Dataset

Description of the sample

Timescale of interest: time on the waiting list