Deady (2004) studied naive mock-jurors’ responses to two types of vertic setups. One has two options only (guilt vs. acquittal) and the other has an additional option “not proven”. The researcher also investigated the influence of conflicting testimonial evidence on judgments. The study had 2 (two vs. three-option verdict) by 2 (conflict vs. no-conflict conditions) factorial design. The dependent variable was the juror’s degree of confidence in her or his verdict, expressed as a percentage rating (0–100). The data included the responses from 104 first-year psychology students at The Australian National University.
The data included three variables.
crc99
: The ratings of confidence levels with rescaling into the (0, 1) interval to avoid 1 and 0 values.vert
: was the dummy variable for coding the conditions of verdict types, whereas
verdict = -1
indicates responses from the two-option verdict condition andverdict = 1
indicates responses from the three-option verdict condition.confl
: was the dummy variable for coding the conflict conditions, whereas
confl = -1
indicates the no-conflict condition andconfl = 1
indicates the conflict condition.library(cdfquantreg)
data(cdfqrExampleData)
#Quick overview of the variables
rbind(head(JurorData,4),tail(JurorData,4))
## vert confl crc99
## 1 -1 -1 0.500
## 2 -1 -1 0.698
## 3 -1 -1 0.797
## 4 -1 -1 0.698
## 101 1 1 0.946
## 102 1 1 0.698
## 103 1 1 0.995
## 104 1 1 0.847
The hypothesis was that the conflicting evidence would lower juror confidence and that the three-option condition would increase it.
That is, it was expected that there would be significant interaction effect between vert
and confl
when predicting crc99
. To test the hypothesis, we fit a a model in cdfquantreg by using the T2-T2 distribution as a demonstration.
# We use T2-T2 distribution
<- "t2" # The parent distribution
fd <- "t2" # The child distribution
sd
# Fit the null model
<- cdfquantreg(crc99 ~ 1 | 1, fd, sd, data = JurorData)
fit_null
# Fit a main effect model
<- cdfquantreg(crc99 ~ vert + confl | 1, fd, sd, data = JurorData)
fit1
# Fit the full model
<- cdfquantreg(crc99 ~ vert*confl | 1, fd, sd, data = JurorData)
fit2
anova(fit1,fit2)
## Likelihood ratio tests
##
## Resid. Df -2Loglik Df LR stat Pr(>Chi)
## 1 100 -53.625
## 2 99 -54.527 1 0.90193 0.3423
# Obtain the statistics for the null model
summary(fit2)
## Family: t2 t2
## Call: cdfquantreg(formula = crc99 ~ vert * confl | 1, data = JurorData,
## fd = fd, sd = sd)
##
## Mu coefficients (Location submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.76694 0.09919 7.732 1.07e-14 ***
## vert 0.09942 0.09778 1.017 0.309
## confl 0.07937 0.09778 0.812 0.417
## vert:confl 0.09335 0.09778 0.955 0.340
##
## Sigma coefficients (Dispersion submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.2258 0.1209 -1.868 0.0618 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Converge: successful completion
## Log-Likelihood: 27.2635
##
## Gradient: 3e-04 0 -3e-04 0 -3e-04
The results were similar to the ones outlined in Smithson & Verkuilen (2006). There was no significant interaction between vert
and confl
in the location submodel.
Next, we replicate the steps in Smithson & Verkuilen by fitting models with the dispersion submodel, expanding the formula to crc99 ~ vert*confl |vert + confl
and crc99 ~ vert*confl |vert*confl
.
# Fit a main effect model
<- cdfquantreg(crc99 ~ vert*confl |vert + confl, fd, sd, data = JurorData)
fit3
# Fit the full model
<- cdfquantreg(crc99 ~ vert*confl |vert*confl, fd, sd, data = JurorData)
fit4
anova(fit2, fit3, fit4)
## Likelihood ratio tests
##
## Resid. Df -2Loglik Df LR stat Pr(>Chi)
## 1 99 -54.527
## 2 97 -60.958 2 6.4308 0.04014 *
## 3 96 -61.746 1 0.7878 0.37475
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Obtain the statistics for the null model
summary(fit4)
## Family: t2 t2
## Call:
## cdfquantreg(formula = crc99 ~ vert * confl | vert * confl, data = JurorData,
## fd = fd, sd = sd)
##
## Mu coefficients (Location submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.77441 0.10998 7.041 1.91e-12 ***
## vert 0.14261 0.10998 1.297 0.195
## confl 0.07258 0.10998 0.660 0.509
## vert:confl 0.08055 0.10998 0.732 0.464
##
## Sigma coefficients (Dispersion submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.20126 0.12122 -1.660 0.0969 .
## vert 0.30779 0.12122 2.539 0.0111 *
## confl -0.03512 0.12122 -0.290 0.7720
## vert:confl -0.10766 0.12122 -0.888 0.3745
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Converge: successful completion
## Log-Likelihood: 30.8728
##
## Gradient: 4e-04 -5e-04 -1e-04 0 -1e-04 -1e-04 -1e-04 -1e-04
As shown in the results, the number of options had significant effect on the dispersion of the distribution. While whether the evidence is conflict or not does not influence the dispersion. There was no significant interaction between the two predictors.
The following figures shows how well the model fit the actual response distributions.
# Compare the empirical distribution and the fitted values distribution
<- seq(0,1,length.out =11)
breaks
plot(fit4,xlim = c(0.1,1),ylim = c(0,3), breaks = breaks)
We can check the fitted values as well as model residuals. Currently three types of residuals are available: raw residuals, Pearson’s residuals and deviance residuals.
par(mfrow=c(2,2),mar = c(2,3,2,2))
# Plot the fitted values
plot(fitted(fit4, "full"), main = "Fitted Values")
# Check Residuals
plot(residuals(fit4, "raw"), main = "Raw Residuals")
plot(residuals(fit4, "pearson"), main = "Pearson Residuals")
plot(residuals(fit4, "deviance"), main = "Deviance Residuals")
A second example is a data from a study that investigates the relationship between stress and anxiety. The data includes 166 nonclinical women in Townsville, Queensland, Australia. The variables are linearly transformed scales from the Depression Anxiety Stress Scales, which normally range from 0 to 42.
The scores of the two continuous variables Anxiety
and Stress
were firstly linearly transformed to the (0, 1) interval. The figure below shows kernel density estimates for the two variables. As one would expect in a nonclinical population, there is an active floor for each variable, with this being most pronounced for anxiety. It should be clear that anxiety is strongly skewed.
head(AnxStrData, 8)
## Anxiety Stress
## 1 0.01 0.01
## 2 0.17 0.29
## 3 0.01 0.17
## 4 0.05 0.41
## 5 0.09 0.21
## 6 0.41 0.45
## 7 0.05 0.21
## 8 0.01 0.01
plot(density(AnxStrData$Anxiety), main = "Anxiety and Stress")
lines(density(AnxStrData$Stress), lty = 2)
Heteroscedasticity is to be expected, as theoretically it is likely that someone experiencing anxiety will also experience a great deal of stress, but the converse is not true, because many people might experience stress without being anxious. That is, stress could be thought of as a necessary but not sufficient condition for anxiety. If so, the relationship between the variables is not one-to-one plus noise, as predicted by ordinary homoscedastic regression, but instead is many-to-one plus noise, which involves heteroscedasticity.
We can fit the cdf model the same way as we did for Example data set 1.
# Fit the null model
<- cdfquantreg(Anxiety ~ 1 | 1, fd, sd, data = AnxStrData)
fit_null
# Fit the location model
<- cdfquantreg(Anxiety ~ Stress | 1, fd, sd, data = AnxStrData)
fit1
# Fit the full model
<- cdfquantreg(Anxiety ~ Stress | Stress, fd, sd, data = AnxStrData)
fit2
anova(fit_null,fit1, fit2)
## Likelihood ratio tests
##
## Resid. Df -2Loglik Df LR stat Pr(>Chi)
## 1 164 -496.90
## 2 163 -615.08 1 118.177 <2e-16 ***
## 3 162 -617.03 1 1.949 0.1627
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit2)
## Family: t2 t2
## Call: cdfquantreg(formula = Anxiety ~ Stress | Stress, data = AnxStrData,
## fd = fd, sd = sd)
##
## Mu coefficients (Location submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.3397 0.2145 -34.22 <2e-16 ***
## Stress 10.7807 0.7845 13.74 <2e-16 ***
##
## Sigma coefficients (Dispersion submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2673 0.1742 1.535 0.125
## Stress 0.7715 0.5517 1.398 0.162
##
## Converge: successful completion
## Log-Likelihood: 308.5143
##
## Gradient: 0 0 1e-04 0
It is clear that Stress
influenced both the mean and the dispersion of scores on anxiety
.
# Compare the empirical distribution and the fitted values distribution
plot(fit2)
# Plot the fitted values
plot(fitted(fit2, "full"))
# Check Residuals
plot(residuals(fit2, "raw"))