# Define a function for formatting numbers
<- function(x, d = 2) format(x, digits = d, big.mark = ",") comma
Interrater Reliability (Confidence Intervals)
library(lme4)
Warning in check_dep_version(): ABI version mismatch:
lme4 was built with Matrix ABI version 2
Current Matrix ABI version is 1
Please re-install lme4 from source or restore original 'Matrix' package
library(tidyverse)
<- read.csv("https://osf.io/download/p9gqk/")
slp_dat <- slp_dat |>
slp_vas_wide select(slpID, Speaker, slp_VAS) |>
pivot_wider(names_from = slpID, values_from = slp_VAS)
head(slp_vas_wide)
Derivation
Each score consists of grand mean + ratee effect + error. For \(i = 1, \ldots, k\) and \(j = 1, \ldots, N\),
\[ Y_{ij} = \beta_j + e_{ij} \]
with \(\beta_j \sim N(\mu, \sigma_\text{ratee}^2)\) and \(e_{ij} \sim N(0, \sigma_E^2)\)
With \(N\) ratees, the mean rating for ratee \(j\) is
\[ \bar Y_{.j} = \frac{\sum_i Y_{ij}}{k} = \beta_j + \frac{\sum_i e_{ij}}{k} \]
The grand mean is
\[ \bar Y_{..} = \frac{\sum_j \sum_i Y_{ij}}{kN} = \frac{\sum_j \beta_j}{N} + \frac{\sum_j \sum_i e_{ij}}{kN} \]
The sum of squares for the ratee effect is
\[ \begin{aligned} \mathit{SS}_\text{ratee}& = \sum_j k (\bar Y_{.j} - \bar Y_{..})^2 \\ & = \sum_j k \left(\beta_j + \frac{\sum_i e_{ij}}{k} - \frac{\sum_j \beta_j}{N} - \frac{\sum_j \sum_i e_{ij}}{kN}\right)^2 \\ & = \sum_j k [(\beta_j - \bar \beta) + (\bar e_{.j} - \bar e_{..})]^2 \end{aligned} \]
Taking the expected value, we have
\[ \begin{aligned} E[\sum_j (\beta_j - \bar \beta)^2] & = E[\sum_j (\beta_j - \mu + \mu - \bar \beta)^2] \\ & = E[\sum_j (\beta_j - \mu)^2] - 2 E[(\mu - \bar \beta) \sum_j (\beta_j - \mu)] + E[\sum_j (\mu - \bar \beta)^2] \\ & = N \sigma_\text{ratee}^2 - 2 E[N (\mu - \bar \beta)^2] + E[N (\bar \beta - \mu)^2] \\ & = N \sigma_\text{ratee}^2 - N \sigma_\text{ratee}^2 / N \\ & = (N - 1) \sigma_\text{ratee}^2 \end{aligned} \]
Similarly,
\[ E[\sum_j (\bar e_{.j} - \bar e_{..})^2] = (N - 1) \sigma_E^2 / k \]
and given \(\beta_j\) and \(e_{ij}\) are independent, we have
\[ E(\mathit{SS}_\text{ratee}) = k (N - 1) \sigma_\text{ratee}^2 + (N - 1) \sigma_E^2. \]
The mean square is the sum of squares divided by the degrees of freedom, which is \(N - 1\) for the rater effect. Therefore, the expected mean square is
\[ E(\mathit{MS}_\text{ratee}) = k \sigma_\text{ratee}^2 + \sigma_E^2. \]
For sum of squares error, we have
\[ \begin{aligned} \mathit{SS}_E& = \sum_j \sum_i k (Y_{ij} - \bar Y_{.j})^2 \\ & = \sum_j \sum_i \left(\beta_j + \frac{\sum_i e_{ij}}{k} - \beta_j - e_{ij}\right)^2 \\ & = \sum_j \sum_i (e_{ij} - \bar e_{.})^2 \end{aligned} \]
And the expected sum of squares (and mean squares) error can be shown as
\[ \begin{aligned} E(\mathit{SS}_E) & = N (k - 1) \sigma_E^2 \\ E(\mathit{MS}_E) & = \sigma_E^2 \end{aligned} \]
Therefore, using method of moments, the estimators for \(\sigma_E^2\) and \(\sigma_\text{ratee}^2\) are
\[ \begin{aligned} \tilde \sigma_E^2 & = \mathit{MS}_E\\ \tilde \sigma_\text{ratee}^2 & = (\mathit{MS}_\text{ratee}- \mathit{MS}_E) / k \\ \end{aligned} \]
Using the above, you can obtain the ICC formulas for “one-way random” of Table 9.5 of your text.
Confidence intervals
From standard ANOVA results Cochran’s theorem, with a balanced design, one gets
\[ \begin{aligned} \mathit{SS}_E& \sim \sigma_E^2 \chi^2_{N (k - 1)} \\ \mathit{SS}_\text{ratee}& \sim (k\sigma_\text{ratee}^2 + \sigma_E^2) \chi^2_{N - 1} \end{aligned} \]
and the two sum of squares are independently distributed, and
\[ \frac{\sigma_E^2}{k\sigma_\text{ratee}^2 + \sigma_E^2}\frac{\mathit{MS}_\text{ratee}}{\mathit{MS}_E} \sim F_{N - 1, N (k - 1)} \]
Note that the distribution of \(\frac{\sigma_E^2}{k\sigma_\text{ratee}^2 + \sigma_E^2}\frac{\mathit{MS}_\text{ratee}}{\mathit{MS}_E}\) does not depend on the parameters, \(\sigma_\text{ratee}^2\) and \(\sigma_E^2\). Therefore, we can invert a two-tailed \(F\) test to consider values that are “non-significant” to obtain a confidence interval. Let \(F_{\frac{\alpha}{2}, N - 1, N(k - 1)}\) and \(F_{1 - \frac{\alpha}{2}, N - 1, N(k - 1)}\) be the \(\alpha / 2\) and \(1 - \alpha / 2\) quantiles of a central \(F\) distribution with df1 = \(N - 1\) and df2 = \(N (k - 1)\), and \(F_0 = \mathit{MS}_\text{ratee}/ \mathit{MS}_E\), then because ICC(1, \(k\)) for average rating is
\[ \begin{aligned} \frac{\sigma_\text{ratee}^2}{\sigma_\text{ratee}^2 + \sigma_E^2 / k} & = \frac{k \sigma_\text{ratee}^2}{k \sigma_\text{ratee}^2 + \sigma_E^2} \\ & = 1 - \frac{\sigma_E^2}{k \sigma_\text{ratee}^2 + \sigma_E^2}, \end{aligned} \]
We have
\[ \frac{\sigma_E^2}{k\sigma_\text{ratee}^2 + \sigma_E^2}\frac{\mathit{MS}_\text{ratee}}{\mathit{MS}_E} = [1 - \rho(1, k)]F_0 \sim F_{N - 1, N (k - 1)} \]
Define \(F = [1 - \rho(1, k)]F_0\), which follows an \(F\) distribution. Because \(\rho(F) = 1 - F / F_0\) is a monotonic decreasing function with known \(F_0\), we can obtain 95% CI by transforming \(F_{\frac{\alpha}{2}, N - 1, N(k - 1)}\) and \(F_{1 - \frac{\alpha}{2}, N - 1, N(k - 1)}\), so an analytic 95% CI for ICC(1, \(k\)) is
\[ \left\{1 - \frac{F_{1 - \frac{\alpha}{2}, N - 1, N(k - 1)}}{F_0} \leq \rho(1, k) \leq 1 - \frac{F_{\frac{\alpha}{2}, N - 1, N(k - 1)}}{F_0}\right\} \]
Similarly, for ICC(1, 1) = \(\sigma_\text{ratee}^2 / (\sigma_\text{ratee}^2 + \sigma_E^2)\), as \(F_0 / F - 1\) = \(k \sigma_\text{ratee}^2 / \sigma_E^2\) = \(k \rho(1, 1) / [1 - \rho(1, 1)]\), we have
\[ \rho(F) = \frac{F_0 / F - 1}{F_0 / F - 1 + k}, \]
which is monotonic decreasing in \(F\). Therefore, we can again obtain a 95% CI by transforming \(F_{\frac{\alpha}{2}, N - 1, N(k - 1)}\) and \(F_{1 - \frac{\alpha}{2}, N - 1, N(k - 1)}\)
\[ \left\{\frac{\frac{F_0}{F_{1 - \frac{\alpha}{2}, N - 1, N(k - 1)}} - 1}{\frac{F_0}{F_{1 - \frac{\alpha}{2}, N - 1, N(k - 1)}} - 1 + k} \leq \rho(1, 1) \leq \frac{\frac{F_0}{F_{\frac{\alpha}{2}, N - 1, N(k - 1)}} - 1}{\frac{F_0}{F_{\frac{\alpha}{2}, N - 1, N(k - 1)}} - 1 + k}\right\} \]
For 95% CIs with two-way crossed designs, see the following papers:
Fleiss, J.L., Shrout, P.E. Approximate interval estimation for a certain intraclass correlation coefficient. Psychometrika 43, 259–262 (1978). https://doi.org/10.1007/BF02293867
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30–46. https://doi.org/10.1037/1082-989X.1.1.30
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428. https://doi.org/10.1037/0033-2909.86.2.420
See also this paper for some updated guidelines and recent development on ICC:
ten Hove, D., Jorgensen, T. D., & van der Ark, L. A. (2022). Updated guidelines on selecting an intraclass correlation coefficient for interrater reliability, with applications to incomplete observational designs. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000516
Demonstrating Analytic CIs in R
Let’s look at the results by treating the raters as different for each ratee (which is incorrect, but just to show the calculation for CIs):
<- lmer(slp_VAS ~ 1 + (1 | Speaker), data = slp_dat)
m1_full <- as.data.frame(VarCorr(m1_full))
vc_m1f # Obtain mean squares from the variance components
<- 21
k <- k * vc_m1f$vcov[1] + vc_m1f$vcov[2]
msr <- vc_m1f$vcov[2]
mse <- msr / mse
f0 # 95% CI (df1 = 19, df2 = 20 * 20 = 400)
# For ICC(1, k)
c("LL" = 1 - 1 / f0 * qf(.975, 19, 400),
"UL" = 1 - 1 / f0 * qf(.025, 19, 400))
LL UL
0.9410646 0.9844892
# For ICC(1, 1)
c("LL" = (f0 / qf(.975, 19, 400) - 1) / (f0 / qf(.975, 19, 400) - 1 + k),
"UL" = (f0 / qf(.025, 19, 400) - 1) / (f0 / qf(.025, 19, 400) - 1 + k))
LL UL
0.4319368 0.7513950
Bootstrap Confidence Intervals
Because the models we fitted are random-effect or mixed-effect models (i.e., multilevel models), we’ll need to do multilevel bootstrap. We could use the bootMer()
available in lme4
for parametric bootstrap; the bootmlm
package provides alternative bootstrap methods that are useful with assumption violations.
From my observation, the bootstrap CIs seem wider than the analytic CIs for ICCs, at least for this example. This observation is not unique for bootstrap CIs.
One-Way Random
# One-way random-effect
<- function(x) {
icc_11 # Function for ICC(1, 1)
<- as.data.frame(VarCorr(x))
vc_x $vcov[1] / sum(vc_x$vcov)
vc_x
}<- bootMer(m1_full, icc_11, nsim = 1999,
boo # change to .progress = "txt" for progress bar
.progress = "none")
::boot.ci(boo, index = 1, type = c("norm", "basic", "perc")) boot
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1999 bootstrap replicates
CALL :
boot::boot.ci(boot.out = boo, type = c("norm", "basic", "perc"),
index = 1)
Intervals :
Level Normal Basic Percentile
95% ( 0.4299, 0.7564 ) ( 0.4453, 0.7737 ) ( 0.3849, 0.7134 )
Calculations and Intervals on Original Scale
Two-Way Random
<- lmer(slp_VAS ~ 1 + (1 | Speaker) + (1 | slpID), data = slp_dat)
m2 # Two-way random-effect
<- function(x) {
icc_2k # Function for ICC(2, k), which is for consistency
<- as.data.frame(VarCorr(x))
vc_x $vcov[2] / (vc_x$vcov[2] + (vc_x$vcov[1] + vc_x$vcov[3]) / 21)
vc_x
}<- bootMer(m2, icc_2k, nsim = 1999,
boo2 # change to .progress = "txt" for progress bar
.progress = "none")
::boot.ci(boo2, index = 1, type = c("norm", "basic", "perc")) boot
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1999 bootstrap replicates
CALL :
boot::boot.ci(boot.out = boo2, type = c("norm", "basic", "perc"),
index = 1)
Intervals :
Level Normal Basic Percentile
95% ( 0.9414, 0.9988 ) ( 0.9513, 1.0073 ) ( 0.9260, 0.9820 )
Calculations and Intervals on Original Scale