Person | R1 | R2 | R3 |
---|---|---|---|
1 | 1 | 1 | 1 |
2 | 2 | 1 | 3 |
3 | 2 | 2 | 3 |
4 | 3 | 3 | 4 |
5 | 3 | 2 | 4 |
6 | 3 | 3 | 4 |
7 | 4 | 4 | 4 |
8 | 4 | 3 | 5 |
9 | 5 | 5 | 5 |
10 | 5 | 5 | 5 |
PSYC 520
Person | R1 | R2 | R3 |
---|---|---|---|
1 | 1 | 1 | 1 |
2 | 2 | 1 | 3 |
3 | 2 | 2 | 3 |
4 | 3 | 3 | 4 |
5 | 3 | 2 | 4 |
6 | 3 | 3 | 4 |
7 | 4 | 4 | 4 |
8 | 4 | 3 | 5 |
9 | 5 | 5 | 5 |
10 | 5 | 5 | 5 |
Concepts in interrater reliability has also been used for repeated measures
Important assumption: construct does not change over the time points
One can also consider R1, R2, R3 as items
1 | 2 | 3 | 4 | 5 | Sum | |
---|---|---|---|---|---|---|
1 | 1 | 0 | 0 | 0 | 0 | 1 |
2 | 1 | 1 | 0 | 0 | 0 | 2 |
3 | 0 | 1 | 2 | 0 | 0 | 3 |
4 | 0 | 0 | 1 | 1 | 0 | 2 |
5 | 0 | 0 | 0 | 0 | 2 | 2 |
Sum | 2 | 2 | 3 | 1 | 2 | 10 |
Nominal agreement: 1 + 1 + 2 + 1 + 2 = 7 (sum of diagonal)
Chance agreement: \(\frac{1}{N} \sum n_{i+} n_{+i}\)
Example
\[ \kappa = \frac{p_o - p_e}{1 - p_e} \]
Thinking
R4 | R5 |
---|---|
1 | 3 |
2 | 4 |
2 | 4 |
3 | 5 |
3 | 5 |
Key references
See Table 9.5 of textbook
\[ \mathop{\mathrm{\mathrm{Var}}}(Y) = \underbrace{\sigma^2_p}_{\text{Person}} + \underbrace{\sigma^2_r}_{\text{Rater}} + \underbrace{\sigma^2_{pr}}_{\text{Person $\times$ Rater}} + \underbrace{\sigma^2_e}_{\text{Error}} \]
To estimate \(\sigma^2_r\), one needs multiple observations per rater
To estimate \(\sigma^2_{pr}\), one needs multiple observations per combination of person and rater
Examples
ICC: \(\dfrac{\sigma^2_p}{\sigma^2_p + \text{Error} / k}\)
psych::ICC
)Agreement | Design | ICC (ST) | ICC (MW) |
---|---|---|---|
Consistency | Nested | ICC(1, 1) | ICC(1) |
Agreement | Nested | ICC(1, 1) | ICC(1) |
Consistency | Crossed | ICC(3, 1) | ICC(C, 1) |
Agreement | Crossed | ICC(2, 1) | ICC(A, 1) |
Agreement | Design | ICC (ST) | ICC (MW) |
---|---|---|---|
Consistency | Nested | ICC(1, k) | ICC(k) |
Agreement | Nested | ICC(1, k) | ICC(k) |
Consistency | Crossed | ICC(3, k) | ICC(C, k) |
Agreement | Crossed | ICC(2, k) | ICC(A, k) |
Wide to long formats
Person | Rater | Score |
---|---|---|
1 | R1 | 1 |
1 | R2 | 1 |
1 | R3 | 1 |
2 | R1 | 2 |
2 | R2 | 1 |
2 | R3 | 3 |
3 | R1 | 2 |
3 | R2 | 2 |
3 | R3 | 3 |
4 | R1 | 3 |
Full mixed-effect model
\[ Y_{ij} = \mu + P_i + R_j + (P \times R)_{ij} + e_{ij} \]
translates to the R formula
library(lme4)
# If nested design:
# lmer(Score ~ (1 | Person), data = dat_long)
# If crossed design:
m1 <- lmer(Score ~ (1 | Person) + (1 | Rater), data = dat_long)
summary(m1)
Linear mixed model fit by REML ['lmerMod']
Formula: Score ~ (1 | Person) + (1 | Rater)
Data: dat_long
REML criterion at convergence: 73.5
Scaled residuals:
Min 1Q Median 3Q Max
-1.5147 -0.5966 0.1211 0.4753 1.2630
Random effects:
Groups Name Variance Std.Dev.
Person (Intercept) 1.5704 1.2531
Rater (Intercept) 0.1889 0.4346
Residual 0.2111 0.4595
Number of obs: 30, groups: Person, 10; Rater, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) 3.3000 0.4765 6.926
From the R output, compute ICC for agreement when averaging across 3 raters
See R notes for another example