Reliability and Classical Test Theory

Learning Objectives

  • Explain the importance of reliability in measurement
  • Explain what true score and error score are in classical test theory (CTT)
  • Define and derive reliability in CTT
  • Explain what parallel, tau-equivalent, and congeneric tests are

Reliability

A test is reliable means that we would obtain very similar scores if we were to repeat the test

aka dependability or consistency across some condition

  • Also precision

  • E.g., across time, items, forms, raters

Important

  • Reliability concerns observed scores

  • Reliability coefficient is not defined for a single score, but a set of (hypothetical) scores

  • On the other hand, precision can be defined for each score

What is Considered Error?

Variability in scores is not necessarily error

E.g., variations in measurement of a person’s weight vs. height across days

What about when an examinee answer a question incorrectly first, and then answer the same question correctly in a second try? Should we consider the difference in response in the two trials “error”?

Classical Test Theory

\[X = T + E\]

  • \(X\): Observed score
  • \(E\): Random error/inconsistencies
  • \(T\): “True” score
    • A hypothetical average score if we could repeatedly test a person, and “brainwash” them after each testing

Propensity Distribution

See Figure 7.1

Standard deviation of PD = standard error of measurement

Random vs. Systematic Error

CTT assumes random error: \(E_1\), \(E_2\), . . . are independent

  • Errors are also random across persons

In practice, error can be systematic

  • E.g., raters are too lenient; blood pressure meter not calibrated

True Score and Error Score

CTT defines \(T\) = \(E(X)\)

So \(T\) can contain systematic error

Important

By construction of CTT

  • The expected value of \(E\) is zero
  • Corr(\(E\), \(T\)) = 0

Reliability in CTT

  • \(\sigma^2_T\) = variance of \(T\) across persons
  • \(\sigma^2_X\) = variance of \(X\) across persons

\[\rho_{X X'} = \frac{\sigma^2_T}{\sigma^2_X}\]

Note

Reliability in CTT is sample-specific

Because \(T\) is not observed, \(\rho_{X X'}\) cannot be obtained

This is solved using the concept of parallel tests

Parallel Tests

If \(X_1\) and \(X_2\) are two parallel tests with true scores \(T_1\) and \(T_2\), they are parallel if and only if

\(T_1\) = \(T_2\); \(Var(E_1)\) = \(Var(E_2)\)

     t1   x11   x12  x123    t2   x21   x22   x23
1  8.77  9.17  9.21  7.94  8.77  8.96  8.22  9.14
2  9.42  9.99  8.45  9.82  9.42  9.54  8.53 10.20
3 10.98 11.52 10.74 10.70 10.98 12.37 11.02  9.56
4 10.22  9.51 12.00  9.15 10.22  8.85 10.93 10.88
5 11.47 11.81 10.51 12.07 11.47  9.43 14.06 10.90

Without loss of generality, assume \(X_1\) and \(X_2\) have been centered

\[E(X_1 X_2) = E(T_1 T_2) + E(T_1 E_2) + E(T_2 E_1) + E(E_1 E_2)\]

The last three terms are zero by construction

Because \(T_1\) = \(T_2\) = \(T\), and \(E(T)\) = 0,

\[E(X_1 X_2) = E(T^2) = \sigma^2_T\]

With parallel tests, \(Var(X_1)\) = \(Var(X_2)\) = \(\sigma^2_X\), so

\[\rho_{X X'} = \frac{\sigma^2_T}{\sigma^2_X} = \frac{E(X_1 X_2)}{\sqrt{Var(X_1) Var(X_2)}}\]

where the last term is the correlation between \(X_1\) and \(X_2\)

So,

Note

Reliability = Correlation between two parallel tests

Tau-Equivalent Tests

\(T_1\) = \(T_2\); \(Var(E_1)\) may be different from \(Var(E_2)\)

     t1   x11   x12  x123    t2   x21   x22   x23
1  7.96  7.38  7.58  8.91  7.96 15.05  1.89  6.93
2  8.16  7.37  8.18  8.94  8.16 11.89  4.36  8.23
3  6.80  7.51  5.59  7.30  6.80  8.61  2.18  9.60
4 14.86 14.20 14.57 15.81 14.86 14.02 22.74  7.83
5  9.72  9.19 10.23  9.73  9.72  1.53 13.86 13.77

Essentially Tau-Equivalent Tests

\(T_1\) = \(a\) + \(T_2\); \(Var(E_1)\) may be different from \(Var(E_2)\)

     t1   x11   x12  x123    t2   x21   x22   x23
1 10.36 10.85  9.84 10.39 12.36 11.48 14.77 10.84
2  8.64  8.37  8.48  9.07 10.64 14.46  4.00 13.46
3 10.51 11.07 11.16  9.29 12.51 13.06 11.76 12.70
4  6.69  6.20  7.69  6.16  8.69  9.53  6.11 10.42
5  7.32  7.61  7.38  6.97  9.32  8.47 10.77  8.72

Congeneric Tests

\(T_1\) = \(a\) + \(\color{red}{b}\) \(T_2\); \(Var(E_1)\) may be different from \(Var(E_2)\)

     t1   x11   x12  x123    t2   x21   x22   x23
1 10.48 10.14  9.63 11.68  9.34  8.92  8.25 10.85
2  9.62  8.71 11.03  9.12  8.73  6.97 12.45  6.79
3  7.13  7.01  7.38  7.02  6.99  2.55  7.57 10.86
4 12.92 12.24 13.61 12.90 11.04 10.77 11.10 11.25
5  9.98  8.80 11.52  9.61  8.98 11.36  9.36  6.23

Additional Note on Reliability in CTT

  • Theoretically speaking, \(0 \leq \rho_{X X'} \leq 1\)

  • \(\rho_{X X'}\) = squared correlation between \(T\) and \(X\) (think of \(R^2\))

  • Only one error variance is estimated, which is the average of error variance across persons