Validity

Learning Objectives

Contrast historical and current view of validity
Explain the argument-based approach to validity
Describe types of evidence that should be obtained, and the common approaches to obtain them
Articulate how one can obtain convergent and discriminant evidence in an MTMM matrix

Historical View

First edition of the Standards

The degree to which a test measures “what it is supposed to measure.”
Types of validity: “3 C’s”
- Content
- Criterion-related: concurrent & predictive
- Construct

Current View

Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests.

(Standards, 2014, p. 11)

Unified view
“Validity is a unitary concept. It is the degree to which all the accumulated evidence supports the intended interpretation of test scores for the proposed use” (Standards, 2014, p. 14)

Focus on Interpretation and Usage

There is disagreement as to whether validity should be about (a) the test itself or (b) the interpretation of the scores

Bandalos (2018) viewed validity as the relationship between test, test scores

Focus on Explanation and Cognitive Models

Argument-Based Approach to Validity

State the proposed interpretation and use explicitly and in some detail, and then to evaluate the plausibility of these proposals

(Kane, p. 1)

Types of Evidence

Test content
Response processes
Internal structure
Relations to other variables
Consequences of testing

Table 11.1

Content

Construct underrepresentation
Construct-irrelevant variance

Response Processes

Process models
- Think-alouds, analysis of errors, expert-novice studies, concept map
- Experimental studies & eye tracking

Internal Structure

Correlation Matrix (Table 11.2)

Internal Structure (cont’d)

Factor analysis
- Finding the hypothesized number of factors
Item response theory (IRT)
- Dichotomous and polytomous items
- Differential item functioning:
Generalizability theory

Relations to Other Variables

Test-criterion
- Issues: appropriate criterion, restriction of range
  - E.g., height is not related to success in NBA
- Issues: attenuation due to unreliability
  - \(\rho_{XY} \leq \sqrt{\rho_{X X'} \rho_{Y Y'}}\)

Classification Accuracy

See Figure 11.1

From a 2 x 2 Classification table,

Sensitivity: TP / (TP + FN)
Specificity: TN / (TN + FP)

ROC curve: sensitivity against 1 - specificity for different cutoffs

Group Differences

Between person (e.g., known-group studies)
Within person

Convergent and discriminant

Embed construct in a nomological network

Multitrait-Multimethod

Example: (Extraversion, Agreeableness, and Conscientiousness) \(\times\) (Self-Reports, Reports by a female close friend [RF], and IAT)

Monotrait-heteromethod: Same trait, different methods
- Convergent validity; should be high
- E.g., correlation between ESR and ERF
Heterotrait-monomethod: Different traits, same method
- Should be smaller than (a); otherwise substantial method variance exists
Heterotrait-heteromethod: Different traits, different methods
- Should be the lowest

Consequences of Testing

Positive vs. negative consequences

E.g., selecting the most qualified vs. adverse impact for certain groups

Intended vs. unintended consequences

E.g., establishing a standard for knowledge vs. teaching only to the test

Unintended Testing Consequences

it is important to distinguish between evidence that is directly relevant to validity and evidence that may inform decisions about social policy but falls outside the realm of validity.

(Standards, p. 20)

E.g., ability to carry heavy weights as test → selecting more male fighters
- Q1: If males generally score higher, does it make the test not valid?
- Q2: Is this “test” valid for this selection purpose?