Validity

Learning Objectives

  • Contrast historical and current view of validity
  • Explain the argument-based approach to validity
  • Describe types of evidence that should be obtained, and the common approaches to obtain them
  • Articulate how one can obtain convergent and discriminant evidence in an MTMM matrix

Historical View

First edition of the Standards

  • The degree to which a test measures “what it is supposed to measure.”

  • Types of validity: “3 C’s”

    • Content
    • Criterion-related: concurrent & predictive
    • Construct

Current View

Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests.

(Standards, 2014, p. 11)

  • Unified view
  • “Validity is a unitary concept. It is the degree to which all the accumulated evidence supports the intended interpretation of test scores for the proposed use” (Standards, 2014, p. 14)

Focus on Interpretation and Usage

There is disagreement as to whether validity should be about (a) the test itself or (b) the interpretation of the scores

Bandalos (2018) viewed validity as the relationship between test, test scores

Focus on Explanation and Cognitive Models

Argument-Based Approach to Validity

State the proposed interpretation and use explicitly and in some detail, and then to evaluate the plausibility of these proposals

(Kane, p. 1)

Types of Evidence

  • Test content
  • Response processes
  • Internal structure
  • Relations to other variables
  • Consequences of testing

Table 11.1

Content

  • Construct underrepresentation
  • Construct-irrelevant variance

Response Processes

  • Process models
    • Think-alouds, analysis of errors, expert-novice studies, concept map
    • Experimental studies & eye tracking

Internal Structure

  • Correlation Matrix (Table 11.2)

Internal Structure (cont’d)

  • Factor analysis
    • Finding the hypothesized number of factors
  • Item response theory (IRT)
    • Dichotomous and polytomous items
    • Differential item functioning:
  • Generalizability theory

Relations to Other Variables

  • Test-criterion
    • Issues: appropriate criterion, restriction of range
      • E.g., height is not related to success in NBA
    • Issues: attenuation due to unreliability
      • \(\rho_{XY} \leq \sqrt{\rho_{X X'} \rho_{Y Y'}}\)

Classification Accuracy

See Figure 11.1

From a 2 x 2 Classification table,

  • Sensitivity: TP / (TP + FN)
  • Specificity: TN / (TN + FP)

ROC curve: sensitivity against 1 - specificity for different cutoffs

Group Differences

  • Between person (e.g., known-group studies)
  • Within person

Convergent and discriminant

  • Embed construct in a nomological network

Multitrait-Multimethod

Example: (Extraversion, Agreeableness, and Conscientiousness) \(\times\) (Self-Reports, Reports by a female close friend [RF], and IAT)

  1. Monotrait-heteromethod: Same trait, different methods
    • Convergent validity; should be high
    • E.g., correlation between ESR and ERF
  2. Heterotrait-monomethod: Different traits, same method
    • Should be smaller than (a); otherwise substantial method variance exists
  3. Heterotrait-heteromethod: Different traits, different methods
    • Should be the lowest

Consequences of Testing

Positive vs. negative consequences

  • E.g., selecting the most qualified vs. adverse impact for certain groups

Intended vs. unintended consequences

  • E.g., establishing a standard for knowledge vs. teaching only to the test

Unintended Testing Consequences

it is important to distinguish between evidence that is directly relevant to validity and evidence that may inform decisions about social policy but falls outside the realm of validity.

(Standards, p. 20)

  • E.g., ability to carry heavy weights as test → selecting more male fighters
    • Q1: If males generally score higher, does it make the test not valid?
    • Q2: Is this “test” valid for this selection purpose?