Learning Objectives
- Contrast historical and current view of validity
- Explain the argument-based approach to validity
- Describe types of evidence that should be obtained, and the common approaches to obtain them
- Articulate how one can obtain convergent and discriminant evidence in an MTMM matrix
Historical View
First edition of the Standards
Current View
Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests.
(Standards, 2014, p. 11)
- Unified view
- “Validity is a unitary concept. It is the degree to which all the accumulated evidence supports the intended interpretation of test scores for the proposed use” (Standards, 2014, p. 14)
Focus on Interpretation and Usage
There is disagreement as to whether validity should be about (a) the test itself or (b) the interpretation of the scores
Bandalos (2018) viewed validity as the relationship between test, test scores
Focus on Explanation and Cognitive Models
Argument-Based Approach to Validity
State the proposed interpretation and use explicitly and in some detail, and then to evaluate the plausibility of these proposals
(Kane, p. 1)
Types of Evidence
- Test content
- Response processes
- Internal structure
- Relations to other variables
- Consequences of testing
Content
- Construct underrepresentation
- Construct-irrelevant variance
Response Processes
- Process models
- Think-alouds, analysis of errors, expert-novice studies, concept map
- Experimental studies & eye tracking
Internal Structure
- Correlation Matrix (Table 11.2)
Internal Structure (cont’d)
- Factor analysis
- Finding the hypothesized number of factors
- Item response theory (IRT)
- Dichotomous and polytomous items
- Differential item functioning:
- Generalizability theory
Relations to Other Variables
- Test-criterion
- Issues: appropriate criterion, restriction of range
- E.g., height is not related to success in NBA
- Issues: attenuation due to unreliability
- \(\rho_{XY} \leq \sqrt{\rho_{X X'} \rho_{Y Y'}}\)
Classification Accuracy
See Figure 11.1
From a 2 x 2 Classification table,
- Sensitivity: TP / (TP + FN)
- Specificity: TN / (TN + FP)
ROC curve: sensitivity against 1 - specificity for different cutoffs
Group Differences
- Between person (e.g., known-group studies)
- Within person
Convergent and discriminant
- Embed construct in a nomological network
Multitrait-Multimethod
Example: (Extraversion, Agreeableness, and Conscientiousness) \(\times\) (Self-Reports, Reports by a female close friend [RF], and IAT)
- Monotrait-heteromethod: Same trait, different methods
- Convergent validity; should be high
- E.g., correlation between ESR and ERF
- Heterotrait-monomethod: Different traits, same method
- Should be smaller than (a); otherwise substantial method variance exists
- Heterotrait-heteromethod: Different traits, different methods
Consequences of Testing
Positive vs. negative consequences
- E.g., selecting the most qualified vs. adverse impact for certain groups
Intended vs. unintended consequences
- E.g., establishing a standard for knowledge vs. teaching only to the test
Unintended Testing Consequences
it is important to distinguish between evidence that is directly relevant to validity and evidence that may inform decisions about social policy but falls outside the realm of validity.
(Standards, p. 20)
- E.g., ability to carry heavy weights as test → selecting more male fighters
- Q1: If males generally score higher, does it make the test not valid?
- Q2: Is this “test” valid for this selection purpose?