The Test Development Process

Learning Objectives

  • Summarize and apply the standards for test development
  • Describe the components in test specification
  • Explain the restriction of range problem and its implications for selecting participants for test tryout.

Steps of Test Development

flowchart TD
  A(Test Specification) --> B(Item development and review)
  B --> A
  B --> C(Administration materials)

  • Test specification
    • Statement of purpose
    • Content specifications
    • Determine whether a measure already exists
    • Format specifications
    • Developt a test blueprint
  • Item development and review
    • Create the initial item pool
    • Conduct the initial item review (and revisions)
    • Field test of items
    • Analyze, revise, and re-test items
  • Develop Procedures and Materials for Administration

Test Specification

  • Description of
    • Content and format
    • Purpose and intended uses
    • Decisions about content, format, test length, psychometric characteristics
    • Delivery mode
    • Administration
    • Scoring and score reporting

State the Purpose of the Test

  • For the general population? Diagnosis?

  • Norm-referenced vs. criterion-referenced interpretations?

Content Specification

  • Delineation of the construct or domain to be measured
  • What goes in the definitions:
    • What does the construct encompass and what does not
    • Meaning of low and high levels of the construct
    • Degree of breadth (e.g., self-concept)
    • Generality (e.g., population characteristic, culture)

Determine Whether a Measure Already Exists

  • Mental Measurements Yearbook
  • Test in Print
  • PsycTESTS
  • Measures of Personality and Social Psychological Attitudes
  • Health and Psychological Instruments database

Format Specifications

  • Cognitive: multiple-choice, true-false, matching, short-answer, performance tasks, etc
  • Noncognitive: Thurstone, Guttman, Likert, etc

Considerations:

  • Accessibility

Test Blueprint

aka table of specifications; more common for cognitive tests

E.g., test blueprint for WISC-IV

  • Delineates
    • the content areas to be tested, and the number of items for each
    • which cognitive levels the test items target
    • scoring specifications (e.g., rubrics, algorithms)

Objective for Noncognitive Measurement

E.g., subdimensions \(\times\) levels of affect

Krathwohl’s taxonomy:

  • receiving, responding, commitment, organization, characterization

Initial Item Pool

  • Three to four times the final number (DeVellis, 2003)
  • Sources of generating items
    • Developers’ knowledge
    • Literature and existing instruments
    • Focus group
    • Experts

Item Review

  • Feedback on
    • item clarity
    • match to the test specifications
    • writing (grammar and readability)
    • possibility of offensiveness or unfairness
    • readability level

Item Tryout

  • A small sample is sufficient for initial tryout
  • Cognitive labs: structured interviews with selected test takers to identify irrelevant barriers
  • Items are revised but not deleted at this stage

Field Test

  • Need a large sample for stable statistics
  • Need a representative sample to avoid restriction of range problem
    • Correlation is attenuated in subset
# Population
set.seed(1911)
npop <- 1000
x <- rnorm(npop, mean = 15, sd = 5)
y <- 0.5 * (x - 15) + 
    rnorm(npop, mean = 15, sd = 5 * sqrt(0.75))
# correlation with full sample
cor(x, y)
[1] 0.4831516
# correlation for subset with x > 20
cor(x[x > 20], y[x > 20])
[1] 0.2900582

Item Selection

Based on both theoretical and empirical considerations

  • If driven mostly by data, cross-validation studies are needed
    • Results from one study are usually unstable

Developing Test Administration and Scoring Procedures and Materials

  • Need to be clear and detailed (Standard 4.15, 4.16, 4.18)
  • Include statements if test is intended for research use only (Standard 4.17)
  • Specify qualification and training of scorers, if appropriate (Standard 4.20)