The Test Development Process

Learning Objectives

Summarize and apply the standards for test development
Describe the components in test specification
Explain the restriction of range problem and its implications for selecting participants for test tryout.

Steps of Test Development

flowchart TD
  A(Test Specification) --> B(Item development and review)
  B --> A
  B --> C(Administration materials)

Test specification
- Statement of purpose
- Content specifications
- Determine whether a measure already exists
- Format specifications
- Developt a test blueprint

Item development and review
- Create the initial item pool
- Conduct the initial item review (and revisions)
- Field test of items
- Analyze, revise, and re-test items

Develop Procedures and Materials for Administration

Test Specification

Description of
- Content and format
- Purpose and intended uses
- Decisions about content, format, test length, psychometric characteristics
- Delivery mode
- Administration
- Scoring and score reporting

State the Purpose of the Test

For the general population? Diagnosis?
Norm-referenced vs. criterion-referenced interpretations?

Content Specification

Delineation of the construct or domain to be measured

What goes in the definitions:
- What does the construct encompass and what does not
- Meaning of low and high levels of the construct
- Degree of breadth (e.g., self-concept)
- Generality (e.g., population characteristic, culture)

Determine Whether a Measure Already Exists

Mental Measurements Yearbook
Test in Print
PsycTESTS
Measures of Personality and Social Psychological Attitudes
Health and Psychological Instruments database

Format Specifications

Cognitive: multiple-choice, true-false, matching, short-answer, performance tasks, etc
Noncognitive: Thurstone, Guttman, Likert, etc

Considerations:

Accessibility

Test Blueprint

aka table of specifications; more common for cognitive tests

E.g., test blueprint for WISC-IV

Delineates
- the content areas to be tested, and the number of items for each
- which cognitive levels the test items target
- scoring specifications (e.g., rubrics, algorithms)

Objective for Noncognitive Measurement

E.g., subdimensions \(\times\) levels of affect

Krathwohl’s taxonomy:

receiving, responding, commitment, organization, characterization

Initial Item Pool

Three to four times the final number (DeVellis, 2003)
Sources of generating items
- Developers’ knowledge
- Literature and existing instruments
- Focus group
- Experts

Item Review

Feedback on
- item clarity
- match to the test specifications
- writing (grammar and readability)
- possibility of offensiveness or unfairness
- readability level

Item Tryout

A small sample is sufficient for initial tryout
Cognitive labs: structured interviews with selected test takers to identify irrelevant barriers
Items are revised but not deleted at this stage

Field Test

Need a large sample for stable statistics
Need a representative sample to avoid restriction of range problem
- Correlation is attenuated in subset

# Population
set.seed(1911)
npop <- 1000
x <- rnorm(npop, mean = 15, sd = 5)
y <- 0.5 * (x - 15) + 
    rnorm(npop, mean = 15, sd = 5 * sqrt(0.75))
# correlation with full sample
cor(x, y)

[1] 0.4831516

# correlation for subset with x > 20
cor(x[x > 20], y[x > 20])

[1] 0.2900582

Item Selection

Based on both theoretical and empirical considerations

If driven mostly by data, cross-validation studies are needed
- Results from one study are usually unstable

Developing Test Administration and Scoring Procedures and Materials

Need to be clear and detailed (Standard 4.15, 4.16, 4.18)
Include statements if test is intended for research use only (Standard 4.17)
Specify qualification and training of scorers, if appropriate (Standard 4.20)