Exercise 11

Author

Instructor of PSYC 573

library(tidyverse)
library(brms)
options(brms.backend = "cmdstanr", mc.cores = 2)

Instruction

!! Remember to set eval: true in YAML.

Answer questions 1-4 below. Submit the rendered file (PDF/WORD) to Brightspace.

When estimating [an] . . . effect under a cluster design, it is not enough to fit a multilevel model with varying intercepts. The slopes . . . must also be allowed to vary . . . (Gelman & Brown, 2024, How Statistical Challenges and Misreadings of the Literature Combine to Produce Unreplicable Science: An Example From Psychology)

# Function to simulate data
simulate_data <- function(n_subjects = 18, n_days = 10, beta0 = 250, beta1 = 0,
                          sd_intercept = 25, sd_slope = 10, sd_residual = 25) {
    subject <- rep(1:n_subjects, each = n_days)
    days <- rep(0:(n_days - 1), times = n_subjects)

    # Random effects
    random_intercepts <- rnorm(n_subjects, mean = 0, sd = sd_intercept)
    random_slopes <- rnorm(n_subjects, mean = 0, sd = sd_slope)

    # Generate reaction times
    reaction <- beta0 + beta1 * days +
        random_intercepts[subject] +
        random_slopes[subject] * days +
        rnorm(n_subjects * n_days, mean = 0, sd = sd_residual)

    data.frame(Subject = factor(subject), Days = days, Reaction = reaction)
}

# Simulate one data set
set.seed(1308)
simulated_data <- simulate_data()
m_ri <- brm(Reaction ~ Days + (1 | Subject), data = simulated_data,
            iter = 1000, chains = 2,
            control = list(adapt_delta = .95))
m_rs <- brm(Reaction ~ Days + (Days | Subject), data = simulated_data,
            iter = 1000, chains = 2,
            control = list(adapt_delta = .95))

Q1: Among the two models above, which model is better based on LOOIC?

Q2: Examine the posterior distributions for the fixed effect of Days (i.e., average slope across participants). How are they different in the two models?

Q3: The following code implements a simulation examining the effect of ignoring the random slopes on Type I error rates. How much is that inflated? This will take a few minutes. [Note: I use lme4::lmer() for frequentist analyses to save time.]

library(lme4)
library(lmerTest)
# Function to perform analysis and check Type I error
check_type1_error <- function(form = Reaction ~ Days + (1 | Subject),
                              n_simulations = 1000, alpha = 0.05) {
    p_values <- numeric(n_simulations)

    for (i in 1:n_simulations) {
        data <- simulate_data()

        # Fit model ignoring random slope
        model <- lmer(form, data = data)

        # Extract p-value for fixed effect of Days
        p_values[i] <- summary(model)$coefficients[2, "Pr(>|t|)"]
    }

    # Calculate Type I error rate
    type1_error_rate <- mean(p_values < alpha)
    return(type1_error_rate)
}

# Run the simulation and check Type I error rate
set.seed(123)
type1_error_rate_ri <- check_type1_error()
print(paste("Type I error rate with random intercepts:", type1_error_rate_ri))

Q4: Repeat Q3, but change the model to one including the random slopes for Days.

# Run the simulation and check Type I error rate

--- title: "Exercise 11" author: "Instructor of PSYC 573" eval: false --- ```{r} #| message: false library(tidyverse) library(brms) options(brms.backend = "cmdstanr", mc.cores = 2) ``` ## Instruction !! Remember to set `eval: true` in YAML. Answer questions 1-4 below. Submit the **rendered** file (PDF/WORD) to Brightspace. > When estimating [an] . . . effect under a cluster design, it is not enough to fit a multilevel model with varying intercepts. The slopes . . . must also be allowed to vary . . . (Gelman & Brown, 2024, [How Statistical Challenges and Misreadings of the Literature Combine to Produce Unreplicable Science: An Example From Psychology](https://journals.sagepub.com/doi/full/10.1177/25152459241276398)) ```{r} # Function to simulate data simulate_data <- function(n_subjects = 18, n_days = 10, beta0 = 250, beta1 = 0, sd_intercept = 25, sd_slope = 10, sd_residual = 25) { subject <- rep(1:n_subjects, each = n_days) days <- rep(0:(n_days - 1), times = n_subjects) # Random effects random_intercepts <- rnorm(n_subjects, mean = 0, sd = sd_intercept) random_slopes <- rnorm(n_subjects, mean = 0, sd = sd_slope) # Generate reaction times reaction <- beta0 + beta1 * days + random_intercepts[subject] + random_slopes[subject] * days + rnorm(n_subjects * n_days, mean = 0, sd = sd_residual) data.frame(Subject = factor(subject), Days = days, Reaction = reaction) } ``` ```{r} # Simulate one data set set.seed(1308) simulated_data <- simulate_data() m_ri <- brm(Reaction ~ Days + (1 | Subject), data = simulated_data, iter = 1000, chains = 2, control = list(adapt_delta = .95)) m_rs <- brm(Reaction ~ Days + (Days | Subject), data = simulated_data, iter = 1000, chains = 2, control = list(adapt_delta = .95)) ``` Q1: Among the two models above, which model is better based on LOOIC? Q2: Examine the posterior distributions for the fixed effect of `Days` (i.e., average slope across participants). How are they different in the two models? Q3: The following code implements a simulation examining the effect of ignoring the random slopes on Type I error rates. How much is that inflated? This will take a few minutes. [Note: I use `lme4::lmer()` for frequentist analyses to save time.] ```{r} library(lme4) library(lmerTest) # Function to perform analysis and check Type I error check_type1_error <- function(form = Reaction ~ Days + (1 | Subject), n_simulations = 1000, alpha = 0.05) { p_values <- numeric(n_simulations) for (i in 1:n_simulations) { data <- simulate_data() # Fit model ignoring random slope model <- lmer(form, data = data) # Extract p-value for fixed effect of Days p_values[i] <- summary(model)$coefficients[2, "Pr(>|t|)"] } # Calculate Type I error rate type1_error_rate <- mean(p_values < alpha) return(type1_error_rate) } # Run the simulation and check Type I error rate set.seed(123) type1_error_rate_ri <- check_type1_error() print(paste("Type I error rate with random intercepts:", type1_error_rate_ri)) ``` Q4: Repeat Q3, but change the model to one including the random slopes for `Days`. ```{r} # Run the simulation and check Type I error rate ```