PSYC 573
rbeta()
, rnorm()
, rbinom()
: generate values that imitate independent samples from known distributions
E.g., rbeta(n, shape1 = 15, shape2 = 10)
With a large number of draws (S),
Main problem in Bayesian: no way to draw independent samples from posterior \[ P(\theta \mid y) = \frac{\mathrm{e}^{-(\theta - 1 / 2)^2} \theta^y (1 - \theta)^{n - y}} {\int_0^1 \mathrm{e}^{-(\theta^* - 1 / 2)^2} {\theta^*}^y (1 - {\theta^*})^{n - y} d\theta^*} \]
MCMC: draw dependent (correlated) samples without evaluating the integral in the denominator
Some commonly used algorithms
The Metropolis algorithm (also called random-walk Metropolis)
Gibbs sampling (in BUGS, JAGS)
Hamiltonian Monte Carlo (and No-U-Turn sampler; in STAN)
You have a task: tour all regions in LA county, and the time your spend on each region should be proportional to its popularity
However, you don’t know which region is the most popular
Each day, you will decide whether to stay in the current region or move to a neighboring region
You have a tour guide that tells you whether region A is more or less popular than region B and by how much
How would you proceed?
In the long run, distribution of time spent in each region = distribution of popularity of each region
Data from LA Barometer (by the USC Dornsife Center for Economic and Social Research)
338 first-gen immigrants, 86 used the metro in the previous year
Question:
What proportion of first-gen immigrants uses the metro in a year?
Beta(1.5, 2) prior \(\to\) Beta(87.5, 254) posterior
1,000 independent draws from the posterior:
Proposal density: \(N(0, 0.1)\); Starting value: \(\theta^{(1)} = 0.1\)
With enough iterations, the Metropolis will simulate samples from the target distribution
It is less efficient than rbeta()
because the draws are dependent
Pros
Cons
Markov chain: a sequence of iterations, \(\{\theta^{(1)}, \theta^{(2)}, \ldots, \theta^{(S)}\}\)
Based on ergodic theorems, a well-behaved chain will reach a stationary distribution
It takes a few to a few hundred thousand iterations for the chain to get to the stationary distribution
Therefore, a common practice is to discard the first \(S_\text{warm-up}\) (e.g., first half of the) iterations
The chain does not get stuck
Mixing: multiple chains cross each other
For more robust diagnostics (Vehtari et al. 2021)
\[ \hat{R} = \frac{\text{Between-chain variance} + \text{within-chain variance}} {\text{within-chain variance}} \]
When the chains converge, each should be exploring the same stationary distribution
In the previous examples,
MCMC draws are dependent, so they contain less information for the target posterior distribution
What is the equivalent number of draws if the draws were independent?
We used Markov Chain Monte Carlo (MCMC), specifically a Metropolis algorithm implemented in R, to approximate the posterior distribution of the model parameters. We used two chains, each with 10,000 draws. The first 5,000 draws in each chain were discarded as warm-ups. Trace plots of the posterior samples (Figure X) showed good mixing, and \(\hat R\) statistics (Vehtari et al., 2021) were < 1.01 for all model parameters, indicating good convergence for the MCMC chains. The effective sample sizes > 2376.931 for all model parameters, so the MCMC draws are sufficient for summarizing the posterior distributions.
The model estimated that 25.569% (posterior SD = 2.328%, 90% CI [21.813%, 29.467%]) of first-generation immigrants took the metro in the year 2019.