Trial | Outcome |
---|---|
1 | 2 |
2 | 3 |
3 | 1 |
4 | 3 |
5 | 1 |
6 | 1 |
7 | 5 |
8 | 6 |
9 | 3 |
10 | 3 |
PSYC 573
2024-09-03
Origin: To study gambling problems
A mathematical way to study uncertainty/randomness
Thought Experiment
Someone asks you to play a game. The person will flip a coin. You win $10 if it shows head, and lose $10 if it shows tail. Would you play?
Kolmogorov axioms
For an event \(A_i\) (e.g., getting a “1” from throwing a die)
\(P(A_i) \geq 0\) [All probabilities are non-negative]
\(P(A_1 \cup A_2 \cup \cdots) = 1\) [Union of all possibilities is 1]
\(P(A_1) + P(A_2) = P(A_1 \text{ or } A_2)\) for mutually exclusive \(A_1\) and \(A_2\) [Addition rule]
\(A_1\) = getting a one, . . . \(A_6\) = getting a six
Mutually exclusive: \(A_1\) and \(A_2\) cannot both be true
Classical: Counting rules
Frequentist: long-run relative frequency
Subjectivist: Rational belief
Trial | Outcome |
---|---|
1 | 2 |
2 | 3 |
3 | 1 |
4 | 3 |
5 | 1 |
6 | 1 |
7 | 5 |
8 | 6 |
9 | 3 |
10 | 3 |
Some events cannot be repeated
Or, probability that the null hypothesis is true
For frequentist, probability is not meaningful for a single case
Discrete outcome: Probability mass
Continuous outcome: Probability density
Instead, we obtain probability density: \[ P(x_0) = \lim_{\Delta x \to 0} \frac{P(x_0 < X < x_0 + \Delta x)}{\Delta x} \]
Central tendency
The center is usually the region of values with high plausibility
Dispersion
How concentrated the region with high plausibility is
Interval
>= 4 | <= 3 | Marginal (odd/even) | |
---|---|---|---|
odd | 1/6 | 2/6 | 3/6 |
even | 2/6 | 1/6 | 3/6 |
Marginal (>= 4 or <= 3) | 3/6 | 3/6 | 1 |
Knowing the value of \(B\), the relative plausibility of each value of outcome \(A\)
\[ P(A \mid B_1) = \frac{P(A, B_1)}{P(B_1)} \]
E.g., P(Alzheimer’s) vs. P(Alzheimer’s | family history)
E.g., Knowing that the number is odd
>= 4 | <= 3 | |
---|---|---|
odd | 1/6 | 2/6 |
Marginal (>= 4 or <= 3) | 3/6 | 3/6 |
Conditional = Joint / Marginal
>= 4 | <= 3 | |
---|---|---|
odd | 1/6 | 2/6 |
Marginal (>= 4 or <= 3) | 3/6 | 3/6 |
Conditional (odd) | (1/6) / (3/6) = 1/3 | (1/6) / (2/6) = 2/3 |
\(P\)(number is six | even number) = 1 / 3
\(P\)(even number | number is six) = 1
Another example:
\(P\)(road is wet | it rains) vs. \(P\)(it rains | road is wet)
Sometimes called the confusion of the inverse
\(A\) and \(B\) are independent if
\[ P(A \mid B) = P(A) \]
E.g.,
P(>= 5) = 1/3. P(>=5 | odd number) = ? P(>=5 | even number) = ?
P(<= 5) = 2/3. P(<=5 | odd number) = ? P(>=5 | even number) = ?
From conditional \(P(A \mid B)\) to marginal \(P(A)\)
\[ \begin{align} P(A) & = P(A, B_1) + P(A, B_2) + \cdots + P(A, B_n) \\ & = P(A \mid B_1)P(B_1) + P(A \mid B_2)P(B_2) + \cdots + P(A \mid B_n) P(B_n) \\ & = \sum_{k = 1}^n P(A \mid B_k) P(B_k) \end{align} \]
Example
Consider the use of a depression screening test for people with diabetes. For a person with depression, there is an 85% chance the test is positive. For a person without depression, there is a 28.4% chance the test is positive. Assume that 19.1% of people with diabetes have depression. If the test is given to 1,000 people with diabetes, around how many people will be tested positive?
Bayes Theorem
Given \(P(A, B) = P(A \mid B) P(B) = P(B \mid A) P(A)\) (joint = conditional \(\times\) marginal)
\[ P(B \mid A) = \dfrac{P(A \mid B) P(B)}{P(A)} \]
Which says how we can go from \(P(A \mid B)\) to \(P(B \mid A)\)
Consider \(B_i\) \((i = 1, \ldots, n)\) as one of the many possible mutually exclusive events
\[ \begin{aligned} P(B_i \mid A) & = \frac{P(A \mid B_i) P(B_i)}{P(A)} \\ & = \frac{P(A \mid B_i) P(B_i)}{\sum_{k = 1}^n P(A \mid B_k)P(B_k)} \end{aligned} \]
A police officer stops a driver at random and does a breathalyzer test for the driver. The breathalyzer is known to detect true drunkenness 100% of the time, but in 1% of the cases, it gives a false positive when the driver is sober. We also know that in general, for every 1,000 drivers passing through that spot, one is driving drunk. Suppose that the breathalyzer shows positive for the driver. What is the probability that the driver is truly drunk?
Goal: Find the probability that the person is drunk, given the test result
Parameter (\(\theta\)): drunk (values: drunk, sober)
Data (\(D\)): test (possible values: positive, negative)
Bayes theorem: \(\underbrace{P(\theta \mid D)}_{\text{posterior}} = \underbrace{P(D \mid \theta)}_{\text{likelihood}} \underbrace{P(\theta)}_{\text{prior}} / \underbrace{P(D)}_{\text{marginal}}\)
Usually, the marginal is not given, so
\[ P(\theta \mid D) = \frac{P(D \mid \theta)P(\theta)}{\sum_{\theta^*} P(D \mid \theta^*)P(\theta^*)} \]
The posterior is a synthesis of two sources of information: prior and data (likelihood)
Generally speaking, a narrower distribution (i.e., smaller variance) means more/stronger information
Prior beliefs used in data analysis must be admissible by a skeptical scientific audience (Kruschke, 2015, p. 115)
Probability of observing the data as a function of the parameter(s)
The posterior is the same as getting \(D_2\) first then \(D_1\), or \(D_1\) and \(D_2\) together, if
Exchangeability
Joint distribution of the data does not depend on the order of the data
E.g., \(P(D_1, D_2, D_3) = P(D_2, D_3, D_1) = P(D_3, D_2, D_1)\)
Example of non-exchangeable data:
Q: Estimate the probability that a coin gives a head
Flip a coin, showing head
Bernoulli model is natural for binary outcomes
Assume the flips are exchangeable given \(\theta\), \[ \begin{align} P(y_1, \ldots, y_N \mid \theta) &= \prod_{i = 1}^N P(y_i \mid \theta) \\ &= \theta^z (1 - \theta)^{N - z} \end{align} \]
\(z\) = # of heads; \(N\) = # of flips
Prior belief, weighted by the likelihood
\[ P(\theta \mid y) \propto \underbrace{P(y \mid \theta)}_{\text{weights}} P(\theta) \]
Likelihood, weighted by the strength of prior belief
\[ P(\theta \mid y) \propto \underbrace{P(\theta)}_{\text{weights}} P(\theta \mid y) \]
See Exercise 2
Discretize a continuous parameter into a finite number of discrete values
For example, with \(\theta\): [0, 1] \(\to\) [.05, .15, .25, …, .95]
Main controversy: subjectivity in choosing a prior
Counters to the Subjectivity Criticism
Counters to the Subjectivity Criticism 2
Subjectivity in choosing a prior is
Counters to the Subjectivity Criticism 3
The prior is a way to incorporate previous research efforts to accumulate scientific evidence
Why should we ignore all previous literature every time we conduct a new study?