1 Introduction

1.1 History of Bayesian Statistics

Here is a nice brief video that covers some of the 250+ years of history of Bayesian statistics:

If you are interested in learning more about the story, check out the popular science book, “The theory that would not die,” by McGrayne (2011)

1.1.1 Thomas Bayes (1701–1762)

You may find a biography of Bayes from https://www.britannica.com/biography/Thomas-Bayes. There is also a nice story in the book by Lambert (2018). He was an English Presbyterian minister. The important work he wrote that founded Bayesian statistics was “An Essay Towards Solving a Problem in the Doctrine of Chances,” which he did not publish and was later discovered and edited by his friend, Richard Price, after Bayes’s death ¹

1.1.2 Pierre-Simon Laplace (1749–1827)

Laplace, a French Mathematician, was an important figure in not just Bayesian statistics but other areas of mathematics, astronomy, and physics. We know much more about the work by Laplace than by Bayes, and Laplace has worked independently on the inverse probability problem (i.e., $P[\text{Parameter} | \text{Data}]$). Indeed, he was credited for largely formalizing the Bayesian interpretation of probability and most of the machinery for Bayesian statistics, and making it a useful technique for different problems, despite the discipline being called “Bayesian.” His other contributions include the methods of least squares and the central limit theorem. See a short biography of him at https://www.britannica.com/biography/Pierre-Simon-marquis-de-Laplace.

1.1.3 20th Century

Until the early 1920s, the inverse probability method, which is based on what is now called Bayes’s Theorem, was pretty much the predominant point of view of statistics. Then a point of view later known as frequentist statistics arrived, and quickly became the mainstream school of thinking for statistical inferences, and is still the primary framework for quantitative research. In the early 1920s, frequentist scholars, most notably R. A. Fisher and Jerzy Neyman, criticized Bayesian inference for using subjective elements in an objective discipline. In Fisher’s words,

The theory of inverse probability is founded upon an error, and must be wholly rejected—Fisher, 1925

Ironically, the term Bayesian was first used in one of Fisher’s works. And interestingly, Fisher actually thought he “[had] been doing almost exactly what Bayes had done in the 18th century.”²

Despite criticisms from frequentist scholars, Bayesian methods have been used by scholars in the Allies in World War II, such as Alan Turing, in an algorithm to break coded messages in the Enigma machine that the German Navy used to communicate. However, because of the more complex mathematics involved in Bayesian statistics, Bayesian statistics is limited to straight-forward problems and theoretical discussions until the early 1980s, when computing speed increased tremendously and made Markov Chain Monte Carlo—the primary algorithm for Bayesian estimation in modern Bayesian statistics—feasible. With the help of increased computing speed, Bayesian statistics has come back and been used as an alternative way of thinking, especially given the growing dissatisfaction towards the misuse of frequentist statistics by some scholars across disciplines. Bayesian estimation methods have also been applied to many new research questions where frequentist approaches work less well, as well as in big data analytics and machine learning.

1.2 Motivations for Using Bayesian Methods

Based on my personal experience, Bayesian methods are used quite often in statistics and related departments, as it is consistent and coherent, in contrast to frequentist where a new and probably ad hoc procedure needed to be developed to handle a new problem. For Bayesian, as long as you can formulate a model, you just run the analysis the same way as you would for simpler problems, or in Bayesian people’s words “turning the Bayesian crank,” and likely the difficulties would be more technical than theoretical, which is usually solved with better computational speed.

Social and behavioral scientists are relatively slow to adopt the Bayesian method, but things have been changing. In a recently accepted paper by Van De Schoot et al. (2017), the authors reviewed papers in psychology between 1990 and 2015 and found that whereas less than 10% of the papers from 1990 to 1996 mentioned “Bayesian”, the proportion increased steadily and was found in close to 45% of the psychology papers in 2015. Among studies using Bayesian methods, more than 1/4 cited computational problems (e.g., nonconvergence) in frequentist methods as a reason, and about 13% cited the need to incorporate prior knowledge into the estimation process. The other reasons included the flexibility of Bayesian methods for complex and nonstandard problems, and the use of techniques traditionally attached to Bayesian such as missing data and model comparisons.

1.2.1 Problem with classical (frequentist) statistics

The rise of Bayesian methods is also related to the statistical reform movement in the past two decades. The problem is that applied researchers are obsessed with $p < .05$ and often misinterpreted a small $p$-value as something that it isn’t (read Gigerenzer, 2004). Some scholars coined the term $p$-hacking to refer to the practice of obtaining statistical significance by choosing to test the data in a certain way, either consciously or subconsciously (e.g., dichotomizing using mean or median, trying the same hypothesis using different measures of the same variable, etc). This is closely related to the recent “replication crisis” in scientific research, with psychology being in the center under close scrutiny.

Bayesian is no panacea to the problem. Indeed, if misused, it can give rise to the same problems as statistical significance. My goal in this class is to help you appreciate the Bayesian tradition of embracing the uncertainty in your results, and adopt rigorous model checking and comprehensive reporting rather than relying merely on a $p$-value. I see this as the most important mission for someone teaching statistics.

1.3 Comparing Bayesian and Frequentist Statistics

Attributes	Frequentist	Bayesian
Interpretation of probability	Frequentist	Subjectivist
Uncertainty	How estimates vary in repeated sampling from the same population	How much prior beliefs about parameters change in light of data
What’s relevant?	Current data set + all that might have been observed	Only the data set that is actually observed
How to proceed with analyses	MLE; ad hoc and depends on problems	“Turning the Bayesian crank”

1.4 Software for Bayesian Statistics

The following summarizes some of the most popular Bayesian software. Currently, JAGS and Stan are the most popular. General statistical programs like SPSS, SAS, and Stata also have some support for Bayesian analyses as well.

WinBUGS
- Bayesian inference Using Gibbs Sampling
- Free, and most popular until late 2000s. Many Bayesian scholars still use WinBUGS
- No further development
- One can communicate from R to WinBUGS using the package R2WinBUGS
JAGS
- Just Another Gibbs Sampler
- Very similar to WinBUGS, but written in C++, and supports user-defined functionality
- Cross-platform compatibility
- One can communicate from R to JAGS using the package rjags or runjags
Stan
- Named in honour of Stanislaw Ulam, who invented the Markov Chain Monte Carlo method
- Uses new algorithms that are different from Gibbs sampling
- Under very active development
- Can interface with R through the package rstan, and the R packages rstanarm and brms automates the procedure for fitting models in Stan for many commonly used models

Price is another important figure in mathematics and philosophy, and had taken Bayes’ theorem and applied it to insurance and moral philosophy.↩︎
See the paper by John Aldrich on this.↩︎

# Introduction ## History of Bayesian Statistics Here is a nice brief video that covers some of the 250+ years of history of Bayesian statistics: {{< video https://www.youtube.com/watch?v=BcvLAw-JRss >}} If you are interested in learning more about the story, check out the popular science book, ["The theory that would not die," by @mcgrayne2011](https://yalebooks.yale.edu/book/9780300188226/theory-would-not-die) ### Thomas Bayes (1701--1762) You may find a biography of Bayes from <https://www.britannica.com/biography/Thomas-Bayes>. There is also a nice story in the book by @lambert2018. He was an English Presbyterian minister. The important work he wrote that founded Bayesian statistics was "An Essay Towards Solving a Problem in the Doctrine of Chances," which he did not publish and was later discovered and edited by his friend, Richard Price, after Bayes's death [^Price] [^Price]: Price is another important figure in mathematics and philosophy, and had taken Bayes' theorem and applied it to insurance and moral philosophy. ### Pierre-Simon Laplace (1749--1827) Laplace, a French Mathematician, was an important figure in not just Bayesian statistics but other areas of mathematics, astronomy, and physics. We know much more about the work by Laplace than by Bayes, and Laplace has worked independently on the inverse probability problem (i.e., $P[\text{Parameter} | \text{Data}]$). Indeed, he was credited for largely formalizing the Bayesian interpretation of probability and most of the machinery for Bayesian statistics, and making it a useful technique for different problems, despite the discipline being called "Bayesian." His other contributions include the methods of least squares and the central limit theorem. See a short biography of him at <https://www.britannica.com/biography/Pierre-Simon-marquis-de-Laplace>. ### 20th Century Until the early 1920s, the *inverse probability* method, which is based on what is now called Bayes's Theorem, was pretty much the predominant point of view of statistics. Then a point of view later known as *frequentist* statistics arrived, and quickly became the mainstream school of thinking for statistical inferences, and is still the primary framework for quantitative research. In the early 1920s, frequentist scholars, most notably R. A. Fisher and Jerzy Neyman, criticized Bayesian inference for using subjective elements in an objective discipline. In Fisher's words, > The theory of inverse probability is founded upon an error, and must be wholly rejected---Fisher, 1925 Ironically, the term *Bayesian* was first used in one of Fisher's works. And interestingly, Fisher actually thought he "[had] been doing almost exactly what Bayes had done in the 18th century."[^Aldrich] [^Aldrich]: See the [paper by John Aldrich](https://projecteuclid.org/download/pdf_1/euclid.ba/1340370565) on this. Despite criticisms from frequentist scholars, Bayesian methods have been used by scholars in the Allies in World War II, such as Alan Turing, in an algorithm to break coded messages in the Enigma machine that the German Navy used to communicate. However, because of the more complex mathematics involved in Bayesian statistics, Bayesian statistics is limited to straight-forward problems and theoretical discussions until the early 1980s, when computing speed increased tremendously and made *Markov Chain Monte Carlo*---the primary algorithm for Bayesian estimation in modern Bayesian statistics---feasible. With the help of increased computing speed, Bayesian statistics has come back and been used as an alternative way of thinking, especially given the growing dissatisfaction towards the misuse of frequentist statistics by some scholars across disciplines. Bayesian estimation methods have also been applied to many new research questions where frequentist approaches work less well, as well as in big data analytics and machine learning. ## Motivations for Using Bayesian Methods Based on my personal experience, Bayesian methods are used quite often in statistics and related departments, as it is consistent and coherent, in contrast to frequentist where a new and probably ad hoc procedure needed to be developed to handle a new problem. For Bayesian, as long as you can formulate a model, you just run the analysis the same way as you would for simpler problems, or in Bayesian people's words "turning the Bayesian crank," and likely the difficulties would be more technical than theoretical, which is usually solved with better computational speed. Social and behavioral scientists are relatively slow to adopt the Bayesian method, but things have been changing. In a recently accepted paper by @vandeschoot2017, the authors reviewed papers in psychology between 1990 and 2015 and found that whereas less than 10% of the papers from 1990 to 1996 mentioned "Bayesian", the proportion increased steadily and was found in close to 45% of the psychology papers in 2015. Among studies using Bayesian methods, more than 1/4 cited computational problems (e.g., nonconvergence) in frequentist methods as a reason, and about 13% cited the need to incorporate prior knowledge into the estimation process. The other reasons included the flexibility of Bayesian methods for complex and nonstandard problems, and the use of techniques traditionally attached to Bayesian such as missing data and model comparisons. ### Problem with classical (frequentist) statistics The rise of Bayesian methods is also related to the statistical reform movement in the past two decades. The problem is that applied researchers are obsessed with $p < .05$ and often misinterpreted a small $p$-value as something that it isn't [read @gigerenzer2004]. Some scholars coined the term [$p$-hacking](https://www.nimh.nih.gov/about/directors/thomas-insel/blog/2014/p-hacking.shtml) to refer to the practice of obtaining statistical significance by choosing to test the data in a certain way, either consciously or subconsciously (e.g., dichotomizing using mean or median, trying the same hypothesis using different measures of the same variable, etc). This is closely related to the recent "replication crisis" in scientific research, [with psychology being in the center under close scrutiny](https://nobaproject.com/modules/the-replication-crisis-in-psychology). Bayesian is no panacea to the problem. Indeed, if misused, it can give rise to the same problems as statistical significance. My goal in this class is to help you appreciate the Bayesian tradition of embracing the uncertainty in your results, and adopt rigorous model checking and comprehensive reporting rather than relying merely on a $p$-value. I see this as the most important mission for someone teaching statistics. ## Comparing Bayesian and Frequentist Statistics | Attributes | Frequentist | Bayesian | | ---------- | ----------- | -------- | | Interpretation of probability | Frequentist | Subjectivist | | Uncertainty | How estimates vary in repeated sampling from the same population | How much prior beliefs about parameters change in light of data | | What's relevant? | Current data set + all that might have been observed | Only the data set that is actually observed | | How to proceed with analyses| MLE; ad hoc and depends on problems | "Turning the Bayesian crank" | ## Software for Bayesian Statistics The following summarizes some of the most popular Bayesian software. Currently, JAGS and Stan are the most popular. General statistical programs like SPSS, SAS, and Stata also have some support for Bayesian analyses as well. - [WinBUGS](http://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-winbugs/) * Bayesian inference Using Gibbs Sampling * Free, and most popular until late 2000s. Many Bayesian scholars still use WinBUGS * No further development * One can communicate from R to WinBUGS using the package `R2WinBUGS` - [JAGS](http://mcmc-jags.sourceforge.net/) * Just Another Gibbs Sampler * Very similar to WinBUGS, but written in C++, and supports user-defined functionality * Cross-platform compatibility * One can communicate from R to JAGS using the package `rjags` or `runjags` - [Stan](http://mc-stan.org/) * Named in honour of Stanislaw Ulam, who invented the Markov Chain Monte Carlo method * Uses new algorithms that are different from Gibbs sampling * Under very active development * Can interface with R through the package `rstan`, and the R packages `rstanarm` and `brms` automates the procedure for fitting models in Stan for many commonly used models