Overview

1 Introduction to Bayesian statistics: Representing our knowledge and uncertainty with probabilities

Bayesian statistics provides a clear, quantitative language for representing what we know and don’t know, and for making decisions under uncertainty. The chapter motivates probability through everyday choices—like checking a weather app—showing why deterministic yes/no predictions are inadequate and how probabilistic outputs let people act according to their own risk tolerance. It introduces random variables and the trade-off between coarse and fine-grained modeling (binary, categorical, continuous), emphasizing that more informative, probabilistic models are often more useful and trustworthy, especially in noisy, data-scarce, or high-stakes settings.

The Bayesian viewpoint treats unknowns as quantities with probability distributions that encode our beliefs, controlled by parameters and summarized by measures like expected values or event probabilities. Crucially, Bayesians update beliefs with data: a prior combined with evidence yields a posterior that reflects both initial knowledge and what was observed—mirroring how people naturally revise opinions. The chapter contrasts this with frequentism, where probability is long-run frequency, uncertainty arises from data-generating randomness, and methods are often computationally lighter. It argues that Bayesian “subjectivity” via priors is a strength: it makes assumptions explicit, leverages domain knowledge, and helps when data are limited, while frequentist tools can be preferable when data are abundant or objectives are well served by established techniques.

As a modern application, the chapter connects Bayesian thinking to large language models, framing next-word prediction as conditional probability over many plausible continuations. While practical LLMs use approximations rather than full Bayesian posteriors for tractability, probabilistic generation enables multiple high-quality candidates and supports refinement through user feedback. The chapter closes by outlining the book’s path: an intuitive, visual build-up from priors, data, and posteriors to core Bayesian techniques (model comparison, hierarchical modeling, Monte Carlo, variational inference), specialized models for sequences and neural networks, and finally Bayesian decision theory for principled action under uncertainty.

An illustration of machine learning models without probabilistic reasoning capabilities being susceptible to noise and overconfidently making the wrong predictions.
An example categorical distribution for rainfall rate.

Summary

  • We need probability to model phenomena in the real world whose outcomes we haven’t observed.
  • With Bayesian probability, we use probability to represent our personal belief about an unknown quantity, which we model using a random variable.
  • From a Bayesian belief about a quantity of interest, we can compute quantities that represent our knowledge and uncertainty about that quantity of interest.
  • There are three main components to a Bayesian model: the prior distribution, the data, and the posterior distribution. The last component is the result of combining the first two and what we want out of a Bayesian model.
  • Bayesian probability is useful when we want to incorporate prior knowledge into a model, when data is limited, and for decision-making under uncertainty.
  • A different interpretation of probability, frequentism, views probability as the frequency of an event under infinite repeats, which limits the application of probability in various scenarios.
  • Large language models, which power popular chat artificial intelligence models, apply Bayesian probability to predict the next word in a sentence.

FAQ

Why do we need probability instead of simple yes/no predictions?Because the world is uncertain and predictions are rarely 100% accurate. Probabilities convey both the possible outcomes and how confident we are in each, enabling better decisions for different risk preferences (for example, whether to bring an umbrella at 10% vs 40% chance of rain).
What is a random variable in this chapter’s context?A random variable is a numerical representation of an uncertain quantity. For weather, a binary random variable can model “rain today” (0 = no, 1 = yes), while a categorical or continuous variable can model “how much it rains.”
How do binary, categorical, and continuous variables differ, and why choose one over another?- Binary captures yes/no outcomes (simple but coarse). - Categorical discretizes values into labeled bins (balanced granularity). - Continuous allows any real value (most detailed but often costlier to compute). The choice trades off granularity, usefulness, and practicality.
What is a probability distribution and how does the Bernoulli distribution fit in?A probability distribution assigns likelihoods to the possible values of a random variable. The Bernoulli distribution models a single trial with two outcomes (success = 1, failure = 0) using a parameter p, where p = Pr(success) and 1 − p = Pr(failure).
What does “expected value” mean, and why is it not just the simple average?The expected value is a probability-weighted average of all possible outcomes. It emphasizes more likely outcomes and summarizes the distribution’s central tendency; it is not necessarily the most likely or typical realized value.
How does Bayesian updating work (prior, data, posterior)?Start with a prior distribution (initial belief), observe data (evidence), and combine them to form the posterior distribution (updated belief), often written as Pr(X | D). For example, seeing dark clouds can raise your prior “no-rain” belief to a higher posterior chance of rain.
How is Bayesian probability different from frequentist probability?Bayesian: probability quantifies belief about unknowns and updates with data (prior → posterior). Frequentist: probability is the long-run frequency of outcomes under repeated sampling; the parameter is fixed and uncertainty comes from the data-generation process. With lots of data, both can yield similar answers, but their interpretations differ.
Is the subjectivity of priors a drawback?Not necessarily. Priors transparently encode domain knowledge, improve decisions with limited data, and make assumptions explicit for scrutiny. Different priors can lead to different conclusions, but this flexibility is often a strength, not a flaw.
Are neural network “confidence scores” true probabilities?Typical classifier scores (e.g., softmax outputs) are normalized for convenience but are not inherently calibrated probabilities. Such models can be overconfident on wrong predictions, motivating probabilistic reasoning and calibration techniques.
What do large language models (LLMs) have to do with Bayesian ideas?LLMs perform next-word prediction using conditional probabilities (e.g., Pr(next word | context, data)). While not fully Bayesian (full posteriors over all words are computationally prohibitive), they operate “in the spirit” of Bayesian modeling by scoring multiple plausible continuations and leveraging data-driven likelihoods; generating multiple likely options also supports user-feedback refinement.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Grokking Bayes ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Grokking Bayes ebook for free