Overview

1 Introduction to Bayesian statistics: Representing our knowledge and uncertainty with probabilities

Bayesian statistics is introduced as a practical language for reasoning and making decisions under uncertainty. Using examples like weather prediction, the chapter shows why purely yes/no forecasts are inadequate and how probabilistic outputs better capture real-world ambiguity and support different risk preferences. It frames uncertainty with random variables—binary, categorical, or continuous—highlighting the value of finer granularity for usefulness and trust, and motivates probability as essential not only for everyday choices but also for robust machine learning, where non-probabilistic models can be confidently wrong.

The Bayesian viewpoint represents unknowns with probability distributions that encode belief, then updates those beliefs as evidence arrives. Through simple cases like a Bernoulli model for “will it rain” and categorical distributions for “how much rain,” the text illustrates parameters, expected values, and how conditional probabilities summarize what we know. Central to Bayesian modeling are the prior (initial belief), data (evidence), and posterior (updated belief), an approach that mirrors common-sense reasoning while exposing a tradeoff between model granularity and computational cost.

Contrasting Bayesian and frequentist perspectives, the chapter explains that Bayesians treat parameters as uncertain and data as evidence to update beliefs, while frequentists view parameters as fixed and uncertainty as arising from data-generating variability across repeated trials. It discusses the subjectivity of priors as a transparent and often beneficial feature—especially with limited data—and notes that both paradigms can converge when data are abundant. As a contemporary application, large language models are framed as probabilistic next-word predictors whose conditioning and multiple likely outputs reflect Bayesian thinking in spirit, even if full posteriors are computationally infeasible. The chapter closes by positioning Bayesian methods as intuitive, flexible, and powerful foundations for modeling and decision-making across real-world tasks.

An illustration of machine learning models without probabilistic reasoning capabilities being susceptible to noise and overconfidently making the wrong predictions.
An example categorical distribution for rainfall rate.

Summary

  • We need probability to model phenomena in the real world whose outcomes we haven’t observed.
  • With Bayesian probability, we use probability to represent our personal belief about an unknown quantity, which we model using a random variable.
  • From a Bayesian belief about a quantity of interest, we can compute quantities that represent our knowledge and uncertainty about that quantity of interest.
  • There are three main components to a Bayesian model: the prior distribution, the data, and the posterior distribution. The last component is the result of combining the first two and what we want out of a Bayesian model.
  • Bayesian probability is useful when we want to incorporate prior knowledge into a model, when data is limited, and for decision-making under uncertainty.
  • A different interpretation of probability, frequentism, views probability as the frequency of an event under infinite repeats, which limits the application of probability in various scenarios.
  • Large language models, which power popular chat artificial intelligence models, apply Bayesian probability to predict the next word in a sentence.

FAQ

Why do we need probability instead of simple yes/no predictions?Because the world is uncertain. Probabilities quantify how unsure we are and let different people make different choices based on their risk tolerance. A yes/no output hides uncertainty and can lead to overconfident, costly mistakes (for example, bringing or skipping an umbrella, or overconfident ML predictions).
What do the three weather app designs illustrate?They show increasing granularity of probabilistic modeling: (1) a non‑probabilistic yes/no app; (2) a probabilistic app that reports the chance of rain; (3) a more granular app that assigns probabilities to specific rainfall amounts. Each more detailed app is a generalization of the simpler one and carries more actionable information.
What is a random variable, and how do binary, categorical, and continuous variables differ?A random variable numerically represents an uncertain quantity. Binary variables take two values (for example, rain: 0/1). Categorical variables take one of several predefined values (for example, discretized rainfall rates). Continuous variables can take any value in a range (for example, exact rainfall rate).
What is the Bernoulli distribution and what does its parameter represent?Bernoulli models a single binary event. Its parameter p is the probability that the event equals 1 (for example, rain), and 1−p is the probability it equals 0 (no rain). Choosing p encodes your belief about the event’s likelihood.
What is the expected value and how should I interpret it?The expected value is the probability‑weighted average of all possible outcomes. It summarizes a distribution with a single number, but it is not necessarily the most likely outcome. You compute it by summing value × probability over all possibilities.
How does Bayesian updating work (prior → data → posterior)?You start with a prior distribution (initial belief), observe data (evidence), and combine them to produce a posterior distribution—your updated belief conditioned on the data. For example, a low prior chance of rain may increase after you see dark clouds.
How do Bayesian and frequentist interpretations of probability differ?Bayesian: probability is a degree of belief about unknown quantities; the unknowns are treated as random and updated with data. Frequentist: probability is long‑run frequency from repeated sampling; parameters are fixed, and randomness comes from data collection. With abundant data, conclusions often coincide.
Why might two Bayesians disagree given the same data, and is that a feature or a bug?They may use different priors, leading to different posteriors. This subjectivity is often a feature: priors encode domain knowledge, improve performance with limited data, and make assumptions explicit and transparent for scrutiny.
When should I use Bayesian methods versus frequentist methods?Use Bayesian methods when you have useful prior knowledge, limited data, or need tailored decision analysis under uncertainty. Use frequentist methods when data are abundant and standard, computationally simple tools (for example, classical hypothesis tests, A/B tests) fit the objective.
How do large language models (LLMs) relate to Bayesian probability?LLMs perform next‑word prediction via conditional probabilities given context and training data. They’re Bayesian in spirit but not fully Bayesian—computing full posteriors over all words is too costly—so they approximate by focusing on likely candidates and sampling to produce multiple plausible outputs (useful for user feedback and refinement). Note: softmax “confidences” are normalized scores and shouldn’t be treated as calibrated, true probabilities.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Grokking Bayes ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Grokking Bayes ebook for free