Statistics Every Programmer Needs you own this product

Practical Python implementations and quantitative methods

Gary Sutton

July 2025
ISBN 9781633436053
448 pages

Included with a Manning Online subscription

printed in black & white

catalog / Other / Mathematics

resources: Source code Errata Book forum Source code on Github Register your pBook for a free eBook

table of content

1 Laying the groundwork

1.1 Stats and quant

1.1.1 Understanding the basics

1.1.2 Why they matter

1.1.3 The broader effect

1.1.4 Diving deeper: Core concepts

1.2 Why Python?

1.2.1 Rich ecosystem

1.2.2 Ease of learning

1.2.3 Online support and community

1.2.4 Industry adoption

1.2.5 Versatility

1.3 Python IDEs

1.3.1 IDLE: A starting point

1.3.2 PyCharm: A professional tool

1.3.3 Other popular IDEs

1.4 Benefits and learning approach

1.4.1 From statistical measures to real-world application

1.4.2 Expanding beyond traditional techniques

1.4.3 A balanced approach to theory and practice

1.5 How this book works

1.5.1 Foundational learning with exploration and practice

1.5.2 Using Python for precision and efficiency

1.5.3 Adaptable learning for diverse skill levels

1.6 What this book does not cover

2 Exploring probability and counting

2.1 Basic probabilities

2.1.1 Probability types

2.1.2 Converting and measuring probabilities

2.2 Counting rules

2.2.1 Multiplication rule

2.2.2 Addition rule

2.2.3 Combinations and permutations

2.3 Continuous random variables

2.3.1 Examples

2.3.2 Probability density function

2.3.3 Cumulative distribution function

2.4 Discrete random variables

2.4.1 Examples

2.4.2 Probability mass function

2.4.3 Cumulative distribution function

3 Exploring probability distributions and conditional probabilities

3.1 Probability distributions

3.1.1 Normal distribution

3.1.2 Binomial distribution

3.1.3 Discrete uniform distribution

3.1.4 Poisson distribution

3.2 Probability problems

3.2.1 Complement rule for probability

3.2.2 Quick reference guide

3.2.3 Applied probability: Examples and solutions

3.3 Conditional probabilities

3.3.1 Examples

3.3.2 Conditional probabilities and independence

3.3.3 Intuitive approach to conditional probability

3.3.4 Formulaic approach to conditional probability

4 Fitting a linear regression

4.1 Primer on linear regression

4.1.1 Linear equation

4.1.2 Goodness of fit

4.1.3 Conditions for best fit

4.2 Simple linear regression

4.2.1 Importing and exploring the data

4.2.2 Fitting the model

4.2.3 Interpreting and evaluating the results

4.2.4 Testing model assumptions

5 Fitting a logistic regression

5.1 Logistic regression vs. linear regression

5.2 Multiple logistic regression

5.2.1 Importing and exploring the data

5.2.2 Fitting the model

5.2.3 Interpreting and evaluating the results

5.2.4 Calculating and evaluating classification metrics

6 Fitting a decision tree and a random forest

6.1 Understanding decision trees and random forests

6.2 Importing, wrangling, and exploring the data

6.2.1 Understanding the data

6.2.2 Wrangling the data

6.2.3 Exploring the data

6.3 Fitting a decision tree

6.3.1 Splitting the data

6.3.2 Fitting the model

6.3.3 Predicting responses

6.3.4 Evaluating the model

6.3.5 Plotting the decision tree

6.3.6 Interpreting and understanding decision trees

6.3.7 Advantages and disadvantages of decision trees

6.4 Fitting a random forest

6.4.1 Fitting the model

6.4.2 Predicting responses

6.4.3 Evaluating the model

6.4.4 Feature importance

6.4.5 Extracting random trees

7 Fitting time series models

7.1 Distinguishing forecasts from predictions

7.2 Importing and plotting the data

7.2.1 Fetching financial data

7.2.2 Understanding the data

7.2.3 Plotting the data

7.3 Fitting an ARIMA model

7.3.1 Autoregression (AR) component

7.3.2 Integration (I) component

7.3.3 Moving average (MA) component

7.3.4 Combining ARIMA components

7.3.5 Stationarity

7.3.6 Differencing

7.3.7 Stationarity and differencing applied

7.3.8 AR and MA components

7.3.9 Fitting the model

7.3.10 Evaluating model fit

7.3.11 Forecasting

7.4 Fitting exponential smoothing models

7.4.1 Model structure

7.4.2 Applicability

7.4.3 Mathematical properties

7.4.4 Types of exponential smoothing models

7.4.5 Choosing between ARIMA and exponential smoothing

7.4.6 SES and DES models

7.4.7 Holt–Winters model

8 Transforming data into decisions with linear programming

8.1 Problem formulation

8.1.1 The scenario

8.1.2 The challenge

8.1.3 The approach

8.1.4 Feature summaries

8.2 Developing the linear optimization framework

8.2.1 Explanation of linear equations and inequalities

8.2.2 Data definition

8.2.3 Objective function

8.2.4 Constraints

8.2.5 Decision variable bounds

8.2.6 Solving the linear programming problem

8.2.7 Result evaluation

9 Running Monte Carlo simulations

9.1 Applications and benefits of Monte Carlo simulations

9.2 Step-by-step process

9.3 Hands-on approach

9.3.1 Establishing a probability distribution (step 1)

9.3.2 Computing a cumulative probability distribution (step 2)

9.3.3 Establishing an interval of random numbers for each variable (step 3)

9.3.4 Generating random numbers (step 4)

9.3.5 Simulating a series of trials (step 5)

9.3.6 Analyzing the results (step 6)

9.4 Automating simulations on discrete data

9.4.1 Plotting and analyzing the results

9.5 Automating simulations on continuous data

9.5.1 Predicting stock prices with Monte Carlo simulations

9.5.2 Analyzing historical data (step 1)

9.5.3 Calculating log returns (step 2)

9.5.4 Computing statistical parameters (step 3)

9.5.5 Generating random daily returns (step 4)

9.5.6 Simulating prices (step 5)

9.5.7 Simulating multiple trials (step 6)

9.5.8 Analyzing the results (step 7)

10 Building and plotting a decision tree

10.1 Decision-making without probabilities

10.1.1 Maximax method

10.1.2 Maximin method

10.1.3 Minimax Regret method

10.1.4 Expected Value method

10.2 Decision trees

10.2.1 Creating the schema

10.2.2 Plotting the tree

11 Predicting future states with Markov analysis

11.1 Understanding the mechanics of Markov analysis

11.2 States and state probabilities

11.2.1 Understanding the vector of state probabilities for multistate systems

11.2.2 Matrix of transition probabilities

11.3 Equilibrium conditions

11.3.1 Predicting equilibrium conditions programmatically

11.4 Absorbing states

11.4.1 Obtaining the fundamental matrix

11.4.2 Predicting absorbing states

11.4.3 Predicting absorbing states programmatically

12 Examining and testing naturally occurring number sequences

12.1 Benford’s law explained

12.2 Naturally occurring number sequences

12.3 Uniform and random distributions

12.3.1 Uniform distribution

12.3.2 Random distribution

12.3.3 Plotted distributions

12.4 Examples

12.4.1 Street addresses

12.4.2 World population figures

12.4.3 Payment amounts

12.5 Validating Benford’s law

12.5.1 Chi-square test

12.5.2 Mean absolute deviation

12.5.3 Distortion factor and z-statistic

12.5.4 Mantissa statistics

13 Managing projects

13.1 Creating a work breakdown structure

13.2 Estimating activity times with PERT

13.3 Finding the critical path

13.3.1 Earliest times

13.3.2 Latest times

13.3.3 Slack

13.3.4 Finding the critical path programmatically

13.4 Estimating the probability of project completion

13.5 Crashing the project

14 Visualizing quality control

14.1 Quality control measures

14.1.1 Upper control limit and lower control limit

14.1.2 Mean and center line

14.1.3 Standard deviation

14.1.4 Range

14.1.5 Sample size

14.1.6 Proportion defective

14.1.7 Number of defective items

14.1.8 Number of defects

14.1.9 Defects per unit

14.1.10 Moving range

14.1.11 z-score

14.1.12 Process capability indices

14.2 Control charts for attributes

14.2.1 p-charts

14.2.2 np-charts

14.2.3 c-charts

14.2.4 g-charts

14.3 Control charts for variables

14.3.1 x-bar charts

14.3.2 r-charts

14.3.3 s-charts

14.3.4 I-MR charts

14.3.5 EWMA charts

Overview

3 Exploring probability distributions and conditional probabilities

This chapter deepens the journey from random variables to practical reasoning under uncertainty by focusing on core probability distributions and conditional probabilities. It builds on prior fundamentals to explain how distributions model real-world phenomena and how programmers can translate those ideas into precise computations. Throughout, the text balances intuition with implementation, showing how to move smoothly between theory, visualization, and executable code.

The discussion centers on four widely used distributions. For the normal distribution, it distinguishes density from probability, highlights the 68–95–99.7 rule, and shows how to compute probabilities via standardization and the cumulative distribution function. The binomial distribution is framed as a model for repeated binary trials, with parameters n and p determining its mean and spread, and a shape that often resembles the normal distribution under common conditions; computations use the probability mass function directly or via library routines. The discrete uniform distribution emphasizes equal likelihood across a finite set and its implications for simulation and fairness. The Poisson distribution models event counts over fixed intervals using a single rate parameter λ (equal to both mean and variance), with shapes that evolve from right-skewed to approximately normal as λ grows. A companion section on probability problems reinforces essential rules—the complement rule, multiplication and addition rules, odds, permutations, and combinations—through concise, realistic exercises.

The final part introduces conditional probability as reasoning with updated information: narrowing the sample space changes the denominator and, consequently, the probability. Through accessible examples (weather, medical testing, traffic, sports, and investing), the chapter contrasts dependence with independence and demonstrates two complementary solution strategies: an intuitive contingency-table approach and the formulaic P(A|B) = P(A and B) / P(B). Programmatic workflows show how to extract counts and compute conditional probabilities directly from tabulated data. Together, these tools equip readers to compute, combine, and condition probabilities with confidence in real-world analytical settings.

A 2 x 2 grid of normal distributions where each plot shares the same x-axis and y-axis scales. The mean equals 0 throughout and the standard deviation equals 5, 10, 15, or 20. Increases in the standard deviation translate to greater dispersion from the mean, flatter distributions, and wider tails. In other words, the probability density function becomes more dispersed across a larger range of values, and the distribution is broader and less peaked around the mean. The bell-like shape applies throughout, however, where the values are distributed symmetrically around the mean.

A standard normal distribution is a normal distribution where the mean is equal to 0, the standard deviation is equal to 1, and the values have been standardized from their raw form. The probability density function typically peaks at or around 0.399.

The top of a typical z-score table. The probability (or area to the right of a particular value that has been standardized) is found where the integer and remaining fractional parts of the value intersect.

A first pair of binomial distributions where the probability of success equals 0.20 (on the left) and 0.50 (on the right), given 20 independent trials. While the distributions are binomial, the data is nonetheless distributed normally.

A second pair of binomial distributions where the probability of success equals 0.75 (on the left) and 0.90 (on the right), given 20 independent trials. Regardless of the probability of success, while the distribution mean shifts as a result, the binomial distributions otherwise maintain their normal distribution look.

A typical discrete uniform distribution, where each discrete random variable has the same probability of occurrence. It doesn’t matter what the lower and upper bounds are; the discrete uniform distribution will always assume this rectangular shape.

Four Poisson distributions, distinguished by their rate parameters, plotted in a single graph. As the rate parameter increases, the distribution mutates from right-skewed to normal.

Summary

The normal distribution is a probability distribution symmetric around the mean so that, when plotted, it assumes a bell-like shape; it is defined by two parameters: the mean, which of course determines the center of the distribution, and the standard deviation, which determines the spread.
A standard normal distribution is a special case of the normal distribution with a mean of 0 and a standard deviation of 1. The raw data has been transformed, or standardized, so that the values represent the number of standard deviations they are below or above the mean.
The binomial distribution describes the probability of a specific number of successes in a fixed number of independent trials, where each trial has but two outcomes and the probabilities of those outcomes remain constant throughout; it is characterized by the following two parameters: the number of trials and the probability of success. Irrespective of those parameters, binomial outcomes are normally distributed.
The discrete uniform distribution is a probability distribution where all outcomes in a finite set of values have an equal probability of occurring; it is otherwise characterized by two parameters: the number of possible outcomes and the range of values in the set.
The Poisson distribution is a probability distribution that describes the number of events occurring within a fixed interval of time or space, given a known average rate of occurrence and assuming independence between events; it is characterized by a single parameter: the rate of occurrence.
When computing probabilities, it’s essential to leverage a combination of various probabilities and counting rules to accurately assess the likelihood of events occurring.
In some scenarios, it can be simpler to compute the inverse, or complement, of an event’s occurrence, thereby providing an alternative approach to understanding its probability.
Conditional probabilities represent the likelihood of an event occurring given that another event has already occurred, thereby allowing for a deeper understanding of how events relate to each other. They are calculated by adjusting the probability of the event of interest based on additional information provided by the occurrence of another event. In other words, the sample space is reduced by decreasing the denominator and maybe the numerator as well. Which can significantly influence the dynamics of subsequent decision-making and risk assessment.

FAQ

What’s the difference between a probability density function (PDF) and an actual probability for the normal distribution?

The PDF gives relative likelihood (density) at a point; it does not equal the probability of that exact value. To get an actual probability under a normal curve, integrate the PDF over an interval (practically: use the cumulative distribution function, CDF). In practice, use a z-table or a function like norm.cdf to get areas/probabilities.

What is the 68-95-99.7 (three-sigma) rule and why is it useful?

For a normal distribution, about 68% of values lie within 1 standard deviation of the mean, ~95% within 2, and ~99.7% within 3. It’s a quick way to judge how unusual a value is and to reason about variability and risk when normality is a reasonable assumption.

How do I compute and use a z-score to find probabilities?

A z-score standardizes a value x by z = (x − μ) / σ. Once standardized, you can look up P(Z ≤ z) in a z-table or compute it via norm.cdf(z). To find the probability between two values, compute the two CDFs and subtract: P(a ≤ Z ≤ b) = CDF(b) − CDF(a).

When should I use the binomial distribution, and what are its key properties?

Use it for a fixed number of independent trials, each with two outcomes (success/failure) and constant success probability p. The PMF is P(X = k) = C(n, k) p^k (1 − p)^(n − k). Its mean is μ = np and standard deviation is σ = √(np(1 − p)).

Why do binomial histograms often look “normal”?

As the number of trials grows (and p is not extremely close to 0 or 1), the binomial distribution becomes approximately symmetric and bell-shaped. This is why binomial outcomes often resemble a normal curve in practice.

What defines a discrete uniform distribution and how do I get its probabilities?

In a discrete uniform distribution over integers a through b, every value in [a, b] has equal probability 1 / (b − a + 1). Outcomes are finite, equally likely, and typically independent. Its plot is “rectangular” because all bars have the same height.

What is the Poisson distribution and what does lambda (λ) represent?

The Poisson models the count of events in a fixed interval when events occur independently at a constant average rate. The rate λ is both the mean and the variance (σ² = λ). For small λ it’s right-skewed; as λ increases it becomes more symmetric and can resemble a normal distribution.

How do I choose between binomial and Poisson models?

Use binomial for a fixed number of independent trials with success probability p. Use Poisson for counts over a time/space interval with an average rate λ and independence of occurrences. Poisson also approximates binomial when n is large and p is small (rare events).

How do the multiplication, addition, and complement rules speed up probability problems?

- Multiplication rule: For independent events, P(A and B) = P(A)P(B).
- Addition rule: For mutually exclusive events, P(A or B) = P(A) + P(B).
- Complement rule: P(A) = 1 − P(Aᶜ). Often the fastest path (e.g., “at least one” = 1 − P(none)).

What is conditional probability and how do I compute it (intuitively and by formula)?

Conditional probability updates a probability given new information: P(A | B) = P(A and B) / P(B). Intuitively, reduce the sample space to cases where B is true (e.g., a contingency table) and take “successes within B” divided by “total within B.” By formula, use counts or probabilities to compute P(A and B) and P(B), then divide.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$63.99 $38.39

you save $25.60 (40%)

include audio $24.99 $14.99

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$63.99 $38.39

you save $25.60 (40%)

include audio $24.99 $14.99

eBook

pdf, ePub, online

$63.99 $38.39

you save $25.60 (40%)

include audio $24.99 $14.99

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more