Statistics Every Programmer Needs you own this product

Practical Python implementations and quantitative methods

Gary Sutton

July 2025
ISBN 9781633436053
448 pages

Included with a Manning Online subscription

printed in black & white

catalog / Other / Mathematics

resources: Source code Errata Book forum Source code on Github Register your pBook for a free eBook

table of content

1 Laying the groundwork

1.1 Stats and quant

1.1.1 Understanding the basics

1.1.2 Why they matter

1.1.3 The broader effect

1.1.4 Diving deeper: Core concepts

1.2 Why Python?

1.2.1 Rich ecosystem

1.2.2 Ease of learning

1.2.3 Online support and community

1.2.4 Industry adoption

1.2.5 Versatility

1.3 Python IDEs

1.3.1 IDLE: A starting point

1.3.2 PyCharm: A professional tool

1.3.3 Other popular IDEs

1.4 Benefits and learning approach

1.4.1 From statistical measures to real-world application

1.4.2 Expanding beyond traditional techniques

1.4.3 A balanced approach to theory and practice

1.5 How this book works

1.5.1 Foundational learning with exploration and practice

1.5.2 Using Python for precision and efficiency

1.5.3 Adaptable learning for diverse skill levels

1.6 What this book does not cover

2 Exploring probability and counting

2.1 Basic probabilities

2.1.1 Probability types

2.1.2 Converting and measuring probabilities

2.2 Counting rules

2.2.1 Multiplication rule

2.2.2 Addition rule

2.2.3 Combinations and permutations

2.3 Continuous random variables

2.3.1 Examples

2.3.2 Probability density function

2.3.3 Cumulative distribution function

2.4 Discrete random variables

2.4.1 Examples

2.4.2 Probability mass function

2.4.3 Cumulative distribution function

3 Exploring probability distributions and conditional probabilities

3.1 Probability distributions

3.1.1 Normal distribution

3.1.2 Binomial distribution

3.1.3 Discrete uniform distribution

3.1.4 Poisson distribution

3.2 Probability problems

3.2.1 Complement rule for probability

3.2.2 Quick reference guide

3.2.3 Applied probability: Examples and solutions

3.3 Conditional probabilities

3.3.1 Examples

3.3.2 Conditional probabilities and independence

3.3.3 Intuitive approach to conditional probability

3.3.4 Formulaic approach to conditional probability

4 Fitting a linear regression

4.1 Primer on linear regression

4.1.1 Linear equation

4.1.2 Goodness of fit

4.1.3 Conditions for best fit

4.2 Simple linear regression

4.2.1 Importing and exploring the data

4.2.2 Fitting the model

4.2.3 Interpreting and evaluating the results

4.2.4 Testing model assumptions

5 Fitting a logistic regression

5.1 Logistic regression vs. linear regression

5.2 Multiple logistic regression

5.2.1 Importing and exploring the data

5.2.2 Fitting the model

5.2.3 Interpreting and evaluating the results

5.2.4 Calculating and evaluating classification metrics

6 Fitting a decision tree and a random forest

6.1 Understanding decision trees and random forests

6.2 Importing, wrangling, and exploring the data

6.2.1 Understanding the data

6.2.2 Wrangling the data

6.2.3 Exploring the data

6.3 Fitting a decision tree

6.3.1 Splitting the data

6.3.2 Fitting the model

6.3.3 Predicting responses

6.3.4 Evaluating the model

6.3.5 Plotting the decision tree

6.3.6 Interpreting and understanding decision trees

6.3.7 Advantages and disadvantages of decision trees

6.4 Fitting a random forest

6.4.1 Fitting the model

6.4.2 Predicting responses

6.4.3 Evaluating the model

6.4.4 Feature importance

6.4.5 Extracting random trees

7 Fitting time series models

7.1 Distinguishing forecasts from predictions

7.2 Importing and plotting the data

7.2.1 Fetching financial data

7.2.2 Understanding the data

7.2.3 Plotting the data

7.3 Fitting an ARIMA model

7.3.1 Autoregression (AR) component

7.3.2 Integration (I) component

7.3.3 Moving average (MA) component

7.3.4 Combining ARIMA components

7.3.5 Stationarity

7.3.6 Differencing

7.3.7 Stationarity and differencing applied

7.3.8 AR and MA components

7.3.9 Fitting the model

7.3.10 Evaluating model fit

7.3.11 Forecasting

7.4 Fitting exponential smoothing models

7.4.1 Model structure

7.4.2 Applicability

7.4.3 Mathematical properties

7.4.4 Types of exponential smoothing models

7.4.5 Choosing between ARIMA and exponential smoothing

7.4.6 SES and DES models

7.4.7 Holt–Winters model

8 Transforming data into decisions with linear programming

8.1 Problem formulation

8.1.1 The scenario

8.1.2 The challenge

8.1.3 The approach

8.1.4 Feature summaries

8.2 Developing the linear optimization framework

8.2.1 Explanation of linear equations and inequalities

8.2.2 Data definition

8.2.3 Objective function

8.2.4 Constraints

8.2.5 Decision variable bounds

8.2.6 Solving the linear programming problem

8.2.7 Result evaluation

9 Running Monte Carlo simulations

9.1 Applications and benefits of Monte Carlo simulations

9.2 Step-by-step process

9.3 Hands-on approach

9.3.1 Establishing a probability distribution (step 1)

9.3.2 Computing a cumulative probability distribution (step 2)

9.3.3 Establishing an interval of random numbers for each variable (step 3)

9.3.4 Generating random numbers (step 4)

9.3.5 Simulating a series of trials (step 5)

9.3.6 Analyzing the results (step 6)

9.4 Automating simulations on discrete data

9.4.1 Plotting and analyzing the results

9.5 Automating simulations on continuous data

9.5.1 Predicting stock prices with Monte Carlo simulations

9.5.2 Analyzing historical data (step 1)

9.5.3 Calculating log returns (step 2)

9.5.4 Computing statistical parameters (step 3)

9.5.5 Generating random daily returns (step 4)

9.5.6 Simulating prices (step 5)

9.5.7 Simulating multiple trials (step 6)

9.5.8 Analyzing the results (step 7)

10 Building and plotting a decision tree

10.1 Decision-making without probabilities

10.1.1 Maximax method

10.1.2 Maximin method

10.1.3 Minimax Regret method

10.1.4 Expected Value method

10.2 Decision trees

10.2.1 Creating the schema

10.2.2 Plotting the tree

11 Predicting future states with Markov analysis

11.1 Understanding the mechanics of Markov analysis

11.2 States and state probabilities

11.2.1 Understanding the vector of state probabilities for multistate systems

11.2.2 Matrix of transition probabilities

11.3 Equilibrium conditions

11.3.1 Predicting equilibrium conditions programmatically

11.4 Absorbing states

11.4.1 Obtaining the fundamental matrix

11.4.2 Predicting absorbing states

11.4.3 Predicting absorbing states programmatically

12 Examining and testing naturally occurring number sequences

12.1 Benford’s law explained

12.2 Naturally occurring number sequences

12.3 Uniform and random distributions

12.3.1 Uniform distribution

12.3.2 Random distribution

12.3.3 Plotted distributions

12.4 Examples

12.4.1 Street addresses

12.4.2 World population figures

12.4.3 Payment amounts

12.5 Validating Benford’s law

12.5.1 Chi-square test

12.5.2 Mean absolute deviation

12.5.3 Distortion factor and z-statistic

12.5.4 Mantissa statistics

13 Managing projects

13.1 Creating a work breakdown structure

13.2 Estimating activity times with PERT

13.3 Finding the critical path

13.3.1 Earliest times

13.3.2 Latest times

13.3.3 Slack

13.3.4 Finding the critical path programmatically

13.4 Estimating the probability of project completion

13.5 Crashing the project

14 Visualizing quality control

14.1 Quality control measures

14.1.1 Upper control limit and lower control limit

14.1.2 Mean and center line

14.1.3 Standard deviation

14.1.4 Range

14.1.5 Sample size

14.1.6 Proportion defective

14.1.7 Number of defective items

14.1.8 Number of defects

14.1.9 Defects per unit

14.1.10 Moving range

14.1.11 z-score

14.1.12 Process capability indices

14.2 Control charts for attributes

14.2.1 p-charts

14.2.2 np-charts

14.2.3 c-charts

14.2.4 g-charts

14.3 Control charts for variables

14.3.1 x-bar charts

14.3.2 r-charts

14.3.3 s-charts

14.3.4 I-MR charts

14.3.5 EWMA charts

Overview

1 Getting started

This opening chapter frames the book as a pragmatic guide for making sound, data-driven decisions under uncertainty. It motivates statistics and quantitative techniques through real-world, high-stakes scenarios and positions the material as both a conceptual foundation and a practical toolkit. Rather than offering either abstract math or cookbook code alone, it emphasizes a dual promise: learn powerful methods such as regression, Monte Carlo simulations, decision trees, optimization, and Markov chains, and understand the assumptions and reasoning that make them work.

Python is presented as the computational backbone because of its clarity, rich ecosystem, and broad industry adoption. The chapter highlights how libraries like Pandas, NumPy/SciPy, Matplotlib/Seaborn, scikit-learn, and Statsmodels streamline everything from data wrangling to modeling and visualization, supported by an active global community. It also surveys IDE options to match project scale and style—IDLE for quick experiments, PyCharm (used throughout the book) for full-featured development, and popular alternatives like Jupyter, Spyder, and PyDev—so readers can choose an environment that keeps the focus on analysis rather than tooling.

The learning approach balances theory and practice: concepts are introduced carefully, then reinforced with annotated, reusable Python code and exploratory workflows that test assumptions, assess fit, and validate results. Early chapters solidify probability and distributions before advancing to models and methods used across finance, operations, and analytics. Readers are taught to go beyond visual checks to formal tests (e.g., normality), to derive and interpret metrics like R-squared, and to communicate findings clearly to stakeholders. The chapter closes by clarifying scope: this is not a Python tutorial or installation guide, but a structured, adaptable path for novices and practitioners to build a transferable, industry-aligned quantitative toolkit.

A process map that depicts how you should expect most of the subsequent chapters to flow. The linear component closely aligns with a typical statistical or quantitative problem at work or in the classroom, especially if we were to change out the theoretical background in favor of an opening problem definition. But there is also some non-linearity due to mixing the underlying concepts with practice, where and when it makes sense.

Summary

This book combines statistical and quantitative theory with hands-on Python implementation, empowering readers to solve real-world problems with confidence. It emphasizes both the "how" and the "why," ensuring a deep understanding of each method.
Through the use of Python’s robust ecosystem—including Pandas, NumPy, Matplotlib, Scikit-learn, and Statsmodels libraries—this book demonstrates how to execute techniques efficiently while providing reusable and annotated code for practical applications.
Whether you're a novice seeking foundational knowledge or an experienced practitioner looking to expand your skill set, the book’s clear structure and progression make it accessible and rewarding for all.
Each chapter is dedicated to a single technique, progressing from foundational concepts to advanced applications. Chapters 2 and 3 serve as foundational pillars, providing essential knowledge in basic and conditional probabilities, common probability distributions, counting rules, and combinations and permutations.
Real-world examples, such as optimizing staffing levels, forecasting stock prices, and analyzing Markov chains, illustrate the applicability of these techniques across industries. Readers gain not only technical expertise but also the critical thinking skills needed to apply these methods effectively in diverse scenarios.

FAQ

What is the purpose of this book and who is it for?

This book equips programmers, students, and practitioners to make data-driven decisions in high-stakes settings by blending solid statistical theory with hands-on Python implementations. It’s designed for both beginners building foundations and experienced professionals expanding their quantitative toolkit.

Why does the book use Python for stats and quant?

Python offers a rare mix of simplicity, power, and versatility. Its rich ecosystem (Pandas, NumPy/SciPy, Matplotlib/Seaborn, scikit-learn, Statsmodels), ease of learning, strong community support, and broad industry adoption make it ideal for bridging theory and practice across analysis, modeling, optimization, and simulation.

Which Python libraries will I use most?

The core toolkit includes Pandas for data manipulation, NumPy and SciPy for numerical computing, Matplotlib and Seaborn for visualization, and scikit-learn and Statsmodels for machine learning and statistical models. These libraries streamline end-to-end workflows from exploration to modeling and presentation.

Do I need to already know Python?

Yes—basic familiarity is assumed. This is not a Python tutorial; the focus is applying Python to statistical and quantitative methods. The book does not cover installing Python or IDEs, but provides reusable, well-annotated code so you can implement techniques effectively.

Which IDE is used for the examples?

All examples were developed in PyCharm 2023.3.3 (Community Edition) on macOS Sonoma 14.2.1 using Python 3.12.12, with libraries installed via pip. While PyCharm is the book’s primary IDE, you can follow along on other platforms with similar setups.

Can I use IDLE or other IDEs instead of PyCharm?

Yes. IDLE is fine for quick experiments but lacks advanced debugging, version control, and project management. Jupyter Notebook excels at interactive, exploratory work; Spyder suits scientific computing with its variable explorer; PyDev (Eclipse) fits multi-language, large projects. Choose based on your workflow and project scale.

How are chapters structured?

Each chapter is self-contained and follows a practical flow: establish theory and assumptions, explore data, implement solutions in Python, and interpret results. You can read sequentially for a full journey or jump to specific topics as needed.

How does the book balance theory and practice?

Concepts come first (e.g., regression assumptions, randomness in Monte Carlo, Markov state transitions), followed by annotated Python code that shows how and why each step works. This dual approach builds both conceptual understanding and implementation skills.

What topics will I learn?

You’ll cover regression (linear and logistic), decision trees and random forests, time series analysis, Monte Carlo simulations, Markov chains, and optimization (e.g., linear programming). You’ll also learn to evaluate assumptions, decompose sums of squares, run formal tests (like Shapiro–Wilk), and communicate results effectively.

What is out of scope for this book?

The book does not teach general Python programming or provide step-by-step installation guides for Python or IDEs. It focuses on applying statistical and quantitative techniques with Python, assuming your development environment is already set up.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$74.99 $47.24

you save $27.75 (37%)

include audio $24.99 $15.74

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$74.99 $47.24

you save $27.75 (37%)

include audio $24.99 $15.74

eBook

pdf, ePub, online

$74.99 $47.24

you save $27.75 (37%)

include audio $24.99 $15.74

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more