Pandas in Action
Boris Paskhaver
  • MEAP began February 2020
  • Publication in Spring 2021 (estimated)
  • ISBN 9781617297434
  • 525 pages (estimated)
  • printed in black & white

An outstanding Pandas reference for for developers just starting to use Pandas and/or Python for data analysis or data science.

Jeff Smith
Pandas has rapidly become one of Python's most popular data analysis libraries. With pandas you can efficiently sort, analyze, filter and munge almost any type of data. In Pandas in Action, a friendly and example-rich introduction, author Boris Paskhaver shows you how to master this versatile tool and take the next steps in your data science career.

About the Technology

Anyone who’s used spreadsheet software will find pandas familiar. While its column-based grids might remind you of Excel or Google Sheets, pandas is more flexible and far more powerful. It can efficiently perform operations on millions of rows and be used in tandem with other Python libraries for statistics, machine learning, and more. And best of all, using pandas doesn’t mean sacrificing user productivity or needing to write tons of complex code. It’s clean, intuitive, and fast.

About the book

Pandas in Action makes it easy to dive into Python-based data analysis. You’ll learn to use pandas to automate repetitive spreadsheet functionality and derive insight from data by sorting columns, filtering data subsets, and creating multi-leveled indices. Each chapter is a self-contained tutorial, letting you dip in when you need to troubleshoot tricky problems. Best of all, you won’t be learning from sterile or randomly created data. You’ll start with a variety of datasets that are big, small, incomplete, broken, and messy and learn how to clean and format them for proper analysis.
Table of Contents detailed table of contents

Part 1: Getting Started

1 Introducing Pandas

1.1 Data in the 21st Century

1.2 Introducing pandas

1.2.1 Pandas vs Graphical Spreadsheet Applications

1.2.2 Pandas vs Its Competitors

1.3 Importing a Dataset

1.4 Manipulating a DataFrame

1.5 Counting Values in a Series

1.6 Filtering a Column by One or More Criteria

1.7 Grouping Data

1.8 Summary

Part 2: The Series

2 The Series Object

2.1 Overview of a Series

2.1.1 Modules, Classes, and Instances

2.1.2 Populating the Series with Values

2.1.3 Customizing the Index

2.1.4 Creating a Series with Missing Values

2.2 Create a Series from Python Objects

2.2.1 Dictionaries

2.2.2 Tuples

2.2.3 Sets

2.2.4 NumPy Arrays

2.3 Retrieving the First and Last Rows

2.4 Mathematical Operations

2.4.1 Arithmetic Operations

2.4.2 Broadcasting

2.5 Passing the Series to Python’s Built-In Functions

2.6 Coding Challenges / Exercises

2.7 Summary

3 Series Methods

3.1 Importing a Dataset with the read_csv Method

3.2 Sorting a Series

3.2.1 Sorting by Values with the sort_values Method

3.2.2 Sorting by Index with the sort_index Method

3.2.3 Retrieving the Smallest and Largest Values with the nsmallest and nlargest Methods

3.3 Overwriting a Series with the inplace Parameter

3.4 Counting Values with the value_counts Method

3.5 Invoking a Function on Every Series Value with the apply Method

3.6 Coding Challenge: Deriving Insights from a Series

3.6.1 Problem

3.6.2 Solution

3.7 Summary

Part 3: The DataFrame

4 The DataFrame Object

4.1 Overview of a DataFrame

4.1.1 Creating A DataFrame from a Dictionary

4.1.2 Creating A DataFrame from a Numpy ndarray

4.2 Similarities between Series and DataFrames

4.2.1 Importing a CSV File with the read_csv Method

4.2.2 Shared and Exclusive Attributes between Series and DataFrames

4.2.3 Shared Methods between Series and DataFrames

4.3 Sorting a DataFrame

4.3.1 Sort by Single Column

4.3.2 Sort by Multiple Columns

4.4 Sort by Index

4.4.1 Sort by Row Index

4.4.2 Sort by Column Index

4.5 Setting a New Index

4.6 Selecting Columns or Rows from a DataFrame

4.6.1 Select a Single Column from a DataFrame

4.6.2 Select Multiple Columns from a DataFrame

4.7 Select Rows from a DataFrame

4.7.1 Extract Rows by Index Label

4.7.2 Extract Rows by Index Position

4.7.3 Extract Values from Specific Columns

4.8 Extract Value from Series

4.9 Rename Column or Row

4.10 Resetting an Index

4.11 Coding Challenge

4.12 Summary

5 Filtering a DataFrame

5.1 Optimizing A Dataset for Memory Usage

5.1.1 Converting Data Types with the as_type Method

5.2 Filtering by a Single Condition

5.3 Filtering by Multiple Conditions

5.3.1 The AND Condition

5.3.2 The OR Condition

5.3.3 Inversion with ~

5.3.4 Methods for Booleans

5.4 Filtering by Condition

5.4.1 The isin Method

5.4.2 The between Method

5.4.3 The isnull and notnull Methods

5.4.4 Dealing with Null Values

5.5 Dealing with Duplicates

5.5.1 The duplicated Method

5.5.2 The drop_duplicates Method

5.6 Coding Challenge

5.6.1 The Problem

5.6.2 Solutions

5.7 Summary

Part 4: Working with Text Data

6 Working with Text Data

6.1 String Casing

6.2 String Slicing

6.2.1 String Slicing and Character Replacement

6.3 Boolean Methods

6.4 Splitting Strings

6.5 Coding Challenge

6.6 A Note on Regular Expressions

6.7 Summary

Part 5: Grouping, Aggregating and Merging Data

7 MultiIndex DataFrames

7.1 The MultiIndex Object

7.2 MultiIndex DataFrames

7.3 Sorting A MultiIndex

7.4 Indexing with a MultiIndex

7.4.1 Extracting One or More Columns

7.4.2 Extracting One or More Rows with loc

7.4.3 Extracting One or More Rows with iloc

7.5 Cross Sections

7.6 Manipulating the Index

7.6.1 Resetting the Index

7.6.2 Setting the Index

7.7 Summary

8 Reshaping and Pivoting

8.1 Wide vs. Narrow Data

8.2 Creating a Pivot Table from a DataFrame

8.2.1 The pivot_table Method

8.2.2 Additional Options for Pivot Tables

8.3 Stacking and Unstacking Index Levels

8.4 Melting a Dataset

8.4.1 Melting a Dataset

8.5 Exploding a List of Values

8.6 Coding Challenge

8.7 Summary

9 The GroupBy Object

9.1 Creating a GroupBy Object from Scratch

9.2 Creating a GroupBy Object from Dataset

9.3 Attributes and Methods on a GroupBy Object

9.4 Aggregate Operations

9.5 Applying an Operation to all Groups

9.6 Grouping by Multiple Columns

9.7 Coding Challenge

9.8 Summary

10 Merging, Joining and Concatenating

10.1 Introducing the Datasets

10.2 Concatenating the Datasets

10.3 Inner Joins

10.4 Outer Joins

10.5 Left and Right Joins

10.6 The left_on and right_on Parameters

10.7 Merging by Indexes

10.8 Coding Challenge

10.9 Summary

Part 6: Working with Dates and Times

11 Working with Dates and Times

11.1 Introducing the Timestamp Object

11.1.1 How Python works with datetimes

11.1.2 How pandas works with datetimes

11.2 Storing Multiple Timestamps in a DatetimeIndex

11.3 Converting a Column or Index to Store Datetimes

11.4 Using the DatetimeProperties Object

11.5 Adding and Subtracting Durations of Time

11.6 Date Offsets

11.7 The timedelta Object

11.8 Coding Challenge

11.8.1 Questions

11.8.2 Answers

11.9 Summary

Part 7: Input and Output

12 Imports and Exports

13 Configuring Pandas

Part 8: Visualization

14 Visualization


Appendix A: Installation and Setup

Appendix B: Python Crash Course

B.1 Simple Data Types

B.1.1 Numbers

B.1.2 Strings

B.1.3 Booleans

B.2 Operators

B.2.1 Mathematical Operators

B.2.2 Equality and Inequality Operators

B.3 Variables

B.4 Functions

B.4.1 Arguments and Return Values

B.4.2 Custom Functions

B.5 Objects and Methods

B.5.1 Attributes

B.5.2 Methods

B.5.3 Additional String Methods

B.6 Lists

B.6.1 List Iteration

B.6.2 List Comprehension

B.6.3 Converting a String to a List and Vice Versa

B.7 Tuples

B.8 Dictionaries

B.8.1 Dictionary Iteration

B.9 Sets

B.9.1 Set Operations

B.10 Modules, Classes, and Datetimes

B.11 Summary

Appendix C: NumPy Crash Course

C.1 Dimensions

C.2 The ndarray Object

C.2.1 Generating a Numeric Range with the arrange Method

C.2.2 Attributes on a ndarray Object

C.2.3 The reshape Method

C.2.4 The randint Function

C.2.5 The randn Function

C.3 The nan Object

C.4 Summary

What's inside

  • Import a CSV, identify issues with its data structures, and convert it to the proper format
  • Sort, filter, pivot, and draw conclusions from a dataset and its subsets
  • Identify trends from text-based and time-based data
  • Organize, group, merge, and join separate datasets
  • Real-world datasets that are easy to download and explore

About the reader

For readers experienced with spreadsheet software who know the basics of Python.

About the author

Boris Paskhaver is a software engineer, Agile consultant, and educator. His six programming courses on Udemy have amassed 236,000 students, with an average course rating of 4.59 out of 5. He first used Python and the pandas library to derive a variety of business insights at the world’s #1 jobs site,

placing your order...

Don't refresh or navigate away from the page.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
print book $29.99 $59.99 pBook + eBook + liveBook
Additional shipping charges may apply
Pandas in Action (print book) added to cart
continue shopping
go to cart

eBook $47.99 3 formats + liveBook
Pandas in Action (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.
customers also reading

This book

FREE domestic shipping on three or more pBooks