Exploring Data with Python
With chapters selected by Naomi Ceder
  • June 2018
  • ISBN 9781617296048
  • 110 pages
Exploring Data with Python
With chapters selected by Naomi Ceder

Python has become a required skill for data science, and it’s easy to see why. It’s powerful, easy to learn, and includes the libraries like Pandas, Numpy, and Scikit that help you slice, scrub, munge, and wrangle your data. Even with a great language and fantastic tools, thought, there’s plenty to learn!

Exploring Data with Python is a collection of chapters from three Manning books, hand-picked by Naomi Ceder, the chair of the Python Software Foundation. This free eBook starts building your foundation in data science processes with practical Python tips and techniques for working and aspiring data scientists. In it, you’ll get a clear introduction to the data science process. Then, you’ll practice using Python for processing, cleaning, and exploring interesting datasets. Finally, you’ll get a practical demonstration of modelling and prediction with classification and regression. When you finish, you’ll have a good overview of Python in data science and a well-lit path to continue your learning.

Table of Contents detailed table of contents

Introduction

Part 1: The data science process

2 The data science process

2.1 Overview of the data science process

2.1.1 Don’t be a slave to the process

2.2 Step 1: Defining research goals and creating a project charter

2.2.1 Spend time understanding the goals and context of your research

2.2.2 Create a project charter

2.3 Step 2: Retrieving data

2.3.1 Start with data stored within the company

2.3.2 Don’t be afraid to shop around

2.3.3 Do data quality checks now to prevent problems later

2.4 Step 3: Cleansing, integrating, and transforming data

2.4.1 Cleansing data

2.4.2 Correct errors as early as possible

2.4.3 Combining data from different data sources

2.4.4 Transforming data

2.5 Step 4: Exploratory data analysis

2.6 Step 5: Build the models

2.6.1 Model and variable selection

2.6.2 Model execution

2.6.3 Model diagnostics and model comparison

2.7 Step 6: Presenting findings and building applications on top of them

2.8 Summary

Summary

Part 2: Processing data files

21 Processing data files

21.1 Welcome to ETL

21.2 Reading text files

21.2.1 Text encoding: ASCII, Unicode, and others

21.2.2 Unstructured text

21.2.3 Delimited flat files

21.2.4 The csv module

21.2.5 Reading a csv file as a list of dictionaries

21.3 Excel files

21.4 Data cleaning

21.4.1 Cleaning

21.4.2 Sorting

21.4.3 Data cleaning issues and pitfalls

21.5 Writing data files

21.5.1 CSV and other delimited files

21.5.2 Writing Excel files

21.5.3 Packaging data files

Summary

Part 3: Exploring data

24 Exploring data

24.1 Python tools for data exploration

24.1.1 Python’s advantages for exploring data

24.1.2 Python can be better than a spreadsheet

24.2 Jupyter notebook

24.2.1 Starting a kernel

24.2.2 Executing code in a cell

24.3 Python and pandas

24.3.1 Why you might want to use pandas

24.3.2 Installing pandas

24.3.3 Data frames

24.4 Data cleaning

24.4.1 Loading and saving data with pandas

24.4.2 Data cleaning with a data frame

24.5 Data aggregation and manipulation

24.5.1 Merging data frames

24.5.2 Selecting data

24.5.3 Grouping and aggregation

24.6 Plotting data

24.7 Why you might not want to use pandas

Summary

Part 4: Modeling and prediction

3 Modeling and prediction

3.1 Basic machine-learning modeling

3.1.1 Finding the relationship between input and target

3.1.2 The purpose of finding a good model

3.1.3 Types of modeling methods

3.1.4 Supervised versus unsupervised learning

3.2 Classification: predicting into buckets

3.2.1 Building a classifier and making predictions

3.2.2 Classifying complex, nonlinear data

3.2.3 Classifying with multiple classes

3.3 Regression: predicting numerical values

3.3.1 Building a regressor and making predictions

3.3.2 Performing regression on complex, nonlinear data

3.4 Summary

3.5 Terms from this chapter

index

Symbols

Numerics

What's inside

  • "The data science process" from Introducing Data Science by Davy Cielen, Arno D. B. Meysman, and Mohamed Ali
  • "Processing data files" from The Quick Python Book, Third Edition by Naomi Ceder
  • "Exploring data" from The Quick Python Book, Third Edition by Naomi Ceder
  • "Modeling and prediction" from Real-World Machine Learning by Henrik Brink, Joseph W. Richards, and Mark Fetherolf

About the author

Naomi Ceder has been learning, using, and teaching Python since 2001. She is chair of the Python Software Foundation, the originator of the PyCon and PyCon UK poster sessions, and founder of the Python Education Summit. Naomi is the author of The Quick Python Book, Third Edition (Manning Publications, 2018).

FREE domestic shipping on three or more pBooks