High Performance Python for Data Analytics
Tiago Rodrigues Antao
  • MEAP began September 2020
  • Publication in Summer 2021 (estimated)
  • ISBN 9781617297939
  • 375 pages (estimated)
  • printed in black & white

If you want to go beyond scripting in Python, you need this book.

Brian S. Cole
Master these effective techniques to reduce costs and run times, handle huge datasets, and implement complex machine learning applications efficiently in Python.

High Performance Python for Data Analytics is your guide to optimizing every part of your Python-based data analysis process, from the pure Python code you write to managing the resources of modern hardware and GPUs. You'll learn to rewrite inefficient data structures, improve underperforming code with multithreading, and simplify your datasets without sacrificing accuracy.

About the Technology

Fast, accurate systems are vital for handling the huge datasets and complex analytical algorithms that are common in modern data science. Python programmers need to boost performance by writing faster pure-Python programs, optimizing the use of libraries, and utilizing modern multi-processor hardware; High Performance Python for Data Analytics shows you how.

About the book

High Performance Python for Data Analytics is a hands-on guide to writing Python code that can process more data, faster, and with less resources. It takes a holistic approach to Python performance, showing you how your code, libraries, and computing architecture interact and can be optimized together.

Written for experienced practitioners, this book dives right into practical solutions for improving computation and storage efficiency. You'll experiment with fun and interesting examples such as rewriting games in lower-level Cython and implementing a MapReduce framework from scratch. Finally, you'll go deep into Python GPU computing and learn how modern hardware has rehabilitated some former antipatterns and made counterintuitive ideas the most efficient way of working.
Table of Contents detailed table of contents

Part 1: First Steps

1 The need for efficient computing and data storage

1.1 The overwhelming need for efficient computing in Python

1.2 The impact of modern computing architectures on high performance computing

1.2.1 Changes inside the computer

1.2.2 Changes in the network

1.2.3 The cloud

1.3 Working with Python’s limitations

1.3.1 The Global Interpreter Lock (GIL)

1.4 What will you learn from this book

1.5 The reader for this book

1.6 Summary

Part 2: Efficient computation

2 Extracting maximum performance from built-in features

2.1 Introducing the project dataset

2.1.1 An architecture for big data processing

2.1.2 Preparing the data

2.2 Profiling code to detect performance bottlenecks

2.2.1 Using Python’s built-in profiling module

2.2.2 Visualizing profiling information

2.2.3 Line profiling

2.3 Optimizing basic data structures for speed: lists, sets, dictionaries

2.3.1 Performance of list searches

2.3.2 Using the bisect module

2.3.3 Content aware search approaches

2.3.4 Searching using sets or dictionaries

2.3.5 List complexity in Python

2.4 Finding excessive memory allocation

2.4.1 Navigating the minefield of Python memory estimation

2.4.2 Using more compact representations

2.4.3 Packing many observations in a number

2.4.4 Use the array module

2.4.5 Systematizing what we have learned: Estimating the memory usage of Python objects

2.5 Using laziness and generators for big-data pipelining

2.5.1 Using generators instead of standard functions

2.5.2 Enabling code pipelining with generators

2.6 Summary

3 Concurrency in Python

4 Using NumPy more efficiently

5 Re-implementing critical parts with Cython

6 Efficient Pandas with Apache Arrow

Part 3: Efficient storage

7 Understanding the impact of CPU and storage hierarchy in Python programs

8 Understanding file system limitations and advantages

9 Efficient persistent storage of large amounts of data with Python

Part 4: Advanced topics

10 GPU computing with Python

11 Sub-sampling and simplifying data to improve performance


Appendix A: Setting up the environment

A.1 Code style and organization

What's inside

  • Writing efficient pure-Python code
  • Optimizing the NumPy and pandas libraries
  • Rewriting critical code in Cython
  • Designing persistent data structures
  • Tailoring code for different architectures
  • Implementing Python GPU computing

About the reader

For intermediate Python programmers familiar with the basics of concurrency.

About the author

Tiago Antao works in the field of genetics, analyzing very large datasets and implementing complex algorithms to process the data. He leverages Python with all its libraries to do scientific computing and data engineering tasks. He is one of the co-authors of Biopython, a major bioinformatics package written in Python. He holds a BE in informatics and a PhD in bioinformatics.

placing your order...

Don't refresh or navigate away from the page.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
print book $29.99 $59.99 pBook + eBook + liveBook
Additional shipping charges may apply
High Performance Python for Data Analytics (print book) added to cart
continue shopping
go to cart

eBook $47.99 3 formats + liveBook
High Performance Python for Data Analytics (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.
customers also reading

This book

FREE domestic shipping on three or more pBooks