Parallel and High Performance Computing
Robert Robey and Yuliana Zamora
  • MEAP began May 2019
  • Publication in Summer 2020 (estimated)
  • ISBN 9781617296468
  • 425 pages (estimated)
  • printed in black & white

This is an authoritative, comprehensive and detailed introduction to parallel computing.

Domingo Salazar
Complex calculations, like training deep learning models or running large-scale simulations, can take an extremely long time. Efficient parallel programming can save hours—or even days—of computing time. Parallel and High Performance Computing shows you how to deliver faster run-times, greater scalability, and increased energy efficiency to your programs by mastering parallel techniques for multicore processor and GPU hardware.
Table of Contents detailed table of contents

Part 1: Introduction to Parallel Computing

1 Why parallel computing

1.1 Why should you learn about parallel computing?

1.1.1 What are the potential benefits of parallel computing?

1.1.2 Parallel computing cautions

1.2 The fundamental laws of parallel computing

1.2.1 The limit to parallel computing: Amdahl’s Law

1.2.2 Breaking through the parallel limit: Gustafson-Barsis’s Law

1.3 How does parallel computing work?

1.3.1 Walk through a sample application

1.3.2 A hardware model for today’s heterogeneous parallel systems

1.3.3 The application/software model for today’s heterogeneous parallel systems

1.4 Categorizing parallel approaches

1.5 Parallel Strategies

1.6 Parallel speedup vs comparative speedups: two different measures

1.7 What will you learn in this book?

1.7.1 Exercises

1.8 Summary

2 Planning for parallel

2.1 Approaching a new project: the preparation

2.1.1 Version Control: creating a safety vault for your parallel code

2.1.2 Test Suites: the first step to creating a robust, reliable application

2.1.3 Finding and fixing memory issues

2.1.4 Improving code portability

2.2 Profiling step: probing the gap between system capabilities and application performance

2.3 Planning step: a foundation for success

2.3.1 Exploring with benchmarks and mini-apps

2.3.2 Design of the core data structures and code modularity

2.3.3 Algorithms: redesign for parallel

2.4 Implementation step: where it all happens

2.5 Commit Step: wrapping it up with quality

2.6 Further explorations

2.6.1 Additional Reading

2.6.2 Exercises

2.7 Summary

3 Performance limits and profiling

3.1 Know your application’s potential performance limits

3.2 Determine your hardware capabilities: benchmarking

3.2.1 Tools for gathering system characteristics

3.2.2 Calculating theoretical maximum FLOPS

3.2.3 The memory hierarchy and theoretical memory bandwidth

3.2.4 Empirical measurement of bandwidth and flops

3.2.5 Calculating the machine balance between flops and bandwidth

3.3 Characterizing your application: profiling

3.3.1 Profiling Tools

3.3.2 Empirical measurement of processor clock frequency and energy consumption

3.3.3 Tracking memory during runtime

3.4 Further explorations

3.4.1 Additional Reading

3.4.2 Exercises

3.5 Summary

4 Data design and performance models

4.1 Performance data structures: data-oriented design

4.1.1 Multidimensional arrays

4.1.2 Array of Structures (AOS) versus Structures of Arrays (SOA)

4.1.3 Array of Structure of Arrays (AOSOA)

4.2 Three C’s of cache misses: compulsory, capacity, conflict

4.3 Simple performance models: a case study

4.3.1 Full matrix data representations

4.3.2 Compressed sparse storage representations

4.4 Advanced performance models

4.5 Network messages

4.6 Further explorations

4.6.1 Additional reading

4.6.2 Exercises

4.7 Summary

5 Parallel algorithms and patterns

5.1 Algorithm analysis for parallel computing applications

5.2 Parallel algorithms: what are they?

5.3 What is a hash function?

5.4 Spatial hashing: a highly-parallel algorithm

5.4.1 Using perfect hashing for spatial mesh operations

5.4.2 Using compact hashing for spatial mesh operations

5.5 Prefix sum (scan) pattern and its importance in parallel computing

5.5.1 Step-efficient parallel scan operation

5.5.2 Work-efficient parallel scan operation

5.5.3 Parallel scan operations for large arrays

5.6 Parallel global sum: addressing the problem of associativity

5.7 Future of parallel algorithm research

5.8 Further explorations

5.8.1 Additional reading

5.8.2 Exercises

5.9 Summary

Part 2: Introduction to CPU Programming

6 Vectorization

7 OpenMP


Part 3: Introduction to GPU Programming

9 GPU architectures and concepts

10 GPU programming model

11 Directive-based GPU programming

12 GPU languages

13 GPU profiling

Part 4: High Performance Computing Ecosystem

14 Affinity

15 Schedulers

16 Parallel input/output

17 Tools and resources


Appendix A: References

A.1 Chapter 1

A.2 Chapter 2

A.3 Chapter 3

A.4 Chapter 4

A.5 Chapter 5

Appendix B: Solutions to Exercises

B.1 Chapter 1

B.2 Chapter 2

B.3 Chapter 3

B.4 Chapter 4

B.5 Chapter 5

Appendix C: Glossary

About the Technology

Modern computing hardware comes equipped with multicore CPUs and GPUs that can process numerous instruction sets simultaneously. Parallel computing takes advantage of this now-standard computer architecture to execute multiple operations at the same time, offering the potential for applications that run faster, are more energy efficient, and can be scaled to tackle problems that demand large computational capabilities. But to get these benefits, you must change the way you design and write software. Taking advantage of the tools, algorithms, and design patterns created specifically for parallel processing is essential to creating top performing applications.

About the book

Parallel and High Performance Computing is an irreplaceable guide for anyone who needs to maximize application performance and reduce execution time. Parallel computing experts Robert Robey and Yuliana Zamora take a fundamental approach to parallel programming, providing novice practitioners the skills needed to tackle any high-performance computing project with modern CPU and GPU hardware. Get under the hood of parallel computing architecture and learn to evaluate hardware performance, scale up your resources to tackle larger problem sizes, and deliver a level of energy efficiency that makes high performance possible on hand-held devices. When you’re done, you’ll be able to build parallel programs that are reliable, robust, and require minimal code maintenance.

This book is unique in its breadth, with discussions of parallel algorithms, techniques to successfully develop parallel programs, and wide coverage of the most effective languages for the CPU and GPU. The programming paradigms include MPI, OpenMP threading, and vectorization for the CPU. For the GPU, the book covers OpenMP and OpenACC directive-based approaches and the native-based CUDA and OpenCL languages.

What's inside

  • Steps for planning a new parallel project
  • Choosing the right data structures and algorithms
  • Addressing underperforming kernels and loops
  • The differences in CPU and GPU architecture

About the reader

For experienced programmers with proficiency in a high performance computing language such as C, C++, or Fortran.

About the authors

Robert Robey has been active in the field of parallel computing for over 30 years. He works at Los Alamos National Laboratory, and has previously worked at the University of New Mexico, where he started up the Albuquerque High Performance Computing Center. Yuliana Zamora has lectured on efficient programming of modern hardware at national conferences, based on her work developing applications running on tens of thousands of processing cores and the latest GPU architectures.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
MEAP combo $59.99 pBook + eBook + liveBook
MEAP eBook $47.99 pdf + ePub + kindle + liveBook
Prices displayed in rupees will be charged in USD when you check out.

placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks