Data Preparation for AI and Analytics

you own this product
From data to insights
  • MEAP began December 2024
  • Last updated May 2025
  • Publication in Fall 2025 (estimated)
  • ISBN 9781633435742
  • 300 pages (estimated)
  • printed in black & white

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Look inside
In the world of ML and AI, data preparation is the key component!

Data Preparation for AI and Analytics is a practical guide to one of the most critical yet underappreciated aspects of the data science and AI pipeline – the preparing of data to be fed into your machine learning and AI models. This book provides end-to-end coverage from initial data exploration through quality assessment, transformation, enrichment, and final preparation for both analytics and ML applications.

“Data is food for AI” – those are the words of Andrew Ng founder of DeepLearning.AI and prominent expert in the field. Making sure that your models have good nourishing “food” is the most critical and often time-intensive part of data science. This book helps you meet that need by balancing theoretical foundations with implementation details, making the book valuable for both newcomers and experienced practitioners.

In Data Preparation for AI and Analytics you’ll:

  • Understand the importance of data quality and why to pursue it
  • Perform exploratory analysis to understand new datasets
  • Clean, transform, and organize your data to drive decision making
  • Deal with missing data and inconsistencies in your data
  • Merge data from different sources into a unified stream
  • Build explainability into your models right from the start
  • Apply generative AI techniques to automate repetitive tasks
  • Use AI to boost data quality and simplify workflows
  • Apply the right data preparation technique for the right outcome

The quality and integrity of your data determine the accuracy, reliability, and usefulness of your AI models. Investing substantial effort in data preparation isn't just beneficial—it's essential. Your investment into data preparation as described in this book leads to more accurate predictions and actionable insights as well as to more confident business decisions.

Data Preparation for AI and Analytics is for data engineers who build data pipelines in support of AI models, machine learning models, and business analytics. It presents data preparation methods with clear language and concrete examples. You’ll explore tried-and-true approaches along with emerging generative AI techniques. You’ll especially appreciate the insights into automation and data governance.

about the book

Data Preparation for AI and Analytics teaches you to tackle the challenges you’ll face as you prepare data to be fed into your AI models and analytic pipelines. The author doesn’t just talk theory, but provides detailed code snippets, real-world scenarios, and discussions about ethical and explainability concerns that other resources sometimes gloss over. You’ll master popular data wrangling tools like Python and Alteryx. You’ll benefit from the author’s skill at presenting complex data preparation concepts as clear, manageable steps and fully illustrated with engaging data sets—including data on the Titanic disaster, rating video games, sentiment analysis of Los Angeles restaurant recommendations, and more. The book is packed with vital advice for complex tasks, including merging multiple data sets, alerting systems for data quality, and scaling data preparation into large cloud-based pipelines.

about the reader

For data engineers of all skill levels who know Python.

about the author

Benoît Cayla is a Senior Solutions Architect at Databricks and understands the practical challenges that data scientists and AI professionals regularly face. He is a data engineer with 25 years of data management experience and an expert in data management and AI. He has deep experience contributing to large-scale projects in manufacturing, insurance, and finance with major players like IBM, Informatica, and Tableau.
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Data Preparation for AI and Analytics ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Data Preparation for AI and Analytics ebook for free