Four-Project Series

DS Pipeline with Python you own this product

basic Python (Jupyter Notebook, NumPy, Matplotlib, NLTK, and RegEx) • intermediate pandas
skills learned
feature extraction with NumPy • text vectorization with TF-IDF and SVD • feature engineering with pandas • interactive data visualization with Matplotlib • data augmentation for ML with object-oriented programming (OOP) • statistical modeling with SciPy
Earn a Certificate of Completion with this liveProject series
Ruihao Qiu
4 weeks · 3-6 hours per week average · INTERMEDIATE
get all Manning content with a subscription
includes 4 liveProjects
liveProject $41.99 $59.99 self-paced learning

The tasks you’ll tackle in this series of liveProjects are typical of tasks a data scientist/engineer would encounter in an online recruiting tech company, a large organization’s HR department, or similar environments. You’ll develop a data pipeline for processing, extracting, and transforming various types of data to be consumed by different types of users, including machine learning engineers, data analysts, and product developers. You’ll build data processing tools with NumPy, use pandas for feature extraction and engineering, use Matplotlib to explore, visualize, and analyze processed data, and build data augmentation tools to enhance the ML modeling. By the end, you’ll have already finished 80% of the work of a typical data science project. You’ll have acquired skills, experience, and confidence that will take you closer to a career in data science.

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.
How to get your FREE
Certificate of Completion
  • Finish all the projects in this liveProject series
  • Take a short online test
  • Answer questions from the liveProject mentor
That's it!

The techniques in the building of the data processing tool were very useful. I can use that in my work projects.

Hung Le, data-engineer, Eco-Energy

here's what's included

Project 1 Data Processing Tools with NumPy

As a data engineer in an online recruiting tech company or a large organization’s HR department, you’ll build a series of practical tools to process and extract useful information from unstructured text data using NumPy. You’ll learn important methods (including trie data structure, TF-IDF, SVD), how to implement them, and their applications in the real world. When you’re finished, you’ll have the know-how to build data processing tools that meet the needs of machine learning engineers, data analysts, and product developers.

Project 2 Pandas for Feature Extraction

Master the basic methods for handling most real-world scenarios as you play the role of a data scientist in an online recruiting tech company or a large organization’s HR department. Using pandas, you’ll process, extract, and transform numerical, categorical, time series, and text data into structured features that are ready for data analysis and ML model training. When you’re done, you’ll have hands-on experience working with most data types you’ll find in the real world, as well as useful skills for extracting and engineering features.

Project 3 Data Visualization for Exploratory Analysis

Visualize this: you’re a data analyst in an online recruiting tech company or a large organization’s HR department. You’ll use Matplotlib to explore, visualize, and analyze processed data to identify missing data and outliers. You’ll build interactive plots for superior data presentation, analyze the correlation of different features using visualization methods, and create analytics dashboards for two types of users. By the end, you’ll be a better data analyst and have the skills to build storytelling tools that let you answer important business questions.

Project 4 Data Augmentation for ML

As a machine learning engineer in an online recruiting tech company or HR department of a large organization, your task is to address a lack of data, a common problem in data science projects. To solve this, you’ll create multiple tools to augment processed data, increasing its volume and learning essentials about probability distributions, random sampling, and OOP. Completing this project will enhance your data analysis and visualization skills, taking you further down the path to a career in data science.

book resources

When you start each of the projects in this series, you'll get full access to the following book for 90 days.

The free project does not include full access to these Manning book. Purchase the full series to unlock this access in the free project, too!
How to get your FREE
Certificate of Completion
  • Finish all the projects in this liveProject series
  • Take a short online test
  • Answer questions from the liveProject mentor
That's it!

choose your plan


only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free eBook every time you renew
  • choose twelve free eBooks per year
  • exclusive 50% discount on all purchases
  • DS Pipeline with Python eBook for free

I believe people will want to purchase this project because it is an interesting topic at a good price.

Casey Childers, software engineering manager, Mindbody

project author

Ruihao Qiu

Ruihao Qiu is the senior data scientist at a German tech company and has more than five years experience in data science and machine learning. As part of the process of earning his PhD in statistical physics, he developed statistical models to simulate and search for new nanomaterials. In his early days as a data science consultant, he helped his clients from DAX30 multinational companies solve real-world data challenges. As a senior data scientist, he designed and built data pipeline and ML recommender systems for online recruitment applications. He enjoys taking on different career roles and sharing ideas about his data science work in tech blogs and in public presentations.


These liveProjects are for Python beginners who are passionate about data and who would like to advance their careers as data analysts, data engineers, or data scientists. To begin these liveProjects you’ll need to be familiar with the following:

  • Basic Python
  • Basic Jupyter Notebook
  • Basic NumPy
  • Intermediate pandas
  • Basic Matplotlib
  • Basic NLTK
  • Basic RegEx
  • Basic matrix operations
  • Basics of trie data structure
  • Basics of TF-IDF, SVD
  • Basics of tokenization and text cleaning
  • Basics of plot types
  • Basic statistics

you will learn

In this liveProject series, you’ll learn to build data processing, data augmentation, feature extraction and engineering tools, and create interactive data analytics dashboards for storytelling.

  • Use built-in Python modules: string, RegEx (regular expression)
  • Use NumPy for different matrix operations
  • Use SciPy to compute cosine similarity
  • Use stats modules for probability distribution fitting
  • Use pandas for dataframe operations
  • Matplotlib plot type
  • ipywidgets for interactive widgets


You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
Certificate of Completion
Earn a certificate of completion, including a badge to display on your resume, LinkedIn page, and other social media, after you complete this series.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.