Four-Project Series

DS Pipeline with Python you own this product

prerequisites: basic Python (Jupyter Notebook, NumPy, Matplotlib, NLTK, and RegEx) • intermediate pandas
skills learned: feature extraction with NumPy • text vectorization with TF-IDF and SVD • feature engineering with pandas • interactive data visualization with Matplotlib • data augmentation for ML with object-oriented programming (OOP) • statistical modeling with SciPy

Ruihao Qiu

4 weeks · 3-6 hours per week average · INTERMEDIATE

Included with a Manning Online subscription

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

whole series

$69.99 $48.99

you save $21.00 (30%)

The tasks you’ll tackle in this series of liveProjects are typical of tasks a data scientist/engineer would encounter in an online recruiting tech company, a large organization’s HR department, or similar environments. You’ll develop a data pipeline for processing, extracting, and transforming various types of data to be consumed by different types of users, including machine learning engineers, data analysts, and product developers. You’ll build data processing tools with NumPy, use pandas for feature extraction and engineering, use Matplotlib to explore, visualize, and analyze processed data, and build data augmentation tools to enhance the ML modeling. By the end, you’ll have already finished 80% of the work of a typical data science project. You’ll have acquired skills, experience, and confidence that will take you closer to a career in data science.

go to series

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

The techniques in the building of the data processing tool were very useful. I can use that in my work projects.

Hung Le, data-engineer, Eco-Energy

here's what's included

Project 1 Data Processing Tools with NumPy

As a data engineer in an online recruiting tech company or a large organization’s HR department, you’ll build a series of practical tools to process and extract useful information from unstructured text data using NumPy. You’ll learn important methods (including trie data structure, TF-IDF, SVD), how to implement them, and their applications in the real world. When you’re finished, you’ll have the know-how to build data processing tools that meet the needs of machine learning engineers, data analysts, and product developers.

learn more

$29.99 $19.99

add to cart

Project 2 Pandas for Feature Extraction

Master the basic methods for handling most real-world scenarios as you play the role of a data scientist in an online recruiting tech company or a large organization’s HR department. Using pandas, you’ll process, extract, and transform numerical, categorical, time series, and text data into structured features that are ready for data analysis and ML model training. When you’re done, you’ll have hands-on experience working with most data types you’ll find in the real world, as well as useful skills for extracting and engineering features.

learn more

$29.99 $19.99

add to cart

Project 3 Data Visualization for Exploratory Analysis

Visualize this: you’re a data analyst in an online recruiting tech company or a large organization’s HR department. You’ll use Matplotlib to explore, visualize, and analyze processed data to identify missing data and outliers. You’ll build interactive plots for superior data presentation, analyze the correlation of different features using visualization methods, and create analytics dashboards for two types of users. By the end, you’ll be a better data analyst and have the skills to build storytelling tools that let you answer important business questions.

learn more

$29.99 $19.99

add to cart

Project 4 Data Augmentation for ML

As a machine learning engineer in an online recruiting tech company or HR department of a large organization, your task is to address a lack of data, a common problem in data science projects. To solve this, you’ll create multiple tools to augment processed data, increasing its volume and learning essentials about probability distributions, random sampling, and OOP. Completing this project will enhance your data analysis and visualization skills, taking you further down the path to a career in data science.

learn more

$29.99 $19.99

add to cart

book resources

When you start each of the projects in this series, you'll get full access to the following book for 90 days.

go to series

whole series

$69.99 $48.99

you save $21.00 (30%)

choose your plan

pro

monthly

annual

$24.99

$249.99
only $20.83 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
DS Pipeline with Python project for free

team

monthly

annual

$49.99

$499.99
only $41.67 per month

five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
DS Pipeline with Python project for free

more seats?

I believe people will want to purchase this project because it is an interesting topic at a good price.

Casey Childers, software engineering manager, Mindbody

project author

Ruihao Qiu

Ruihao Qiu is the senior data scientist at a German tech company and has more than five years experience in data science and machine learning. As part of the process of earning his PhD in statistical physics, he developed statistical models to simulate and search for new nanomaterials. In his early days as a data science consultant, he helped his clients from DAX30 multinational companies solve real-world data challenges. As a senior data scientist, he designed and built data pipeline and ML recommender systems for online recruitment applications. He enjoys taking on different career roles and sharing ideas about his data science work in tech blogs and in public presentations.

Prerequisites

These liveProjects are for Python beginners who are passionate about data and who would like to advance their careers as data analysts, data engineers, or data scientists. To begin these liveProjects you’ll need to be familiar with the following:

TOOLS

Basic Python
Basic Jupyter Notebook
Basic NumPy
Intermediate pandas
Basic Matplotlib
Basic NLTK
Basic RegEx

TECHNIQUES

Basic matrix operations
Basics of trie data structure
Basics of TF-IDF, SVD
Basics of tokenization and text cleaning
Basics of plot types
Basic statistics

you will learn

In this liveProject series, you’ll learn to build data processing, data augmentation, feature extraction and engineering tools, and create interactive data analytics dashboards for storytelling.

Use built-in Python modules: string, RegEx (regular expression)
Use NumPy for different matrix operations
Use SciPy to compute cosine similarity
Use stats modules for probability distribution fitting
Use pandas for dataframe operations
Matplotlib plot type
ipywidgets for interactive widgets

features

Self-paced: You choose the schedule and decide how much time to invest as you build your project.
Project roadmap: Each project is divided into several achievable steps.
Get Help: While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others: For each step, compare your deliverable to the solutions by the author and other participants.
book resources: Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.