DS Pipeline

Data Processing Tools with NumPy you own this product

This free project is part of the liveProject series DS Pipeline with Python
basic Python (Jupyter Notebook, NumPy, RegEx) • basic matrix operations • basic knowledge of trie data structure • basic knowledge of TF-IDF and SVD
skills learned
extract features with NumPy • vectorize text with TF-IDF and SVD • compute similar items with embedded vectors
Ruihao Qiu
1 week · 4-6 hours per week · INTERMEDIATE

placing your order...

Don't refresh or navigate away from the page.
This free project is part of the liveProject series DS Pipeline with Python explore series
Check your email for instructions on accessing Data Processing Tools with NumPy (liveProject)
continue shopping
adding to cart

choose your plan


only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free eBook every time you renew
  • choose twelve free eBooks per year
  • exclusive 50% discount on all purchases
  • Data Processing Tools with NumPy eBook for free
Look inside

As a data engineer in an online recruiting tech company or a large organization’s HR department, you’ll build a series of practical tools to process and extract useful information from unstructured text data using NumPy. You’ll learn important methods (including trie data structure, TF-IDF, SVD), how to implement them, and their applications in the real world. When you’re finished, you’ll have the know-how to build data processing tools that meet the needs of machine learning engineers, data analysts, and product developers.

project author

Ruihao Qiu

Ruihao Qiu is the senior data scientist at a German tech company and has more than five years experience in data science and machine learning. As part of the process of earning his PhD in statistical physics, he developed statistical models to simulate and search for new nanomaterials. In his early days as a data science consultant, he helped his clients from DAX30 multinational companies solve real-world data challenges. As a senior data scientist, he designed and built data pipeline and ML recommender systems for online recruitment applications. He enjoys taking on different career roles and sharing ideas about his data science work in tech blogs and in public presentations.


This liveProject is for Python beginners who are interested in building data processing tools using NumPy. To begin these liveProjects you’ll need to be familiar with the following:

  • Basic Python
  • Basic Jupyter Notebook/JupyterLab
  • Basic NumPy and pandas
  • Basic matrix operations
  • Basic knowledge of tree data structure
  • Basic concept of TF-IDF, SVD (what they’re named for and used for)
  • Basic understanding of tokenization and cleaning of text data

you will learn

In this liveProject, you’ll learn commonly used methods for machine learning data preprocessing, various use cases for NumPy and sklearn, and how to build custom data processing tools.

  • Use built-in Python modules
  • Use NumPy for various matrix operations
  • Use SciPy to calculate cosine similarity
  • Use sklearn
  • Trie data structure
  • Use TF-IDF
  • Correlation matrix


You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.