Math for Machine Learning

Latent Semantic Analysis for NLP you own this product

This project is part of the liveProject series Math for Machine Learning
prerequisites
intermediate Python (particularly NumPy, Matplotlib, and/or seaborn) • vectors and spaces from linear algebra
skills learned
clean data with regular expressions • mathematical concepts and how and when to apply latent semantic analysis and cosine similarity
Nicole Königstein
1 week · 8-10 hours per week · INTERMEDIATE

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Look inside

At Finative, an ESG analytics company, you’re a data scientist who helps measure the sustainability of publicly traded companies by analyzing environmental, social, and governance (ESG) factors so Finative can report back to its clients. Recently, the CEO has decided that Finative should increase its own sustainability. You’ve been assigned the task of saving digital storage space by storing only relevant data. You’ll test different methods—including keyword retrieval with TD-IDF, computing cosine similarity, and latent semantic analysis—to find relevant keywords in documents and determine whether the documents should be discarded or saved for use in training your ML models.

This project is designed for learning purposes and is not a complete, production-ready application or solution.

book resources

When you start your liveProject, you get full access to the following books for 90 days.

project author

Nicole Konigstein

Nicole Königstein currently works as data science and technology lead at impactvise, an ESG analytics company, and as a quantitative researcher and technology lead at Quantmate, an innovative FinTech startup that leverages alternative data as part of its predictive modeling strategy. She’s a regular speaker, sharing her expertise at conferences such as ODSC Europe. In addition, she teaches Python, machine learning, and deep learning, and holds workshops at conferences including the Women in Tech Global Conference.

prerequisites

This liveProject is for ML engineers, intermediate-level Python programmers, and early-stage data scientists who are familiar with the basics of linear algebra. To begin these liveProjects you’ll need to be familiar with the following:

TOOLS
  • Intermediate Python (declaring variables, loops, branches, working with arrays)
  • How to use Jupyter Notebook
  • Understanding of systems of linear equations, vector spaces, and matrix transformations
  • Basic familiarity with NumPy (indexing arrays, array creation, and manipulation)
  • Basic understanding of regular expressions to manipulate a string
TECHNIQUES
  • Basic linear algebra
  • Basic data science

you will learn

In this liveProject, you’ll learn how to preprocess text data using NLP tools, including regular expressions, tokenization, and stop-word removal.

  • Mathematical insights into singular value decomposition (SVD) and why it is such a powerful and useful algorithm
  • Basic mathematics of cosine similarity and when to apply it
  • How to tokenize, clean, and prepare text data
  • The algorithm and mathematical principles of Term Frequency - Inverse Document Frequency (TF-IDF)
  • The mathematical concepts and application of latent semantic analysis (LSA), singular value decomposition (SVD), how it differs from cosine similarity, and when to apply it

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Latent Semantic Analysis for NLP project for free