In this liveProject, you’ll step into the life of a budding data scientist looking for their first job in the industry. There are thousands of potential roles being advertised online, but only a few that are a good match to your skill set. Your challenge is to use data science tools to automate the process of trawling through job listings, to save time as you optimize your resume, identify the most in-demand skills, and find jobs that are a good fit for you. To do this you’ll use Python to perform Natural Language Processing and text analysis on pre-scraped data from jobs posting websites.
This project is designed for learning purposes and is not a complete, production-ready application or solution.
The liveProject is for intermediate Python programmers who know basic data science techniques. To begin this liveProject, you should be familiar with the following topics:
- Basics of Jupyter Notebook
- Basics of pandas
- Basics of scikit-learn
- Basics of K-means clustering
- Basics of TF-IDF
you will learn
In this liveProject, you’ll learn how to use libraries in the Python data ecosystem to analyze text-based data. You’ll clean data derived from HTML files, use text similarity analysis to find the perfect job for you, and visualize your results using word clouds and display plots.
- Parsing HTML web pages with the BeautifulSoup library
- Storing and processing data with pandas DataFrames
- Converting raw text to numeric features with the scikit-learn library
- Measuring text similarity with a cosine distance function
- Dimensionality reduction with singular value decomposition using scikit-learn
- k-means clustering using scikit-learn
- Creating word clouds with the WordCloud library for text cluster visualization