In this liveProject, you’ll step into the life of a budding data scientist looking for their first job in the industry. There are thousands of potential roles being advertised online, but only a few that are a good match to your skill set. Your challenge is to use data science tools to automate the process of trawling through job listings, to save time as you optimize your resume, identify the most in-demand skills, and find jobs that are a good fit for you. To do this you’ll use Python to perform Natural Language Processing and text analysis on pre-scraped data from jobs posting websites.
Nate George started his career studying LEDs for his PhD and working on solar cell manufacturing. He then leveraged his programming and mathematics experience to move to data science. Nate has been teaching and developing several data science and math courses at Regis University since 2017, mentors students at Udacity, and has developed a Python machine learning course at DataCamp. Nate?s expertise includes data engineering (database technologies such as MongoDB and PostgreSQL and cloud technologies such as GCP and AWS), data science (Python, R, statistics) and machine learning.
The liveProject is for intermediate Python programmers who know basic data science techniques. To begin this liveProject, you should be familiar with the following topics:
Basics of Jupyter notebooks
Basics of pandas
Basics of scikit-learn
Basics of K-means clustering
Basics of TF-IDF
you will learn
In this liveProject, you’ll learn how to use libraries in the Python data ecosystem to analyze text-based data. You’ll clean data derived from HTML files, use text similarity analysis to find the perfect job for you, and visualize your results using word clouds and display plots.
Parsing HTML web pages with the BeautifulSoup library
Storing and processing data with pandas DataFrames
Converting raw text to numeric features with the scikit-learn library
Measuring text similarity with a cosine distance function
Dimensionality reduction with singular value decomposition using scikit-learn
k-means clustering using scikit-learn
Creating word clouds with the WordCloud library for text cluster visualization
You choose the schedule and decide how much time to invest as you build your project.
Each project is divided into several achievable steps.
Chat with other participants within the liveProject platform.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
Book and video resources
Excerpts from Manning books and videos are included, as well as references to other resources.
1. Extracting Text from Online Job Postings
1.1. Extracting Raw Text from Job Posting HTML Web Pages
Python Tools for Data Exploration
Using the Filesystem
Extracting Text from Web Pages
1.2. Submit Your Work
2. Ranking Job Postings by Similarity
2.1. Find the Most Similar Job Postings to Our Resume
Computing TF-IDF Vectors with Scikit-Learn
Computing Similarities Across Large Document Datasets
2.2. Submit Your Work
3. Clustering Job Posting Skill Requirements
3.1. Finding Clusters of Job Posting Skills
Efficient Dimension Reduction Using SVD and Scikit-Learn
K-means Clustering Using Scikit-Learn
Clustering Texts by Topic
Visualizing Text Clusters
3.2. Submit Your Work
4. Finding Missing Skills From Our Resume
4.1. Determine Missing Resume Skills
4.2. Submit Your Work
LiveProject Session Notification
notify me when registration opens for Decoding Data Science Job Postings to Improve Your Resume