The company you work for, which offers lifestyle products, has enjoyed success for several years, but management has decided it’s not competitive enough. Your task is to identify current market trends and new niche markets for the company’s lifestyle products. Using Python and Ray, you’ll build a web scraper that will load and save multiple web pages concurrently. To preprocess the data, you’ll read each of your locally stored pages, split them into sentences, tokenize each sentence with Hugging Face tokenizers, and store your tokenized documents in a new (pickled) format in your file system. When you’re finished, you’ll have leveraged Ray to preprocess a large amount of data while bypassing Python’s notorious concurrency limitations. The data science team will thank you for helping minimize the data preparation time, and the management team will thank you for helping the company sharpen its competitive edge.
This liveProject is for data scientists who want to prepare their ML models for deployment to production, as well as software engineers who need to overcome the challenges of ML applications. To begin these liveProjects you’ll need to be familiar with the following:TOOLS
In this liveProject, you’ll learn how to use Ray to build a web scraper and how to prepare your scraped data for training an ML model.
geekle is based on a wordle clone.