In this series of liveProjects, you’ll build a custom search engine that’s capable of quickly and accurately sourcing documents from the CDC’s document database. Your search engine will improve the CDC’s ability to handle future pandemics, with the capability to aggregate and search unstructured text data from records of earlier outbreaks. Each liveProject in this series tackles a different aspect of searching with natural language processing so you can pick and choose the specific skills you need.
This liveProject is for intermediate Python programmers familiar with the basics of manipulations with strings, lists and dictionaries. To begin this liveProject, you will need to be familiar with the following:
In this liveProject you will learn to implement the simple-but-effective term frequency - inverse document frequency (TF-IDF) search method. This method will encompass calculating the frequency of certain words in documents.