In this liveProject, you’ll take on the role of a data scientist at the World Health Organization (WHO). The WHO is responsible for responding to international epidemics, a critical component of which involves monitoring global news headlines for signs of disease outbreaks. However, this daily deluge of news data is too huge to manually analyze. Your challenge is to pull geographic information from headlines, and determine where in the world outbreaks are occurring. Problems you will have to solve include extracting information from text using regular expressions, using the Basemap Matplotlib extension to visualize map locations for patterns indicating an epidemic, and reporting your findings to your superiors so resources can be dispatched.
This project is designed for learning purposes and is not a complete, production-ready application or solution.
The liveProject is for intermediate Python programmers who know the basics of data science. To begin this liveProject, you will need to be familiar with:
- Basics of pandas
- Basics of scikit-learn
- Basics of text extraction
- Basics of K-means and DBSCAN clustering
- Basics of Jupyter Notebook
you will learn
In this liveProject, you’ll develop techniques for text extraction, data manipulation, clustering, interpreting algorithm outputs, and learn to produce an actionable report. All these skills are easily transferable to a variety of data science roles in business and other organizations.
- Extracting city and country name data from text using regular expressions
- Manipulating data and matching location names to geographic coordinates
- Clustering geographic coordinates with KMeans and/or DBSCAN
- Visualizing clusters on a geographic map
- Analyzing algorithm output and tuning model settings to improve results
- Sorting between clusters based on size and within clusters based on
- Interpreting algorithm results in the problem domain
- Summarizing findings of a data science project effectively