Data Science Bookcamp

Discovering Disease Outbreaks from News Headlines you own this product

This project is part of the liveProject series Data Science Bookcamp Projects
prerequisites
intermediate Python • beginner scikit-learn • basics of pandas • basics of data science
skills learned
text extraction with pandas • clustering with K-means and DBSCAN • visualize clusters
Leonard Apeltsin and Will Koehrsen
4 weeks · 5-10 hours per week · INTERMEDIATE

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Look inside
In this liveProject, you’ll take on the role of a data scientist at the World Health Organization (WHO). The WHO is responsible for responding to international epidemics, a critical component of which involves monitoring global news headlines for signs of disease outbreaks. However, this daily deluge of news data is too huge to manually analyze. Your challenge is to pull geographic information from headlines, and determine where in the world outbreaks are occurring. Problems you will have to solve include extracting information from text using regular expressions, using the Basemap Matplotlib extension to visualize map locations for patterns indicating an epidemic, and reporting your findings to your superiors so resources can be dispatched.
This project is designed for learning purposes and is not a complete, production-ready application or solution.

book resources

When you start your liveProject, you get full access to the following books for 90 days.

project authors

Leonard Apeltsin
Leonard Apeltsin is a co-founder of Primer AI, a startup that develops advanced technology to analyze terabytes of unstructured text data. Leonard helped expand the Primer AI team from four employees to over 80. His PhD research on bioinformatics required analyzing millions of sequenced DNA patterns to uncover genetic links in deadly diseases. It was this research that led him to realize that his skills were transferable to other areas of analysis; and Leonard's data science consultancy was born. Leonard is currently a research fellow at the Berkeley Institute for Data Science.
William Koehrsen
Will Koehrsen is lead data scientist at Cortex Building Intelligence, a startup helping engineers improve energy efficiency in office buildings using analytics and machine learning. He has built numerous machine learning pipelines to optimize building operations, including algorithms to find the best time for engineers to start and stop their buildings' air conditioning/heating in some of the largest buildings in Manhattan, including the Empire State Building. Will is passionate about data science and helping others join the field. He writes for Towards Data Science.

prerequisites

The liveProject is for intermediate Python programmers who know the basics of data science. To begin this liveProject, you will need to be familiar with:

TOOLS
  • Basics of pandas
  • Basics of scikit-learn
  • Basics of text extraction
  • Basics of K-means and DBSCAN clustering
  • Basics of Jupyter Notebook

you will learn

In this liveProject, you’ll develop techniques for text extraction, data manipulation, clustering, interpreting algorithm outputs, and learn to produce an actionable report. All these skills are easily transferable to a variety of data science roles in business and other organizations.

  • Extracting city and country name data from text using regular expressions
  • Manipulating data and matching location names to geographic coordinates
  • Clustering geographic coordinates with KMeans and/or DBSCAN
  • Visualizing clusters on a geographic map
  • Analyzing algorithm output and tuning model settings to improve results
  • Sorting between clusters based on size and within clusters based on distance
  • Interpreting algorithm results in the problem domain
  • Summarizing findings of a data science project effectively

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Discovering Disease Outbreaks from News Headlines project for free

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Discovering Disease Outbreaks from News Headlines project for free