Five-Project Series

Data Science Bookcamp Projects you own this product

prerequisites
basic Python and pandas • basic visualization with Matplotlib or Seaborn • basic statistics • basics of machine learning
skills learned
simulating real-life game environments in python • using Python fundamentals to set up environments to test hypotheses • using NetworkX to analyze and visualize network datasets
Leonard Apeltsin, William Koehrsen, Nathan George, and Emre Rencberoglu
11 weeks · 4-8 hours per week average · INTERMEDIATE

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Are you ready to work out with the Data Science Bookcamp? This series of liveProjects takes you hands-on with fun and engaging data science challenges from the bestselling book by Leonard Apeltsin. It features Discovering Disease Outbreaks from News Headlines, which he co-created with Will Koehrsen, Nate George’s Decoding Data Science Job Postings to Improve Your Resume, and three original projects by Emre Rencberoglu. Each challenge stretches your data science muscles and teaches you useful new skills through practice, such as using NumPy and SciPy for mathematical operations, clustering with scikit-learn, and analyzing and visualizing network datasets with NetworkX. Tackle them individually or all of them for an intensive workout of your data capabilities!

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

here's what's included

Project 1 Discovering Disease Outbreaks from News Headlines
In this liveProject, you’ll take on the role of a data scientist at the World Health Organization (WHO). The WHO is responsible for responding to international epidemics, a critical component of which involves monitoring global news headlines for signs of disease outbreaks. However, this daily deluge of news data is too huge to manually analyze. Your challenge is to pull geographic information from headlines, and determine where in the world outbreaks are occurring. Problems you will have to solve include extracting information from text using regular expressions, using the Basemap Matplotlib extension to visualize map locations for patterns indicating an epidemic, and reporting your findings to your superiors so resources can be dispatched.
Project 2 Decoding Data Science Job Postings to Improve Your Resume
In this liveProject, you’ll step into the life of a budding data scientist looking for their first job in the industry. There are thousands of potential roles being advertised online, but only a few that are a good match to your skill set. Your challenge is to use data science tools to automate the process of trawling through job listings, to save time as you optimize your resume, identify the most in-demand skills, and find jobs that are a good fit for you. To do this you’ll use Python to perform Natural Language Processing and text analysis on pre-scraped data from jobs posting websites.
Project 3 Win a Card Game

In this liveProject, you’ll stretch your Python data science skills by building a simulator for popular Blackjack variant game 21. You’ll design your simulation step-by-step, then use visualization techniques to interpret your findings. By the end of your project, you’ll have a winning strategy for playing card games and new skills with fundamental Python libraries like NumPy and SciPy.

Project 4 When to Tweet for Best Effect

In this liveProject, you’ll build a fun (and useful!) data analysis tool that can determine which day of the week is the best to Tweet. You’ll test the hypothesis that Friday is the best day for engagement by calculating the p-variables and interpreting the results. You’ll utilize common techniques such as the permutation test and bonferroni correction to see if your hypothesis is accurate—essential skills for any data scientist.

Project 5 Analyze a Trust Network

In this liveProject, you’ll turn your data science skills to analyzing an OTC network dataset scraped from bitcoin users in order to establish the most (and least!) trustworthy users. You’ll analyze a provided graph dataset, visualize it, generate features, and then create user clusters. You’ll start out by reading and examining the trust network dataset in Python, then create and interpret user clusters, and finally visualize the nodes and edges of the network dataset. This fast and engaging data science project will stretch your skills and build your knowledge of clustering.

book resources

When you start each of the projects in this series, you'll get full access to the following book for 90 days.

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Data Science Bookcamp Projects project for free

project authors

Leonard Apeltsin
Leonard Apeltsin is a co-founder of Primer AI, a startup that develops advanced technology to analyze terabytes of unstructured text data. Leonard helped expand the Primer AI team from four employees to over 80. His PhD research on bioinformatics required analyzing millions of sequenced DNA patterns to uncover genetic links in deadly diseases. It was this research that led him to realize that his skills were transferable to other areas of analysis; and Leonard's data science consultancy was born. Leonard is currently a research fellow at the Berkeley Institute for Data Science.
William Koehrsen
Will Koehrsen is lead data scientist at Cortex Building Intelligence, a startup helping engineers improve energy efficiency in office buildings using analytics and machine learning. He has built numerous machine learning pipelines to optimize building operations, including algorithms to find the best time for engineers to start and stop their buildings' air conditioning/heating in some of the largest buildings in Manhattan, including the Empire State Building. Will is passionate about data science and helping others join the field. He writes for Towards Data Science.
Nathan George
Nate George started his career studying LEDs for his Ph.D. and working on solar cell manufacturing. He then leveraged his programming and mathematics experience to move to data science. Nate has been teaching and developing several data science and math courses at Regis University since 2017, mentors students at Udacity, and has developed a Python machine learning course at DataCamp. Nate's expertise includes data engineering (database technologies such as MongoDB and PostgreSQL and cloud technologies such as GCP and AWS), data science (Python, R, statistics), and machine learning.
Emre Rencberoglu
Emre Rencberoglu is a senior data scientist with over seven years of experience in machine learning, statistics, analytics, and data engineering. He developed numerous machine learning projects and built data pipelines from scratch using R, Python and Spark. Currently, he is leading a data science team of ten in one of the biggest e-commerce companies in the Europe, Middle East, and Africa region.

Prerequisites

These liveProjects are for intermediate Python programmers who want to improve their data science skills. To begin these liveProjects you will need to be familiar with the following:


TOOLS
  • Intermediate Python loops: for, while, do-while; conditional statements: if, if/else, switches
  • Basic Python data structures: dictionaries, lists, arrays; Python functions
  • Basic pandas: dataframes, transformations
  • Basic visualization with Matplotlib or seaborn
  • Basic Python scripting with an IDE or notebook
  • Basic scikit-learn
  • Basic Jupyter Notebook
TECHNIQUES
  • Basic statistics: mean, median, distributions
  • Basic machine learning: (K-means and DBSCAN clustering)
  • Basic text extraction with tf-idf

you will learn

In this liveProject, you’ll work out your Python data science skills and develop an important understanding of common data science and statistics techniques, such as:


  • Simulating real-life game environments in Python
  • Calculating the statistical confidence intervals
  • Permutation test for calculating p-values
  • NumPy and SciPy for mathematical operations
  • Matplotlib and seaborn to visualize results
  • Python fundamentals to set up an analysis environment
  • Using pandas for data operations
  • Using scikit-learn for clustering
  • Using NetworkX to analyze and visualize network datasets
  • Extracting city and country name data from text using regular expressions
  • Manipulating data and matching location names to geographic coordinates
  • Visualizing clusters on a geographic map
  • Analyzing algorithm output and tuning model settings to improve results
  • Sorting between clusters based on size and within clusters based on distance
  • Interpreting algorithm results in the problem domain
  • Summarizing findings of a data science project effectively
  • Parsing HTML web pages with the BeautifulSoup library
  • Storing and processing data with pandas DataFrames
  • Converting raw text to numeric features with the scikit-learn library
  • Creating word clouds with the WordCloud library for text cluster visualization

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.