There’s never been a better time to get into data science. But where do you start? Data Science is a broad field, incorporating aspects of statistics, machine learning, and data engineering. It's easy to become overwhelmed, or end up learning about a small section of data science or a single methodology.

*Exploring Data Science* is a collection of five hand-picked chapters introducing you to various areas in data science and explaining which methodologies work best for each. John Mount and Nina Zumel, authors of Practical Data Science with R, selected these chapters to give you the big picture of the many data domains. You’ll learn about time series, neural networks, text analytics, and more. As you explore different modeling practices, you’ll see practical examples of how R, Python, and other languages are used in data science. Along the way, you'll experience a sample of Manning books you may want to add to your library.

# Introduction

# Exploring data

## 1. Exploring data

### 1.1. Using summary statistics to spot problems

#### 1.1.1. Typical problems revealed by data summaries

### 1.2. Spotting problems using graphics and visualization

#### 1.2.1. Visually checking distributions for a single variable

#### 1.2.2. Visually checking relationships between two variables

### 1.3. Summary

#### 1.3.1. What's inside

# Time series

## 2. Time series

### 2.1. Creating a time-series object in R

### 2.2. Smoothing and seasonal decomposition

#### 2.2.1. Smoothing with simple moving averages

#### 2.2.2. Seasonal decomposition

### 2.3. Exponential forecasting models

#### 2.3.1. Simple exponential smoothing

#### 2.3.2. Holt and Holt-Winters exponential smoothing

#### 2.3.3. The ets() function and automated forecasting

### 2.4. ARIMA forecasting models

#### 2.4.1. Prerequisite concepts

#### 2.4.2. ARMA and ARIMA models

#### 2.4.3. Automated ARIMA forecasting

### 2.5. Going further

### 2.6. Summary

#### 2.6.1. What's inside

# Deep learning and neural networks

## 3. Deep learning and neural networks

### 3.1. An intuitive approach to deep learning

### 3.2. Neural networks

### 3.3. The perceptron

#### 3.3.1. Training

#### 3.3.2. Training a perceptron in scikit-learn

#### 3.3.3. A geometric interpretation of the perceptron for two inputs

### 3.4. Multilayer perceptrons

#### 3.4.1. Training using backpropagation

#### 3.4.2. Activation functions

#### 3.4.3. Intuition behind backpropagation

#### 3.4.4. Backpropagation theory

#### 3.4.5. MLNN in scikit-learn

#### 3.4.6. A learned MLP

### 3.5. Going deeper: from multilayer neural networks to deep learning

#### 3.5.1. Restricted Boltzmann Machines

#### 3.5.2. The Bernoulli Restricted Boltzmann Machine

#### 3.5.3. RBMS in action

### 3.6. Conclusions

#### 3.6.1. What's inside

# Text mining and text analytics

## 4. Text mining and text analytics

### 4.1. Text mining in the real world

### 4.2. Text mining techniques

#### 4.2.1. Bag of words

#### 4.2.2. Stemming and lemmatization

#### 4.2.3. Decision tree classifier

### 4.3. Case study: Classifying Reddit posts

#### 4.3.1. Meet the Natural Language Toolkit

#### 4.3.2. Data science process overview and step 1: The research goal

#### 4.3.3. Step 2: Data retrieval

#### 4.3.4. Step 3: Data preparation

#### 4.3.5. Step 4: Data exploration

#### 4.3.6. Step 3 revisited: Data preparation adapted

#### 4.3.7. Step 5: Data analysis

#### 4.3.8. Step 6: Presentation and automation

### 4.4. Summary

#### 4.4.1. What's inside

# Modeling dependencies with Bayesian and Markov networks

## 5. Modeling dependencies with Bayesian and Markov networks

### 5.1. Modeling dependencies

#### 5.1.1. Directed dependencies

#### 5.1.2. Undirected dependencies

#### 5.1.3. Direct and indirect dependencies

### 5.2. Using Bayesian networks

#### 5.2.1. Bayesian networks defined

#### 5.2.2. How a Bayesian network defines a probability distribution

#### 5.2.3. Reasoning with Bayesian networks

### 5.3. Exploring a Bayesian network example

#### 5.3.1. Designing a computer system diagnosis model

#### 5.3.2. Reasoning with the computer system diagnosis model

### 5.4. Using probabilistic programming to extend Bayesian networks: predicting product success

#### 5.4.1. Designing a product success prediction model

#### 5.4.2. Reasoning with the product success prediction model

### 5.5. Using Markov networks

#### 5.5.1. Markov networks defined

#### 5.5.2. Representing and reasoning with Markov networks

### 5.6. Summary

### 5.7. Exercises

#### 5.7.1. What's inside

# Index

##### Symbols

##### Numerics

## About the authors

**Nina Zumel** and **John Mount** are cofounders of a San Francisco-based data science consulting firm. Both hold PhDs from Carnegie Mellon and blog on statistics, probability, and computer science at win-vector.com.