Machine Learning with R, tidyverse, and mlr
FREEYou can see any available part of this book for free.
Click the table of contents to start reading.
Machine learning is a collection of programming techniques for discovering relationships in data. With ML algorithms, you can cluster and classify data for tasks like making recommendations or fraud detection and make predictions for sales trends, risk analysis, and other forecasts. Once the domain of academic data scientists, machine learning has become a mainstream business process, and tools like the easy-to-learn R programming language put high-quality data analysis in the hands of any programmer. Machine Learning with R, tidyverse, and mlr teaches you widely used ML techniques and how to apply them to your own datasets using the R programming language and its powerful ecosystem of tools. This book will get you started!
A great combination of statistics and code.
Table of Contents takes you straight to the bookdetailed table of contents
Part 1: Introduction
1.1 What is machine learning?
1.1.1 Artificial intelligence and machine learning
1.1.2 The difference between a model and an algorithm
1.2 Classes of machine learning algorithms
1.2.1 Differences between supervised, unsupervised, and semi-supervised learning
1.2.2 Classification, regression, dimension reduction, and clustering
1.2.3 A brief word on deep learning
1.3 Why use R for machine learning?
1.4 Which datasets will we use?
1.5 What will you learn in this book
2 Tidying, manipulating and plotting data with the tidyverse
2.1 What is the tidyverse and what is tidy data?
2.2 Loading the tidyverse
2.3 What the tibble package is and what it does
2.3.1 Creating tibbles
2.3.2 Converting existing data frames into tibbles
2.3.3 Differences between data frames and tibbles
2.4 What the dplyr package is and what it does
2.4.1 Manipulating the CO2 dataset with dplyr
2.4.2 Chaining dplyr functions together
2.5 What the ggplot2 package is and what it does
2.6 What the tidyr package is and what it does
2.8 Solutions to exercises
Part 2: Classification
3 Classifying based on similar observations: the k-Nearest neighbors algorithm
3.1 What is the k-nearest neighbors algorithm?
3.1.1 How does the k-nearest neighbors algorithm learn?
3.1.2 What happens if the vote is tied?
3.2 Building our first k-NN model
3.2.1 Loading and exploring the diabetes dataset
3.2.2 Using mlr to train your first k-NN model
3.2.3 Telling mlr what we’re trying to achieve: defining the task
3.2.4 Telling mlr which algorithm to use: defining the learner
3.2.5 Putting it all together: training the model
3.3 Balancing two sources of model error: the bias-variance trade-off
3.4 How to tell if you’re over/underfitting: cross-validation
3.5 Cross validating our k-NN model
3.5.1 Hold-out cross-validation
3.5.2 k-fold cross-validation
3.5.3 Leave-one-out cross-validation
3.6 What algorithms can learn and what they must be told: parameters and hyperparameters
3.7 Tuning k to improve our model
3.7.1 Including hyperparameter tuning in our cross-validation
3.7.2 Using our model to make predictions
3.8 strengths and weaknesses of k-NN
3.10 Solutions to exercises
4 Classifying based on odds: logistic regression
4.1 What is logistic regression?
4.1.1 How does logistic regression learn?
4.1.2 What if I have more than two classes?
4.2 Building our first logistic regression model
4.2.1 Loading and exploring the Titanic dataset
4.2.2 Making the most of the data: feature engineering and feature selection
4.2.3 Plotting the data
4.2.4 Training the model
4.2.5 Dealing with missing data
4.2.6 Training the model (take two)
4.3 Cross-validating our logistic regression model
4.3.1 Including missing value imputation in our cross-validation
4.3.2 Accuracy is the most important performance metric, right?
4.4 Interpreting the model: the odds ratio
4.4.1 Converting model parameters into odds ratios
4.4.2 When a one unit increase doesn’t make sense
4.5 Using our model to make predictions
4.6 Strengths and weaknesses of logistic regression
4.8 Solutions to exercises
5 Classifying by maximizing class separation: discriminant analysis
5.1 What is discriminant analysis?
5.1.1 How does discriminant analysis learn?
5.1.2 What if I have more than two classes?
5.1.3 Learning curves instead of straight lines: QDA
5.1.4 How do LDA and QDA make predictions?
5.2 Building our first linear and quadratic discriminant models
5.2.1 Loading and exploring the wine dataset
5.2.2 Plotting the data
5.2.3 Training the models
5.3 Strengths and weaknesses of LDA and QDA
5.5 Solutions to exercises
6 Classifying based on probabilities and hyperplanes: naive Bayes and support vector machines
6.1 What is the naive Bayes algorithm?
6.1.1 Using naive Bayes for classification
6.1.2 How is the likelihood calculated for categorical and continuous predictors?
6.2 Building our first naive Bayes model
6.2.1 Loading and exploring the HouseVotes84 dataset
6.2.2 Plotting the data
6.2.3 Training the model
6.3 Strengths and weaknesses of naive Bayes
6.4 What is the support vector machine (SVM) algorithm?
6.4.1 SVMs for linearly-separable data
6.4.2 SVMs for non-linearly separable data
6.4.3 Hyperparameters of the SVM algorithm
6.4.4 What if I have more than two classes?
6.5 Building our first SVM model
6.5.1 Loading and exploring the spam dataset
6.5.2 Tuning our hyperparameters
6.5.3 Training the model with the tuned hyperparameters
6.6 Cross-validating our SVM model
6.7 Strengths and weaknesses of the SVM algorithm
6.9 Solutions to exercises
7 Classifying with trees: Decision trees, random forests and gradient boosting
7.1 What is the recursive partitioning algorithm?
7.1.1 Using Gini gain to split the tree
7.1.2 What about continuous, and multi-level categorical predictors?
7.1.3 Hyperparameters of the rpart algorithm
7.2 Building our first decision tree model
7.3 Loading and exploring the zoo dataset
7.4 Training the decision tree model
7.4.1 Training the model with the tuned hyperparameters
7.5 Cross-validating our decision tree model
7.6 Ensemble techniques: bagging, boosting, and stacking
7.6.1 Training models on sampled data: bootstrap aggregating
7.6.2 Learning from the previous models' mistakes: boosting
7.6.3 Learning from predictions made by other models: stacking
7.7 Building our first random forest model
7.8 Building our first XGBoost model
7.9 Strengths and weaknesses of tree-based algorithms
7.10 Benchmarking algorithms against each other
Part 3: Regression
8 Regression with lines: linear regression and generalized additive models
8.1 What is linear regression?
8.1.1 What if we have multiple predictors?
8.1.2 What if my predictors are categorical?
8.2 When the relationship isn’t linear: polynomial terms
8.3 When we need even more flexibility: splines and generalized additive models
8.4 Building our first linear regression model
8.4.1 Loading and exploring the Ozone dataset
8.4.2 Imputing missing values
8.4.3 Automating feature selection
8.4.3 Including imputation and feature selection in our cross-validation
8.4.4 Interpreting the model
8.5 Building our first GAM
8.6 Strengths and weaknesses of linear regression and GAMs
8.8 Solutions to exercises
9 Preventing overfitting in regression: Ridge regression, LASSO and elastic net
10 Regression with distance and trees: k-nearest neighbors, random forest and XGBoost
Part 4: Dimension reduction
11 Maximising variance and similarity: Principal components analysis and t-SNE
12 Dimension reduction with networks and local structure: Selforganizing maps and locally-linear embedding
Part 5: Clustering
13 Clustering by finding centers and hierarchies in data: k-means and hierarchical clustering
14 Clustering based on the distribution of data: Density and mixture model clustering
15 Final notes and further reading
About the TechnologyMachine learning techniques accurately and efficiently identify patterns and relationships in data and use those models to make predictions about new data. ML techniques can work on even relatively small datasets, making these skills a powerful ally for nearly any data analysis task. The R programming language was designed with mathematical and statistical applications in mind. Small datasets are its sweet spot, and its modern data science tools, including the popular tidyverse package, make R a natural choice for ML.
About the bookMachine Learning with R, tidyverse, and mlr teaches you how to gain valuable insights from your data using the powerful R programming language. In his engaging and informal style, author and R expert Hefin Ioan Rhys lays a firm foundation of ML basics and introduces you to the tidyverse, a powerful set of R tools designed specifically for practical data science. Armed with the fundamentals, you’ll delve deeper into commonly used machine learning techniques including classification, prediction, reduction, and clustering algorithms, applying each to real data to make predictions on fun and interesting problems.
Using the tidyverse packages, you’ll transform, clean, and plot your data, onboarding data science best practices as you go. To simplify your learning process, you’ll also use R’s mlr package, an incredibly flexible interface for a variety of core algorithms that allows you to perform complicated ML tasks with minimal coding. You’ll explore essential concepts like overfitting, underfitting, validating model performance, and how to choose the best model for your task. Illuminating visuals provide clear explanations, cementing your new knowledge.
Whether you’re tackling business problems, crunching research data, or just a data-minded developer, you’ll be building your own ML pipelines in no time with this hands-on tutorial!
- Commonly used ML techniques
- Using the tidyverse packages to organize and plot your data
- Validating model performance
- Choosing the best ML model for your task
- A variety of hands-on coding exercises
- ML best practices
About the readerFor readers with basic programming skills in R, Python, or another standard programming language.
About the authorHefin Ioan Rhys is a senior laboratory research scientist in the Flow Cytometry Shared Technology Platform at The Francis Crick Institute. He spent the final year of his PhD program teaching basic R skills at the university. A data science and machine learning enthusiast, he has his own Youtube channel featuring screencast tutorials in R and R Studio.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
Machine Learning with R, tidyverse, and mlr (combo) added to cart
continue shoppinggo to cart
Machine Learning with R, tidyverse, and mlr (eBook) added to cart
continue shoppinggo to cart
placing your order...Don't refresh or navigate away from the page.
customers also bought