Five-Project Series

Anomaly Detection with Python you own this product

prerequisites
basic Python • basic pandas • basic scikit-learn • basics of machine learning
skills learned
apply anomaly detection methods using unsupervised and supervised methods through scikit-learn • PyOD for detecting outliers • imblearn to tackle imbalanced data
Stylianos Kampakis and Shreesha Jagadeesh
5 weeks · 5-8 hours per week average · INTERMEDIATE

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


In this series of liveProjects, you’ll learn different methods for detecting anomalies and outliers with Python machine learning techniques. Anomaly detection is a vital tool for tasks like spotting medical problems, and even detecting seismic events like earthquakes. You’ll explore both supervised and unsupervised learning methods for anomaly detection to master this valuable ML task.

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

here's what's included

Project 1 Using scikit-learn

In this liveProject, you’ll explore the basics of anomaly detection by analyzing a medical dataset using unsupervised learning. You’ll create a model that can determine whether patients referred to a clinic have abnormal thyroid function. To accomplish this, you’ll download and prepare your dataset, and then utilize scikit-learn to compare different anomaly detection algorithms to find the most effective. You are going to use Isolation Forest, the Local Outlier Factor (LOF), One-Class SVM and Robust Covariance.

Project 2 Using Oversampling

In this liveProject, you’ll go hands-on with supervised learning methods for anomaly detection. You’ll explore an imbalanced dataset of seismic activity. To balance this dataset you will utilize the SMOTE and ADASYN oversampling algorithms to both generate synthetic examples of the minority class and then compare performance using random forest, logistic regression and Naive Bayes binary classification algorithms.

Project 3 Using Undersampling

In this liveProject, you’ll utilize undersampling techniques to balance out a seismic activity dataset. To balance this dataset, you will utilize the ClusterCentroids, NearMiss and CondensedNearestNeighbor algorithms to downsample the majority class. Then, the performance is compared using random forest, logistic regression and Naive Bayes binary classification algorithms.

Project 4 Using PyOD

In this liveProject, you’ll use scikit-learn and the PyOD library to build an unsupervised machine learning model for detecting hyperthyroidism. PyOD is a Python toolkit for detecting outlying objects in multivariate data. You’ll compare performance between four different anomaly detection methods on a specialized thyroid dataset: PCA, Clustering-Based Local Outlier Factor (CBLOF), Histogram-Based Outlier Score (HBOS), and KNN algorithms.

Project 5 Using PyOD and Ensemble Methods

In this liveProject, you’ll explore a dataset with more variables and use scikit-learn and the PyOD library to build an unsupervised machine learning model for detecting cardiac arrhythmias. You’ll develop an algorithm which will detect arrhythmias from device data like EEG, using the Locally Selective Combination in Parallel Outlier Ensembles (LSCP) algorithm. A LSCP model accepts input as various other algorithms, and can be used to set up detectors with different settings.

book resources

When you start each of the projects in this series, you'll get full access to the following book for 90 days.

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Anomaly Detection with Python project for free

project authors

Stylianos Kampakis
Dr. Stylianos (Stelios) Kampakis is a data scientist with more than 10 years of experience. He has worked with decision-makers from companies of all sizes from startups to organizations like the US Navy, Vodafone, and British Land. He has also helped many people follow a career in data science and technology. He is a member of the Royal Statistical Society, honorary research fellow at the UCL Centre for Blockchain Technologies, a data science advisor for London Business School and CEO of The Tesseract Academy. A natural polymath with a PhD in machine learning and degrees in artificial intelligence, statistics, psychology, and economics, he loves using his broad skillset to solve difficult problems and help companies improve their efficiency.
Shreesha Jagadeesh
Shreesha Jagadeesh is a product manager at Amazon creating data science-driven HR products for talent retention, career growth and internal mobility. He has previously worked as a manager at Ernst & Young where he led a large global team of 25+ data scientists and engineers to apply data science-driven digital transformation of their tax business units. Aside from his day job, he is a startup advisor helping young companies build out their data science functions. He has a master’s in electrical and computer engineering from the University of Toronto. He has been teaching for more than a decade and has written data science articles on Medium, reviewed other Manning courses and developed a popular Udemy course for Agile data science.

Prerequisites

This liveProject is for Python programmers who are interested in learning anomaly detection techniques. To begin this liveProject, you will need to be familiar with the following:


TOOLS
  • Basic Python
  • Basic pandas
  • Basic methods for data processing
  • Basic scikit-learn
TECHNIQUES
  • Basics of machine learning (supervised learning, unsupervised learning)

you will learn

Across the different liveProjects, you’ll master the domain of anomaly detection through exploring various methods.


  • Analyze and pre-process data using scikit-learn
  • Learn how to apply scikit learn-based anomaly detection algorithms
  • Learn how to upsample the minority class using imblearn methods
  • Learn how to downsample the majority class using imblearn methods
  • Learn how to correctly pre-process data for novelty vs outlier detection
  • Run novelty and outlier detection algorithms in PyOD
  • Use the LSCP algorithm in PyOD to create ensembles from base algorithms

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.