Four-Project Series

Three Anomaly Detection Methods you own this product

prerequisites
intermediate Python programming
skills learned
anomaly detection techniques • anomaly detection algorithm benchmark techniques • anomaly detection algorithms tradeoffs and limitations
Sergio Solórzano
1 week · 7-9 hours per week average · BEGINNER

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Welcome to Sigma Corp, a large conglomerate that produces nuclear, coal, and renewable energy sources. As a lead data scientist at Sigma, you’ve been tasked with creating mission-critical anomaly detection algorithms that will prevent operation interruptions at Sigma’s many facilities. You’ll develop the means to evaluate the performance of the algorithms using the receiver operating characteristic (ROC) curve and the area under curve (AUC) metrics. You’ll then build and implement a simple z-score anomaly detection algorithm for one-dimensional data. According to requirements and feedback, you’ll progress to implementing more complex methods designed for multidimensional data including the Mahalanobis distance (MD) method, the principal component analysis (PCA) method, the Empirical Cumulative distribution-based Outlier Detection (ECOD) method, and Isolation Forest algorithms. When you’re finished with this series of liveProjects, you’ll have a solid understanding of how anomaly detection methods work as well as the knowledge and skills to build them according to your specific needs.

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

This series is truly exceptional, encompassing the most vital approaches to anomaly detection.

Ninoslav Cerkez, Senior Machine Learning Engineer, Rimac Technology

here's what's included

Project 0 Develop Z-score and Baseline Results

Failure is not an option for Sigma Corp. As a lead data scientist for the large conglomerate of energy production companies, it’s up to you to help ensure interruption-free operations by developing a means for detecting anomalies that signal potential problems. Using metrics, including the receiver operating characteristic (ROC) curve and the area under curve (AUC) score, you’ll evaluate anomaly detection algorithms. You’ll build a z-score anomaly detection algorithm, which focuses on a single feature and provides a simple benchmark, and you’ll apply it to a dataset to establish a reference for comparison. When you’re finished, you’ll have a firm grasp of z-score anomaly detection, classification error categories, and evaluating anomaly detection algorithms.

Project 1 Methods for Multidimensional Datasets

Preventing operation failures and interruptions is mission-critical at Sigma Corp. The large conglomerate of energy production companies has recently implemented a z-score anomaly detection algorithm that focuses on a single feature. Now that the algorithm has proved its value, members of Sigma have requested additional algorithms that are just as simple to use, but that can handle multidimensional data. As a lead data scientist at Sigma, you’ll implement the Mahalanobis distance (MD) method and the principal component analysis (PCA) method as you build anomaly detection algorithms for multidimensional data. To gauge the performance of your algorithms, you’ll test them against a benchmark dataset as well as synthetic anomalies generated by your own algorithms. When you’re done, you’ll have firsthand experience building anomaly detection algorithms for multidimensional datasets as well as testing anomaly detection algorithms against both benchmark datasets and synthetic anomalies.

Project 2 ECOD Algorithm

Sigma Corp, a large conglomerate of energy production companies, has recently implemented anomaly detection algorithms and is generally pleased with their performance. However, analysts report that not all anomalies are being identified and the algorithms are too slow at times. As a lead data scientist at Sigma, it’s up to you to address these concerns. To increase the robustness of the algorithms, you’ll implement and optimize the probability-based Empirical Cumulative distribution-based Outlier Detection (ECOD) method, an alternative to statistical methods. You’ll benchmark the ECOD method in order to compare its performance with the statistical MD and PCA methods Sigma is currently using. When you’re finished, you’ll have firsthand experience implementing the highly efficient ECOD method to detect anomalies in multidimensional data.

Project 3 Isolation Forests

Red alert! One of the energy production companies managed by Sigma Corp has suffered an outage. An investigation has led to the conclusion that the facility’s anomaly detection mechanism failed to detect early signals due to a sudden change in the distribution of the analyzed data. As a lead data scientist at Sigma, you’ll build an Isolation Forest algorithm, which is less likely than the Empirical Cumulative distribution-based Outlier Detection (ECOD) method to fail in such scenarios. To gauge how robust your method is, you’ll benchmark your algorithms against adversarial scenarios, synthetic anomalies, and standard datasets. When you’re done, you’ll have practical experience creating, using, and testing the Isolation Forest algorithm as an effective alternative to ECOD in circumstances where the data distribution changes.

book resources

When you start each of the projects in this series, you'll get full access to the following book for 90 days.

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Three Anomaly Detection Methods project for free

The knowledge is universal, and I can apply it to other similar tasks.

Oluwatosin Oluwole, Research Analyst Intern, HIFMB

The subject covered is very important; implementing algorithms from scratch is always a good idea to understand.

Simone De Bonis, Data Science student, Università Politecnica delle Marche

project author

Sergio Solorzano

Sergio Solórzano holds a PhD in physics from ETH Zürich, where he specialized in computational physics and published various papers on numerical algorithms for physical simulation and analysis. Currently, he’s a senior researcher and developer at Exeon Analytics, developing systems for anomaly detection in cybersecurity.

Prerequisites

This liveProject is for beginner data scientists interested in learning the sought-after skills of building, implementing, and evaluating anomaly detection algorithms. To begin these liveProjects you’ll need to be familiar with the following:

TOOLS
  • Intermediate Python (3.10)
TECHNIQUES
  • Intermediate Python: function definition, classes, list comprehensions, import libraries
  • Basic data science: classification problems, supervised/unsupervised methods distinction
  • Basic statistics: mean, variance, covariance
  • Basic mathematics: familiarity with matrices, eigenvalues, and eigenvectors
  • Basic probability: probability distribution function and cumulative distribution function

you will learn

In this liveProject series, you’ll learn to implement and evaluate progressively more complex anomaly detection algorithms.

  • Build the functions to calculate the ROC curve and the AUC metric using Python and NumPy
  • Implement the z-score anomaly detection method
  • Implement the Mahalanobis distance (MD) method
  • Implement the principal component analysis (PCA) method
  • Implement the Empirical Cumulative distribution-based Outlier Detection (ECOD) method
  • Implement the Isolation Forest method
  • Benchmark anomaly detection implementations using known data sets
  • Benchmark anomaly detection implementations using synthetic anomalies

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.