Four-Project Series

- prerequisites
- intermediate Python programming
- skills learned
- anomaly detection techniques • anomaly detection algorithm benchmark techniques • anomaly detection algorithms tradeoffs and limitations

Welcome to Sigma Corp, a large conglomerate that produces nuclear, coal, and renewable energy sources. As a lead data scientist at Sigma, you’ve been tasked with creating mission-critical anomaly detection algorithms that will prevent operation interruptions at Sigma’s many facilities. You’ll develop the means to evaluate the performance of the algorithms using the receiver operating characteristic (ROC) curve and the area under curve (AUC) metrics. You’ll then build and implement a simple z-score anomaly detection algorithm for one-dimensional data. According to requirements and feedback, you’ll progress to implementing more complex methods designed for multidimensional data including the Mahalanobis distance (MD) method, the principal component analysis (PCA) method, the Empirical Cumulative distribution-based Outlier Detection (ECOD) method, and Isolation Forest algorithms. When you’re finished with this series of liveProjects, you’ll have a solid understanding of how anomaly detection methods work as well as the knowledge and skills to build them according to your specific needs.

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

This series is truly exceptional, encompassing the most vital approaches to anomaly detection.

Project 0 Develop Z-score and Baseline Results

Project 1 Methods for Multidimensional Datasets

Project 2 ECOD Algorithm

Project 3 Isolation Forests

The knowledge is universal, and I can apply it to other similar tasks.

The subject covered is very important; implementing algorithms from scratch is always a good idea to understand.

This liveProject is for beginner data scientists interested in learning the sought-after skills of building, implementing, and evaluating anomaly detection algorithms. To begin these liveProjects you’ll need to be familiar with the following:

TOOLS- Intermediate Python (3.10)

- Intermediate Python: function definition, classes, list comprehensions, import libraries
- Basic data science: classification problems, supervised/unsupervised methods distinction
- Basic statistics: mean, variance, covariance
- Basic mathematics: familiarity with matrices, eigenvalues, and eigenvectors
- Basic probability: probability distribution function and cumulative distribution function

In this liveProject series, you’ll learn to implement and evaluate progressively more complex anomaly detection algorithms.

- Build the functions to calculate the ROC curve and the AUC metric using Python and NumPy
- Implement the z-score anomaly detection method
- Implement the Mahalanobis distance (MD) method
- Implement the principal component analysis (PCA) method
- Implement the Empirical Cumulative distribution-based Outlier Detection (ECOD) method
- Implement the Isolation Forest method
- Benchmark anomaly detection implementations using known data sets
- Benchmark anomaly detection implementations using synthetic anomalies

