Use Machine Learning to Detect Phishing Websites

pandas, NumPy, scikit-learn, logistic regression, Weights & Biases
Sayak Paul
4 weeks · 5-8 hours per week
In this liveProject, you’ll take on the role of a data scientist employed by the cybersecurity manager of a large organization. Recently, your colleagues have received multiple fake emails containing links to phishing websites. Phishing attacks are one of the most common—and most effective—online security threats, and your manager is worried that passwords or other information will be given to an attacker. You have been assigned the task of creating a machine learning model that can detect whether a linked website is a phishing site. Your challenges will include loading and understanding a tabular dataset, cleaning your dataset, and building a logistic regression model.

Prerequisites

This liveProject is designed for developers interested in data science and for beginner data scientists. To begin this liveProject, you will need to be familiar with:

TOOLS
  • Basics of Python and its utility functions
  • Basics of pandas
  • Basics of NumPy
  • Basics of scikit-learn
TECHNIQUES
  • Basics of data science

you will learn

In this liveProject, you’ll learn to build a machine learning model using common Python libraries. You’ll develop techniques for querying datasets, data cleaning, performing hyperparameter tuning, and analyzing and summarizing the performance of your models. These skills can easily be applied to a wide variety of machine learning tasks and other data projects.

  • Loading and understanding tabular datasets using pandas
  • Preprocessing tabular datasets with NumPy
  • Preparing reports on your data with visualization tools
  • Creating a logistic regression classifier as a baseline model using scikit-learn
  • Using random searching to find optimal hyperparameters of the baseline model

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Peer support
Chat with other participants within the liveProject platform.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
Book and video resources
Excerpts from Manning books and videos are included, as well as references to other resources.

project outline

Introduction

about this liveProject

1. Loading and Understanding the Phishing Websites Dataset

1.1. Knowing the Dataset

1.2. A Quick Tour of Pandas

1.3. Submit Your Work

Solution

2. Further Data Investigation and Preparing Investigation Reports

2.1. Getting Useful Information from the Dataset

2.2. Submit Your Work

Solution

3. Cleaning the Class Labels and Inspecting for Missing Values

3.1. Cleaning the Class Labels and Inspecting for Missing Values

3.2. Submit Your Work

Solution

4. Training a Logistic Regression Model

4.1. Training a Logistic Regression Model

4.2. A Quick Primer on Logistic Regression

4.3. A Brief Take on Scikit-Learn

4.4. A Continuous Approach to Splitting Points: Logistic Regression

4.5. Submit Your Work

Solution

Summary

Project Conclusions

FAQs

placing your order...

Don't refresh or navigate away from the page.
liveProject $25.00 $50.00 self-paced learning
Use Machine Learning to Detect Phishing Websites (liveProject) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.

project author

Sayak Paul
Sayak works at PyImageSearch where he applies deep learning to solve problems in computer vision, and brings solutions to edge devices. He also provides Q&A support to PyImageSearch readers. Previously, Sayak developed projects and practice pools for DataCamp. Outside of work, Sayak enjoys writing technical articles and giving talks at developer meetups and conferences.