In this liveProject, you’ll take on the role of a data scientist employed by the cybersecurity manager of a large organization. Recently, your colleagues have received multiple fake emails containing links to phishing websites. Phishing attacks are one of the most common—and most effective—online security threats, and your manager is worried that passwords or other information will be given to an attacker. You have been assigned the task of creating a machine learning model that can detect whether a linked website is a phishing site. Your challenges will include loading and understanding a tabular dataset, cleaning your dataset, and building a logistic regression model.
This project is designed for learning purposes and is not a complete, production-ready application or solution.
This liveProject is designed for developers interested in data science and for beginner data scientists. To begin this liveProject, you will need to be familiar with:
- Basics of Python and its utility functions
- Basics of pandas
- Basics of NumPy
- Basics of scikit-learn
you will learn
In this liveProject, you’ll learn to build a machine learning model using common Python libraries. You’ll develop techniques for querying datasets, data cleaning, performing hyperparameter tuning, and analyzing and summarizing the performance of your models. These skills can easily be applied to a wide variety of machine learning tasks and other data projects.
- Loading and understanding tabular datasets using pandas
- Preprocessing tabular datasets with NumPy
- Preparing reports on your data with visualization tools
- Creating a logistic regression classifier as a baseline model using scikit-learn
- Using random searching to find optimal hyperparameters of the baseline model