Look inside

Authorship Identification with Text Mining and Machine Learning

you own this product

prerequisites: beginner Python • basics of pandas • basics of NumPy • basics of machine learning and scikit-learn
skills learned: extract features from text using scikit-learn and spaCy • build a predictive classification model • visualize authorship styles with an interactive plot • incorporate your trained model into a user-friendly program

Robert Layton

4 weeks · 7-10 hours per week · INTERMEDIATE

Included with a Manning Online subscription

catalog / Data Science / Machine Learning / Natural Language Processing

try now

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

project

$49.99 $37.49

you save $12.50 (25%)

Look inside

In this liveProject, you’ll step into the boots of an investigator trying to find the anonymous author of a seriously defamatory blog post. You’ve narrowed down your list of suspects, acquired a dataset of writing samples, and now plan to find the culprit using a custom machine learning project. Your challenge is to build an authorship analysis model that will match a sample to the defamatory blogpost and reveal the guilty party. To do this, you’ll need to extract data from a corpus of documents, build a model that can learn authorship style, scale the model to handle hundreds of suspects, and finally develop a user-friendly program that will allow non-technical colleagues to make use of your findings.

This project is designed for learning purposes and is not a complete, production-ready application or solution.

liveProject mentor Lawrence Nderu shares what he likes about the Manning liveProject platform.

project

$49.99 $37.49

you save $12.50 (25%)

project author

Robert Layton

Rob Layton is a data scientist, past core contributor to scikit-learn, and holds a PhD in cybercrime analytics in analyzing phishing websites to identify authorship patterns. He runs his own data analytics business, dataPipeline, and has given training with expert training provider Python Charmers for more than 5 years, to students in the finance, government, and other private sectors.

prerequisites

This liveProject is for software developers with an interest in data science, and beginner data scientists. It will require a machine with a minimum of 2GB of free hard drive space and 4GB of RAM. To begin this liveProject, you will need to be familiar with:

TOOLS

Beginner Python and its utility functions, min. version 3.9
Basics of pandas
Basics of NumPy
Basics of scikit-learn, min. version 0.24.0

TECHNIQUES

Basics of data science and machine learning
Reading text files with Python
Saving and loading pandas DataFrames
Running a training and evaluation experiment
Running Python code from Jupyter Notebook
Basics of running terminal commands

features

Self-paced: You choose the schedule and decide how much time to invest as you build your project.
Project roadmap: Each project is divided into several achievable steps.
Get Help: While within the liveProject platform, get help from fellow participants and even more help with paid sessions with our expert mentors.
Compare with others: For each step, compare your deliverable to the solutions by the author and other participants.
book resources: Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.

choose your plan

pro

monthly

annual

$24.99

$249.99
only $20.83 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Authorship Identification with Text Mining and Machine Learning project for free

team

monthly

annual

$49.99

$399.99
only $33.33 per month

five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Authorship Identification with Text Mining and Machine Learning project for free

more seats?

Authorship Identification with Text Mining and Machine Learning

pro $24.99 per month

lite $19.99 per month

team

project author

prerequisites

features

related titles

related titles

pro

team