click to
look inside
Look inside

Authorship Identification with Text Mining and Machine Learning you own this product

prerequisites
beginner Python • basics of pandas • basics of NumPy • basics of machine learning and scikit-learn
skills learned
extract features from text using scikit-learn and spaCy • build a predictive classification model • visualize authorship styles with an interactive plot • incorporate your trained model into a user-friendly program
Robert Layton
4 weeks · 7-10 hours per week · INTERMEDIATE
filed under

placing your order...

Don't refresh or navigate away from the page.
liveProject liveProjects give you the opportunity to learn new skills by completing real-world challenges in your local development environment. Solve practical problems, write working code, and analyze real data—with liveProject, you learn by doing. These self-paced projects also come with full liveBook access to select books for 90 days plus permanent access to other select Manning products.

Get One, Give One  
This December, for every book, video, or liveProject you buy, you’ll get a free second one to give away. You can use these free gifts for your friends, coworkers, or anyone you want to help, nudge, or encourage.
$34.99 $49.99 you save $15 (30%)
+ get a free copy to give away
Authorship Identification with Text Mining and Machine Learning (liveProject) added to cart
continue shopping
go to cart

Look inside
In this liveProject, you’ll step into the boots of an investigator trying to find the anonymous author of a seriously defamatory blog post. You’ve narrowed down your list of suspects, acquired a dataset of writing samples, and now plan to find the culprit using a custom machine learning project. Your challenge is to build an authorship analysis model that will match a sample to the defamatory blogpost and reveal the guilty party. To do this, you’ll need to extract data from a corpus of documents, build a model that can learn authorship style, scale the model to handle hundreds of suspects, and finally develop a user-friendly program that will allow non-technical colleagues to make use of your findings.
This project is designed for learning purposes and is not a complete, production-ready application or solution.

book resources

When you start your liveProject, you get full access to the following books for 90 days.

project author

Robert Layton
Rob Layton is a data scientist, past core contributor to scikit-learn and holds a PhD in cybercrime analytics in analysing phishing websites to identify authorship patterns. He runs his own data analytics business, dataPipeline, and has given training with expert training provider Python Charmers for more than 5 years, to students in the finance, government and other private sectors.

prerequisites

This liveProject is for software developers with an interest in data science, and beginner data scientists. It will require a machine with a minimum of 2GB of free hard drive space and 4GB of RAM. To begin this liveProject, you will need to be familiar with:

TOOLS
  • Beginner Python and its utility functions
  • Basics of pandas
  • Basics of NumPy
  • Basics of scikit-learn
TECHNIQUES
  • Basics of data science and machine learning
  • Reading text files with Python
  • Saving and loading pandas DataFrames
  • Running a training and evaluation experiment
  • Running Python code from Jupyter Notebook
  • Basics of running terminal commands

you will learn

This liveProject will teach you important text mining and machine learning techniques that can be used for both author identification and other text-based tasks. The skills you’ll develop can also be applied to spotting bots on social media, building predictive models, and detecting fraud in electronic communications.

  • Extract data from original datasets using XML parsing and BeautifulSoup
  • Clean data to remove noise that will affect your model
  • Extract useful features using preprocessing techniques, spaCy and scikit-learn
  • Build a predictive classification model with scikit-learn
  • Visualize authorship styles with an interactive plot built with Altair
  • Incorporate your trained model into a user-friendly program by converting Jupyter notebooks with Viola

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.
RECENTLY VIEWED