Look inside
News Media Corp needs to be quick if they want to get ahead of their competitors. Their current news frontpage is put together manually, in a time consuming process where human editors create flashcards that summarize articles. It’s too slow—so senior management wants to supercharge the process using natural language processing. To get this built, they’ve turned to you. Your challenge in this liveProject is to create an NLP model that can reduce turnaround time for news editors with an automatic text summarizer. To do this, you’ll need to prepare and process your dataset with tokenization and padding, extract meaningful statistics from it, and finally use your dataset to train a deep learning model that can speedily summarize a body text.
This project is designed for learning purposes and is not a complete, production-ready application or solution.
prerequisites
The liveProject is for intermediate Python programmers who know the basics of deep learning and NLP. To begin this liveProject, you should be familiar with the following topics:
TOOLS
- Jupyter Notebooks
- pandas
- scikit-learn
- Keras
TECHNIQUES
- Basic data manipulation and visualization
- Rouge scoring
- Tokenization
- Word embeddings
- Neural network architectures like convolutional neural networks and recurrent neural networks
you will learn
In this liveProject, you’ll master extractive text summarization, a well established field that intersects natural language processing and deep learning which is easily transferred to other NLP projects.
- Preparing your data set with text-cleaning and text processing
- Converting an abstractive text summarization dataset to an extractive one
- Calculation of a Rouge score between a pair of sentences
- Preprocessing a prepared extractive text summarization dataset
- Preparing the train, test, and validation splits with the Python data ecosystem
- Building deep learning models and evaluating them with TensorFlow and scikit-learn