In this liveProject, you’ll step into the role of a data scientist working for an investment firm. Your company wants to make sure their investments meet European Union guidelines for environmental sustainability. That’s where you come in.
The EU taxonomy for sustainable finance is big, complex, and confusing. Your bosses need a program that saves them from searching through hundreds of pages whenever they have a query. You’ve been tasked with building a machine learning model that can pose certain questions to the EU guidelines, and return reliable answers.
Your challenges will include extracting text data from the EU taxonomy document, and matching environment questions with the corresponding paragraph in the guidelines. You’ll then set up a pretrained transformer Question-Answering model, evaluate its performance, and combine it with your question-paragraph model for an end-to-end solution. When you’re done, you’ll have an interface into which you can type a sustainable finance question and receive the correct answer from the EU guidelines.
This project is designed for learning purposes and is not a complete, production-ready application or solution.
This liveProject is for intermediate Python programmers and who already know the basics of data science and Machine Learning. To begin this liveProject, you will need to be familiar with:
- Intermediate Python
- Basics of pandas
- Basics of NumPy
- Basics of scikit-learn
- Basics of data science
- Basics of machine learning
you will learn
In this liveProject, you’ll get to grips with fundamentals of Information Retrieval and Natural Language Processing that are the cornerstone of data and deep learning projects.
- Deep learning with Pytorch and Spacy
- Extracting text from PDFs
- Evaluating machine learning models
- Loading and working with pretrained models
- Transformers and auto-encoders
- Word and paragraph embeddings