Four-Project Series

Transformers and Explainable AI for Computer Vision you own this product

intermediate Python • intermediate TensorFlow • intermediate PyTorch • intermediate knowledge of deep learning models • basic knowledge of transformer models • intermediate knowledge of computer vision concepts
skills learned
Train and use transformers for classification, segmentation, and object detection • Build real-world applications by deploying a trained transformer model and explaining its predictions.
Anuradha Kar
4 weeks · 3-5 hours per week average · INTERMEDIATE

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!


5, 10 or 20 seats+ for your team - learn more

In this series of liveProjects, you’ll join up with four different computer vision companies to explore computer vision models powered by the latest deep learning architectures. You’ll utilize the groundbreaking transformer architecture, which forms the driving force behind ChatGPT, to develop a series of increasingly complex models. Starting with a classifier to detect brain tumors, you'll move on to a segmentation algorithm and an object detection application capable of detecting construction vehicles and structural flaws. Finally, you’ll take on the role of an MLOps expert, implementing model deployment and explainability in the systems you’ve developed.

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

It’s an excellent course, exactly what I needed in my neuroscience PhD, as I am dealing with ultra-high field fMRI data, where we need to manually segment portions of the fMRI image - the layers.This live project series is golden for anyone interested in learning transformers and their application to different computer vision tasks, like MRI!.

Arslan Gabdulkhakov, neuroscience PhD student

here's what's included

Project 1 Image Classification

In this liveProject, you’ll join BrainAI’s MRI data analysis team. BrainAI needs you to develop a state-of-the-art AI module utilizing vision transformer models to detect brain tumors with exceptional accuracy. Armed with Python tools like Hugging Face Transformers, PyTorch, and more, you'll detect the presence or absence of tumors within human-brain MRI datasets. With Google Colab's GPU computing resources, you'll utilize deep learning to try and achieve a 95%+ accuracy rate.

Project 2 Image Segmentation

In this liveProject, you'll pioneer the development of cutting-edge MRI segmentation algorithms using transformer architecture for computer vision company VisionSys. Manual segmentation is labor-intensive and expensive, so you’ll be developing a custom model that can do it for you. You'll train and evaluate SegFormer and MaskFormer models to identify brain tumor regions with over 90% accuracy. With Python tools like Hugging Face Transformers and Google Colab's GPU computing resources, you'll create pipelines, preprocess data, and showcase sample predictions and quantitative results.

Project 3 Object Detection

In this liveProject, you'll spearhead the development of AI-aided surveillance software for construction site supervision. You’ll build two computer vision applications capable of detecting construction vehicles and their types across a large worksite and a more powerful model that can detect building defects such as cracks and fissures. Start by working with a pre-trained DETR model, then explore the Roboflow platform to assist you as you create a multi-class object detection dataset from multiple datasets with non-identical classes. With these datasets, you will train different transformer models for object detection to identify construction vehicles and cracks in buildings.

Project 4 Deploying Models

In this liveProject, you’ll take on the role of an engineer at AISoft, where you'll be part of two dynamic teams: MLOps and Core-ML. On the MLOps team, you'll utilize software engineering techniques to ensure your models are not only accurate but also scalable, reliable, and maintainable. You’ve been assigned two important support tasks for the Core-ML team.First, you'll utilize the Gradio Python library to create an interactive web user interface that runs MRI classification and segmentation transformer models in the background, and then you'll build a pipeline that provides interpretability to the decisions of a construction vehicle classification model.

book resources

When you start each of the projects in this series, you'll get full access to the following book for 90 days.

choose your plan


only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Transformers and Explainable AI for Computer Vision project for free

Very interesting series with innovative materials about transformers.

Mikael Dautrey, infrastructure consultant

project author

Anuradha Kar
Anuradha Kar is a researcher at the Institut Pasteur in Paris, working on deep learning applications in drug discovery. Before this, she worked at the Paris Brain Institute on applying attention-based deep learning models to understanding the evolution of Alzheimer's disease and at École normale supérieure de Lyon in France on deep learning-based analysis of 3D bio-image datasets. She has a Ph.D. in electrical engineering from the National University of Ireland, Galway. In 2021, she published a liveProject series with Manning Publications titled Transfer Learning for Dicom Image Classification.


This liveProject series is aimed at intermediate-level Python programmers who already know the basics of deep learning and computer vision.

  • Intermediate Python
  • Intermediate Jupyter Notebook
  • Intermediate TensorFlow
  • Intermediate PyTorch
  • Intermediate OpenCV

  • Intermediate levels of deep learning and image classification
  • Intermediate levels of data science

you will learn

In these liveProjects, you'll train a wide range of Transformer models for computational imaging tasks like classification, segmentation, and object detection. Additionally, you'll deploy these models for real-world web applications and implement explainability techniques to interpret predictions, fostering insight and trust.

  • Train transformers for classification, segmentation, and object detection tasks
  • Implement end-to-end transformer-based image analysis pipelines, including model deployment and explanations
  • Gain proficiency with the Hugging Face Transformers library, fine-tuning models for computer vision
  • Access open-source datasets on Roboflow, learn how to create and merge datasets on this platform and facilitate cross-platform data sharing with Hugging Face
  • Compare the performance of transformer-based models for classification, segmentation, and object detection tasks
  • Understand two deep learning model explainers
  • Use Python libraries to explain decisions of vision transformer models
  • Display and interpret deep learning model explanations
  • Deploy transformer models in real-world web applications using the Gradio library
  • Explore characteristics of medical image datasets (MRI)


You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.