Vector Database and Document Retrieval

you own this product
prerequisites
intermediate Python • basics of embeddings and similarity • basics of APIs and databases • optional basic Docker
skills learned
Qdrant setup • semantic chunking with LangChain • embedding generation with sentence-transformers • semantic search with metadata filtering • packaging a reusable `retrieve(query, k)` API
1 week · 5-8 hours per week · INTERMEDIATE

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Look inside

Step into the role of an AI engineer building a semantic search system for a compliance team drowning in regulatory text. Working with the EU AI Act, a dense, hundred-page policy document, you’ll create a retrieval pipeline that lets analysts ask plain-language questions and instantly surface the most relevant passages by meaning, not just keyword match. You’ll set up a vector database, chunk the text in ways that preserve legal context, generate and store embeddings, and implement metadata-filtered retrieval. Along the way, you’ll validate chunk quality and package your retrieval logic into reusable functions.

This project is a part of the series Building an Agentic RAG Application.
This project is designed for learning purposes and is not a complete, production-ready application or solution.

project author

Matteus Tanha
Dr. Matteus Tanha is an AI engineer and architect with over a decade of experience building production machine learning and agentic AI systems. He is co-founder of Alpha Quants, a boutique AI consultancy serving finance and enterprise clients, and has led AI initiatives at organizations including the Financial Times and Zurich Insurance. At the Financial Times, he architected AskFT, a retrieval-augmented research assistant combining semantic search and LLM orchestration to serve over a million monthly users. His work spans hybrid retrieval systems, knowledge graphs, and multi-agent orchestration, with deep expertise in RAG architectures and vector and graph databases. Matteus holds a Ph.D. in Computational Chemistry from Carnegie Mellon University, where his research applied machine learning methods to quantum chemical computation.

prerequisites

This liveProject is for intermediate Python programmers who want to build production-ready Retrieval-Augmented Generation (RAG) systems using vector databases and semantic search.


TOOLS
  • Python (intermediate)
  • Jupyter Notebooks (basics)
  • Command line / Terminal (basics)
TECHNIQUES
  • Machine Learning Fundamentals
  • Data Processing
  • API Usage
  • Database Concepts

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from fellow participants and even more help with paid sessions with our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Vector Database and Document Retrieval project for free