LangChain

Augment with Embeddings you own this product

This project is part of the liveProject series Build a Custom Chatbot Using LangChain and ChatGPT
prerequisites
intermediate Python • managing file operations with the os and shutil modules
skills learned
use of embeddings with the OpenAI API for similarity search

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Look inside

In this liveProject, you’ll become a software engineer at InfoHub, an up-and-coming AI startup looking to revolutionize how companies interact with their knowledge bases. InfoHub seeks to utilize groundbreaking large language models to deliver a system whereby a user’s questions about company data can be answered through a Q&A-style language interface. You’ll begin by assisting them in creating this tool by processing, tokenizing, and converting data into embeddings with BeautifulSoup and tiktoken.

This project is designed for learning purposes and is not a complete, production-ready application or solution.

book resources

When you start your liveProject, you get full access to the following books for 90 days.

project author

Pablo Elgueta
Over the past seven years, Pablo has been deeply involved in the tech industry, specifically focusing on software product development for the last four years. He worked at Microsoft, where he expanded his expertise in large-scale technology solutions. His academic journey led him to earn an MSc in Data Science and Machine Learning, where he researched the application of OpenAI’s GPT3 to predict cryptocurrency prices for his dissertation. Pablo has also spoken at the API Days conference, discussing the practical implementation of Large Language Models (LLMs). He has developed 'FanPods,’ an innovative app that turns text into AI-driven audiobooks. As an LLM Engineer at AffiliateAI, Pablo is crafting solutions for chat-based data retrieval using LLMs.

prerequisites

This liveProject is for intermediate-level Python developers. No special tools are required—you can perform everything you need using a normal IDE or Jupyter Notebook.


TOOLS
  • Intermediate Python
  • Basics of pandas
  • Basics of the OpenAI API

TECHNIQUES
  • Basics of tokens
  • Basics of embeddings
  • Basics of LLMs

you will learn

In this liveProject, you’ll learn to implement embeddings and tokenization, vital skills for creating LLM-powered apps.


  • Scrape and parse web pages using BeautifulSoup to extract text, links, and other relevant data
  • Tokenize texts for large language models utilizing tiktoken, which is crucial for obeying token limits
  • Store and retrieve embeddings and processed data efficiently using pandas DataFrames
  • Derive insights from extensive datasets leveraging sophisticated OpenAI large language models
  • Traverse directories and subdirectories thoroughly using the OS library's os.walk function
  • Protect API keys through secure environment variable API key management
  • Potentially expedite processing and improve performance by employing parallel processing techniques

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Augment with Embeddings project for free