Building LLM Applications with DSPy you own this product

Replacing manual prompts with systematic optimization

Serj Smorodinsky and Brett Kennedy

MEAP began May 2026
Last updated May 2026
Publication in Fall 2026 (estimated)

ISBN 9781633435018
250 pages (estimated)

Included with a Manning Online subscription

printed in black & white

catalog / Data Science

resources: Source code Book forum Source code on Github

table of content

1 Introduction to prompt programming and DSPy

1.1 What makes prompt programming different?

1.1.1 The goals of prompt programming

1.1.2 Prompt programming vs prompt engineering

1.2 Introducing DSPy

1.2.1 DSPy code example

1.2.2 Advantages to working with code

1.2.3 A methodical search for optimal prompts

1.2.4 The bitter lesson

1.2.5 A data-driven approach to tuning prompts

1.2.6 Adapting to change

1.2.7 Making the best use of LMs

1.3 Working with complex LM-based applications in DSPy

1.4 Where is DSPy useful?

1.5 Building LLM applications through baselines and optimization

1.6 Summary

2 Basic prompting and DSPy

2.1 Prompts

2.1.1 Classification prompt example

2.1.2 The components of an effective prompt

2.2 A full DSPy application

2.3 The main concepts in DSPy

2.4 The LM (language model)

2.4.1 Storing API keys in .env files

2.4.2 Calling language models directly

2.4.3 Using LiteLLM to access LMs

2.4.4 LM caching

2.4.5 Setting language model parameters

2.4.6 Switching between LMs

2.5 Signatures

2.5.1 Example asking for a confidence score

2.5.2 Summarization example

2.5.3 Translation example

2.5.4 Entailment example

2.5.5 Style transfer example

2.6 Modules

2.7 Predictions

2.8 Code readability and code history

2.9 Summary

3 Classifying user intent

3.1 Creating a baseline classifier

3.1.1 Utterances

3.1.2 Multi-class, multi-label, and dynamic classification

3.2 Download and prepare the ATIS dataset

3.3 Building a DSPy intent classifier

3.3.1 The signature

3.3.2 The module

3.4 Using the OpenAI API directly

3.4.1 Classification using the OpenAI API

3.4.2 System and user roles

3.4.3 A comparison of DSPy and the direct use of LM APIs

3.5 Dynamic labels vs static labels

3.5.1 Dynamic intent classification

3.5.2 Static intent classification

3.6 Handling multiple intents

3.7 Viewing the history

3.8 Summary

4 Evaluating DSPy programs

4.1 Creating a dataset for evaluation

4.1.1 Using the Example class

4.1.2 Dividing the data into train and test sets

4.1.3 The sizes of the sets

4.1.4 Splitting the ATIS data

4.1.5 Using DSPy to generate examples

4.2 Evaluating a module with a test set

4.2.1 Defining a metric for evaluation

4.2.2 Defining a metric for the baseline model

4.2.3 Calculating a final evaluation for the module

4.2.4 Evaluation for tuning versus for a final evaluation

4.2.5 Testing the metric function

4.3 Evaluating the DSPy baseline model

4.3.1 Using custom python code for evaluation

4.3.2 Using DSPy Evaluate

4.3.3 Rate limits

4.4 Evaluating a manually-created prompt using the OpenAI API

4.5 Evaluating the per-class performance

4.6 Evaluating the consistency of responses

4.7 Evaluating other LMs and modules

4.8 Summary

5 Optimizing prompt examples

5.1 General approaches to optimization

5.2 The LabeledFewShot optimizer

5.2.1 Executing the LabeledFewShot optimizer

5.2.2 Executing the LabeledFewShot optimizer repeatedly in a loop

5.2.3 Working with ChainOfThought

5.3 BootstrapFewShot

5.3.1 Working with ChainOfThought

5.3.2 Specifying a teacher

5.4 BootstrapFewShotWithRandomSearch

5.5 KNN: Finding Examples Dynamically

5.6 Evaluation Results

5.6.1 gpt-4o-mini

5.6.2 GPT-4.1-nano

5.7 Summary

6 Optimizing prompt instructions

6.1 COPRO

6.1.1 What are the prefixes to the output fields?

6.1.2 What are the prompt instructions?

6.1.3 Executing the COPRO optimizer

6.1.4 Using a prompt LM

6.1.5 How COPRO generates candidate instructions

6.1.6 Executing multiple optimizers

6.2 MIPROv2

6.2.1 Grounding

6.2.2 Using Bayesian hyperparameter tuning for trial optimization

6.3 InferRules

6.4 SIMBA

6.4.1 Optimizing the airline classifier with SIMBA

6.4.2 Taking advantage of feedback

6.5 GEPA

6.5.1 Genetic Algorithms

6.5.2 The Pareto front

6.5.3 Generating new candidates

6.5.4 Using GEPA for the airline classifier

6.6 Ensemble

6.7 Saving the programs generated by DSPy

6.8 Comparing optimizers

6.9 Summary

7 Custom models

8 Summarization & LLM as a judge

9 Agentic RAG based chatbot

Overview

1 Introduction to prompt programming and DSPy

Building LLM applications often begins with simple prompts, but quickly becomes difficult because language models can respond very differently to small wording changes. Traditional prompt engineering relies on manually rewriting and testing prompts, which is slow, inconsistent, and hard to maintain. Prompt programming offers a higher-level alternative: instead of focusing on exact prompt wording, developers describe the task, inputs, expected outputs, and evaluation criteria in code.

DSPy, short for Declarative Self-improving Python, is presented as a framework for prompt programming. It lets developers build LM-based applications using modular Python components rather than hand-crafted prompts. DSPy can automatically generate, evaluate, and optimize prompts, making it easier to improve quality, reduce development time, switch between models, test different prompting strategies, and rerun optimization when requirements or models change.

The chapter emphasizes that DSPy is especially valuable for complex or long-running applications such as RAG systems, agents, chatbots, summarizers, classifiers, and workflows involving many LM calls. Its data-driven approach encourages developers to create baselines, evaluate performance rigorously, and then optimize prompts automatically. While casual one-off prompting may not require DSPy, the framework can help teams build more reliable, maintainable, cost-effective LLM applications, often enabling smaller or cheaper models to perform well through better prompts.

Layers of code when working with DSPy. We usually need only work at the top level, which is the source code we create – using classes and functions provided by the DSPy layer below.

Prompt of optimization of a customer service intent classifier. The optimization process generated 3 candidate prompts that are each evaluated. In the end, the best is selected. In this case, that is the 2nd prompt, which has the highest score, 90%. DSPy also supports processes that modify and re-evaluate the prompts over several iterations.

An example control flow in which several LM calls are made. The specific set of LM calls executed and their specific content are determined by previous LM calls and calls to tools.

The three main stages of building an LM-based application with DSPy

Example creating a simple, baseline application in DSPy

Once we have a baseline application, or any other version of the application, we can evaluate this.

DSpy supports automatically optimizing the prompts used by an application.

Summary

To get good results from an LM, it’s necessary to ensure that it’s given a good prompt.Manually creating and tuning prompts is often slow and ad-hoc. It also must be repeated for each LM that’s considered.
Prompt programming, which supports automatically creating, evaluating, and optimizing prompts, provides a more modern alternative.
DSPy is the state of the art in prompt programming. It allows for clean, understandable, and simple code that can be easily re-executed to test new LMs or new prompting techniques.
Working with DSPy allows us to develop quickly, as much of the work interacting with LMs is now handled automatically by the framework.
Using DSPy, it’s recommended to first create a baseline application, then evaluate it, then optimize it. Optimization works by having another LM suggest many candidate prompts and by carefully evaluating each of these.
Optimization may execute over multiple iterations, and uses established optimization techniques to identify, each iteration, progressively stronger candidate prompts.
DSPy supports defining complex workflows.Where workflows contain multiple prompts, DSPy allows us to optimize the full set of prompts together, allowing us to create very effective applications.

FAQ

What is prompt programming?

Prompt programming is an approach to building LM-based applications where developers specify, in code, what they want the language model to do rather than manually writing and tuning exact prompt text. Developers define the inputs, expected outputs, and evaluation criteria, while a framework such as DSPy generates and optimizes the prompts automatically.

How is prompt programming different from prompt engineering?

Prompt engineering is the manual process of repeatedly rewriting prompts to improve LM responses. Prompt programming works at a higher level: developers describe the task in code, and tools like DSPy automatically generate, test, and optimize candidate prompts. This makes the process more systematic, repeatable, and easier to maintain.

What does DSPy stand for?

DSPy stands for Declarative Self-improving Python. “Declarative” means developers declare what they want the LM to do instead of writing the exact prompt. “Self-improving” refers to DSPy’s ability to optimize prompts automatically. “Python” reflects that DSPy is developed and used in Python.

Why can prompt engineering be difficult and time-consuming?

Language models are sensitive to small changes in prompt wording, so different prompts can produce very different results. Developers often have to test many variations manually, and the process can become ad hoc, slow, and hard to track. Prompts may also grow long and messy, making it difficult to know why each instruction was added or how to improve it later.

What are the main goals of prompt programming tools like DSPy?

The main goals include improving prompt quality, reducing development time, making it easier to switch between language models, supporting experimentation with different prompting techniques, and allowing prompt development code to be re-executed whenever needed.

How does DSPy optimize prompts?

DSPy can generate many candidate prompts, evaluate each one against a defined dataset and evaluation function, and select the best-performing prompt. It can also refine prompts over multiple iterations using established optimization approaches such as hill climbing, genetic algorithms, and Bayesian optimization.

Why is evaluation important in DSPy?

Evaluation is central because it determines which prompts actually perform best. Since LMs are stochastic, the same prompt can produce different results on different runs, and one prompt may work better for some inputs than others. DSPy encourages a data-driven process where prompts are tested consistently against examples and judged with a defined evaluation function.

What are the advantages of working with DSPy code instead of raw prompts?

DSPy code is modular, readable, testable, and easier to maintain. It lets developers use established software engineering practices such as debugging, testing, code review, and monitoring. DSPy also makes it easier to swap language models, change prompting techniques, and build complex workflows from smaller components.

When is DSPy especially useful?

DSPy is especially useful for large, complex, or long-running LM applications, such as RAG systems, agents, chatbots, summarization tools, and applications with many LM calls. It is also valuable when prompt quality matters, when prompts need to be optimized across different models, or when using smaller, cheaper, or faster LMs effectively is important.

What workflow does the chapter recommend for building LLM applications with DSPy?

The recommended workflow is to first create a simple baseline version of the application, then evaluate that baseline, and finally optimize the application by letting DSPy discover stronger prompts for the LM calls. Once the optimized version performs satisfactorily, it can be put into production.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

Introductory offer
Save 50% for a limited time!

eBook

pdf, ePub, online

$47.99 $23.99

you save $24.00 (50%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

Introductory offer
Save 50% for a limited time!

eBook

$47.99 $23.99

you save $24.00 (50%)

Introductory offer
Save 50% for a limited time!

eBook

pdf, ePub, online

$47.99 $23.99

you save $24.00 (50%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more