1 Introduction to prompt programming and DSPy
Building LLM applications often begins with simple prompts, but quickly becomes difficult because language models can respond very differently to small wording changes. Traditional prompt engineering relies on manually rewriting and testing prompts, which is slow, inconsistent, and hard to maintain. Prompt programming offers a higher-level alternative: instead of focusing on exact prompt wording, developers describe the task, inputs, expected outputs, and evaluation criteria in code.
DSPy, short for Declarative Self-improving Python, is presented as a framework for prompt programming. It lets developers build LM-based applications using modular Python components rather than hand-crafted prompts. DSPy can automatically generate, evaluate, and optimize prompts, making it easier to improve quality, reduce development time, switch between models, test different prompting strategies, and rerun optimization when requirements or models change.
The chapter emphasizes that DSPy is especially valuable for complex or long-running applications such as RAG systems, agents, chatbots, summarizers, classifiers, and workflows involving many LM calls. Its data-driven approach encourages developers to create baselines, evaluate performance rigorously, and then optimize prompts automatically. While casual one-off prompting may not require DSPy, the framework can help teams build more reliable, maintainable, cost-effective LLM applications, often enabling smaller or cheaper models to perform well through better prompts.
Layers of code when working with DSPy. We usually need only work at the top level, which is the source code we create – using classes and functions provided by the DSPy layer below.
Prompt of optimization of a customer service intent classifier. The optimization process generated 3 candidate prompts that are each evaluated. In the end, the best is selected. In this case, that is the 2nd prompt, which has the highest score, 90%. DSPy also supports processes that modify and re-evaluate the prompts over several iterations.
An example control flow in which several LM calls are made. The specific set of LM calls executed and their specific content are determined by previous LM calls and calls to tools.
The three main stages of building an LM-based application with DSPy
Example creating a simple, baseline application in DSPy
Once we have a baseline application, or any other version of the application, we can evaluate this.
DSpy supports automatically optimizing the prompts used by an application.
Summary
- To get good results from an LM, it’s necessary to ensure that it’s given a good prompt.Manually creating and tuning prompts is often slow and ad-hoc. It also must be repeated for each LM that’s considered.
- Prompt programming, which supports automatically creating, evaluating, and optimizing prompts, provides a more modern alternative.
- DSPy is the state of the art in prompt programming. It allows for clean, understandable, and simple code that can be easily re-executed to test new LMs or new prompting techniques.
- Working with DSPy allows us to develop quickly, as much of the work interacting with LMs is now handled automatically by the framework.
- Using DSPy, it’s recommended to first create a baseline application, then evaluate it, then optimize it. Optimization works by having another LM suggest many candidate prompts and by carefully evaluating each of these.
- Optimization may execute over multiple iterations, and uses established optimization techniques to identify, each iteration, progressively stronger candidate prompts.
- DSPy supports defining complex workflows.Where workflows contain multiple prompts, DSPy allows us to optimize the full set of prompts together, allowing us to create very effective applications.
FAQ
What is prompt programming?
Prompt programming is an approach to building LM-based applications where developers specify, in code, what they want the language model to do rather than manually writing and tuning exact prompt text. Developers define the inputs, expected outputs, and evaluation criteria, while a framework such as DSPy generates and optimizes the prompts automatically.
How is prompt programming different from prompt engineering?
Prompt engineering is the manual process of repeatedly rewriting prompts to improve LM responses. Prompt programming works at a higher level: developers describe the task in code, and tools like DSPy automatically generate, test, and optimize candidate prompts. This makes the process more systematic, repeatable, and easier to maintain.
What does DSPy stand for?
DSPy stands for Declarative Self-improving Python. “Declarative” means developers declare what they want the LM to do instead of writing the exact prompt. “Self-improving” refers to DSPy’s ability to optimize prompts automatically. “Python” reflects that DSPy is developed and used in Python.
Why can prompt engineering be difficult and time-consuming?
Language models are sensitive to small changes in prompt wording, so different prompts can produce very different results. Developers often have to test many variations manually, and the process can become ad hoc, slow, and hard to track. Prompts may also grow long and messy, making it difficult to know why each instruction was added or how to improve it later.
What are the main goals of prompt programming tools like DSPy?
The main goals include improving prompt quality, reducing development time, making it easier to switch between language models, supporting experimentation with different prompting techniques, and allowing prompt development code to be re-executed whenever needed.
How does DSPy optimize prompts?
DSPy can generate many candidate prompts, evaluate each one against a defined dataset and evaluation function, and select the best-performing prompt. It can also refine prompts over multiple iterations using established optimization approaches such as hill climbing, genetic algorithms, and Bayesian optimization.
Why is evaluation important in DSPy?
Evaluation is central because it determines which prompts actually perform best. Since LMs are stochastic, the same prompt can produce different results on different runs, and one prompt may work better for some inputs than others. DSPy encourages a data-driven process where prompts are tested consistently against examples and judged with a defined evaluation function.
What are the advantages of working with DSPy code instead of raw prompts?
DSPy code is modular, readable, testable, and easier to maintain. It lets developers use established software engineering practices such as debugging, testing, code review, and monitoring. DSPy also makes it easier to swap language models, change prompting techniques, and build complex workflows from smaller components.
When is DSPy especially useful?
DSPy is especially useful for large, complex, or long-running LM applications, such as RAG systems, agents, chatbots, summarization tools, and applications with many LM calls. It is also valuable when prompt quality matters, when prompts need to be optimized across different models, or when using smaller, cheaper, or faster LMs effectively is important.
What workflow does the chapter recommend for building LLM applications with DSPy?
The recommended workflow is to first create a simple baseline version of the application, then evaluate that baseline, and finally optimize the application by letting DSPy discover stronger prompts for the LM calls. Once the optimized version performs satisfactorily, it can be put into production.
Building LLM Applications with DSPy ebook for free