As large language models (LLMs) evolve, so does the need for modular, optimizable, and reproducible pipelines. Enter DSPy — a framework from Stanford designed to declaratively build, compose, and optimize LLM pipelines. In this blog, we’ll walk through:
- What is DSPy?
- Core concepts: Module, Signature, Optimizer
- Building your first DSPy pipeline
- Using optimizers like BootstrapFewShot
- Full working example: question answering pipeline
What is DSPy?
DSPy (Declarative Self-improving Python) lets you express LLM applications in terms of what should be done (Signature) and how it should be done (Module). The optimizer then figures out the best prompt or few-shot examples for your task.
- Think of it as “scikit-learn for LLM pipelines”.
Installing DSPy
pip install dspy-ai
Core Concepts
1. Signature: Define Inputs and Outputs
A Signature defines the interface — the inputs the module receives and the outputs it must produce.
import dspy
class QA(dspy.Signature):
"""Answer questions using a passage."""
context = dspy.InputField()
question = dspy.InputField()
answer = dspy.OutputField(desc="Answer to the question based on the context")
2. Modules: Chains, Programs, or Generators
Modules implement logic — LLMs, chains, or multi-step reasoning.
qa_module = dspy.Predict(QA) # Use an LLM to predict answer given context + question
3. Optimizers: Self-improvement via Examples
DSPy can learn from examples using optimizers like BootstrapFewShot.
from dspy.optimize import BootstrapFewShot
trainset = [
dspy.Example(
context="Paris is the capital of France.",
question="What is the capital of France?",
answer="Paris"
)
]
devset = [
dspy.Example(
context="Berlin is the capital of Germany.",
question="What is the capital of Germany?",
answer="Berlin"
)
]
optimizer = BootstrapFewShot(metric=dspy.metrics.answer_exact_match)
optimized_qa_module = optimizer.compile(
qa_module,
trainset=trainset,
evalset=devset
)
Full Example: QA Pipeline with DSPy
Step 1: Set the backend LLM
from dspy import OpenAI
dspy.settings.configure(llm=OpenAI(model='gpt-4'))
You can also use local models, HuggingFace models, or Gemini via custom dspy.Backend.
Step 2: Define signature
class AnswerQuestion(dspy.Signature):
passage = dspy.InputField()
question = dspy.InputField()
answer = dspy.OutputField(desc="Direct answer to the question")
Step 3: Create a module and run
predictor = dspy.Predict(AnswerQuestion)
sample = predictor(
passage="Mars is the fourth planet from the Sun.",
question="Which planet is fourth from the sun?"
)
print(sample.answer)
Step 4: Optimize it with examples
from dspy.optimize import BootstrapFewShot
trainset = [
dspy.Example(
passage="Mars is red and is the fourth planet.",
question="Which planet is fourth?",
answer="Mars"
),
dspy.Example(
passage="Earth is the third planet from the Sun.",
question="Which planet is third?",
answer="Earth"
)
]
devset = [
dspy.Example(
passage="Jupiter is the fifth planet.",
question="Which planet is fifth?",
answer="Jupiter"
)
]
optimizer = BootstrapFewShot(metric=dspy.metrics.answer_exact_match)
optimized_module = optimizer.compile(
predictor,
trainset=trainset,
evalset=devset
)
# Try inference again
result = optimized_module(
passage="Venus is the second planet from the Sun.",
question="Which planet is second?"
)
print("Optimized Answer:", result.answer)
Why Use DSPy?
| Feature | Benefit |
| Declarative Interface | Write clean, intention-first code |
| Optimization | Automatically improve your prompts |
| Composability | Chain modules together for complex pipelines |
| Backend Agnostic | Use OpenAI, Hugging Face, Groq, Gemini, etc. |
Thinking about developing an AI-powered application? Jellyfish Technologies is here to help you build custom, scalable, and innovative solutions tailored to your business needs. Let’s connect and bring your AI vision to life.
