Building Multi-Agent Systems with LangGraph: A Practical Guide
How I built a system where AI agents collaborate to solve complex problems - from architecture decisions to production deployment.
Single-agent systems hit a wall when tasks get complex. That's when you need multiple specialized agents working together. Here's how I built a multi-agent system using LangGraph that actually works in production.
Why Multi-Agent?
The pitch is simple: instead of one general-purpose agent trying to do everything, you have specialists that collaborate.
- Research Agent: Gathers information
- Analysis Agent: Processes and synthesizes
- Writer Agent: Produces final output
- Critic Agent: Reviews and improves
Each agent can have its own model, tools, and prompts optimized for its specific task.
LangGraph Fundamentals
LangGraph models agent workflows as graphs. Nodes are agents or tools, edges define the flow.
from langgraph.graph import StateGraph, END
# Define the state that flows through the system
class AgentState(TypedDict):
messages: List[BaseMessage]
current_agent: str
research_results: Optional[dict]
draft: Optional[str]
feedback: Optional[str]
# Create the graph
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node("researcher", research_agent)
workflow.add_node("analyzer", analysis_agent)
workflow.add_node("writer", writer_agent)
workflow.add_node("critic", critic_agent)
The Routing Challenge
The hard part isn't building agents - it's deciding which agent should run next. I've tried several approaches:
1. Fixed Routing
Simple but inflexible. Works for predictable workflows.
workflow.add_edge("researcher", "analyzer")
workflow.add_edge("analyzer", "writer")
workflow.add_edge("writer", "critic")
2. Conditional Routing
More flexible. Decisions based on state.
def route_after_critic(state: AgentState) -> str:
if state["feedback"] == "approved":
return END
return "writer" # Send back for revision
workflow.add_conditional_edges("critic", route_after_critic)
3. LLM-Based Routing
Most flexible but adds latency and cost.
def supervisor_route(state: AgentState) -> str:
# Use a fast model to decide next agent
decision = supervisor_llm.invoke(
f"Given the current state, which agent should act next? "
f"Options: researcher, analyzer, writer, critic, end"
)
return decision.content.strip().lower()
Memory Management
Each agent needs context, but passing everything everywhere is expensive. Here's my approach:
class SharedMemory:
def __init__(self):
self.facts = [] # Verified information
self.drafts = [] # Version history
self.decisions = [] # Key decisions made
def summarize_for_agent(self, agent_type: str) -> str:
# Each agent gets a relevant summary
if agent_type == "writer":
return self._format_facts_and_latest_draft()
elif agent_type == "critic":
return self._format_latest_draft_with_requirements()
# etc.
Error Handling and Recovery
Agents fail. Models hallucinate. APIs timeout. Plan for it.
@retry(max_attempts=3, backoff=exponential)
async def run_agent(agent, state, timeout=30):
try:
return await asyncio.wait_for(
agent.invoke(state),
timeout=timeout
)
except Exception as e:
# Log, alert, and fall back
return create_fallback_response(agent.name, e)
The Production Architecture
Here's what runs in production:
┌─────────────────────────────────────────────┐
│ Supervisor │
│ (Routes, monitors, retries) │
└─────────────────────┬───────────────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌─────────┐
│Research│ │ Analysis │ │ Writing │
│ Agent │──────│ Agent │──────│ Agent │
└────────┘ └──────────┘ └─────────┘
│ │ │
└─────────────────┼─────────────────┘
│
▼
┌──────────┐
│ Critic │
│ Agent │
└──────────┘
Performance Optimizations
Multi-agent systems can be slow. Here's how we got latency down:
- Parallel execution: Run independent agents concurrently
- Streaming: Start processing partial results
- Model selection: Use GPT-4 for complex reasoning, GPT-3.5-turbo for simple tasks
- Caching: Store intermediate results
# Parallel research with multiple sources
async def parallel_research(query: str):
tasks = [
search_web(query),
search_docs(query),
search_database(query)
]
results = await asyncio.gather(*tasks)
return merge_results(results)
Lessons Learned
After six months in production:
- Start simple: Begin with 2-3 agents, add more only when needed
- Observability is critical: Log every agent decision and handoff
- Human-in-the-loop: For high-stakes decisions, add approval checkpoints
- Test the edges: Edge cases in routing cause the worst bugs
What's Next
I'm exploring:
- Dynamic agent spawning based on task complexity
- Learning from feedback to improve routing
- Cost optimization with smarter model selection
The field is moving fast. What was cutting-edge six months ago is now table stakes.
Building something similar? I'd love to hear about your approach. Connect with me on LinkedIn.
Stay updated
Get notified when I publish new articles on AI, ML, and engineering.
No spam. Unsubscribe anytime.