February 24, 20244 min read

Building Multi-Agent Systems with LangGraph: A Practical Guide

How I built a system where AI agents collaborate to solve complex problems - from architecture decisions to production deployment.

Single-agent systems hit a wall when tasks get complex. That's when you need multiple specialized agents working together. Here's how I built a multi-agent system using LangGraph that actually works in production.

Why Multi-Agent?

The pitch is simple: instead of one general-purpose agent trying to do everything, you have specialists that collaborate.

Research Agent: Gathers information
Analysis Agent: Processes and synthesizes
Writer Agent: Produces final output
Critic Agent: Reviews and improves

Each agent can have its own model, tools, and prompts optimized for its specific task.

LangGraph Fundamentals

LangGraph models agent workflows as graphs. Nodes are agents or tools, edges define the flow.

from langgraph.graph import StateGraph, END

# Define the state that flows through the system
class AgentState(TypedDict):
    messages: List[BaseMessage]
    current_agent: str
    research_results: Optional[dict]
    draft: Optional[str]
    feedback: Optional[str]

# Create the graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("researcher", research_agent)
workflow.add_node("analyzer", analysis_agent)
workflow.add_node("writer", writer_agent)
workflow.add_node("critic", critic_agent)

The Routing Challenge

The hard part isn't building agents - it's deciding which agent should run next. I've tried several approaches:

1. Fixed Routing

Simple but inflexible. Works for predictable workflows.

workflow.add_edge("researcher", "analyzer")
workflow.add_edge("analyzer", "writer")
workflow.add_edge("writer", "critic")

2. Conditional Routing

More flexible. Decisions based on state.

def route_after_critic(state: AgentState) -> str:
    if state["feedback"] == "approved":
        return END
    return "writer"  # Send back for revision

workflow.add_conditional_edges("critic", route_after_critic)

3. LLM-Based Routing

Most flexible but adds latency and cost.

def supervisor_route(state: AgentState) -> str:
    # Use a fast model to decide next agent
    decision = supervisor_llm.invoke(
        f"Given the current state, which agent should act next? "
        f"Options: researcher, analyzer, writer, critic, end"
    )
    return decision.content.strip().lower()

Memory Management

Each agent needs context, but passing everything everywhere is expensive. Here's my approach:

class SharedMemory:
    def __init__(self):
        self.facts = []  # Verified information
        self.drafts = []  # Version history
        self.decisions = []  # Key decisions made

    def summarize_for_agent(self, agent_type: str) -> str:
        # Each agent gets a relevant summary
        if agent_type == "writer":
            return self._format_facts_and_latest_draft()
        elif agent_type == "critic":
            return self._format_latest_draft_with_requirements()
        # etc.

Error Handling and Recovery

Agents fail. Models hallucinate. APIs timeout. Plan for it.

@retry(max_attempts=3, backoff=exponential)
async def run_agent(agent, state, timeout=30):
    try:
        return await asyncio.wait_for(
            agent.invoke(state),
            timeout=timeout
        )
    except Exception as e:
        # Log, alert, and fall back
        return create_fallback_response(agent.name, e)

The Production Architecture

Here's what runs in production:

┌─────────────────────────────────────────────┐
│                 Supervisor                   │
│         (Routes, monitors, retries)          │
└─────────────────────┬───────────────────────┘
                      │
    ┌─────────────────┼─────────────────┐
    │                 │                 │
    ▼                 ▼                 ▼
┌────────┐      ┌──────────┐      ┌─────────┐
│Research│      │ Analysis │      │ Writing │
│ Agent  │──────│  Agent   │──────│  Agent  │
└────────┘      └──────────┘      └─────────┘
    │                 │                 │
    └─────────────────┼─────────────────┘
                      │
                      ▼
               ┌──────────┐
               │  Critic  │
               │  Agent   │
               └──────────┘

Performance Optimizations

Multi-agent systems can be slow. Here's how we got latency down:

Parallel execution: Run independent agents concurrently
Streaming: Start processing partial results
Model selection: Use GPT-4 for complex reasoning, GPT-3.5-turbo for simple tasks
Caching: Store intermediate results

# Parallel research with multiple sources
async def parallel_research(query: str):
    tasks = [
        search_web(query),
        search_docs(query),
        search_database(query)
    ]
    results = await asyncio.gather(*tasks)
    return merge_results(results)

Lessons Learned

After six months in production:

Start simple: Begin with 2-3 agents, add more only when needed
Observability is critical: Log every agent decision and handoff
Human-in-the-loop: For high-stakes decisions, add approval checkpoints
Test the edges: Edge cases in routing cause the worst bugs

What's Next

I'm exploring:

Dynamic agent spawning based on task complexity
Learning from feedback to improve routing
Cost optimization with smarter model selection

The field is moving fast. What was cutting-edge six months ago is now table stakes.

Building something similar? I'd love to hear about your approach. Connect with me on LinkedIn.

Written by Harika Yenuga

Senior AI/ML Engineer building production systems.

LinkedIn Twitter

Stay updated

Get notified when I publish new articles on AI, ML, and engineering.

No spam. Unsubscribe anytime.

Back to all posts