ReAct Agents Explained: A Step-by-Step Implementation Using LangGraph

Nagesh Singh Chauhan
Jan 4
11 min read

A hands-on walkthrough of a hotel cancellation agent using ReAct and LangGraph. Learn how reasoning, tools, memory, and safety come together in a real production scenario.

Image Credits

Introduction

Large Language Models (LLMs) have rapidly evolved from passive text generators into systems capable of reasoning, planning, and interacting with external tools. However, a raw LLM on its own is fundamentally limited: it can think, but it cannot act; it can generate reasoning, but it lacks persistent state and structured control over execution.

This gap is where AI agents emerge—and among agentic patterns, ReAct (Reason + Act) has become one of the most influential.

ReAct agents interleave reasoning steps with actions taken in the external world, allowing models to think through problems, invoke tools when needed, observe results, and iteratively refine their approach. Rather than a single prompt-response interaction, ReAct transforms an LLM into a decision-making loop.

To build such agents reliably in production, we need more than prompt engineering—we need explicit orchestration, state management, and control flow. This is exactly what LangGraph provides.

This article focuses on understanding ReAct deeply and why LangGraph is the right abstraction before building a ReAct agent completely from scratch.

What Is a ReAct Agent?

A ReAct agent operates on a simple but powerful loop:

Reason → Act → Observe → Repeat

Image Credits

Instead of forcing the LLM to answer everything directly, we allow it to:

Reason about what it knows and what it needs
Act by invoking tools (search, APIs, databases, calculators)
Observe the result of those actions
Refine its reasoning based on new information

This mirrors how humans solve complex problems.

Why ReAct Is Better Than Pure Chain-of-Thought

Traditional Chain-of-Thought (CoT) prompting improves reasoning but has a major limitation:

The model reasons only within its internal knowledge
It cannot validate assumptions or fetch new data

ReAct solves this by grounding reasoning in reality through tools.

Approach	Limitation
Direct Prompting	Hallucination, no verification
Chain-of-Thought	Reasoning without action
Tool-Only Agents	Reactive, no planning
ReAct	Reasoning + grounded action

This makes ReAct agents:

More accurate
More interpretable
More adaptable to real-world tasks

Image Credits

A ReAct agent’s internal dialogue looks like this:

Thought: I don’t know the answer yet. I should search.
Action: Search("Paris weather this week")
Observation: It will rain on Thursday.
Thought: I should suggest indoor activities.
Final Answer: ...

Each step is explicit, auditable, and controllable.

ReAct Prompting

ReAct prompting is a specialized prompting strategy designed to guide a large language model (LLM) to operate according to the ReAct paradigm—an iterative loop of reasoning (thought), tool usage (action), and context updating (observation). While it is not strictly mandatory to use classical ReAct prompting to build a ReAct agent, most ReAct-based systems either implement it directly or draw heavy inspiration from its structure.

Originally introduced in the ReAct research paper, this prompting technique serves as the behavioral blueprint for how an LLM should think, act, and adapt while solving a task. Rather than producing a single answer in one step, the model is explicitly instructed to reason step by step, decide when to invoke tools, and incorporate the results of those actions into subsequent reasoning.

Purpose of ReAct Prompting

At its core, ReAct prompting aims to:

Enforce a structured reasoning loop
Explicitly define which tools (actions) the model can use
Teach the model when to stop reasoning and produce a final answer

This structure transforms an LLM from a passive text generator into an interactive problem-solving agent.

Key Elements of ReAct Prompting

A well-designed ReAct prompt typically ensures the following behaviors:

1. Chain-of-Thought Reasoning

The model is encouraged to reason explicitly and incrementally. Instead of jumping to conclusions, it “thinks out loud,” breaking complex tasks into manageable steps. These reasoning steps are interleaved with actions rather than isolated at the beginning.

2. Explicit Action Space

The prompt defines a fixed set of actions the model is allowed to take. These actions usually correspond to external tools such as:

Search engines
Knowledge bases
Calculators
APIs or databases

By constraining the action space, the model learns how and when to seek external information rather than hallucinating answers.

3. Observation Integration

After each action, the model is instructed to observe the result and reassess its context. These observations update the agent’s internal state and directly influence the next reasoning step.

This grounding step is critical—it ensures that reasoning is based on real outcomes, not assumptions.

4. Iterative Looping

The prompt explicitly allows (and often encourages) the agent to repeat the Thought → Action → Observation cycle multiple times. Termination can be governed by:

A maximum number of iterations
The agent’s own determination that it has enough information
An explicit “final answer” signal

This looping behavior is what enables ReAct agents to handle complex, multi-step tasks.

5. Final Answer Generation

Once the termination condition is met, the model is instructed to stop reasoning and present a concise, user-facing answer. Importantly, many ReAct prompts ask the model to perform its reasoning in a scratchpad, ensuring that intermediate thoughts do not leak into the final output unless desired.

Canonical ReAct Prompt Structure

A classic ReAct prompt follows a rigid but powerful format:

Question: <user question>

Thought: <reason about what to do next>
Action: <selected tool>
Action Input: <tool input>
Observation: <tool output>

... (repeat as needed)

Thought: I now know the final answer
Final Answer: <answer to the user>

This format teaches the model:

When to think
When to act
How to interpret results
When to conclude

Workflow diagram of the React-Agent-based prompt optimization loop for LLM adaptation. Image Credits

Zero-Shot ReAct Prompting

One of the most widely used demonstrations of ReAct prompting is the zero-shot ReAct system prompt, where the model is guided entirely by instructions—without any example demonstrations.

In this setup:

The available tools are declared upfront
The expected Thought/Action/Observation format is enforced
The model learns to behave as a ReAct agent purely from instructions

This approach proves that ReAct behavior is not dependent on few-shot examples, but rather on clear structural guidance.

ReAct Prompting vs ReAct Agents

It’s important to distinguish between the two:

ReAct prompting defines how the model should behave
ReAct agents define how the system executes that behavior

Prompting alone relies on the LLM to manage loops, termination, and safety. Frameworks like LangGraph externalize these concerns into explicit control flow and state management, making the system more robust and production-ready.

ReAct prompting is the cognitive instruction manual for ReAct agents. It teaches the model how to think and act, but it does not enforce how execution happens. For reliable, scalable agents, ReAct prompting is most powerful when paired with structured orchestration—where reasoning remains flexible, but control remains deterministic.

Why LangGraph for ReAct Agents?

LangGraph workflow. Image Credits

ReAct agents are not just prompt patterns—they are iterative, stateful decision systems. As soon as an agent reasons, acts, observes, and loops, it stops being a single LLM call and starts behaving like a program with memory, control flow, and side effects.

This is precisely where most naïve ReAct implementations fail.

The Core Problem: ReAct Is a State Machine, Not a Prompt

At a conceptual level, a ReAct agent is a finite (or semi-infinite) state machine:

The agent has state (messages, tool outputs, context)
It transitions between modes (reasoning, acting, observing)
It loops until a termination condition is met

However, many implementations attempt to encode this logic using:

While loops
Prompt heuristics
Regex-based parsing
Implicit control flow

These approaches work in demos but collapse under real-world complexity.

What Breaks Without LangGraph

Let’s examine what typically goes wrong when ReAct is implemented without a proper orchestration framework.

1. Implicit Control Flow

Most ReAct loops look like this:

while True:
    llm_output = llm(prompt)
    if "Action:" in llm_output:
        tool_result = call_tool(...)
    else:
        break

Problems:

Control flow is hidden inside string parsing
Hard to reason about execution paths
No guarantees about termination
Debugging becomes guesswork

ReAct appears simple, but the logic is brittle.

2. Fragile State Management

Without a structured state:

Messages are concatenated blindly
Tool outputs get mixed with reasoning
Memory grows unbounded
Context windows overflow silently

You lose:

Reproducibility
Observability
Safety

3. No First-Class Loop Semantics

ReAct requires looping, but ad-hoc loops:

Can spin forever
Are difficult to interrupt
Cannot be conditionally branched cleanly
Cannot be inspected mid-execution

Agents become black boxes.

4. Poor Production Readiness

In real systems, you need:

Step-level logging
Retry semantics
Human-in-the-loop checkpoints
Tool safety validation
Partial execution recovery

Prompt-only ReAct agents cannot support this reliably.

Key Concepts in LangGraph (Agent-First View)

LangGraph is built on a fundamental realization:

An agent is not a chain—it is a graph.

1. State: The Agent’s Memory

The state is a structured object that persists across steps.

Typical ReAct state includes:

Conversation history
Tool outputs
Intermediate reasoning
Execution metadata

Unlike stateless LLM calls, LangGraph ensures memory continuity.

2. Nodes: Cognitive & Operational Units

Each node does one thing well:

Reasoning node → calls the LLM
Action node → executes tools
Validation node → checks outputs
Memory node → updates long-term context

This separation improves:

Debuggability
Testability
Safety

3. Edges: Explicit Control Flow

Edges define:

Loops (ReAct cycles)
Conditional branching
Termination logic

This prevents runaway agents and enables deterministic behavior.

4. Deterministic Execution with Flexibility

LangGraph gives you:

Predictable execution paths
While still allowing probabilistic LLM reasoning

This balance is critical for enterprise-grade agents.

ReAct + LangGraph: A Natural Fit

ReAct Requirement	LangGraph Capability
Iterative reasoning	Graph loops
Tool grounding	Tool nodes
State tracking	Shared state
Safety controls	Conditional edges
Observability	Step-level tracing

Together, they form a robust agentic architecture, not just a prompt pattern.

Use case: Hotel cancellation assistant (policy + refund calculation)

Problem Statement

Hotel cancellations involve complex rate-plan rules and time-based penalties, making refund calculations error-prone and inconsistent when handled manually. The goal of this use case is to build an AI-powered hotel cancellation assistant that accurately interprets cancellation policies, calculates refunds using external tools, and delivers transparent, auditable decisions through a structured reasoning and action loop.

Goal: A guest asks:

“I booked a Non-Refundable rate for Jan 20–22 at $120/night. I cancelled on Jan 18. How much refund do I get?”

A ReAct agent should:

Reason what it needs (rate plan policy + refund math)
Act by calling tools (get_policy, calculate_refund)
Observe tool outputs
Answer clearly

Step 1: Install dependencies

pip install -U langgraph langchain langchain-openai

Set your key:

export OPENAI_API_KEY="..."

Step 2: Define tools (your “Actions”)

These are the functions the agent can call.

from typing import TypedDict, Annotated
from datetime import datetime
import json

from pydantic import BaseModel

from langchain_openai import ChatOpenAI
from langchain_core.messages import (
    BaseMessage,
    HumanMessage,
    ToolMessage,
    SystemMessage
)
from langchain_core.tools import tool

from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import tools_condition


@tool
def get_cancellation_policy(rate_plan: str) -> str:
    """
    Returns cancellation policy text for a given rate plan.
    """
    policies = {
        "flexible": "Free cancellation until 24 hours before check-in. After that, first night is charged.",
        "semi-flex": "Free cancellation until 72 hours before check-in. After that, 50% of the stay is charged.",
        "non-refundable": "No refund after booking. Full stay amount is charged on cancellation."
    }
    key = rate_plan.strip().lower()
    return policies.get(key, "Policy not found. Supported: flexible, semi-flex, non-refundable.")

@tool
def calculate_refund(
    rate_plan: str,
    check_in: str,
    cancel_date: str,
    nightly_rate: float,
    nights: int
) -> str:
    """
    Calculates refund amount based on a simplified policy model.
    Dates format: YYYY-MM-DD
    """
    rp = rate_plan.strip().lower()
    ci = datetime.strptime(check_in, "%Y-%m-%d").date()
    cd = datetime.strptime(cancel_date, "%Y-%m-%d").date()

    total = nightly_rate * nights
    days_before = (ci - cd).days

    if rp == "non-refundable":
        refund = 0.0
        charged = total
        rule = "Non-refundable: no refund."
    elif rp == "flexible":
        if days_before >= 1:
            refund = total
            charged = 0.0
            rule = "Flexible: cancelled >= 24h before check-in, full refund."
        else:
            charged = nightly_rate  # 1 night penalty
            refund = max(total - charged, 0.0)
            rule = "Flexible: late cancel, 1 night charged."
    elif rp == "semi-flex":
        if days_before >= 3:
            refund = total
            charged = 0.0
            rule = "Semi-flex: cancelled >= 72h before check-in, full refund."
        else:
            charged = 0.5 * total
            refund = total - charged
            rule = "Semi-flex: late cancel, 50% charged."
    else:
        return "Unsupported rate plan. Use: flexible, semi-flex, non-refundable."

    return (
        f"Rule: {rule}\n"
        f"Days before check-in: {days_before}\n"
        f"Total: ${total:.2f}\n"
        f"Charged: ${charged:.2f}\n"
        f"Refund: ${refund:.2f}"
    )

Step 3: Build a ReAct loop in LangGraph (Reason → Tool → Reason)

LangGraph will:

Keep state (messages)
Decide if the model wants to call a tool
Route execution to tools
Loop until the model returns a final answer

from typing import TypedDict, Annotated
from langchain_core.messages import BaseMessage, HumanMessage
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

from langchain_openai import ChatOpenAI
from langchain_core.tools import Tool
from langgraph.prebuilt import ToolNode, tools_condition

# 1) Define state
class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
    booking_id: str

# 2) Define structured Output schema
class RefundDecision(BaseModel):
    booking_id: str
    rate_plan: str
    total_amount: float
    charged_amount: float
    refund_amount: float
    policy_summary: str
    explanation: str

# 2) Choose model and System prompt
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

SYSTEM_PROMPT = SystemMessage(
    content="""
You are a hotel cancellation assistant.

Rules:
- Use tools when needed.
- Never guess policy or refund.
- Final answer MUST be valid JSON with this schema:

{
  "booking_id": "...",
  "rate_plan": "...",
  "total_amount": number,
  "charged_amount": number,
  "refund_amount": number,
  "policy_summary": "...",
  "explanation": "..."
}
"""
)

# 3) Register tools
tools = [get_cancellation_policy, calculate_refund]

def safe_tool_node(state):
    last_msg = state["messages"][-1]

    if not hasattr(last_msg, "tool_calls") or not last_msg.tool_calls:
        return {}

    tool_call = last_msg.tool_calls[0]
    tool_name = tool_call["name"]

    if tool_name not in ALLOWED_TOOLS:
        return {
            "messages": [
                ToolMessage(
                    content=f"Tool '{tool_name}' is not allowed.",
                    tool_call_id=tool_call["id"]
                )
            ]
        }

    for tool in tools:
        if tool.name == tool_name:
            result = tool.invoke(tool_call["args"])
            return {
                "messages": [
                    ToolMessage(
                        content=result,
                        tool_call_id=tool_call["id"]
                    )
                ]
            }

# 4)Before reasoning, check if we already processed this booking.
REFUND_MEMORY = {}

def memory_lookup_node(state):
    booking_id = state["booking_id"]
    if booking_id in REFUND_MEMORY:
        return {
            "messages": [
                HumanMessage(
                    content=f"Cached decision found:\n{REFUND_MEMORY[booking_id]}"
                )
            ]
        }
    return {}

# 5) Reasoning node: LLM decides next action (tool call) or final answer
def agent_node(state: AgentState):
    # Bind tools so the model can produce tool calls
    llm_with_tools = llm.bind_tools(tools)
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

# 6) After final decision, store it.
def memory_write_node(state):
    booking_id = state["booking_id"]
    final_answer = state["messages"][-1].content
    REFUND_MEMORY[booking_id] = final_answer
    return {}

Build and Compile the graph

# 7) Build the graph
builder = StateGraph(AgentState)

# Nodes
builder.add_node("memory_lookup", memory_lookup_node)
builder.add_node("agent", agent_node)
builder.add_node("tools", safe_tool_node)
builder.add_node("memory_write", memory_write_node)

# Flow
builder.add_edge(START, "memory_lookup")
builder.add_edge("memory_lookup", "agent")

builder.add_conditional_edges(
    "agent",
    tools_condition,
    {
        "tools": "tools",   # model wants to act
        END: "memory_write" # model finished reasoning
    }
)

builder.add_edge("tools", "agent")
builder.add_edge("memory_write", END)

graph = builder.compile()

Step 4: Run the agent on the use case

query = """
Booking details:
Rate plan: Non-Refundable
Check-in: 2026-01-20
Nights: 2
Nightly rate: 120
Cancelled on: 2026-01-18
"""

result = graph.invoke({
    "booking_id": "BKG-12345",
    "messages": [
        SYSTEM_PROMPT,
        HumanMessage(content=query)
    ]
})

final_output = result["messages"][-1].content
print(final_output)

Output

{
  "booking_id": "BKG-12345",
  "rate_plan": "Non-Refundable",
  "total_amount": 240.0,
  "charged_amount": 240.0,
  "refund_amount": 0.0,
  "policy_summary": "Non-refundable bookings do not allow refunds after confirmation.",
  "explanation": "The booking was made under a non-refundable rate plan, which charges the full stay amount regardless of cancellation timing."
}

What happens internally (ReAct behavior)

1) Thought (Reason)

The model realizes:

It needs the policy for “non-refundable”
It should calculate refund using the tool

2) Action (Tool call)

It calls:

get_cancellation_policy(rate_plan="non-refundable")
calculate_refund(rate_plan="non-refundable", ...)

3) Observation (Tool outputs)

The tool returns:

The policy text
Refund breakdown (refund $0, charged full stay)

4) Final Answer

The model produces a human-friendly explanation:

One-line policy summary
Refund amount and rationale

Why this is “ReAct” (not just tools)

Because the model is not forced to call tools—it decides:

what it needs
which tool to use
when to stop

LangGraph makes this safe and structured using:

State (messages)
ToolNode (executes tools)
Conditional edges (tools_condition)
Loop (tools → agent)

Conclusion

Building reliable AI agents requires moving beyond prompt engineering into structured system design. In this blog, we demonstrated how the ReAct paradigm transforms large language models from passive text generators into active decision-makers—capable of reasoning, invoking tools, observing outcomes, and iterating toward correct answers. However, as the use case showed, ReAct alone is not enough; without explicit control flow, safety checks, and memory, agent behavior quickly becomes brittle and opaque.

LangGraph provides the missing execution layer for ReAct agents. By modeling agents as stateful graphs with deterministic transitions, we gained fine-grained control over reasoning loops, tool usage, termination conditions, and memory persistence. Enhancements such as tool allowlists, structured JSON outputs, and booking-level memory turned a conceptual agent into a production-ready system. With LangSmith observability, every reasoning step, tool call, and state transition became traceable and auditable—critical for trust in real-world applications like refunds and policy enforcement.

Together, ReAct, LangGraph, form a powerful foundation for building safe, explainable, and scalable AI agents. The hotel cancellation assistant is just one example, but the same architecture applies to pricing engines, customer support automation, compliance workflows, and autonomous business operations. As AI systems increasingly take action in the real world, designing agents as transparent, governed systems will be not just an advantage—but a necessity.

Introduction

What Is a ReAct Agent?

Why ReAct Is Better Than Pure Chain-of-Thought

ReAct Prompting

Purpose of ReAct Prompting

Key Elements of ReAct Prompting

1. Chain-of-Thought Reasoning

2. Explicit Action Space

3. Observation Integration

4. Iterative Looping

5. Final Answer Generation

Canonical ReAct Prompt Structure

Zero-Shot ReAct Prompting

ReAct Prompting vs ReAct Agents

Why LangGraph for ReAct Agents?

The Core Problem: ReAct Is a State Machine, Not a Prompt

What Breaks Without LangGraph

1. Implicit Control Flow

2. Fragile State Management

3. No First-Class Loop Semantics

4. Poor Production Readiness

Key Concepts in LangGraph (Agent-First View)

1. State: The Agent’s Memory

2. Nodes: Cognitive & Operational Units

3. Edges: Explicit Control Flow

4. Deterministic Execution with Flexibility

ReAct + LangGraph: A Natural Fit

Use case: Hotel cancellation assistant (policy + refund calculation)

Problem Statement

Step 1: Install dependencies

Step 2: Define tools (your “Actions”)

Step 3: Build a ReAct loop in LangGraph (Reason → Tool → Reason)

Step 4: Run the agent on the use case

Output

What happens internally (ReAct behavior)

1) Thought (Reason)

2) Action (Tool call)

3) Observation (Tool outputs)

4) Final Answer

Why this is “ReAct” (not just tools)

Conclusion

Comments