top of page

ReAct Agents Explained: A Step-by-Step Implementation Using LangGraph

  • Writer: Nagesh Singh Chauhan
    Nagesh Singh Chauhan
  • Jan 4
  • 11 min read

A hands-on walkthrough of a hotel cancellation agent using ReAct and LangGraph. Learn how reasoning, tools, memory, and safety come together in a real production scenario.



Introduction


Large Language Models (LLMs) have rapidly evolved from passive text generators into systems capable of reasoning, planning, and interacting with external tools. However, a raw LLM on its own is fundamentally limited: it can think, but it cannot act; it can generate reasoning, but it lacks persistent state and structured control over execution.

This gap is where AI agents emerge—and among agentic patterns, ReAct (Reason + Act) has become one of the most influential.


ReAct agents interleave reasoning steps with actions taken in the external world, allowing models to think through problems, invoke tools when needed, observe results, and iteratively refine their approach. Rather than a single prompt-response interaction, ReAct transforms an LLM into a decision-making loop.


To build such agents reliably in production, we need more than prompt engineering—we need explicit orchestration, state management, and control flow. This is exactly what LangGraph provides.


This article focuses on understanding ReAct deeply and why LangGraph is the right abstraction before building a ReAct agent completely from scratch.


What Is a ReAct Agent?


A ReAct agent operates on a simple but powerful loop:

Reason → Act → Observe → Repeat

Instead of forcing the LLM to answer everything directly, we allow it to:


  1. Reason about what it knows and what it needs

  2. Act by invoking tools (search, APIs, databases, calculators)

  3. Observe the result of those actions

  4. Refine its reasoning based on new information


This mirrors how humans solve complex problems.


Why ReAct Is Better Than Pure Chain-of-Thought


Traditional Chain-of-Thought (CoT) prompting improves reasoning but has a major limitation:


  • The model reasons only within its internal knowledge

  • It cannot validate assumptions or fetch new data


ReAct solves this by grounding reasoning in reality through tools.

Approach

Limitation

Direct Prompting

Hallucination, no verification

Chain-of-Thought

Reasoning without action

Tool-Only Agents

Reactive, no planning

ReAct

Reasoning + grounded action

This makes ReAct agents:


  • More accurate

  • More interpretable

  • More adaptable to real-world tasks



A ReAct agent’s internal dialogue looks like this:


Thought: I don’t know the answer yet. I should search.
Action: Search("Paris weather this week")
Observation: It will rain on Thursday.
Thought: I should suggest indoor activities.
Final Answer: ...

Each step is explicit, auditable, and controllable.


ReAct Prompting


ReAct prompting is a specialized prompting strategy designed to guide a large language model (LLM) to operate according to the ReAct paradigm—an iterative loop of reasoning (thought), tool usage (action), and context updating (observation). While it is not strictly mandatory to use classical ReAct prompting to build a ReAct agent, most ReAct-based systems either implement it directly or draw heavy inspiration from its structure.



Originally introduced in the ReAct research paper, this prompting technique serves as the behavioral blueprint for how an LLM should think, act, and adapt while solving a task. Rather than producing a single answer in one step, the model is explicitly instructed to reason step by step, decide when to invoke tools, and incorporate the results of those actions into subsequent reasoning.


Purpose of ReAct Prompting


At its core, ReAct prompting aims to:


  • Enforce a structured reasoning loop

  • Explicitly define which tools (actions) the model can use

  • Teach the model when to stop reasoning and produce a final answer


This structure transforms an LLM from a passive text generator into an interactive problem-solving agent.


Key Elements of ReAct Prompting


A well-designed ReAct prompt typically ensures the following behaviors:


1. Chain-of-Thought Reasoning


The model is encouraged to reason explicitly and incrementally. Instead of jumping to conclusions, it “thinks out loud,” breaking complex tasks into manageable steps. These reasoning steps are interleaved with actions rather than isolated at the beginning.


2. Explicit Action Space


The prompt defines a fixed set of actions the model is allowed to take. These actions usually correspond to external tools such as:


  • Search engines

  • Knowledge bases

  • Calculators

  • APIs or databases


By constraining the action space, the model learns how and when to seek external information rather than hallucinating answers.


3. Observation Integration


After each action, the model is instructed to observe the result and reassess its context. These observations update the agent’s internal state and directly influence the next reasoning step.


This grounding step is critical—it ensures that reasoning is based on real outcomes, not assumptions.


4. Iterative Looping


The prompt explicitly allows (and often encourages) the agent to repeat the Thought → Action → Observation cycle multiple times. Termination can be governed by:


  • A maximum number of iterations

  • The agent’s own determination that it has enough information

  • An explicit “final answer” signal


This looping behavior is what enables ReAct agents to handle complex, multi-step tasks.


5. Final Answer Generation


Once the termination condition is met, the model is instructed to stop reasoning and present a concise, user-facing answer. Importantly, many ReAct prompts ask the model to perform its reasoning in a scratchpad, ensuring that intermediate thoughts do not leak into the final output unless desired.


Canonical ReAct Prompt Structure


A classic ReAct prompt follows a rigid but powerful format:

Question: <user question>

Thought: <reason about what to do next>
Action: <selected tool>
Action Input: <tool input>
Observation: <tool output>

... (repeat as needed)

Thought: I now know the final answer
Final Answer: <answer to the user>

This format teaches the model:


  • When to think

  • When to act

  • How to interpret results

  • When to conclude


Workflow diagram of the React-Agent-based prompt optimization loop for LLM adaptation. Image Credits


Zero-Shot ReAct Prompting


One of the most widely used demonstrations of ReAct prompting is the zero-shot ReAct system prompt, where the model is guided entirely by instructions—without any example demonstrations.


In this setup:


  • The available tools are declared upfront

  • The expected Thought/Action/Observation format is enforced

  • The model learns to behave as a ReAct agent purely from instructions


This approach proves that ReAct behavior is not dependent on few-shot examples, but rather on clear structural guidance.


ReAct Prompting vs ReAct Agents


It’s important to distinguish between the two:


  • ReAct prompting defines how the model should behave

  • ReAct agents define how the system executes that behavior


Prompting alone relies on the LLM to manage loops, termination, and safety. Frameworks like LangGraph externalize these concerns into explicit control flow and state management, making the system more robust and production-ready.


ReAct prompting is the cognitive instruction manual for ReAct agents. It teaches the model how to think and act, but it does not enforce how execution happens. For reliable, scalable agents, ReAct prompting is most powerful when paired with structured orchestration—where reasoning remains flexible, but control remains deterministic.


Why LangGraph for ReAct Agents?


LangGraph workflow. Image Credits


ReAct agents are not just prompt patterns—they are iterative, stateful decision systems. As soon as an agent reasons, acts, observes, and loops, it stops being a single LLM call and starts behaving like a program with memory, control flow, and side effects.


This is precisely where most naïve ReAct implementations fail.


The Core Problem: ReAct Is a State Machine, Not a Prompt


At a conceptual level, a ReAct agent is a finite (or semi-infinite) state machine:


  • The agent has state (messages, tool outputs, context)

  • It transitions between modes (reasoning, acting, observing)

  • It loops until a termination condition is met


However, many implementations attempt to encode this logic using:


  • While loops

  • Prompt heuristics

  • Regex-based parsing

  • Implicit control flow


These approaches work in demos but collapse under real-world complexity.


What Breaks Without LangGraph


Let’s examine what typically goes wrong when ReAct is implemented without a proper orchestration framework.


1. Implicit Control Flow


Most ReAct loops look like this:


while True:
    llm_output = llm(prompt)
    if "Action:" in llm_output:
        tool_result = call_tool(...)
    else:
        break

Problems:


  • Control flow is hidden inside string parsing

  • Hard to reason about execution paths

  • No guarantees about termination

  • Debugging becomes guesswork


ReAct appears simple, but the logic is brittle.


2. Fragile State Management


Without a structured state:

  • Messages are concatenated blindly

  • Tool outputs get mixed with reasoning

  • Memory grows unbounded

  • Context windows overflow silently


You lose:


  • Reproducibility

  • Observability

  • Safety


3. No First-Class Loop Semantics


ReAct requires looping, but ad-hoc loops:


  • Can spin forever

  • Are difficult to interrupt

  • Cannot be conditionally branched cleanly

  • Cannot be inspected mid-execution


Agents become black boxes.


4. Poor Production Readiness


In real systems, you need:


  • Step-level logging

  • Retry semantics

  • Human-in-the-loop checkpoints

  • Tool safety validation

  • Partial execution recovery


Prompt-only ReAct agents cannot support this reliably.


Key Concepts in LangGraph (Agent-First View)


LangGraph is built on a fundamental realization:

An agent is not a chain—it is a graph.


1. State: The Agent’s Memory


The state is a structured object that persists across steps.


Typical ReAct state includes:


  • Conversation history

  • Tool outputs

  • Intermediate reasoning

  • Execution metadata


Unlike stateless LLM calls, LangGraph ensures memory continuity.


2. Nodes: Cognitive & Operational Units


Each node does one thing well:


  • Reasoning node → calls the LLM

  • Action node → executes tools

  • Validation node → checks outputs

  • Memory node → updates long-term context


This separation improves:


  • Debuggability

  • Testability

  • Safety


3. Edges: Explicit Control Flow


Edges define:


  • Loops (ReAct cycles)

  • Conditional branching

  • Termination logic


This prevents runaway agents and enables deterministic behavior.


4. Deterministic Execution with Flexibility


LangGraph gives you:


  • Predictable execution paths

  • While still allowing probabilistic LLM reasoning


This balance is critical for enterprise-grade agents.


ReAct + LangGraph: A Natural Fit

ReAct Requirement

LangGraph Capability

Iterative reasoning

Graph loops

Tool grounding

Tool nodes

State tracking

Shared state

Safety controls

Conditional edges

Observability

Step-level tracing

Together, they form a robust agentic architecture, not just a prompt pattern.


Use case: Hotel cancellation assistant (policy + refund calculation)


Problem Statement


Hotel cancellations involve complex rate-plan rules and time-based penalties, making refund calculations error-prone and inconsistent when handled manually. The goal of this use case is to build an AI-powered hotel cancellation assistant that accurately interprets cancellation policies, calculates refunds using external tools, and delivers transparent, auditable decisions through a structured reasoning and action loop.


Goal: A guest asks:

“I booked a Non-Refundable rate for Jan 20–22 at $120/night. I cancelled on Jan 18. How much refund do I get?”

A ReAct agent should:


  1. Reason what it needs (rate plan policy + refund math)

  2. Act by calling tools (get_policy, calculate_refund)

  3. Observe tool outputs

  4. Answer clearly


Step 1: Install dependencies

pip install -U langgraph langchain langchain-openai

Set your key:

export OPENAI_API_KEY="..."

Step 2: Define tools (your “Actions”)


These are the functions the agent can call.

from typing import TypedDict, Annotated
from datetime import datetime
import json

from pydantic import BaseModel

from langchain_openai import ChatOpenAI
from langchain_core.messages import (
    BaseMessage,
    HumanMessage,
    ToolMessage,
    SystemMessage
)
from langchain_core.tools import tool

from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import tools_condition


@tool
def get_cancellation_policy(rate_plan: str) -> str:
    """
    Returns cancellation policy text for a given rate plan.
    """
    policies = {
        "flexible": "Free cancellation until 24 hours before check-in. After that, first night is charged.",
        "semi-flex": "Free cancellation until 72 hours before check-in. After that, 50% of the stay is charged.",
        "non-refundable": "No refund after booking. Full stay amount is charged on cancellation."
    }
    key = rate_plan.strip().lower()
    return policies.get(key, "Policy not found. Supported: flexible, semi-flex, non-refundable.")

@tool
def calculate_refund(
    rate_plan: str,
    check_in: str,
    cancel_date: str,
    nightly_rate: float,
    nights: int
) -> str:
    """
    Calculates refund amount based on a simplified policy model.
    Dates format: YYYY-MM-DD
    """
    rp = rate_plan.strip().lower()
    ci = datetime.strptime(check_in, "%Y-%m-%d").date()
    cd = datetime.strptime(cancel_date, "%Y-%m-%d").date()

    total = nightly_rate * nights
    days_before = (ci - cd).days

    if rp == "non-refundable":
        refund = 0.0
        charged = total
        rule = "Non-refundable: no refund."
    elif rp == "flexible":
        if days_before >= 1:
            refund = total
            charged = 0.0
            rule = "Flexible: cancelled >= 24h before check-in, full refund."
        else:
            charged = nightly_rate  # 1 night penalty
            refund = max(total - charged, 0.0)
            rule = "Flexible: late cancel, 1 night charged."
    elif rp == "semi-flex":
        if days_before >= 3:
            refund = total
            charged = 0.0
            rule = "Semi-flex: cancelled >= 72h before check-in, full refund."
        else:
            charged = 0.5 * total
            refund = total - charged
            rule = "Semi-flex: late cancel, 50% charged."
    else:
        return "Unsupported rate plan. Use: flexible, semi-flex, non-refundable."

    return (
        f"Rule: {rule}\n"
        f"Days before check-in: {days_before}\n"
        f"Total: ${total:.2f}\n"
        f"Charged: ${charged:.2f}\n"
        f"Refund: ${refund:.2f}"
    )

Step 3: Build a ReAct loop in LangGraph (Reason → Tool → Reason)


LangGraph will:


  • Keep state (messages)

  • Decide if the model wants to call a tool

  • Route execution to tools

  • Loop until the model returns a final answer

from typing import TypedDict, Annotated
from langchain_core.messages import BaseMessage, HumanMessage
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

from langchain_openai import ChatOpenAI
from langchain_core.tools import Tool
from langgraph.prebuilt import ToolNode, tools_condition

# 1) Define state
class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
    booking_id: str

# 2) Define structured Output schema
class RefundDecision(BaseModel):
    booking_id: str
    rate_plan: str
    total_amount: float
    charged_amount: float
    refund_amount: float
    policy_summary: str
    explanation: str

# 2) Choose model and System prompt
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

SYSTEM_PROMPT = SystemMessage(
    content="""
You are a hotel cancellation assistant.

Rules:
- Use tools when needed.
- Never guess policy or refund.
- Final answer MUST be valid JSON with this schema:

{
  "booking_id": "...",
  "rate_plan": "...",
  "total_amount": number,
  "charged_amount": number,
  "refund_amount": number,
  "policy_summary": "...",
  "explanation": "..."
}
"""
)

# 3) Register tools
tools = [get_cancellation_policy, calculate_refund]

def safe_tool_node(state):
    last_msg = state["messages"][-1]

    if not hasattr(last_msg, "tool_calls") or not last_msg.tool_calls:
        return {}

    tool_call = last_msg.tool_calls[0]
    tool_name = tool_call["name"]

    if tool_name not in ALLOWED_TOOLS:
        return {
            "messages": [
                ToolMessage(
                    content=f"Tool '{tool_name}' is not allowed.",
                    tool_call_id=tool_call["id"]
                )
            ]
        }

    for tool in tools:
        if tool.name == tool_name:
            result = tool.invoke(tool_call["args"])
            return {
                "messages": [
                    ToolMessage(
                        content=result,
                        tool_call_id=tool_call["id"]
                    )
                ]
            }

# 4)Before reasoning, check if we already processed this booking.
REFUND_MEMORY = {}

def memory_lookup_node(state):
    booking_id = state["booking_id"]
    if booking_id in REFUND_MEMORY:
        return {
            "messages": [
                HumanMessage(
                    content=f"Cached decision found:\n{REFUND_MEMORY[booking_id]}"
                )
            ]
        }
    return {}

# 5) Reasoning node: LLM decides next action (tool call) or final answer
def agent_node(state: AgentState):
    # Bind tools so the model can produce tool calls
    llm_with_tools = llm.bind_tools(tools)
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

# 6) After final decision, store it.
def memory_write_node(state):
    booking_id = state["booking_id"]
    final_answer = state["messages"][-1].content
    REFUND_MEMORY[booking_id] = final_answer
    return {}

Build and Compile the graph


# 7) Build the graph
builder = StateGraph(AgentState)

# Nodes
builder.add_node("memory_lookup", memory_lookup_node)
builder.add_node("agent", agent_node)
builder.add_node("tools", safe_tool_node)
builder.add_node("memory_write", memory_write_node)

# Flow
builder.add_edge(START, "memory_lookup")
builder.add_edge("memory_lookup", "agent")

builder.add_conditional_edges(
    "agent",
    tools_condition,
    {
        "tools": "tools",   # model wants to act
        END: "memory_write" # model finished reasoning
    }
)

builder.add_edge("tools", "agent")
builder.add_edge("memory_write", END)

graph = builder.compile()

Step 4: Run the agent on the use case

query = """
Booking details:
Rate plan: Non-Refundable
Check-in: 2026-01-20
Nights: 2
Nightly rate: 120
Cancelled on: 2026-01-18
"""

result = graph.invoke({
    "booking_id": "BKG-12345",
    "messages": [
        SYSTEM_PROMPT,
        HumanMessage(content=query)
    ]
})

final_output = result["messages"][-1].content
print(final_output)

Output


{
  "booking_id": "BKG-12345",
  "rate_plan": "Non-Refundable",
  "total_amount": 240.0,
  "charged_amount": 240.0,
  "refund_amount": 0.0,
  "policy_summary": "Non-refundable bookings do not allow refunds after confirmation.",
  "explanation": "The booking was made under a non-refundable rate plan, which charges the full stay amount regardless of cancellation timing."
}


What happens internally (ReAct behavior)




1) Thought (Reason)


The model realizes:


  • It needs the policy for “non-refundable”

  • It should calculate refund using the tool


2) Action (Tool call)


It calls:


  • get_cancellation_policy(rate_plan="non-refundable")

  • calculate_refund(rate_plan="non-refundable", ...)


3) Observation (Tool outputs)


The tool returns:


  • The policy text

  • Refund breakdown (refund $0, charged full stay)


4) Final Answer


The model produces a human-friendly explanation:


  • One-line policy summary

  • Refund amount and rationale


Why this is “ReAct” (not just tools)


Because the model is not forced to call tools—it decides:


  • what it needs

  • which tool to use

  • when to stop


LangGraph makes this safe and structured using:


  • State (messages)

  • ToolNode (executes tools)

  • Conditional edges (tools_condition)

  • Loop (tools → agent)


Conclusion


Building reliable AI agents requires moving beyond prompt engineering into structured system design. In this blog, we demonstrated how the ReAct paradigm transforms large language models from passive text generators into active decision-makers—capable of reasoning, invoking tools, observing outcomes, and iterating toward correct answers. However, as the use case showed, ReAct alone is not enough; without explicit control flow, safety checks, and memory, agent behavior quickly becomes brittle and opaque.


LangGraph provides the missing execution layer for ReAct agents. By modeling agents as stateful graphs with deterministic transitions, we gained fine-grained control over reasoning loops, tool usage, termination conditions, and memory persistence. Enhancements such as tool allowlists, structured JSON outputs, and booking-level memory turned a conceptual agent into a production-ready system. With LangSmith observability, every reasoning step, tool call, and state transition became traceable and auditable—critical for trust in real-world applications like refunds and policy enforcement.


Together, ReAct, LangGraph, form a powerful foundation for building safe, explainable, and scalable AI agents. The hotel cancellation assistant is just one example, but the same architecture applies to pricing engines, customer support automation, compliance workflows, and autonomous business operations. As AI systems increasingly take action in the real world, designing agents as transparent, governed systems will be not just an advantage—but a necessity.

Comments


Follow

  • Facebook
  • Linkedin
  • Instagram
  • Twitter
Sphere on Spiral Stairs

©2026 by Intelligent Machines

bottom of page