ReAct Agents Explained: A Step-by-Step Implementation Using LangGraph
- Nagesh Singh Chauhan
- Jan 4
- 11 min read
A hands-on walkthrough of a hotel cancellation agent using ReAct and LangGraph. Learn how reasoning, tools, memory, and safety come together in a real production scenario.

Introduction
Large Language Models (LLMs) have rapidly evolved from passive text generators into systems capable of reasoning, planning, and interacting with external tools. However, a raw LLM on its own is fundamentally limited: it can think, but it cannot act; it can generate reasoning, but it lacks persistent state and structured control over execution.
This gap is where AI agents emerge—and among agentic patterns, ReAct (Reason + Act) has become one of the most influential.
ReAct agents interleave reasoning steps with actions taken in the external world, allowing models to think through problems, invoke tools when needed, observe results, and iteratively refine their approach. Rather than a single prompt-response interaction, ReAct transforms an LLM into a decision-making loop.
To build such agents reliably in production, we need more than prompt engineering—we need explicit orchestration, state management, and control flow. This is exactly what LangGraph provides.
This article focuses on understanding ReAct deeply and why LangGraph is the right abstraction before building a ReAct agent completely from scratch.
What Is a ReAct Agent?
A ReAct agent operates on a simple but powerful loop:
Reason → Act → Observe → Repeat

Instead of forcing the LLM to answer everything directly, we allow it to:
Reason about what it knows and what it needs
Act by invoking tools (search, APIs, databases, calculators)
Observe the result of those actions
Refine its reasoning based on new information
This mirrors how humans solve complex problems.
Why ReAct Is Better Than Pure Chain-of-Thought
Traditional Chain-of-Thought (CoT) prompting improves reasoning but has a major limitation:
The model reasons only within its internal knowledge
It cannot validate assumptions or fetch new data
ReAct solves this by grounding reasoning in reality through tools.
Approach | Limitation |
Direct Prompting | Hallucination, no verification |
Chain-of-Thought | Reasoning without action |
Tool-Only Agents | Reactive, no planning |
ReAct | Reasoning + grounded action |
This makes ReAct agents:
More accurate
More interpretable
More adaptable to real-world tasks

A ReAct agent’s internal dialogue looks like this:
Thought: I don’t know the answer yet. I should search.
Action: Search("Paris weather this week")
Observation: It will rain on Thursday.
Thought: I should suggest indoor activities.
Final Answer: ...Each step is explicit, auditable, and controllable.
ReAct Prompting
ReAct prompting is a specialized prompting strategy designed to guide a large language model (LLM) to operate according to the ReAct paradigm—an iterative loop of reasoning (thought), tool usage (action), and context updating (observation). While it is not strictly mandatory to use classical ReAct prompting to build a ReAct agent, most ReAct-based systems either implement it directly or draw heavy inspiration from its structure.

Originally introduced in the ReAct research paper, this prompting technique serves as the behavioral blueprint for how an LLM should think, act, and adapt while solving a task. Rather than producing a single answer in one step, the model is explicitly instructed to reason step by step, decide when to invoke tools, and incorporate the results of those actions into subsequent reasoning.
Purpose of ReAct Prompting
At its core, ReAct prompting aims to:
Enforce a structured reasoning loop
Explicitly define which tools (actions) the model can use
Teach the model when to stop reasoning and produce a final answer
This structure transforms an LLM from a passive text generator into an interactive problem-solving agent.
Key Elements of ReAct Prompting
A well-designed ReAct prompt typically ensures the following behaviors:
1. Chain-of-Thought Reasoning
The model is encouraged to reason explicitly and incrementally. Instead of jumping to conclusions, it “thinks out loud,” breaking complex tasks into manageable steps. These reasoning steps are interleaved with actions rather than isolated at the beginning.
2. Explicit Action Space
The prompt defines a fixed set of actions the model is allowed to take. These actions usually correspond to external tools such as:
Search engines
Knowledge bases
Calculators
APIs or databases
By constraining the action space, the model learns how and when to seek external information rather than hallucinating answers.
3. Observation Integration
After each action, the model is instructed to observe the result and reassess its context. These observations update the agent’s internal state and directly influence the next reasoning step.
This grounding step is critical—it ensures that reasoning is based on real outcomes, not assumptions.
4. Iterative Looping
The prompt explicitly allows (and often encourages) the agent to repeat the Thought → Action → Observation cycle multiple times. Termination can be governed by:
A maximum number of iterations
The agent’s own determination that it has enough information
An explicit “final answer” signal
This looping behavior is what enables ReAct agents to handle complex, multi-step tasks.
5. Final Answer Generation
Once the termination condition is met, the model is instructed to stop reasoning and present a concise, user-facing answer. Importantly, many ReAct prompts ask the model to perform its reasoning in a scratchpad, ensuring that intermediate thoughts do not leak into the final output unless desired.
Canonical ReAct Prompt Structure
A classic ReAct prompt follows a rigid but powerful format:
Question: <user question>
Thought: <reason about what to do next>
Action: <selected tool>
Action Input: <tool input>
Observation: <tool output>
... (repeat as needed)
Thought: I now know the final answer
Final Answer: <answer to the user>
This format teaches the model:
When to think
When to act
How to interpret results
When to conclude

Workflow diagram of the React-Agent-based prompt optimization loop for LLM adaptation. Image Credits
Zero-Shot ReAct Prompting
One of the most widely used demonstrations of ReAct prompting is the zero-shot ReAct system prompt, where the model is guided entirely by instructions—without any example demonstrations.
In this setup:
The available tools are declared upfront
The expected Thought/Action/Observation format is enforced
The model learns to behave as a ReAct agent purely from instructions
This approach proves that ReAct behavior is not dependent on few-shot examples, but rather on clear structural guidance.
ReAct Prompting vs ReAct Agents
It’s important to distinguish between the two:
ReAct prompting defines how the model should behave
ReAct agents define how the system executes that behavior
Prompting alone relies on the LLM to manage loops, termination, and safety. Frameworks like LangGraph externalize these concerns into explicit control flow and state management, making the system more robust and production-ready.
ReAct prompting is the cognitive instruction manual for ReAct agents. It teaches the model how to think and act, but it does not enforce how execution happens. For reliable, scalable agents, ReAct prompting is most powerful when paired with structured orchestration—where reasoning remains flexible, but control remains deterministic.
Why LangGraph for ReAct Agents?

LangGraph workflow. Image Credits
ReAct agents are not just prompt patterns—they are iterative, stateful decision systems. As soon as an agent reasons, acts, observes, and loops, it stops being a single LLM call and starts behaving like a program with memory, control flow, and side effects.
This is precisely where most naïve ReAct implementations fail.
The Core Problem: ReAct Is a State Machine, Not a Prompt
At a conceptual level, a ReAct agent is a finite (or semi-infinite) state machine:
The agent has state (messages, tool outputs, context)
It transitions between modes (reasoning, acting, observing)
It loops until a termination condition is met
However, many implementations attempt to encode this logic using:
While loops
Prompt heuristics
Regex-based parsing
Implicit control flow
These approaches work in demos but collapse under real-world complexity.
What Breaks Without LangGraph
Let’s examine what typically goes wrong when ReAct is implemented without a proper orchestration framework.
1. Implicit Control Flow
Most ReAct loops look like this:
while True:
llm_output = llm(prompt)
if "Action:" in llm_output:
tool_result = call_tool(...)
else:
breakProblems:
Control flow is hidden inside string parsing
Hard to reason about execution paths
No guarantees about termination
Debugging becomes guesswork
ReAct appears simple, but the logic is brittle.
2. Fragile State Management
Without a structured state:
Messages are concatenated blindly
Tool outputs get mixed with reasoning
Memory grows unbounded
Context windows overflow silently
You lose:
Reproducibility
Observability
Safety
3. No First-Class Loop Semantics
ReAct requires looping, but ad-hoc loops:
Can spin forever
Are difficult to interrupt
Cannot be conditionally branched cleanly
Cannot be inspected mid-execution
Agents become black boxes.
4. Poor Production Readiness
In real systems, you need:
Step-level logging
Retry semantics
Human-in-the-loop checkpoints
Tool safety validation
Partial execution recovery
Prompt-only ReAct agents cannot support this reliably.
Key Concepts in LangGraph (Agent-First View)
LangGraph is built on a fundamental realization:
An agent is not a chain—it is a graph.

1. State: The Agent’s Memory
The state is a structured object that persists across steps.
Typical ReAct state includes:
Conversation history
Tool outputs
Intermediate reasoning
Execution metadata
Unlike stateless LLM calls, LangGraph ensures memory continuity.
2. Nodes: Cognitive & Operational Units
Each node does one thing well:
Reasoning node → calls the LLM
Action node → executes tools
Validation node → checks outputs
Memory node → updates long-term context
This separation improves:
Debuggability
Testability
Safety
3. Edges: Explicit Control Flow
Edges define:
Loops (ReAct cycles)
Conditional branching
Termination logic
This prevents runaway agents and enables deterministic behavior.
4. Deterministic Execution with Flexibility
LangGraph gives you:
Predictable execution paths
While still allowing probabilistic LLM reasoning
This balance is critical for enterprise-grade agents.
ReAct + LangGraph: A Natural Fit
ReAct Requirement | LangGraph Capability |
Iterative reasoning | Graph loops |
Tool grounding | Tool nodes |
State tracking | Shared state |
Safety controls | Conditional edges |
Observability | Step-level tracing |
Together, they form a robust agentic architecture, not just a prompt pattern.
Use case: Hotel cancellation assistant (policy + refund calculation)
Problem Statement
Hotel cancellations involve complex rate-plan rules and time-based penalties, making refund calculations error-prone and inconsistent when handled manually. The goal of this use case is to build an AI-powered hotel cancellation assistant that accurately interprets cancellation policies, calculates refunds using external tools, and delivers transparent, auditable decisions through a structured reasoning and action loop.
Goal: A guest asks:
“I booked a Non-Refundable rate for Jan 20–22 at $120/night. I cancelled on Jan 18. How much refund do I get?”
A ReAct agent should:
Reason what it needs (rate plan policy + refund math)
Act by calling tools (get_policy, calculate_refund)
Observe tool outputs
Answer clearly
Step 1: Install dependencies
pip install -U langgraph langchain langchain-openaiSet your key:
export OPENAI_API_KEY="..."Step 2: Define tools (your “Actions”)
These are the functions the agent can call.
from typing import TypedDict, Annotated
from datetime import datetime
import json
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from langchain_core.messages import (
BaseMessage,
HumanMessage,
ToolMessage,
SystemMessage
)
from langchain_core.tools import tool
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import tools_condition
@tool
def get_cancellation_policy(rate_plan: str) -> str:
"""
Returns cancellation policy text for a given rate plan.
"""
policies = {
"flexible": "Free cancellation until 24 hours before check-in. After that, first night is charged.",
"semi-flex": "Free cancellation until 72 hours before check-in. After that, 50% of the stay is charged.",
"non-refundable": "No refund after booking. Full stay amount is charged on cancellation."
}
key = rate_plan.strip().lower()
return policies.get(key, "Policy not found. Supported: flexible, semi-flex, non-refundable.")
@tool
def calculate_refund(
rate_plan: str,
check_in: str,
cancel_date: str,
nightly_rate: float,
nights: int
) -> str:
"""
Calculates refund amount based on a simplified policy model.
Dates format: YYYY-MM-DD
"""
rp = rate_plan.strip().lower()
ci = datetime.strptime(check_in, "%Y-%m-%d").date()
cd = datetime.strptime(cancel_date, "%Y-%m-%d").date()
total = nightly_rate * nights
days_before = (ci - cd).days
if rp == "non-refundable":
refund = 0.0
charged = total
rule = "Non-refundable: no refund."
elif rp == "flexible":
if days_before >= 1:
refund = total
charged = 0.0
rule = "Flexible: cancelled >= 24h before check-in, full refund."
else:
charged = nightly_rate # 1 night penalty
refund = max(total - charged, 0.0)
rule = "Flexible: late cancel, 1 night charged."
elif rp == "semi-flex":
if days_before >= 3:
refund = total
charged = 0.0
rule = "Semi-flex: cancelled >= 72h before check-in, full refund."
else:
charged = 0.5 * total
refund = total - charged
rule = "Semi-flex: late cancel, 50% charged."
else:
return "Unsupported rate plan. Use: flexible, semi-flex, non-refundable."
return (
f"Rule: {rule}\n"
f"Days before check-in: {days_before}\n"
f"Total: ${total:.2f}\n"
f"Charged: ${charged:.2f}\n"
f"Refund: ${refund:.2f}"
)
Step 3: Build a ReAct loop in LangGraph (Reason → Tool → Reason)
LangGraph will:
Keep state (messages)
Decide if the model wants to call a tool
Route execution to tools
Loop until the model returns a final answer
from typing import TypedDict, Annotated
from langchain_core.messages import BaseMessage, HumanMessage
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.tools import Tool
from langgraph.prebuilt import ToolNode, tools_condition
# 1) Define state
class AgentState(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
booking_id: str
# 2) Define structured Output schema
class RefundDecision(BaseModel):
booking_id: str
rate_plan: str
total_amount: float
charged_amount: float
refund_amount: float
policy_summary: str
explanation: str
# 2) Choose model and System prompt
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
SYSTEM_PROMPT = SystemMessage(
content="""
You are a hotel cancellation assistant.
Rules:
- Use tools when needed.
- Never guess policy or refund.
- Final answer MUST be valid JSON with this schema:
{
"booking_id": "...",
"rate_plan": "...",
"total_amount": number,
"charged_amount": number,
"refund_amount": number,
"policy_summary": "...",
"explanation": "..."
}
"""
)
# 3) Register tools
tools = [get_cancellation_policy, calculate_refund]
def safe_tool_node(state):
last_msg = state["messages"][-1]
if not hasattr(last_msg, "tool_calls") or not last_msg.tool_calls:
return {}
tool_call = last_msg.tool_calls[0]
tool_name = tool_call["name"]
if tool_name not in ALLOWED_TOOLS:
return {
"messages": [
ToolMessage(
content=f"Tool '{tool_name}' is not allowed.",
tool_call_id=tool_call["id"]
)
]
}
for tool in tools:
if tool.name == tool_name:
result = tool.invoke(tool_call["args"])
return {
"messages": [
ToolMessage(
content=result,
tool_call_id=tool_call["id"]
)
]
}
# 4)Before reasoning, check if we already processed this booking.
REFUND_MEMORY = {}
def memory_lookup_node(state):
booking_id = state["booking_id"]
if booking_id in REFUND_MEMORY:
return {
"messages": [
HumanMessage(
content=f"Cached decision found:\n{REFUND_MEMORY[booking_id]}"
)
]
}
return {}
# 5) Reasoning node: LLM decides next action (tool call) or final answer
def agent_node(state: AgentState):
# Bind tools so the model can produce tool calls
llm_with_tools = llm.bind_tools(tools)
response = llm_with_tools.invoke(state["messages"])
return {"messages": [response]}
# 6) After final decision, store it.
def memory_write_node(state):
booking_id = state["booking_id"]
final_answer = state["messages"][-1].content
REFUND_MEMORY[booking_id] = final_answer
return {}
Build and Compile the graph
# 7) Build the graph
builder = StateGraph(AgentState)
# Nodes
builder.add_node("memory_lookup", memory_lookup_node)
builder.add_node("agent", agent_node)
builder.add_node("tools", safe_tool_node)
builder.add_node("memory_write", memory_write_node)
# Flow
builder.add_edge(START, "memory_lookup")
builder.add_edge("memory_lookup", "agent")
builder.add_conditional_edges(
"agent",
tools_condition,
{
"tools": "tools", # model wants to act
END: "memory_write" # model finished reasoning
}
)
builder.add_edge("tools", "agent")
builder.add_edge("memory_write", END)
graph = builder.compile()Step 4: Run the agent on the use case
query = """
Booking details:
Rate plan: Non-Refundable
Check-in: 2026-01-20
Nights: 2
Nightly rate: 120
Cancelled on: 2026-01-18
"""
result = graph.invoke({
"booking_id": "BKG-12345",
"messages": [
SYSTEM_PROMPT,
HumanMessage(content=query)
]
})
final_output = result["messages"][-1].content
print(final_output)Output
{
"booking_id": "BKG-12345",
"rate_plan": "Non-Refundable",
"total_amount": 240.0,
"charged_amount": 240.0,
"refund_amount": 0.0,
"policy_summary": "Non-refundable bookings do not allow refunds after confirmation.",
"explanation": "The booking was made under a non-refundable rate plan, which charges the full stay amount regardless of cancellation timing."
}What happens internally (ReAct behavior)

1) Thought (Reason)
The model realizes:
It needs the policy for “non-refundable”
It should calculate refund using the tool
2) Action (Tool call)
It calls:
get_cancellation_policy(rate_plan="non-refundable")
calculate_refund(rate_plan="non-refundable", ...)
3) Observation (Tool outputs)
The tool returns:
The policy text
Refund breakdown (refund $0, charged full stay)
4) Final Answer
The model produces a human-friendly explanation:
One-line policy summary
Refund amount and rationale
Why this is “ReAct” (not just tools)
Because the model is not forced to call tools—it decides:
what it needs
which tool to use
when to stop
LangGraph makes this safe and structured using:
State (messages)
ToolNode (executes tools)
Conditional edges (tools_condition)
Loop (tools → agent)
Conclusion
Building reliable AI agents requires moving beyond prompt engineering into structured system design. In this blog, we demonstrated how the ReAct paradigm transforms large language models from passive text generators into active decision-makers—capable of reasoning, invoking tools, observing outcomes, and iterating toward correct answers. However, as the use case showed, ReAct alone is not enough; without explicit control flow, safety checks, and memory, agent behavior quickly becomes brittle and opaque.
LangGraph provides the missing execution layer for ReAct agents. By modeling agents as stateful graphs with deterministic transitions, we gained fine-grained control over reasoning loops, tool usage, termination conditions, and memory persistence. Enhancements such as tool allowlists, structured JSON outputs, and booking-level memory turned a conceptual agent into a production-ready system. With LangSmith observability, every reasoning step, tool call, and state transition became traceable and auditable—critical for trust in real-world applications like refunds and policy enforcement.
Together, ReAct, LangGraph, form a powerful foundation for building safe, explainable, and scalable AI agents. The hotel cancellation assistant is just one example, but the same architecture applies to pricing engines, customer support automation, compliance workflows, and autonomous business operations. As AI systems increasingly take action in the real world, designing agents as transparent, governed systems will be not just an advantage—but a necessity.







Comments