ReAct and Reasoning Loops

How Do Agents Decide What to Do?

A chatbot generates one response per message: the user asks, the model answers, conversation over. An agent, by contrast, must make a sequence of decisions. At each step it faces questions that a single-turn model never encounters: which tool should I call? What arguments should I pass? Did the previous action succeed or fail? Do I have enough information to answer the user, or do I need another step? Should I try a different approach entirely?

This requires something beyond pattern matching from training data. The agent needs to reason about its current state: what it knows, what it doesn't know, what it has tried, and what it should try next. Without explicit reasoning, an agent calling tools is like someone randomly pressing buttons on a calculator hoping to get the right answer. It might stumble onto a solution for simple tasks, but for anything that requires multiple coordinated steps, it will flounder.

The key insight comes from a 2022 paper called ReAct (Yao et al., 2023) , which proposed a surprisingly simple idea: interleave reasoning (thinking in natural language) with acting (calling tools and observing results). Instead of going directly from user question to tool call, the model first writes down what it's thinking and why it's choosing a particular action. This "thinking out loud" turns out to be the difference between an agent that works and one that spins in circles.

ReAct: Think, Then Act, Then Observe

The ReAct pattern structures agent behaviour into a repeating cycle of three steps: Thought , Action , and Observation . Let's walk through a concrete example. Suppose a user asks: "What is France's GDP per capita?" An agent with access to a search tool might proceed like this:

Thought: I need to calculate GDP per capita, which is GDP / population.
         I'll start by searching for France's population.
Action:  search("population of France 2024")
Observation: France population: 68.4 million (2024 estimate)

Thought: Good, I have the population. Now I need France's GDP.
Action:  search("GDP of France 2024")
Observation: France GDP: $3.13 trillion (2024)

Thought: I have both numbers. GDP per capita = $3.13T / 68.4M = ~$45,760.
         I can now answer the user.
Answer:  France's GDP per capita is approximately $45,760.

Notice what the Thought steps are doing. The first thought decomposes the problem (GDP per capita requires two pieces of data) and picks which one to get first. The second thought acknowledges the result and identifies what's still missing. The third thought performs the calculation and decides the agent has enough information to stop. Without these thoughts, the model would need to implicitly plan all of this in its hidden states, which works for trivial cases but breaks down as tasks get more complex.

Why do the explicit thoughts matter so much? Three reasons:

Debuggability: when an agent goes wrong, the thoughts tell you exactly where its reasoning broke. Without thoughts, you see a sequence of tool calls with no explanation, and diagnosing the failure requires guessing what the model was "thinking".
Course correction: the act of writing a thought forces the model to evaluate the current state. If the previous search returned irrelevant results, the thought step is where the model notices and adjusts ("That search didn't work, let me try a different query").
Scratchpad reasoning: the thoughts serve as a working memory. The model can hold intermediate calculations, partial answers, and remaining sub-goals in the text, rather than relying on its limited implicit working memory.

It helps to compare ReAct with two alternatives. Chain-of-thought (CoT) prompting (Wei et al., 2022) gets a model to reason step by step, but it never takes actions: all the reasoning happens in one shot, using only whatever knowledge is already in the model's weights. CoT can decompose a problem beautifully, but if it needs external data (a stock price, a database query, a file's contents), it's stuck. The other extreme is acting without reasoning : the model calls tools directly based on pattern matching, with no explicit planning. This works for single-step tool calls but falls apart when later steps depend on earlier results, because the model has no mechanism to evaluate progress or change strategy.

The key finding of the ReAct paper is that the combination of reasoning and acting outperforms either one alone. On benchmarks like HotpotQA (multi-hop question answering) and FEVER (fact verification), ReAct agents consistently beat both pure CoT reasoning (which can't look things up) and pure acting (which can't plan or course-correct). The explicit thoughts are not just a nice debugging feature; they materially improve task success.

💡 The name "ReAct" is a portmanteau of Reasoning and Acting. The paper showed that on tasks requiring information retrieval (HotpotQA, FEVER), ReAct reduced hallucination significantly compared to chain-of-thought alone, because the model could verify its claims by looking them up rather than guessing from memory.

Planning: Breaking Tasks into Steps

Simple questions need one tool call. Complex tasks need a plan : a decomposition of the user's request into a sequence of sub-goals, each of which may involve one or more tool calls. The way an agent plans has a dramatic effect on how robust it is when things go wrong.

There are three broad planning strategies:

Sequential planning: generate the full plan upfront, then execute each step in order. For example: "Step 1: search for population. Step 2: search for GDP. Step 3: compute the ratio. Step 4: format the answer." This is simple and fast, but brittle. If step 2 returns unexpected data (say, GDP in euros instead of dollars), the pre-made plan has no way to insert a currency conversion step. The agent either ignores the problem or crashes.
Dynamic planning: plan one step at a time, observe the result, then decide the next step. This is exactly what the ReAct loop does: each Thought step is a micro-planning decision based on everything observed so far. It's more robust because the agent can adapt ("The GDP was in euros, I need to convert to USD first"), but it's slower because each step requires a full model inference.
Hierarchical planning: break the task into independent subtasks, plan and execute each one separately, then combine the results. For example, a research agent might split "Compare the economies of France and Germany" into two parallel subtasks (one per country), each with its own ReAct loop. This can be faster (subtasks can run in parallel) and more modular, but requires a higher-level orchestrator to manage the subtasks and merge results.

In practice, most production agents use dynamic planning . The reason is simple: real-world tool calls are unpredictable. APIs fail, searches return irrelevant results, computations produce unexpected types. An agent that commits to a full plan upfront cannot handle any of these surprises. Dynamic planning, where the agent reasons after each action and decides what to do next, is inherently more resilient.

This connects to a broader trend in recent language models. Extended thinking (sometimes called "thinking tokens") gives models a dedicated reasoning phase where they can think extensively before producing an output. Models like o1, DeepSeek-R1, and Claude with extended thinking use this to produce higher-quality plans. The intuition is straightforward: the more tokens a model spends reasoning, the better its plan tends to be. For agents, this means the quality of each Thought step improves when the model is allowed to think longer, leading to fewer wasted actions and faster task completion overall.

💡 Chain-of-thought prompting (Wei et al., 2022) was the first clear demonstration that letting a model "show its work" dramatically improves reasoning accuracy. ReAct extends this idea from pure reasoning into the agent setting, where the model must interleave thinking with real-world actions.

Self-Correction and Error Recovery

Agents make mistakes. Tools return errors, searches find nothing useful, calculations go wrong. The question that separates a useful agent from a frustrating one is: can it recover?

The most studied recovery mechanism is Reflexion (Shinn et al., 2023) . After failing a task, a Reflexion agent doesn't just retry blindly. It first generates a natural-language reflection on what went wrong: "My search query was too specific and returned no results. I should try broader terms." or "I used the wrong formula. GDP per capita is GDP divided by population, not the other way around." This reflection is then added to the agent's context for the next attempt, giving it an explicit memory of past failures to avoid repeating them.

Beyond Reflexion, agents use several recovery strategies in practice:

Self-debugging: the agent runs code, reads the error traceback, and fixes the bug. This is the core loop of code-generation agents: write code, execute it, see the error, revise. Each error message is an observation that feeds back into the reasoning loop, often leading to a correct solution within 2-3 iterations.
Tool fallback: if one tool fails, try another. "The API returned a 429 rate limit error. Let me try searching the web for the same information instead." or "The calculator can't handle symbolic math. Let me write a Python snippet to compute this." This requires the agent to reason about tool capabilities, not just tool names.
Query reformulation: a search that returns no results often just needs a different query. The agent tries "population France 2024" instead of "current demographic statistics French Republic." The Thought step is where the agent notices the mismatch and tries a simpler query.
Human-in-the-loop: when the agent is truly stuck (ambiguous instructions, missing credentials, conflicting information), the best behaviour is to ask the user for help rather than guessing. An agent that says "I found conflicting GDP numbers from two sources. Which one should I trust?" is far more useful than one that silently picks the wrong number.

The common thread across all these strategies is that the agent must recognise failure . This sounds obvious, but it's where many agents break down. An agent that doesn't check whether a tool call succeeded will barrel ahead with garbage data, producing a confident but wrong answer. The best agents fail gracefully: they know when they're stuck and either try a different approach or ask for help, rather than hallucinating through the problem.

ReAct in Practice

Let's build a minimal ReAct loop from scratch. The code below simulates an agent with two mock tools (a search engine and a calculator) that answers multi-step questions by thinking, acting, and observing in a loop. No API calls or external libraries are needed; we simulate the model's decisions with a hardcoded trace to show the full mechanics of the loop.

# A minimal ReAct loop — pure Python, no external dependencies
# We simulate the LLM's decisions to show the Thought-Action-Observation cycle

class MockSearchTool:
    """Simulates a search engine with canned results."""
    def __init__(self):
        self.data = {
            "population of France 2024": "France population: 68.4 million (2024 estimate)",
            "GDP of France 2024": "France GDP: $3.13 trillion (2024 estimate)",
            "capital of France": "The capital of France is Paris.",
        }

    def search(self, query):
        for key, value in self.data.items():
            if key.lower() in query.lower():
                return value
        return "No results found."

class MockCalculator:
    """Evaluates simple arithmetic expressions."""
    def calculate(self, expression):
        try:
            result = eval(expression)  # safe here: we control the input
            return str(result)
        except Exception as e:
            return f"Error: {e}"

# --- The ReAct Loop ---
def react_loop(question, planned_steps, max_iterations=10):
    """
    Runs a ReAct loop with pre-planned steps (simulating LLM decisions).
    In a real system, each 'thought' and 'action' would come from the LLM.
    """
    search = MockSearchTool()
    calc = MockCalculator()
    tools = {"search": search.search, "calculate": calc.calculate}

    print(f"Question: {question}")
    print("=" * 60)

    observations = []
    for i, step in enumerate(planned_steps):
        if i >= max_iterations:
            print(f"\n[Max iterations ({max_iterations}) reached]")
            break

        # THOUGHT
        print(f"\nThought {i+1}: {step['thought']}")

        # Check if the agent decided to answer
        if "answer" in step:
            print(f"\nAnswer: {step['answer']}")
            return step["answer"]

        # ACTION
        tool_name = step["tool"]
        tool_arg = step["arg"]
        print(f"Action {i+1}:  {tool_name}({repr(tool_arg)})")

        # OBSERVATION
        if tool_name in tools:
            result = tools[tool_name](tool_arg)
        else:
            result = f"Error: tool '{tool_name}' not found"
        print(f"Observation {i+1}: {result}")
        observations.append(result)

    return None

# Simulate: "What is France's GDP per capita?"
planned_steps = [
    {
        "thought": "GDP per capita = GDP / population. I need both numbers. "
                   "Let me search for France's population first.",
        "tool": "search",
        "arg": "population of France 2024"
    },
    {
        "thought": "Got population: 68.4 million. Now I need France's GDP.",
        "tool": "search",
        "arg": "GDP of France 2024"
    },
    {
        "thought": "Got GDP: $3.13 trillion. Let me compute: "
                   "3.13e12 / 68.4e6 = GDP per capita.",
        "tool": "calculate",
        "arg": "3.13e12 / 68.4e6"
    },
    {
        "thought": "The calculation gives ~$45,760. I have enough to answer.",
        "answer": "France's GDP per capita is approximately $45,760."
    },
]

react_loop("What is France's GDP per capita?", planned_steps)

In a real system, the planned steps wouldn't be hardcoded. Instead, each Thought and Action would be generated by an LLM, with the full conversation history (including all previous Thoughts, Actions, and Observations) included in the prompt. The loop runs until the model decides to output a final Answer instead of another Action, or until a maximum iteration limit is hit.

Now let's see what happens when things go wrong. The next example shows an agent handling a failed search by reformulating its query, demonstrating the self-correction behaviour we discussed:

# ReAct with error recovery — the agent adapts when a search fails

class MockSearchToolV2:
    """Search that only matches exact queries — forces reformulation."""
    def __init__(self):
        self.data = {
            "population of France 2024": "68.4 million",
            "GDP of France 2024 USD": "$3.13 trillion",
        }

    def search(self, query):
        # Only exact substring match
        for key, value in self.data.items():
            if key.lower() in query.lower():
                return value
        return "No results found."

search = MockSearchToolV2()

# Simulate an agent that hits a dead end and recovers
steps_with_recovery = [
    {
        "thought": "I need France's GDP. Let me search for it.",
        "tool": "search",
        "arg": "French Republic gross domestic product"
    },
    {
        "thought": "That search returned nothing. The query was too formal. "
                   "Let me try simpler keywords.",
        "tool": "search",
        "arg": "GDP of France 2024 USD"
    },
    {
        "thought": "Got it: $3.13 trillion. Now I need population.",
        "tool": "search",
        "arg": "population of France 2024"
    },
    {
        "thought": "Population is 68.4 million. "
                   "GDP per capita = $3.13T / 68.4M = ~$45,760.",
        "answer": "France's GDP per capita is approximately $45,760."
    },
]

print("Question: What is France's GDP per capita?")
print("(This time the first search FAILS)\n")
print("=" * 60)

tools = {"search": search.search}
for i, step in enumerate(steps_with_recovery):
    print(f"\nThought {i+1}: {step['thought']}")
    if "answer" in step:
        print(f"\nAnswer: {step['answer']}")
        break
    result = tools[step["tool"]](step["arg"])
    print(f"Action {i+1}:  {step['tool']}({repr(step['arg'])})")
    print(f"Observation {i+1}: {result}")
    if result == "No results found.":
        print("  >> Agent notices failure — will reformulate next step")

The critical moment is between Observation 1 ("No results found") and Thought 2 ("The query was too formal. Let me try simpler keywords"). In a pure acting-without-reasoning agent, the model might repeat the same query, or give up, or hallucinate an answer. The explicit Thought step is where the agent diagnoses the problem and adjusts its strategy.

In production, ReAct loops encounter several common failure modes:

Infinite loops: the agent keeps calling the same tool with the same (or very similar) arguments, getting the same unhelpful result each time. This happens when the model's reasoning isn't strong enough to recognise the repetition. Mitigation: set a maximum iteration count (typically 5-15 steps) and track recent actions to detect cycles.
Hallucinated tools: the agent tries to call a tool that doesn't exist in its tool inventory. For example, it might emit email("user@example.com", "Here are your results") when no email tool was provided. Mitigation: validate each action against the tool list before execution, and return a clear error if the tool doesn't exist.
Premature stopping: the agent answers before gathering enough information, producing a plausible but incomplete or wrong response. This often happens when the model is too eager to appear helpful. Mitigation: include instructions in the system prompt that the agent should verify it has all required data before answering.
Tool misuse: the agent calls the right tool but with wrong arguments (passing a natural-language sentence to a calculator, or using the wrong parameter name for an API). Mitigation: provide clear tool descriptions with argument schemas, and validate inputs before execution.

A robust agent implementation addresses all of these: a maximum iteration limit prevents infinite loops, tool validation catches hallucinated tools and misused arguments, and explicit reasoning prompts encourage the agent to check its work before answering. The ReAct pattern doesn't eliminate these failure modes, but it makes them visible in the thought trace, which makes them far easier to detect, diagnose, and fix.

Quiz

Test your understanding of the ReAct pattern and agent reasoning loops.

What is the key difference between chain-of-thought (CoT) prompting and ReAct?

CoT uses larger models while ReAct uses smaller ones

CoT reasons but cannot take actions; ReAct interleaves reasoning with tool calls and observations

ReAct is faster because it skips the reasoning step

CoT works only on math problems while ReAct works on all tasks

Why does dynamic planning (reasoning after each action) tend to work better than sequential planning (full plan upfront) for agents?

Dynamic planning uses fewer tokens overall

Sequential planning cannot use tools

Dynamic planning lets the agent adapt when tool calls return unexpected results or errors

Sequential planning requires more training data

In the Reflexion framework, what does the agent do after failing a task?

It retrains its weights on the failed example

It generates a natural-language reflection on what went wrong, then retries with that reflection in context

It switches to a larger model for the retry

It asks the user to rephrase the original question

What is the primary purpose of the explicit 'Thought' steps in a ReAct loop?

To slow down the agent so it doesn't exceed rate limits

To generate training data for fine-tuning

To provide a reasoning scratchpad that enables planning, self-correction, and debuggable decision-making

To reduce the number of tool calls needed