iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🧩

Which Business Tasks Actually Require ReAct Agents?

に公開

Term Update (2026-04-30): To align with the AAP repository, two of the four quadrant names have been renamed: (2) Classical AI Quadrant → (2) Algorithmic Search Quadrant, and (4) ReAct Quadrant → (4) Autonomous Agentic Loop Quadrant. The Scripting Quadrant and LLM Workflow Quadrant remain unchanged. While the terminology used in the body has been replaced with the new names, references to the ReAct pattern (Yao et al. 2022) and the ReAct loop are preserved.

A Sense of Discomfort

As someone building AI agents, I find that a persistent sense of discomfort lingers when I browse the READMEs of current agent products.

"Run on a $5 VPS." "Spawn isolated subagents." "Self-improving." "Cron scheduling, running unattended." "Voice memo transcription via Telegram."

The vocabulary is entirely centered on demos. Not a single word of business vocabulary appears. Audit. Approval workflows. Role-based access control. Change management. SLA. DR. These are terms that any practical deployment requires, yet they are absent from the READMEs of leading agent products.

I felt that these were not designed with production operations in mind. It seems as though production is not even on their radar.

At first, I thought, "Perhaps I’m just biased toward business-oriented perspectives." But that wasn't it. I felt the source of this discomfort lay elsewhere. Most current agent products rely too heavily on the assumption that the ReAct autonomous loop is the essence of an agent. However, when applying AI to business, the domain where ReAct agents are genuinely necessary turns out to be extremely narrow, once you actually implement them.

Premise: How ReAct Works

First, let's clarify ReAct.

ReAct is a way of operating LLM agents proposed by Yao et al. in 2022. The paper is titled "ReAct: Synergizing Reasoning and Acting in Language Models" (arXiv:2210.03629). It repeats three elements as one set:

  1. Thought: The LLM thinks in language about how to interpret the current situation and what to do next.
  2. Action: It calls external tools such as search, browsers, or file operations.
  3. Observation: It reads the results returned by the tools.

This loop continues until the LLM itself decides that it has sufficient information. The key is that the LLM decides its own "next action" every turn. Since procedures are not pre-written, the number of tool calls, which tools to use, and when to finish are all determined at runtime.

Note that the paper evaluated four types of tasks: HotpotQA (open-domain QA), Fever (fact verification), ALFWorld (interactive exploration in home environments), and WebShop (open-ended product search). All of these are tasks that require exploration in unknown environments or open-ended information integration; application to business workflows was not discussed in the paper.

This is the power of ReAct, and also its weight. Because it is dynamically determined, most applications in business do not require it. Let's look at this breakdown using four quadrants.

Viewing Business AI via Four Quadrants

When introducing AI into a business, the nature of the work can be divided into four quadrants using two axes.

  • Horizontal axis: Processing can be written with deterministic logic / Requires semantic judgment (= LLM not needed / LLM needed)
  • Vertical axis: Workflow is predetermined / Exploratory (= The path is decided by human-written code beforehand / The model dynamically decides at runtime)

The expression on the vertical axis can be mapped to Anthropic terminology. The phrase "predetermined code paths vs LLM dynamically directs its own process" used in "Building Effective Agents" (2024) is essentially the same concept.

Workflow Predeterminable Exploratory
Can be written deterministically (1) Scripting Quadrant / pipeline (2) Algorithmic Search Quadrant (Outside the scope of this article)
Requires semantic judgment (3) LLM Workflow Quadrant
(3a) Dialogue → Specialized chat agent
(3b) Batch → Single-function LLM function
(4) Autonomous Agentic Loop Quadrant (= ReAct agent)

(2) is the domain of Algorithmic Search / OR (Operations Research). Delivery route optimization, production scheduling, and combinatorial optimization fall here. These have historically been solved with A* search, dynamic programming, Monte Carlo Tree Search, and reinforcement learning. Since these are not problems that require LLMs, I will exclude them from the scope of this article. I will look at (1), (3), and (4) in order.

(1) Deterministic × Predeterminable — Scripting is Sufficient

Form transcription. Data formatting. Lookups. Verification. LLMs are not even needed here. Scripting and workflow engines are sufficient. There is no reason to introduce AI in this domain.

(3) Semantic Judgment × Predeterminable — Workflow + LLM Functions are Sufficient

This is the main battlefield for business AI. Anthropic outlines five workflow patterns corresponding to this domain in "Building Effective Agents": prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer. OpenAI also addresses the same domain in "A Practical Guide to Building Agents" (2025), presenting the "manager pattern" and "decentralized pattern."

Common to these is the fact that the path (what to do in what order) is determined beforehand, and the LLM is called as one step within that path. The LLM itself does not decide the next action.

Because I/O modality differs by business, (3) is further divided into variants. In this article, I will look at dialogue-based and batch-based forms.

Dialogue-based

Legal consultation, diagnostic assistance, internal FAQs, and expert knowledge support. Businesses that consist entirely of decision-making.

I am skeptical about whether autonomous agents are necessary here. In many cases, an expert chat agent with specialized knowledge is sufficient. This is my intuition. RAG + system prompts + (if necessary, stateful) LLM calls are enough. The human bears the final judgment, while the AI performs knowledge retrieval and synthesis. At the very least, this division of labor appears sufficient for many scenarios.

However, there is a spectrum within dialogue-based tasks. Simple FAQs can be handled by a single LLM call. Legal consultations that require narrowing down conditions over multiple turns or diagnostic assistance that proceeds while calling tools change the narrative. Situations where Anthropic's workflow patterns (prompt chaining, routing, orchestrator-workers) are effective increase. Even so, whether the loop where the LLM itself determines the next action (ReAct) is necessary remains a separate issue. This is because, for most decision-making tasks, the human who is the primary decision-maker deciding the next action is sufficient.

Tasks consisting solely of decision-making might be where autonomous agents are not needed at all. This is counterintuitive to me.

It is easy to conflate decision-making = thinking = agents. However, much of the thinking in decision-making tasks resembles knowledge retrieval and organization, which seems distinct from the type of reasoning that requires an agentic loop.

Batch-based

Invoice matching. Ticket routing. Address normalization. Threshold judgment. Patterns where semantic judgment is interspersed within a deterministic pipeline.

ReAct is not needed here either. A deterministic pipeline controls the flow, and a single-function LLM function is called at exception points. The output of the LLM function is probabilistically volatile—the exact same output is not returned for the same input every time—but the role it fulfills as a function remains fixed. The shape of "receiving a defined input and returning a judgment in a defined schema" does not change. The pipeline knows what to do next.

Example: Invoice Matching

Consider a process where an invoice is matched against a purchase order (PO) to decide whether to approve, send back, or request review. 80% can be processed mechanically with deterministic rules.

def process_invoice(invoice: Invoice) -> Action:
    po = lookup_po(invoice.po_id)              # Deterministic: PO lookup
    if po is None:
        return Action.REJECT_NO_PO
    if invoice.date > po.expiry:               # Deterministic: Expiry check
        return Action.REJECT_EXPIRED
    if is_duplicate(invoice):                  # Deterministic: Duplicate check
        return Action.REJECT_DUPLICATE

    if abs(invoice.amount - po.amount) / po.amount <= 0.01:
        return Action.APPROVE                  # Deterministic: Approve if amount within 1% error

    # Semantic judgment zone from here
    # Amount does not match, but might be substantially consistent due to line item expression differences
    verdict = match_line_items(invoice.lines, po.lines)  # ← Single-function LLM function
    if verdict == "MATCH":
        return Action.APPROVE
    elif verdict == "PARTIAL":
        return Action.ESCALATE_FOR_HUMAN_REVIEW
    else:
        return Action.REJECT_AMOUNT_MISMATCH

The core of process_invoice is a deterministic pipeline, and all judgments (PO existence / expiry / duplicate / amount) can be written as rules. The only point where semantic judgment is required is: "The amounts don't match, but are they substantially consistent despite differences in line item expressions?" There, we call a single-function LLM function match_line_items(invoice_lines, po_lines) -> Verdict.

This function does nothing but determine whether "two line items are semantically corresponding"; it holds no other responsibilities. The prompt is a simple instruction: "Compare the line items of the invoice and the PO, and determine if they correspond in content even if the expressions differ. Output MATCH, PARTIAL, or NO_MATCH." The LLM returns the judgment according to the schema. If you pass an input, it returns a judgment (since the output itself is probabilistic, it may fluctuate rarely). However, there is no element where the LLM itself decides what to do next. The calling pipeline already knows what to do next.

The difference from the ReAct loop is clear. The LLM is not the agent that "thinks → chooses a tool → observes the result → thinks again," but rather a component that just returns "input → judgment" as one step in a pipeline.

What this structure implies

In the world of business automation, there is a structure that has been known for a long time. There is a tendency for any business to have 80% that can be written with deterministic rules and 20% of exceptions that do not fit. This 20% becomes a bottleneck for deployment. For many years, people tried to solve this with "more complex rules," "machine learning classifiers," or "natural language processing add-ons," but they did not solve the essence. The moment an LLM enters as a single-function component here, it changes into a problem that can be solved.

There is a point I want to emphasize here. This exception judgment is a role that humans were doing manually to begin with. Humans did not always make judgments based on the exact same criteria every time either. Even when looking at the same invoice, the judgment fluctuated depending on the situation of the day or the person in charge. Even if there was a manual, in the end, it was left to human probabilistic interpretation. Exception judgment was essentially a probabilistic job.

LLM functions perform exactly the same role. They simply undertake probabilistic judgments with a mechanism that fluctuates. Because an area that does not require complete determinism is fine, the probabilistic nature of LLMs is not an essential obstacle. On the contrary, it appears to be a tool suited for this, in the sense that it can "take over what humans were doing probabilistically, with judgment quality equivalent to humans, and at a low cost." The counterargument that "LLMs are probabilistic and therefore unsuitable for business" likely overestimates what human business judgment was to begin with.

Crucially, a "general-purpose agent" is not needed. One single-function function per category is enough. If there are 50 categories, you have 50 functions. It does not need to be able to do everything generally.

(4) Semantic Judgment × Exploratory — Autonomous Agentic Loop Quadrant (= Proper Domain for ReAct Agents)

ReAct loops (Thought → Action → Observation) are necessary for tasks where the workflow cannot be determined in advance and the agent itself must judge the next action.

  • Coding (agent decides where to fix, how to test)
  • Exploratory tasks for automated browser operations (the target of the operation is dynamic)
  • Deep Research (the branching of information cannot be predicted in advance)

Unless the LLM chooses the next action itself, it cannot proceed. In terms of both research and practice, ReAct can be said to be a technique designed for this quadrant. The tasks evaluated in the Yao et al. paper (HotpotQA / Fever / ALFWorld / WebShop) all belong to this quadrant.

Categorical Error — The Ecosystem is Bringing (4) into All Quadrants

I should note upfront that in production implementations, hybrid patterns frequently appear on the boundary between (3) and (4). Plan-and-Execute (planning is done in a (4)-like manner, while execution follows deterministic (3) patterns), Router agents (LLMs make only the decision of "which branch to flow into" within a (3) workflow), and tiered handoff (first handling with (3), and escalating to (4) only when necessary). These can be read as design principles of "using (3) as a base and limiting the use of (4) to where it is needed" — which is an extension of the argument in this article.

The problem lies elsewhere. The buzz in the current agent ecosystem is attempting to bring the (4) architecture into every quadrant at all times. This is nothing but a categorical error — a type of mistake where things of different natures are treated as the same species.

Specifically, the following phenomena are observed:

  • Implementing customer support with autonomous agents, when most could be handled by (3) dialogue-based specialized chat agents.
  • Implementing sales support with multi-tool agents, when most could be handled by (3) batch-based single-function LLM functions.
  • Implementing advanced business automation with a ReAct base, when a (3) deterministic pipeline + LLM function would suffice.
  • Selling internal assistants as autonomous agents, when (3) chat agents would suffice.

To rephrase the argument: an architecture that assumes workflows cannot be predetermined is being brought into businesses where workflows can be predetermined.

This phenomenon is starting to be recognized in the industry. Thoughtworks criticizes this trend with the term "agentwashing." Gartner predicts that over 40% of agentic AI projects will be canceled by 2027. Anthropic itself stated in "Building Effective Agents" that "This might mean not building agentic systems at all," suggesting that you should not build an agent if a simple solution suffices. The 4 quadrants in this article re-cast this industry consensus from a business perspective.

Furthermore, I feel there is a marketing-side issue behind the mass production of these categorical errors. The hype around LLMs is premised on the idea that "the agent thinks." The vocabulary for (3)'s unassuming chat agents + deterministic pipelines does not carry press buzz. "Autonomous!" and "Self-improving!" are easier to sell. Therefore, marketing lumps all business together using the vocabulary of the (4) quadrant. The result is a structure where categorical errors occur in the field, with (4) architectures being superimposed onto (3) business processes.

As a result, the following occurs on the accountability side:

  • Unnecessary autonomy creates ambiguity in responsibility.
  • Unnecessary loops inflate costs.
  • Unnecessary black boxes destroy auditability and accountability.

And on the technical quality side, there is the issue of necessity. Since the workflows for (3) tasks are determined in advance, there is no technical reason to introduce the degree of freedom found in a ReAct loop. There is no necessity to cover a point-in-time semantic judgment with a mechanism that autonomously chooses the next action.

Accountability Framework Reorganized

When adopting the (3) architecture, the discussion surrounding accountability becomes much clearer.

  • Input, output, and judgment content are explicitly stated for each LLM invocation.
  • "What was done next" is fully traceable through pipeline logs.
  • The ambiguity of responsibility caused by an agent's autonomy disappears.
  • LLM usage is scoped as a product, and the deployer remains the responsible entity.
  • It aligns with the product liability model.

This does not conflict with current legal systems (as detailed in another article: Can we trace causality after an accident?). Single-function pipelines and specialized chat agents keep the responsibility with the human (the deployer), meaning no special legal status for AI is required.

It is impossible to introduce an agent as a "responsible entity" into a legal system that treats killing a pet as "property damage" (while special animal welfare laws exist, they do not grant independent legal personhood to animals). The architecture in quadrant (3) is inherently consistent with this legal reality from the start.

In quadrant (4) — the legitimate domain for ReAct agents — accountability issues arise more profoundly. However, this is a matter for the small minority of areas where autonomy is truly essential, not for business as a whole. Discussing all business using the vocabulary of (4) is, in itself, the root of the error.

Implementation Evidence: When to Use and When Not to Use ReAct Agents

To support the discussion so far, I will share some implementation experience from both quadrants.

Cases Where I Used ReAct Agents in (4)

In the past, I worked at a site where the official software knowledge base was so massive that humans had difficulty exploring the information they needed. I built a Copilot that accepted user questions and constructed optimal answers while exploring the knowledge space.

It worked powerfully—almost laughably so. Its behavior was identical to a Deep Research-style exploration agent, a mechanism that builds answers while repeatedly calling search tools. The foundation of the implementation was the ReAct pattern (Thought → Action [calling search tool] → Observation → Thought...) I learned in a Coursera prompt engineering course. I simply took that structure I encountered in the course and applied it to the context of knowledge exploration. The task of "exploring an unknown knowledge space to find an answer" is the definition of the (4) quadrant. What to search for next is not predetermined. It was necessary for the LLM to see the previous Observation and decide on the next Action.

However, the flip side of being too powerful was that I felt a lack of controllability. While the loop was running toward its goal, it was impossible to predict in advance what path the LLM would take to call tools. You don't know how many turns it will take. When the branching exploded halfway through, I had the sensation that the energy required to reach the goal was approaching an uncontrollable runaway state. It worked when it worked, but I felt the predictability of its operation was low. I believe this is the fundamental nature of the (4) quadrant. If you bring that nature into a business as is, you will naturally be hit by cost and accountability issues.

I know the power of ReAct agents. Knowing that, I have judged that (4) is limited when applied to business.

Cases Where I Did Not Use ReAct Agents in (3)

Contemplative Agent, which I have published, is on the opposite side of this. It does not use the ReAct loop at all.

Contemplative Agent generates output based on a given constitution, skills, rules, and identity. Its essence lies in a generic structure that can take on any norms, roles, and skill definitions. Each step of the generation pipeline is ordered in a predetermined sequence. Each acts as a single-function LLM function that returns a judgment based on a defined schema for defined inputs. While the LLM output itself fluctuates probabilistically, the pipeline, not the LLM, decides which step to execute and what to do next. In terms of positioning, it falls under (3) — Semantic Judgment × Predetermined — batch-type.

There was never a scenario where I considered ReAct for CA. When operating CA, the only issue is "what kind of comment to make based on what criteria." Since what to do next is decided in advance, there is no room to run a ReAct loop. The same applies to the distillation pipeline; it would be a problem if it were processed through different routes every time. While the LLM's own judgment fluctuates probabilistically, it is an operational requirement that the specific steps and judgments applied remain fixed.

So it wasn't even a choice between (4) or (3) in the 4-quadrant framework. Given the nature of the business, (3) was the only choice. The 4-quadrant framework in this article is close to a post-facto verbalization of these implementation-level givens. It wasn't that I had a framework and then implemented; it was that after I implemented, the framework was there. That was the order.

Dividing the Scope of Application

Use ReAct agents for tasks that require ReAct agents ((4)), and do not use them for tasks that do not ((3)). My argument is not a denial due to ignorance of ReAct agents, but rather a desire to narrow down their application domain after having understood them. The hole that the current agent ecosystem is falling into is "bringing (4) tools into all quadrants," not that "ReAct agents themselves are bad."

Conclusion

I realized that if you start from ReAct agents when bringing AI into a business, the choice of quadrants becomes invisible.

First, triage the business. If it is a judgment-based task, a specialized chat agent ((3) dialogue-type) is often sufficient. If it is exception handling, a single-function LLM function + deterministic pipeline ((3) batch-type) seems sufficient. If it is classical optimization, it is a problem for classical AI/OR ((2)), where LLMs have no role. ReAct agents are only needed for exploratory tasks where the workflow cannot be predetermined ((4)).

I feel that the majority of tasks targeted by the current agent ecosystem lie in quadrant (3), not (4). Having compared the experience of implementing in (4) with the experience of implementing in (3), that impression has not wavered.

Which business really needs ReAct agents? I have come to believe that if you start from this question, the architectural options will come into view. Conversely, if you skip this question and start from "let's do everything with agents," you will inevitably end up in a categorical error of covering (3) tasks with (4) architectures.

Key Technical Sources

Industry Criticism and Forecasts

  • Contemplative Agent — An implementation corresponding to (3) in this article. A deterministic pipeline structure that does not use a ReAct loop.
  • Agent Attribution Practice (AAP) — A research repo handling agent responsibility entities and attribution.
GitHubで編集を提案

Discussion