iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🧩

Analyzing Agent Work Design through Claude Code's Dynamic Workflows

に公開

I took a look at the prompts of the agents executed in Claude Code's Dynamic Workflows.

Initially, I was just curious to see "what kind of agents were running in the workflow."

However, what I actually found interesting was that it wasn't just a set of procedures, but the task design itself—how to get the agent to move the work forward.

How to differentiate perspectives.
How to narrow down the scope of the investigation.
What to report as findings.
How to respond when nothing is found.
How to format the Outcome so that it can be passed to subsequent tasks.

In this article, rather than reproducing the full text of the prompts displayed for each agent, I will organize points that seem useful for agent workflow design, based on the structure of Prompt / Activity / Outcome.

Dynamic Workflows themselves are introduced in the official Anthropic article, "Introducing dynamic workflows in Claude Code."

To summarize the official explanation loosely, it is a system where Claude assembles a workflow according to the task and breaks down the work into multiple sub-agents to proceed.

What I Looked At

The environment I checked was Claude Code v2.1.158 as of May 30, 2026.

The workflow I targeted was product-defect-audit.

The request I entered was to look for security and UI/UX defects in the product I am building.

This article is not intended to introduce the full text of the prompts or specific findings.

What I want to see is not the full text itself, but the structure that could be read from it.

  • How Prompt / Activity / Outcome are separated.
  • How the breakdown of phases and agents looks.
  • How the responsible perspectives and scope of investigation for each agent are specified.
  • How the reporting conditions are narrowed down.
  • How the Outcome is formed so it can be passed to subsequent work.
  • What can be brought back to CLAUDE.md or custom commands.

In conclusion, the prompts for each agent displayed in Dynamic Workflows looked closer to a "pattern of work" rather than mere instructions.

What I Saw in the Dynamic Workflows Screen

When you run the /workflows command in Claude Code, you can see a list of workflows created in that session.

For example, it is displayed as follows:

Dynamic workflows
  2 completed

    ✔ audit-findings-to-requirements  26 agents · 1.2m tok · 9m 2s
  ❯ ✔ product-defect-audit  61 agents · 2.4m tok · 10m 6s

  ↑/↓ to select · Enter to view · s to save · Esc to close

Choosing a workflow from here allows you to check the content executed in that task.

In the detail screen, I could see which agents ran in each phase, which models were used, the number of tokens, the number of tool executions, and the execution time.

product-defect-audit
Audit a product for security, UI/UX, and generation-quality defects...
61/61 agents · 10m05s · done

╭ Phases ──────────┬ Review · 9 agents ─────────────────────────────────────────╮
│ ❯ ✔ Review   9/9 │  ✔ find:sec-authz       Sonnet 4.6             118.4k tok │
│   ✔ Verify 52/52 │  ✔ find:sec-secrets     Sonnet 4.6                58k tok │
│                  │  ✔ find:ux-a11y         Sonnet 4.6             127.5k tok │
│                  │  ✔ find:gen-prompt      Opus 4.8 (1M context)   84.9k tok │
│                  │  ✔ find:gen-pipeline    Opus 4.8 (1M context)   84.6k tok │
╰──────────────────┴───────────────────────────────────────────────────────────╯

In other words, with Dynamic Workflows, you can check not only that the "workflow was executed," but also, as far as it is displayed, what work units were created inside and which model they were assigned to.

Personally, what I found most helpful was the content displayed when selecting an individual agent.

Roughly speaking, it consisted of a three-part structure:

Prompt · 20 lines
  You are auditing this codebase...
  FOCUS: Frontend & route-level security.
  Report each finding with file+line evidence.

Activity · last 3 of 50 tool calls
  Grep(...)
  Read(...)
  StructuredOutput

Outcome
  Four confirmed defects, each with concrete file+line evidence:
  ...

Prompt is the responsible perspective and investigation scope passed to that agent.

I was able to confirm this entirely.

Activity is the execution history of what tool calls were actually made.

However, only a portion of the end is visible, such as last 3 of 50 tool calls, so it did not seem possible to list all tool calls.

Therefore, I could not reconstruct the complete execution log from Activity. It was best to interpret it as a clue for seeing "what kind of work it was doing."

Outcome is the result finally returned by that agent.

In the example I saw this time, the detected defects were summarized here.

Instead of simply writing "there seems to be a problem," it was in a form close to an audit report containing information such as:

  • What the problem is.
  • Which file/line it is in.
  • What the severity or classification is.
  • Why it is a problem.
  • How to fix it.

In other words, the Outcome I saw this time was closer to a deliverable that could be passed to subsequent correction work rather than an agent's work log.

Because it was in this three-part structure, I felt it was easy to trace back "what was instructed, what kind of work was done, and what was returned."

What Was Interesting

What left a strong impression was that Dynamic Workflows leaves the decomposition process of work in a visible form, rather than just the "final answer."

Usually, when you ask an AI agent to do a job, you tend to judge its quality only by looking at the final answer.

However, with Dynamic Workflows, you can at least verify the following three things separately in the displayed area:

  • What prompt the agent was launched with.
  • What kind of work the agent performed.
  • What it ultimately returned as a deliverable.

This makes it easier to review the task design itself, not just the results.

For example, when an Outcome is not good, you can break it down in the following ways rather than simply saying "the AI's accuracy is low":

  • Was the focus of the Prompt vague?
  • Judging from the Activity, does it seem like sufficient investigation wasn't performed?
  • Was the way the Outcome was summarized poor?
  • Was the division of agents itself too coarse?

Of course, since the tool calls visible in Activity are only a portion, you cannot fully verify the execution content.

Even so, it is easier to think about where the failure might have occurred compared to just looking at the final answer.

Task Design Visible from Agent-Specific Prompts

The prompts I saw this time were not abstract work instructions but rather specific instructions tailored for agents.

There were several points that seemed useful when writing instructions for agents myself.

Specific Perspectives for Each Agent

Each agent was given a quite narrowed-down perspective.

For example, instead of looking at security in general in a vague way, perspectives were divided into things like authorization, secrets, input validation, and frontend.

This is close to how we often divide tasks in human reviews.

By dividing responsibilities by perspective rather than having one person look at everything, it becomes clearer what should be examined.

For agents as well, it seemed easier to operate with a request like "focus on authorization boundaries" than a broad request like "review the security."

Narrowed Investigation Scope

The prompts specified which areas to look at and from what perspective.

When this is vague, the agent looks at things broadly and superficially.

Conversely, when the investigation scope is narrowed, the Outcome tends to be more specific.

In the example this time, because the perspectives each agent should look at were divided, the Outcome was not just "a vague impression of the whole codebase," but a report focused on specific issues.

Strict Reporting Conditions

What was impressive was that the reporting conditions were written quite strictly.

It was instructed to return only explainable defects based on actual files and line numbers, rather than "potential concerns."

I think this is important.

Agent reviews can easily become mixed with general theories or possibilities if you are not careful.

"This implementation might be dangerous."
"This design could become a problem."
"Generally, you should do this."

Sometimes these suggestions are useful.

However, when you want to pass them to subsequent fix work as a defect audit, evidence is more important than conjecture.

In the prompts I saw this time, it appeared that this aspect was being controlled from the prompt side.

Allowing for "Empty Hits"

If nothing was found, the agents were allowed to return an empty result rather than forcing something out.

I thought this was subtly important as well.

In my experience asking agents for reviews so far, I feel that when you design it so that they "must find something," agents tend to create weak points.

They report things that aren't actually major as if they were defects.

Conversely, by allowing for empty hits, it becomes easier to steer them in the direction of "only reporting what can be said with evidence."

In this workflow, it appeared they were trying to increase the reliability of the Outcome in this way.

Outcome Is in a Fixable Form

The Outcome was not just a simple summary.

For found defects, it included what the problem was, where it was, why it was a problem, and even how to fix it.

This was very helpful.

When asking an agent for a review, I tend to just say "find problems."

But to actually connect it to the next step, a "list of problems" is not enough.

At the very least, having the following information makes it easier to enter the fix phase:

  • Target file
  • Target line
  • Description of the problem
  • Scope of impact
  • Severity
  • Classification
  • Fix plan

The Outcome I saw this time was in a form that is easy to pass on to subsequent fix work.

I wanted to incorporate this when writing CLAUDE.md or custom commands myself.

The Value of Reading Prompts

The value of reading the prompts for each agent displayed in Dynamic Workflows is not about "knowing exactly what Claude Code is doing behind the scenes."

Since Activity only shows a part of the process, it is not possible to fully reconstruct the internal execution process.

Nevertheless, by looking at the displayed Prompt / Activity / Outcome, you can observe how Claude Code tries to break down complex tasks.

In the prompts I saw this time, at least the following points were explicitly stated:

  • Agent's responsible perspective
  • Scope of investigation targets
  • Types of defects to find
  • Files or perspectives to read
  • Conditions for reporting
  • How to handle cases with no results
  • Format of the deliverable

This structure is not exclusive to Dynamic Workflows.

It can be applied as-is when writing CLAUDE.md or custom commands.

When writing instructions for agents, "what you want them to do" is not enough.

Rather, what is important is "how you want them to proceed."

Taking It Back to CLAUDE.md or Custom Commands

From the prompts I saw, the following structure seems worth incorporating into my own instructions:

  • Responsibility: What perspective is this agent looking from?
  • Scope: Which files or areas should it look at?
  • Perspective: What is considered a defect?
  • Evidence: Grounds required for reporting, such as files and line numbers.
  • Exclusions: Do not output conjectures or general theories.
  • Empty hits: Return empty if nothing is found.
  • Format: Return in a form that can be passed to subsequent tasks.

For example, if you are writing a custom command for a review, rather than just writing:

Review the security.

I thought it would be better to write it like this:

Focus your review on authorization boundaries.

Report only issues that can be explained with actual files and line numbers.
Do not include suggestions based only on conjecture or general theories.
If there are no relevant findings, return an empty result.

Include the problem description, target file, target line, impact, severity, and fix plan for each finding.

This is not the exact prompt I saw this time.

However, it is a pattern that I thought I could take back to my own instructions after seeing the agent-specific prompts displayed in Dynamic Workflows.

It means writing not just "what to look at," but also "what to accept as a finding," "what not to report," and "in what form to return it."

Even just doing this, I think agent output will change significantly.

Summary

This time, I looked at how work was broken down in Claude Code's Dynamic Workflows when I asked it to "look for security and UI/UX defects" in the product I am building.

The biggest takeaway was being able to see part of the flow leading up to the final findings, not just the findings themselves.

I was able to confirm which agents were launched, what prompts were passed, what perspectives were used to search for findings, and how the Outcome was summarized.

Of course, since the tool calls visible in Activity are only a portion, you cannot verify the entire execution process.

Still, by being able to see Prompt / Activity / Outcome separately, it becomes much easier to observe the agent's task design.

Dynamic Workflows is not only useful as a convenient feature but also interesting as educational material to observe how Claude Code decomposes complex tasks.

In the future, when I write CLAUDE.md or custom commands myself, I want to design them to include the responsible perspective, scope, reporting conditions, handling of empty hits, and the Outcome format, just as I saw this time.

Conclusion

I occasionally write on X about things I've tried with Claude Code and agent development.

Feel free to follow me if you'd like.

https://x.com/yamk_fuu_k

Discussion