iTranslated by AI
Capturing AI Agent Problem-Solving Outcomes in a Reusable Format
I want to reuse the steps executed by the agent
Suppose you ask an agent, "Extract articles matching specified keywords from the AWS What's New RSS feed."
The agent receives the request and returns the articles.
At this point, don't you feel like reusing the procedures the agent executed? Don't you want to script and reuse the steps through which the agent obtained the final result?
Thinking so, I tried a mechanism to record and reproduce the steps executed by the agent, and a mechanism to obtain the means by which the agent acquired the final result as an artifact.
The idea I tried

Mechanism to record AI work processes and extract them as reusable Python scripts
- Artifacts in Python scripts - Have the agent implement it as a Python script for a reusable format.
- Record the rationale for implementation - Have them record the rationale for why the final implementation became what it is.
- Tool: Python REPL only - To ensure the final result is obtained via a Python script, provide only the Python REPL to the agent. Have it solve the problem using only a tool that receives, executes, and returns results of Python scripts.
- Separate work into "Exploration" and "Final Function Acquisition" - Clearly divide the agent's tasks so the artifact doesn't include unnecessary implementations.
- Execution records from trace logs - To let the agent focus on exploration and final function acquisition, don't have the agent record itself; instead, retrieve records from trace logs provided by the SDK, etc.
- Clarify prompt roles - Describe only requirements (functional/non-functional) in the system prompt, and only the problem to be solved with the artifact in the user prompt.
The scenario I tried
I used a format where the system prompt contains "requirements (functional/non-functional)" and the user prompt contains the "specific problem to be solved."
Data prepared
- AWS What's New RSS feed (XML format)
System prompt (Excerpts of main elements)
- Available tools: python_repl only
- Data source location and format (RSS 2.0, description of each field)
- Expected output format (JSON)
- Specification of extraction conditions (period, keywords, deduplication, sort order, etc.)
- Finally obtain as a single reusable function
- Include rationale for design decisions as comments in the function
※ Refer to main.py for the actual system prompt.
User prompt (Problem)
I want to develop a function to automatically extract generative AI-related and Tokyo region-supported articles from the AWS news (XML format) for the second half of 2025.
Technology stack used
- AI agent implementation and trace recording: Strands Agents
- Destination for reproducing the agent's exploration process and artifacts: Jupyter Notebook
- Information extraction from trace records to Notebook: Custom implementation via Python script
Outputs obtained
-
Trace file (
trace.jsonl): A complete record including all thoughts, executions, and failures of the agent. -
Jupyter Notebook (
agent_replay.ipynb):- Execution history expressing the agent's "exploration" process chronologically in Python scripts.
- Complete implementation of the "final function" acquired by the agent, along with design decisions and rationale.
This was exactly what I wanted. Refer to agent_replay.ipynb for details. No human intervention was involved. The recording of the agent's execution results into the Jupyter Notebook was performed mechanically by a Python script.
Obtained execution history
How the agent proceeded with exploration
The agent reached the final implementation by taking the following steps:
- Confirmation of data source (number of files, size)
- Analysis of XML structure (element names, hierarchical structure)
- Investigation of date format (checking RFC 2822 format)
- Prototyping and verification of filtering logic
- Formatting into the final function and recording design decisions
Specific examples of exploration (excerpts)
For example, in the XML structure analysis phase, the following Python code was executed:
# Check the structure of the XML file
import xml.etree.ElementTree as ET
xml_file = xml_files[0]
tree = ET.parse(xml_file)
root = tree.getroot()
# Check the root element
print(f"Root tag: {root.tag}")
print(f"Root attribs: {root.attrib}")
# Check the first item to understand the structure
for item in root.findall('.//item')[:1]:
print("\nFirst item structure:")
for child in item:
text_preview = child.text[:100] if child.text else "(empty)"
print(f" <{child.tag}>: {text_preview}...")
Advantages of being written in Python
By recording the exploration process as Python code:
- No ambiguity: Actual executed code remains instead of natural language descriptions.
- Reproducible: Intermediate steps of the exploration can be re-executed as they are.
- Easy to verify: There is no room to doubt whether "these steps were actually used for confirmation."
- Helpful for understanding: You can read from the code how the agent thought and what it verified.
Refer to agent_replay.ipynb for the detailed exploration process.
The acquired final function
Implementation design decisions and rationale
In the Notebook, the design decisions and their rationale are recorded just before the final function. For example:
### Implementation design decisions and rationale
- Library selection: xml.etree.ElementTree (standard library, standard for RSS/XML parsing)
email.utils.parsedate_to_datetime (standard for RFC 2822 date parsing)
json (standard for output format)
- Data structure: dict/list (supports JSON serialization, memory efficiency)
Convert to list after deduplicating with set (guarantees order)
- Algorithm processing order:
1. Parse the entire XML file (supports multiple files)
2. Date filter (efficient narrowing down)
3. Keyword matching (topic AND region conditions)
4. Deduplication by guid
5. Sort by pubDate in descending order
- Error handling: Safely skip invalid date formats and empty elements
Skip the relevant file on XML parse error and continue processing
Function structure
The generated function has a structure like the following (excerpt):
def extract_aws_articles(
data_dir: Path,
from_date: datetime,
to_date: datetime,
topic_keywords: List[str],
region_keywords: List[str]
) -> Dict[str, Any]:
"""Filters and extracts AWS news articles from XML files"""
# Helper function: Keyword matching
def contains_keyword(text: str, keywords: List[str]) -> bool:
...
# Helper function: Date conversion
def parse_pub_date(date_str: str) -> tuple:
...
# Parse XML files
for xml_file in xml_files:
for item in root.findall('.//item'):
# Date filter, keyword matching, deduplication
...
# Sort and return results
return {"articles": [...], "summary": {...}}
Refer to agent_replay.ipynb for the detailed implementation.
How to execute the function
The Notebook also includes an execution example with initial values set to the parameters the agent used to obtain the final result:
# Parameter settings
data_dir_path = Path('data/01_raw')
from_date = datetime(2025, 7, 1, 0, 0, 0, tzinfo=timezone.utc)
to_date = datetime(2025, 12, 31, 23, 59, 59, tzinfo=timezone.utc)
topic_keywords = [
'Bedrock', 'SageMaker', 'Claude', 'LLM', 'foundation model', 'generative AI',
'embedding', 'fine-tuning', 'inference', 'RAG', 'prompt', 'model training',
'neural network', 'transformer', 'deep learning', 'machine learning'
]
region_keywords = [
'Tokyo', 'ap-northeast-1', 'Asia Pacific (Tokyo)', 'available in Tokyo',
'Tokyo region', 'all regions', 'all aws regions'
]
# Function execution
result = extract_aws_articles(
data_dir=data_dir_path,
from_date=from_date,
to_date=to_date,
topic_keywords=topic_keywords,
region_keywords=region_keywords
)
print(json.dumps(result, indent=2, ensure_ascii=False))
Since it is a Jupyter Notebook, you can execute this code as it is.
Advantages of obtaining the final function this way
- Proven code: You can obtain code that has already successfully solved the problem.
- Reusable without LLM: If it can be reused as is, you can solve the problem with only a Python execution environment.
- Logic transparency: You can clearly explain what kind of logic the final result presented by the agent was obtained with.
Summary and reflection
I find value in the fact that the solution the agent actually used for the problem is obtained in a reusable state as Python code. Obtaining not just the implementation but also the design rationale as a set was a result beyond my expectations.
By providing only the Python REPL as a tool, I obtained a clearer exploration history than expected. It was an unexpected gain to be able to explain and analyze the agent's problem-solving process. Since the exploration process remains clear, it might even be used as educational material for problem-solving.
Reproduction steps and details are on GitHub. If you want to see the actual Notebook, please check the repository.
GitHub Repository
Discussion