iTranslated by AI
Fully Automated Approval Workflow System with AI Agents (Developed with AI-DLC) [Microsoft Agent Hackathon 2026]
Introduction
I usually design and develop AWS serverless architectures, explore the application of AI-driven development, and propose, design, and develop advanced AI utilization systems using tools like n8n and workflow platforms.
When the opportunity arose to work on an Azure-based project, the Microsoft Agent Hackathon 2026 was taking place. I decided to dive in and get some hands-on experience with Azure.
I focused not just on "AI" or "AI Agents," but on a three-layer architecture of AI × Workflow × Rule-based systems, prioritizing "how to drive practical business operations" over pure technical novelty.
What I Built: An Automated Internal Approval System (Ringi)
In Japanese companies, drafting and checking approval documents (Ringi) takes an average of 1 to 2 hours per request, and the approval flow often becomes stagnant due to personal dependencies and multi-stage processes.
There is significant room for improvement here, as evidenced by cases like Miyazaki Bank, which used Azure OpenAI to automate Ringi document generation and reduced the process time by 95% (from 2 hours to a few minutes).
However, simply automating document creation or providing AI-based review assistance does not significantly change the "total time to approval."
Ultimately, human interaction with approvers and waiting times remain bottlenecks.
What I built is an automation of the "entire review and approval flow," including those bottlenecks.
Requests that can be rejected by rules are sent back immediately, low-risk requests are auto-approved, and only the others are routed to humans. By incorporating this branching as a workflow, I reduced the actual number of cases requiring human approval waiting time.
Demo Video for the Hackathon
Flow
Applicant chats with AI → AI automatically generates the Ringi document
→ Rule engine checks format, content, and policies
→ AI reviews content validity
→ If conditions are met, auto-approve; otherwise, notify the approver
→ Approver approves/rejects from the screen
→ Email notification to the applicant
👀 From application to approval completion, everything except the approver's final decision is fully automated.
Capabilities by Role
Applicant
- Ringi documents are automatically generated just by chatting.
- The generated draft can be reviewed and corrected via a form.
- After applying, the progress can be tracked in real-time on a status screen.
- Notifications for rejections, approvals, or denials are sent via email.
Approver
- Direct access to the approval screen via the URL sent in the email.
- A comprehensive view of the application content, rule evaluation results, AI review reasoning, and concerns.
- The flow is completed simply by inputting approval/rejection and comments.
Administrator
- Ability to view a list of statuses for all applications.
- Audit logs (who made what decision and when) can be tracked chronologically for each application.
Why AI Alone Wasn't Enough
This is the core of this design.
At first, I thought, "Why not let AI do everything?"
But as I worked through the design, three problems became apparent.
Problem 1: AI judgment accuracy is too "probabilistic"
When you have AI decide things like "whether a required field is empty" or "whether the amount is 100,000 yen or less," it makes mistakes occasionally. It is 100% more accurate to write clear rules in code.
Problem 2: It's hard to trace decisions with only AI Agents
It becomes difficult to investigate later "why this request was rejected" or "at which step and by whom the decision was made." It also carries the risk of hallucinations and does not meet the level required for audit trails.
Problem 3: The trade-off between automation level and control
If you rely only on AI, you tend to get stuck with a binary choice: "let the AI decide everything" or "send everything to humans." In reality, you want a separation where "obviously OK requests are auto-approved, while gray zones are judged by humans."
Solution: A Three-Layer Design (AI × WF × Rule-based)
[Applicant]
↓
[Rule Engine] ← Mechanically judges format, content, policy, and risk (0 or 1)
↓
[AI Review] ← Evaluates the validity of qualitative content (probabilistic)
↓
[Workflow] ← Branching, notifications, and state management based on results (deterministic)
↓
[Human (Approver)] ← Final decision (referencing AI's review reasons)
- Things with clear rules → Rule engine (Python)
- Things requiring qualitative judgment → AI (Azure OpenAI)
- State management, notifications, routing → Workflow (Azure Logic Apps)
By using these three layers:
✅ Rule-based parts are 100% accurate
✅ AI judgment is used as "support," and humans make the final decision
✅ Every step is traceable via Logic Apps execution logs + audit tables
I have organized specifically which requests go through which routes.
I set up simple rules and an application flow for the hackathon.
Route Map
| Condition | Rule Engine Result | AI Review Result | Final Decision |
|---|---|---|---|
| Required field missing | REJECTED | (Not called) | Immediate rejection |
| Amount negative or 0 yen | REJECTED | (Not called) | Immediate rejection |
| Business trip request over 300k yen | REJECTED | (Not called) | Immediate rejection |
| Purchase request but no vendor name | REJECTED | (Not called) | Immediate rejection |
| Amount ≤ 100k, no risk, no new vendor | APPROVED (Auto-approve) | Approval recommended or Review needed | Auto-approve (Section Chief level) |
| Amount 100k–1M, no risk | PENDING_REVIEW | Approval recommended | Notify Dept. Manager → Human judgment |
| Amount over 1M | PENDING_REVIEW | Review needed | Notify Executive → Human judgment |
| New vendor registration (any amount) | PENDING_REVIEW | Review needed | Notify Executive → Human judgment |
| Emergency request (within 7 days), high amount, sensitive keywords | PENDING_REVIEW | Review needed | Notify approver (with 'Attention' flag) |
| Vague description of purpose/effects | APPROVED or PENDING_REVIEW | Rejection recommended | Reject via AI judgment |
Types of Risk Flags
There are 6 types of risk flags detected by the rule engine.
If there is even one flag, it will not be auto-approved and will enter the notification route to the approver.
| Flag Name | Condition |
|---|---|
HIGH_AMOUNT |
Amount over 1 million yen |
NEW_VENDOR |
Application category is new vendor registration |
VENDOR_REASON_MISSING |
No vendor selection reason & amount over 500k yen |
URGENT_REQUEST |
Scheduled date is within 7 days from application date |
NO_ATTACHMENT |
No attached documents & amount over 300k yen |
SENSITIVE_KEYWORD |
Subject/Purpose contains "reimbursement," "private expense," "personal," or "entertainment expenses" |
Approver Routing
Approvers are automatically determined based on the amount.
Up to 100,000 yen → Section Chief (Auto-approval candidate)
Up to 1,000,000 yen → Dept. Manager
Over 1,000,000 yen → Executive
New Vendor → Executive (Regardless of amount)
Architecture Diagram

Technology Stack
| Layer | Technology |
|---|---|
| Frontend/Backend | Python Flask |
| AI Interaction | Azure AI Agent Service |
| AI Review/Generation | Azure OpenAI |
| Rule Engine | Azure Functions |
| Workflow | Azure Logic Apps |
| Data Persistence | Azure Table Storage |
| Secrets | Azure Key Vault |
| Monitoring | Application Insights + Log Analytics |
Operational Example Screenshots
Chat Screen (Collecting Application Information)


AI Auto-Rejection (Email Notification)

Approver Screen (Displaying AI Review Results and Decision Reasons)

Administrator Screen (Audit Log)

Logic Apps Workflow Execution Log

Implementation Details and Tips
Tip 1: Design the Rule Engine using MECE
I divided Ringi document checks into four independent categories.
RuleEvaluator
├─ FormChecker Format check (required fields, character counts, date formats)
├─ ContentChecker Content check (logical consistency for each application category)
├─ PolicyChecker Policy check (approver routing based on amount)
└─ RiskDetector Risk detection (high amount, emergency, new vendor, sensitive keywords)
Each checker is implemented as a pure function to make it easy to test.
# Good design: Depends only on arguments, no side effects
def check_form(doc: RingiDocument) -> CheckResult:
messages = []
if not doc.ApplicationDepartment:
messages.append("Application department is required")
status = CheckerStatus.FAIL if messages else CheckerStatus.PASS
return CheckResult(checker="form", status=status, messages=messages)
Rule thresholds are centralized in rules_config.py, allowing business rules to be changed without touching the code.
# rules_config.py
AUTO_APPROVAL_MAX_AMOUNT = 100_000 # Auto-approval candidate for 100k yen or less
CATEGORY_AMOUNT_LIMITS = {
"Business Trip Request": 300_000, # Business trips capped at 300k yen
}
Tip 2: Use Structured Outputs for Deterministic Document Generation
I used Structured Outputs from Azure OpenAI so the JSON generated for the Ringi document doesn't change format every time.
response = client.chat.completions.create(
model=deployment_name,
messages=[...],
response_format={
"type": "json_schema",
"json_schema": RINGI_SCHEMA # Forces fixed JSON schema
}
)
This eliminates the problem where the JSON structure returned by the AI varies.
Tip 3: Implement AI Completion Detection with JSON Flags
Determining "whether all necessary information has been collected" in chat is surprisingly difficult.
I included the following in the system prompt:
If you determine that all necessary information has been collected, you must return ONLY the following JSON:
{"is_complete": true, "summary": "Summary of collected information (in Japanese)"}
Return other normal responses in plain text.
The server side attempts to parse the JSON, and if is_complete == true, it judges the collection as complete. It's simple but worked quite stably.
Tip 4: Display AI Decision Reasons on the Approver Screen
"I approved it because the AI said so" is not sufficient for auditing. The approver screen displays all of the AI's review results (recommendation), decision reasons (reasoning), and concerns.
{
"recommendation": "Review needed",
"reasoning": "The amount exceeds 500,000 yen, and there is no description of the vendor selection reason. HIGH_AMOUNT and VENDOR_REASON_MISSING flags have been detected.",
"confidence": 0.72,
"concerns": ["No vendor selection reason", "High amount scale"]
}
The approver can make a final judgment while seeing "why the AI decided that." This is the implementation of "AI explainability."
Tip 5: Use Idempotency Guard to Prevent Duplicate Execution
Logic Apps retries can cause the same Ringi document ID to be sent to the rule engine multiple times.
I cache the result in Table Storage using ringi_id as the key, so if the same request arrives, I skip evaluation and return the same result.
def get_cached_result(ringi_id: str) -> dict | None:
entity = table.get_entity("rule_evaluation", ringi_id)
return json.loads(entity["result_json"]) if entity else None
Tip 6: Prompt Loading on Demand
I designed it to load prompts from files each time.
def _load_prompt(filename: str) -> str:
prompt_path = os.path.join(os.path.dirname(__file__), "prompts", filename)
with open(prompt_path, encoding="utf-8") as f:
return f.read()
Prompts can be improved without deployment. Loading local files takes less than 1ms, so there's no performance impact.
Tip 7: No secrets in code with Managed Identity + Key Vault Reference
All secrets (API keys, connection strings, etc.) are centralized in Azure Key Vault and referenced from App Settings via KV Reference.
// Configure KV Reference with IaC
{
name: 'AZURE_OPENAI_API_KEY'
value: '@Microsoft.KeyVault(SecretUri=https://kv-bakuchiku-dev.vault.azure.net/secrets/aoai-api-key/)'
}
There is no longer any need to write secrets in the code.
Trouble Points
AI Foundry RBAC
Even after attaching the Azure AI Developer role to the Azure Functions Managed Identity, I could not call the Agents API. Correctly, the Foundry User role (53ca6127-db72-4b80-b1b0-d745d6d5456d) was required at the AI Foundry Project scope.
resource aiFuncFoundryUserRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(aiProjectId, aiFuncId, 'FoundryUser')
scope: aiProject
properties: {
roleDefinitionId: '/providers/Microsoft.Authorization/roleDefinitions/53ca6127-db72-4b80-b1b0-d745d6d5456d'
principalId: aiFunc.identity.principalId
}
}
It was not written in the documentation, so it took a lot of time.
Logic Apps Webhook Pattern
I didn't know at first when the WaitForApproval (HTTPWebhook) subscribe request would be sent. I designed it to prepare an /approvals/register-callback endpoint on the WebApp and save the received callback_url to Table Storage.
Trade-offs Made (Hackathon Scope)
There are parts I intentionally decided to cut due to hackathon constraints. These are points I would like to fix for production grade.
Authentication/Security
- User authentication is handled by 5 users hardcoded in
config.py(ideally Azure AD / Entra ID integration). - Stack traces are returned as-is in error responses (prioritizing debug efficiency; remove in production).
AI/Workflow
- No AI API retries (returns an error immediately on timeout).
- Logic Apps email connector authentication uses OAuth2 (dependent on a personal Google account; use SendGrid or a shared account in production).
Testing
- E2E workflow testing (pytest) procedures are documented, to be executed after connecting to the Azure environment.
- Property-based testing close to production environments is out of scope for this.
Operations
- Approver email mapping is statically defined in Logic Apps parameters (would like to make it dynamic via DB or settings screen in production).
- No monitoring alerts (checking via Application Insights as needed).
Future Expansion Vision
Improving Accuracy
- Incorporate past request data into RAG to let the AI refer to "similar cases often have these judgments."
- Add a settings screen to customize AI review prompts.
- Add branching to require human review if the AI judgment confidence score is low.
Improving Application Experience
- Add a "pre-consultation" mode before chatting to check "Is this request likely to pass?"
- A feature to pull similar content from past request history to help complete drafts.
Horizontal Expansion to Other Tasks
- I believe the same three-layer structure can be applied to "expense settlement exception requests," "hiring approval," "IT system change requests," etc., in addition to Ringi documents.
Summary
The biggest takeaway this time is the importance of a "design that doesn't try to let AI do everything."
- Things with clear rules → Rule engine (100% accurate, easy to trace)
- Qualitative judgment → AI (flexible but probabilistic, always visualize reasoning)
- State management/orchestration → Workflow (Logic Apps)
With these three layers, it became a system where humans can trace "why this judgment was made" at any time while maintaining a high level of automation.
I believe that by reusing the skeleton of these three layers and simply plugging in company-specific rules and judgment criteria, it can be applied to various tasks. Credit risk rules for finance, procurement policies for manufacturing, change management processes for IT services, etc. Since the flow structure can be reused just by rewriting the thresholds in rules_config.py and the AI prompts, I feel it has significant versatility.
Discussion