🤖
AgentCLIを作る方法

2025/09/15に公開
テーマ「AI駆動開発でやったこと」
Agent CLI使っていますか？Claude Code Codex Gemini CLIと世界はターミナルの世界に回帰して激戦区になっていますね。

Claude Code

エディタ連携や対話主導のコーディング体験で知られる系統です。CLI/IDE拡張の形で、タスク分解・リファクタ・テスト生成などを対話で進めます。

Codex

かつてのコードモデルを起点にした自動補完/変換の文脈から、ターミナル操作やスクリプト生成を対話で行うCLIの流儀が広がりました。

Gemini CLI

LLMをローカルの作業文脈（ファイル群・ログ）に接続し、質問→アクション→再質問のループを小回りよく回すツールチェインが広く使われています。
今回は、**Swift だけで実装する「Agent CLI」**を具体的に作る方法を解説します。
僕が Swift で開発している軽量エージェント JARDISを作りました。

JARDISは iOS / iPadOS / macOS で軽快に動作すること、オンデバイス／ローカル優先の運用ができること、そして記憶（グラフDB）を前提とすることを重視しています。
JARDISは（由来は Just A Rather Dope Intelligent System）

アイアンマンの JARVIS をオマージュしてます。

JARDISを作るにあたり複数のライブラリを作成しました。
https://github.com/1amageek/SwiftAgent
https://github.com/1amageek/OpenFoundationModels
https://github.com/1amageek/OpenFoundationModels-Ollama
https://github.com/1amageek/kuzu-swift-extension
https://github.com/1amageek/swift-memory

 先に「各ライブラリの役割」を押さえますJARDISのAgent CLIは、以下の公開ライブラリの協調で動作します。それぞれの役目を先に理解しておくと設計が安定します。
SwiftAgent

宣言的DSLで Agent / Step を合成します。Agent、@StepBuilder、Transform、GenerateText、Relay、@Session などのプリミティブを用意し、入力→処理→出力のパイプラインを型安全に組み立てられます。目標は「世界で一番シンプルにエージェントを書ける」ことです。
OpenFoundationModels

Apple の Foundation Models と互換インターフェースを提供しつつ、OpenAI / Anthropic / Ollama / Apple など複数プロバイダを同じ書き味で扱える抽象層です。モデル差や提供元の違いを吸収します。

今回FoundationModelsをリバースエンジニアリングしながら作ったのでとても勉強になりました。
https://developer.apple.com/documentation/foundationmodels
FoundationModelsに関してはWWDCの動画がとてもわかりやすいです。

https://developer.apple.com/jp/videos/play/wwdc2025/286/

https://developer.apple.com/jp/videos/play/wwdc2025/301/

OpenFoundationModels-Ollama

ローカル/リモートの Ollama を OpenFoundationModels の プロバイダとして扱う実装です。他にもOpenFoundationModels-MLXを作ったのでこちらも参考にしてください。
https://github.com/1amageek/OpenFoundationModels-MLX
swift-memory

Kùzu（組み込みグラフDB）をSwiftから扱う記憶レイヤです。会話・知識・タスク・依存関係をグラフとして永続化し、「覚えて考える」エージェントを支えます。session.create / task.create / dependency.set / task.list(readyOnly) などのAPIを提供します。
kuzu-swift-extension

Kùzu の FFI/バインディングとユーティリティ群です。SwiftPM 連携や高レベルAPIを提供し、swift-memoryの土台になります。
kuzudbはEmbeddingやGraphを扱うのにとても優れたDBで軽量でクライアントでも動作します。

kuzu-swift-extensionはSwiftDataのインターフェイスに近づけるように設計し、kuzuを使いやすくしています。
https://kuzudb.com/

 そもそも「Agent CLI」とは何かCLI（コマンドライン）という制約の中で、エージェントは次の8要素を満たすと実用的になります。

Perceive–Think–Act ループ：入力→思考→行動（ツール実行/回答）を回します。

ポリシー（役割・規律）：出力形式、情報源の扱い、安全方針、長いタスクの進め方を明示します。

モデル抽象：プロバイダに依らず、同じ呼び出しでLLM等を扱えます。

ツール呼び出し：Web/ファイル/要約/計画などの外部機能を型安全に利用します。

記憶（Memory）：会話・知識・タスク・依存を**関係（グラフ）**として保存し再利用します。

コンテキスト制御：長文はチャンク分割→並列要約→統合で扱います。

ルーティング：通常メッセージと /compact などコマンドを分岐します。

ロバスト性：例外・リトライ・フォールバック・観測（ログ/トレース）を備えます。
JARDIS はこれらを Swiftだけで完結できるように設計しています。

 SwiftAgentの要点SwiftAgent は SwiftUI ライクな宣言的DSLです。以下のプリミティブだけ覚えれば、CLIに必要なエージェントはほぼ書けます。

Agent：本体。body に Step 連鎖を書きます。

@StepBuilder：複数Stepの宣言的合成。

LanguageModelSession：モデル・ツール・Instructions を束ねます（@Sessionで共有）。

Transform<I,O>：純Swiftで前後処理を記述します。

GenerateText<I>：LLM呼び出しStep。入力Iをプロンプト化します。

Relay<T>：履歴やセッション情報などの共有状態を安全に受け渡します。

Loop / WaitForInput：REPL（対話ループ）を簡潔に実装します。

 CLIプロジェクトの雛形mkdir AgentCLI && cd AgentCLI
swift package init --type executable
Package.swift に依存を追加します。
// Package.swift（例）
dependencies: [
  .package(url: "https://github.com/1amageek/SwiftAgent", branch: "main"),
  .package(url: "https://github.com/1amageek/OpenFoundationModels", branch: "main"),
  .package(url: "https://github.com/1amageek/OpenFoundationModels-Ollama", branch: "main"),
  .package(url: "https://github.com/1amageek/swift-memory", branch: "main"),
  .package(url: "https://github.com/1amageek/kuzu-swift-extension", branch: "main"),
],
targets: [
  .executableTarget(
    name: "AgentCLI",
    dependencies: [
      "SwiftAgent",
      "OpenFoundationModels",
      "OpenFoundationModelsOllama",
      .product(name: "SwiftMemory", package: "swift-memory")
    ]
  )
]

 単一CLIでスラッシュコマンドを解釈する設計と実装
/コマンドの振り分け、会話ループ、ツール実行はすべて Agent 内部で完結させます。

Perceive–Think–Act のループを保ったまま操作できます。

 使い方$ jardis
JARDIS を起動します。終了は Ctrl+D / Ctrl+C です。
> 仕様書を読み込んで要点を教えてください
...（回答）
> /compact implementation details
✅ 会話を要約しました（implementation details にフォーカス）

> /help
/clear, /compact [focus], /status, /help が使えます

 構成方針
AgentCLI.swift … CLI の入口。GeneralAgent を起動するだけにします。

GeneralAgent.swift … 入力正規化 → ルーティング → 実行を Step で表現します。

AgentCore.swift … 型や補助 Step（InputProcessingStep / CommandRoutingStep / ConditionalRoutingStep）をまとめます。

CompactAgent.swift … 履歴要約のミニマル実装（必要十分な 3 段スライス: プロンプト→生成→整形）。
依存関係（SwiftAgent / OpenFoundationModels / OpenFoundationModels-Ollama / swift-memory など）は従来どおりです。記事では要点に絞って掲載します。

 AgentCLI.swift（CLI は「起動して委ねる」だけ）// Sources/JardisCLI/AgentCLI.swift
import Foundation

@main
struct AgentCLI {
    static func main() async {
        print("JARDIS を起動します。終了は Ctrl+D / Ctrl+C です。")
        do {
            try await GeneralAgent().run("")   // 初期入力は空でOK（Agent 内の Loop が標準入力を監視します）
        } catch {
            fputs("fatal: \(error)\n", stderr)
        }
    }
}

 GeneralAgent.swift（スラッシュコマンドを Agent 内で解釈）// Sources/JardisCLI/GeneralAgent.swift
import Foundation
import SwiftAgent
import OpenFoundationModels
import OpenFoundationModelsOllama
// 必要に応じて: import SwiftMemory, AgentTools, Render, TerminalUI

public struct GeneralAgent: Agent {

    // === 共有状態（履歴やセッション情報） ===========================
    private let conversationHistory = Relay<[String]>(get: { [] }, set: { _ in })
    private let sessionContext      = Relay<SessionContext>(get: { SessionContext() }, set: { _ in })

    // === LLM セッション（モデル・ツール・方針） ======================
    @SwiftAgent.Session
    private var session = LanguageModelSession(
        model: OllamaLanguageModel(modelName: "gpt-oss:20b"),
        tools: [], // ここに memory ツールや read ツール等を加えられます
        instructions: {
            Instructions(Self.instructions())
        }
    )

    // === Agent の規律（出力形式・安全・長尺作業の進め方） ==========
    static func instructions() -> String {
        """
        You are Jardis, a general-purpose AI agent.
        - Drive tasks through Situation→Research→Planning→Execution→Check.
        - Keep outputs concise and structured (headings + bullets).
        - Verify when recency matters; prefer primary sources.
        - Do not expose raw chain-of-thought; provide decision summaries only.
        - Advance autonomously; ask only decision-critical questions.
        """
    }

    // === 本体パイプライン ============================================
    // 入力 → 正規化 → ルーティング → 実行（通常応答 or デリゲーション or 直接応答）
    @StepBuilder
    public var body: some Step<String, String> {
        Loop { _ in
            WaitForInput(prompt: "> ")
            InputProcessingStep()                                             // 1) 正規化
            CommandRoutingStep(                                               // 2) ルーティング
                conversationHistory: conversationHistory,
                sessionContext: sessionContext,
                modelName: "gpt-oss:20b"
            )
            ConditionalRoutingStep(                                           // 3) 実行
                conversationHistory: conversationHistory,
                sessionContext: sessionContext,
                modelName: "gpt-oss:20b",
                workspace: URL(fileURLWithPath: FileManager.default.currentDirectoryPath)
            )
        }
    }
}

 AgentCore.swift（型と 3 つの補助 Step）// Sources/JardisCLI/AgentCore.swift
import Foundation
import SwiftAgent

// ========== タイプ定義 ==========
public struct ProcessedInput: Sendable {
    public enum InputType: Sendable { case message, command, exit, invalid }
    public let type: InputType
    public let content: String
    public let command: String?
    public let arguments: String?
    public let timestamp: Date
    public init(type: InputType, content: String, command: String? = nil, arguments: String? = nil, timestamp: Date = .init()) {
        self.type = type; self.content = content; self.command = command; self.arguments = arguments; self.timestamp = timestamp
    }
}

public struct RoutingResult: Sendable {
    public enum Action: Sendable { case process, respond(String), delegate(AgentType), exit }
    public let action: Action
    public let context: ConversationContext?
    public init(action: Action, context: ConversationContext? = nil) { self.action = action; self.context = context }
}

public enum AgentType: Sendable { case compact(instructions: String) }

public struct ConversationContext: Sendable {
    public let input: String
    public let history: [String]
    public let sessionInfo: SessionContext
}

public struct SessionContext: Codable, Sendable {
    public var sessionID: UUID = .init()
    public var startTime: Date = .init()
    public var taskCount: Int = 0
    public var totalTokensUsed: Int = 0
    public var lastActivityTime: Date = .init()
    public var currentTaskID: UUID? = nil
    public mutating func updateActivity() { lastActivityTime = .init() }
    public var formattedDuration: String {
        let s = Int(Date().timeIntervalSince(startTime)); let h=s/3600, m=(s%3600)/60, ss=s%60
        return h>0 ? "\(h)h \(m)m \(ss)s" : (m>0 ? "\(m)m \(ss)s" : "\(ss)s")
    }
}

// ========== 1) 入力正規化（/コマンド検出） ==========
public struct InputProcessingStep: Step {
    public typealias Input = String
    public typealias Output = ProcessedInput
    public init() {}
    public func run(_ input: String) async throws -> ProcessedInput {
        let t = input.trimmingCharacters(in: .whitespacesAndNewlines)
        guard !t.isEmpty else { return .init(type: .invalid, content: input) }
        if ["exit", "quit"].contains(t.lowercased()) { return .init(type: .exit, content: input) }
        if t.hasPrefix("/") {
            let comps = t.dropFirst().split(separator: " ", maxSplits: 1)
            let cmd = String(comps.first ?? "")
            let arg = comps.count>1 ? String(comps[1]) : nil
            return .init(type: .command, content: t, command: cmd.lowercased(), arguments: arg)
        }
        return .init(type: .message, content: t)
    }
}

// ========== 2) コマンド・ルーティング ==========
public struct CommandRoutingStep: Step {
    public typealias Input = ProcessedInput
    public typealias Output = RoutingResult
    private let conversationHistory: Relay<[String]>
    private let sessionContext: Relay<SessionContext>
    private let modelName: String
    public init(conversationHistory: Relay<[String]>, sessionContext: Relay<SessionContext>, modelName: String) {
        self.conversationHistory = conversationHistory; self.sessionContext = sessionContext; self.modelName = modelName
    }
    public func run(_ input: ProcessedInput) async throws -> RoutingResult {
        var ctx = sessionContext.wrappedValue; ctx.updateActivity(); sessionContext.wrappedValue = ctx
        switch input.type {
        case .exit:
            return .init(action: .exit)
        case .invalid:
            return .init(action: .respond("❌ 入力が空です。もう一度お願いします。"))
        case .message:
            let c = ConversationContext(input: input.content, history: conversationHistory.wrappedValue, sessionInfo: sessionContext.wrappedValue)
            return .init(action: .process, context: c)
        case .command:
            guard let cmd = input.command else { return .init(action: .respond("❌ コマンド形式が正しくありません")) }
            switch cmd {
            case "clear":
                conversationHistory.wrappedValue = []
                sessionContext.wrappedValue = SessionContext()
                return .init(action: .respond("✨ 会話をリセットしました。"))
            case "help":
                return .init(action: .respond("""
                📚 利用可能なコマンド:
                • /clear        会話履歴を消去します
                • /compact [focus]  会話を要約します（フォーカス指定可）
                • /status       セッション情報を表示します
                • /help         このヘルプを表示します
                """))
            case "status":
                let s = sessionContext.wrappedValue
                return .init(action: .respond("""
                📊 Session Status
                • Session ID: \(s.sessionID.uuidString.prefix(8))...
                • Duration: \(s.formattedDuration)
                • Messages: \(conversationHistory.wrappedValue.count)
                """))
            case "compact":
                let focus = input.arguments ?? "key points and action items"
                return .init(action: .delegate(.compact(instructions: focus)))
            default:
                return .init(action: .respond("❓ 未知のコマンド: /\(cmd)\n/help で一覧を確認できます。"))
            }
        }
    }
}

// ========== 3) 実行（通常応答 / デリゲーション / 直接応答） ==========
import OpenFoundationModels
import OpenFoundationModelsOllama

public struct ConditionalRoutingStep: Step {
    public typealias Input = RoutingResult
    public typealias Output = String
    private let conversationHistory: Relay<[String]>
    private let sessionContext: Relay<SessionContext>
    private let modelName: String
    private let workspace: URL
    public init(conversationHistory: Relay<[String]>, sessionContext: Relay<SessionContext>, modelName: String, workspace: URL) {
        self.conversationHistory = conversationHistory; self.sessionContext = sessionContext; self.modelName = modelName; self.workspace = workspace
    }
    public func run(_ result: RoutingResult) async throws -> String {
        switch result.action {
        case .exit:
            exit(0)
        case .respond(let message):
            print(message); return message
        case .process:
            guard let ctx = result.context else { return "❌ 内部エラー: コンテキストなし" }
            conversationHistory.wrappedValue.append("User: \(ctx.input)")
            // 履歴つきプロンプトを構築して通常応答
            let prompt = buildPrompt(from: ctx)
            let reply = try await generate(prompt: prompt)
            conversationHistory.wrappedValue.append("Assistant: \(reply)")
            trimHistoryIfNeeded()
            print(reply)
            return reply
        case .delegate(let kind):
            switch kind {
            case .compact(let focus):
                let agent = CompactAgent(modelName: modelName)
                let input = CompactInput(instructions: focus, history: conversationHistory.wrappedValue)
                let out = try await agent.run(input)
                if out.success {
                    conversationHistory.wrappedValue = ["[Summary] Focus: \(focus)\n\(out.summary)"]
                    let msg = "✅ 会話を要約しました（Focus: \(focus)）"
                    print(msg); return msg
                } else {
                    let msg = "❌ 要約に失敗しました。"
                    print(msg); return msg
                }
            }
        }
    }

    private func buildPrompt(from ctx: ConversationContext) -> String {
        let recent = ctx.history.suffix(10).joined(separator: "\n")
        return """
        ## Recent Conversation
        \(recent.isEmpty ? "[No previous context]" : recent)

        ## User
        \(ctx.input)

        Please respond concisely and helpfully, with clear structure.
        """
    }
    private func generate(prompt: String) async throws -> String {
        let session = LanguageModelSession(
            model: OllamaLanguageModel(modelName: modelName),
            tools: [],
            instructions: {
                Instructions("""
                You are a helpful assistant. Use headings and bullets when useful.
                """)
            }
        )
        let relay = Relay(get: { session }, set: { _ in })
        let step = GenerateText<String>(session: relay) { _ in prompt }
        return try await step.run(prompt)
    }
    private func trimHistoryIfNeeded() {
        if conversationHistory.wrappedValue.count > 100 {
            conversationHistory.wrappedValue = Array(conversationHistory.wrappedValue.suffix(80))
        }
    }
}

 CompactAgent.swift（/compact が委譲する「要約」エージェント）3 段スライス（プロンプト→生成→整形）で小さく保ちます。GeneralAgent 内からのみ利用する想定のため、入出力も最小限です。
// Sources/JardisCLI/CompactAgent.swift
import Foundation
import SwiftAgent
import OpenFoundationModels
import OpenFoundationModelsOllama

public struct CompactInput: Sendable {
    public let instructions: String
    public let history: [String]
    public init(instructions: String, history: [String]) { self.instructions = instructions; self.history = history }
}

public struct CompactOutput: Sendable {
    public let summary: String
    public let originalCount: Int
    public let success: Bool
    public init(summary: String, originalCount: Int, success: Bool) {
        self.summary = summary; self.originalCount = originalCount; self.success = success
    }
}

public struct CompactAgent: Agent {
    public typealias Input = CompactInput
    public typealias Output = CompactOutput
    let modelName: String
    public init(modelName: String) { self.modelName = modelName }

    @StepBuilder
    public var body: some Step<CompactInput, CompactOutput> {
        // 1) プロンプト
        Transform<CompactInput, String> { input in
            """
            Please provide a concise and structured summary of the following conversation.

            Focus: \(input.instructions)

            Conversation:
            \(input.history.joined(separator: "\n"))

            Instructions:
            - Capture key decisions and open issues
            - Preserve important technical details and context
            - Keep it clear with headings and bullets
            - Emphasize: \(input.instructions)
            """
        }
        // 2) 生成
        Transform<String, String> { prompt in
            let session = LanguageModelSession(
                model: OllamaLanguageModel(modelName: modelName),
                tools: [],
                instructions: { Instructions("You summarize technical conversations.") }
            )
            let relay = Relay(get: { session }, set: { _ in })
            let step = GenerateText<String>(session: relay) { _ in prompt }
            return try await step.run(prompt)
        }
        // 3) 整形
        Transform<String, CompactOutput> { summary in
            CompactOutput(summary: summary, originalCount: summary.count, success: true)
        }
    }
}

 まとめエージェントの中身は、思っているよりずっとシンプルです。やっていることは「入力を読む → 目的に沿って考える → 必要な道具（ツール）を呼んで動く」を小さく繰り返すだけです。中心にあるのは LLM で、そこにファイル読解や検索、要約、計画などのツールを足して、ひとつの流れに束ねます。難しい魔法というより、道具の選び方と段取りの話に近いです。
もうひとつ大事なのは、人間の作業に近い手つきをエージェントで再現することです。まず当たりをつけて、仮説を立てて、ツールで確かめ、結果を次の一手に反映する──この「小さく進めて確かめる」リズムがあると、CLI の会話も自然に前へ進みます。スラッシュコマンドで要約や計画を呼ぶのも、その一環です。
一番の難所はコンテキストの扱い方です。長い会話や長文をそのまま抱えると重くなるので、チャンク分割して並列に要約したり、要点だけを残す「圧縮」をこまめに入れたりします。いつ何を捨てて、何を残すか。ここは設計の勘所で、運用しながら調整が必要になります。
メモリー機能はまだ発展途上です。グラフで「誰が・何を・なぜ・どこと繋がるか」を残せるのは強力ですが、書き過ぎればノイズになり、書かなければ再利用できません。保持期間や重要度のしきい値、プライバシーの扱いも含めて、プロジェクトごとに最適解が変わります。育てていく余白がある分野です。
今後は MCP（Model Context Protocol）との連携を入れていきます。ツールや外部データ源を安全に拡張しやすくなり、エージェントが触れる世界が広がります。オンデバイスで軽く動かしつつ、必要なときだけ外の力を借りる──その橋渡しを、標準的なプロトコルで整えていく予定です。
結局のところ、Agent CLI は「LLM と少数のツールを、人間の段取りに合わせてつなぎ直す」作業です。余計な重装備を増やすより、手元の流れを保ったまま小さく回すほうがうまくいきます。ターミナルで会話しながら道具を呼ぶが現在のAgentとなっています。今後の発展に期待です！
先に「各ライブラリの役割」を押さえます

そもそも「Agent CLI」とは何か

SwiftAgentの要点

CLIプロジェクトの雛形

単一CLIでスラッシュコマンドを解釈する設計と実装

使い方

構成方針

AgentCLI.swift（CLI は「起動して委ねる」だけ）

GeneralAgent.swift（スラッシュコマンドを Agent 内で解釈）

AgentCore.swift（型と 3 つの補助 Step）

CompactAgent.swift（/compact が委譲する「要約」エージェント）

まとめ

Discussion