iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🐕

Building a macOS Input Switcher in a Weekend with Claude Code

に公開

Introduction

I built a menu bar-resident application for macOS over the weekend. The tool is EnJaSwitcher, designed to switch between English and Japanese input modes on a US keyboard. This is a record of how I developed it using Claude Code, with a human verifying and refining the code generated by the AI.

While I had some slight familiarity with Swift, I was not well-versed in building actual applications, low-level APIs like CGEventTap or IOHIDManager, or the complexities of TCC databases and code signing. I want to leave behind these findings—including the pitfalls I encountered and their solutions—that might be useful for others, as far as I could delve into them in collaboration with an AI. Please point out any errors.

I have written about the process and my philosophy on work in this note article, so please read that if you are interested.

Topics covered:

  • Building a single binary with swiftc and manually assembling a .app bundle
  • Implementing CGEventTap to distinguish between left and right Command keys via bit flags
  • Why I used IOHIDManager to monitor CapsLock
  • Bypassing a known TISSelectInputSource bug by sending virtual keys
  • Using self-signed certificates to pin TCC permissions
  • Persistence via LaunchAgent and the pitfall of starting from the Terminal

Technical Specifications of the Deliverable

Targeting macOS 13 and later, supporting both Apple Silicon and Intel. The .app bundle is provided as a zip file on GitHub Releases. No notarization was performed.

The application is a single main.swift file. It is built by passing -framework Carbon -framework Cocoa -framework IOKit to swiftc. No external library dependencies. LSUIElement is specified in Info.plist to operate as a background agent that does not appear in the App Dock or the app switcher.

The input switching implementation is a combination of two methods: CGEventTap for the left/right Command keys, and IOHIDManager for CapsLock. Input source switching itself does not use TISSelectInputSource but instead virtually sends the JIS keyboard "Eisu" (keyCode 102) and "Kana" (keyCode 104) keys as CGEvents (reasoning below).

1. Building with swiftc as a Single File and Manual .app Bundle Assembly

I chose to build directly with swiftc rather than using an Xcode project. It only requires source files and a build command, plus I wanted to use Zed.

Build command:

swiftc -O -o enja-switcher main.swift \
  -framework Carbon -framework Cocoa -framework IOKit

Bundle structure:

EnJaSwitcher.app/
  Contents/
    Info.plist
    MacOS/
      enja-switcher            # swiftc output binary
    Resources/
      AppIcon.icns

Here are some common pitfalls when assembling this manually:

Correspondence between CFBundleExecutable and the binary name. The value of CFBundleExecutable in Info.plist must exactly match the name of the binary in Contents/MacOS/. If they differ, the app may launch but do nothing, with very little information appearing in the logs.

Converting to .icns for icons. A single PNG is not enough. You can bundle multiple PNG sizes into an .icns file using iconutil -c icns MyIcon.iconset. The filenames inside the .iconset directory must follow a fixed format: icon_16x16.png, icon_16x16@2x.png, and so on.

Sign the entire .app bundle. Use codesign --force --sign "EnJaSwitcher Dev" EnJaSwitcher.app to sign the bundle as a whole. Signing only the binary inside will break bundle integrity, causing it to be rejected by Gatekeeper or spctl. The handling of the signing identity (why I use a fixed self-signed certificate instead of ad-hoc signing) is covered in detail in Chapter 5 regarding TCC measures.

Summary

For a scale where the source is just a single main.swift, building directly with swiftc is sufficient. You can assemble .app bundles by hand, but be careful with the correspondence between CFBundleExecutable and the binary name.

2. Distinguishing Left and Right Command Keys with CGEventTap

CGEventTap monitors key events, but a single press of the Command key arrives via flagsChanged, not keyDown. While it is possible to distinguish between left and right using the kVK_Command (55) and kVK_RightCommand (54) keycode constants, this implementation uses bit flags from event.flags.rawValue.

Left Command press bit:  0x08
Right Command press bit: 0x10

These are bit flags for left/right identification contained in the raw rawValue, separate from CGEventFlags.maskCommand (the public flag indicating that it is being pressed). Although not officially documented, this has been functioning stably for a long time.

Trigger conditions are as follows: Pressing and releasing the Command key alone, with no other keys pressed between the press and release. I manage this condition with two flags: commandIsDown and otherKeyPressedDuringCommand. There is no time limit. Even for a long press, if no other key is touched, it will trigger the moment it is released.

Here is the relevant code. Please refer to the GitHub main.swift for the full code.

let leftCommandBit: UInt64 = 0x08
let rightCommandBit: UInt64 = 0x10

var commandIsDown = false
var otherKeyPressedDuringCommand = false

let callback: CGEventTapCallBack = { proxy, type, event, refcon in
    if type == .keyDown || type == .keyUp {
        if commandIsDown { otherKeyPressedDuringCommand = true }
        return Unmanaged.passUnretained(event)
    }

    // flagsChanged
    let rawFlags = event.flags.rawValue
    let isLeft = (rawFlags & leftCommandBit) != 0
    let isRight = (rawFlags & rightCommandBit) != 0

    if isLeft || isRight {
        commandIsDown = true
        otherKeyPressedDuringCommand = false
    } else if commandIsDown {
        if !otherKeyPressedDuringCommand {
            // Pressed and released alone -> trigger switch
        }
        commandIsDown = false
    }
    return Unmanaged.passUnretained(event)
}

One implementation note: I encountered a phenomenon where the event tap stopped working when these constants (like leftCommandBit) and state variables were placed in the file's global scope. It works when structured to reference them via closure captures. I suspect this is related to Swift's closure capture behavior, though I haven't tracked down the root cause. I left it in the working form for now.

Measures Against CGEventTap Event Loss

CGEventTap can be disabled by the OS at its own discretion, such as during high load, timeouts, or when the user performs certain permission operations. Since key inputs will be missed if it is disabled, I included a process to re-enable the event tap.

if type == .tapDisabledByTimeout || type == .tapDisabledByUserInput {
    CGEvent.tapEnable(tap: eventTap, enable: true)
    return Unmanaged.passUnretained(event)
}

If you handle this at the beginning of the callback, it will automatically recover even if the OS cuts the tap. This is essential for a menu bar app that resides for a long time.

Summary

A single press of a modifier key arrives via flagsChanged. Using bit flags in event.flags.rawValue for left/right identification is solid. Because CGEventTap can be silently disabled, always include recovery logic for .tapDisabledByTimeout and .tapDisabledByUserInput.

3. CapsLock and IOHIDManager

As the second input switching method, I implemented switching to English on a single press of CapsLock and to Japanese on a double press. Here, I adopted IOHIDManager instead of CGEventTap.

The reason is that if the user has set "Caps Lock key action" to "No Action" in the system settings, events do not arrive at CGEventTap. Many people remap CapsLock to Control or Escape or disable it entirely, and I cannot allow the switching to stop working in that state.

IOHIDManager captures inputs directly at the USB HID level. Since it operates at a layer lower than the OS's "Disable Caps Lock" setting, it is unaffected by system settings.

Outline of the implementation:

let hidManager = IOHIDManagerCreate(kCFAllocatorDefault,
                                    IOOptionBits(kIOHIDOptionsTypeNone))

let matchingDict: [String: Any] = [
    kIOHIDDeviceUsagePageKey as String: kHIDPage_GenericDesktop,
    kIOHIDDeviceUsageKey as String: kHIDUsage_GD_Keyboard
]
IOHIDManagerSetDeviceMatching(hidManager, matchingDict as CFDictionary)

IOHIDManagerRegisterInputValueCallback(hidManager, { ctx, result, sender, value in
    let element = IOHIDValueGetElement(value)
    let usagePage = IOHIDElementGetUsagePage(element)
    let usage = IOHIDElementGetUsage(element)
    let pressed = IOHIDValueGetIntegerValue(value) == 1

    // Detect usagePage == 0x07 (Keyboard) and usage == 0x39 (CapsLock)
    // Determine single vs. double press based on timing (0.3 seconds)
}, nil)

IOHIDManagerScheduleWithRunLoop(hidManager, CFRunLoopGetCurrent(),
                                CFRunLoopMode.commonModes.rawValue)
IOHIDManagerOpen(hidManager, IOOptionBits(kIOHIDOptionsTypeNone))

Use matchingDict to filter by keyboard devices. If you don't do this, the inputValueCallback will also be called for the mouse and trackpad, causing unnecessary processing.

Permission Requirements

To capture raw inputs using IOHIDManager, macOS Input Monitoring permission is required. You need the user to allow the app under System Settings > Privacy & Security > Input Monitoring. Since there is no built-in mechanism to show a permission dialog upon initial launch, it is best to provide a way for the user to open the settings screen from an app menu or similar.

Summary

If you want to capture keyboard input without being affected by system settings, consider IOHIDManager, which is lower-level than CGEventTap. This requires Input Monitoring permission. Don't forget to filter by keyboard devices using the matching dictionary.

4. Bug in TISSelectInputSource and Workaround via Virtual Key Sending

This is the core of this article. To ensure that those who arrive here by searching for the same symptoms can find the solution as quickly as possible, I will write this in the order of symptoms → investigation → workaround.

Symptoms

  • When switching to Japanese input using TISSelectInputSource from a background agent, the menu bar input indicator changes to "あ", but the actual input state remains as ABC.
  • When keys are typed, half-width English characters are entered.
  • It works normally when called from a foreground app.
  • Even when called from the background, switching to English works.
  • Only switching to Japanese is asymmetrically broken.

Below is the implementation first suggested by AI. As an official API, it is correct.

func switchToJapanese() {
    guard let sourcesCF = TISCreateInputSourceList(nil, false)?.takeRetainedValue() else { return }
    let sources = sourcesCF as! [TISInputSource]
    for source in sources {
        guard let idPtr = TISGetInputSourceProperty(source, kTISPropertyInputSourceID) else { continue }
        let id = Unmanaged<CFString>.fromOpaque(idPtr).takeUnretainedValue() as String
        if id == "com.apple.inputmethod.Kotoeri.RomajiTyping.Japanese" {
            TISSelectInputSource(source)
            break
        }
    }
}

Investigation Flow

I will debunk the hypotheses initially returned by the AI.

  1. Accessibility/Input Monitoring permissions not granted? → Both are already granted. The symptoms persist even after removing and re-granting them. Not applicable.
  2. Call timing too early? → The symptoms are the same even if delayed by several hundred milliseconds to several seconds using DispatchQueue.main.async. Not applicable.
  3. Input source ID is incorrect? → It is recognized by the same ID during manual switching. Not applicable.
  4. Focus on asymmetry → Switching to English works, only Japanese is broken. This asymmetry cannot be explained by the way the API is used. It is likely a bug on the macOS side.

Having reached this point in the investigation, I concluded that this corresponds to the known macOS bug openradar #5021326444232704. It is a long-standing issue where switching to Japanese via TISSelectInputSource from the background fails.

Workaround: Sending Virtual Keys

JIS keyboards have "Eisu" (keyCode 102) and "Kana" (keyCode 104) as physical keys. When the OS receives these key inputs, it switches the input source. This path is processed at a layer lower than TISSelectInputSource, so it is not affected by the bug.

US keyboards do not have these physical keys, but the OS switches the input source regardless of the presence of physical keys if a virtual event saying "keyCode 102 was pressed" is delivered. By synthesizing and sending this virtual event via CGEvent, the bug can be bypassed.

Here is the code you can use as is:

func sendVirtualKey(_ keyCode: CGKeyCode) {
    let source = CGEventSource(stateID: .hidSystemState)
    let keyDown = CGEvent(keyboardEventSource: source,
                          virtualKey: keyCode, keyDown: true)
    let keyUp = CGEvent(keyboardEventSource: source,
                        virtualKey: keyCode, keyDown: false)
    keyDown?.post(tap: .cghidEventTap)
    keyUp?.post(tap: .cghidEventTap)
}

func switchToEnglish()  { sendVirtualKey(102) }
func switchToJapanese() { sendVirtualKey(104) }

With this implementation, the menu bar display and the actual input state remain synchronized even when called from the background.

Consideration of Side Effects

One concern with virtual key sending is whether it breaks the state of other keys. Within the scope of my experiments, the following has been confirmed:

  • Modifier key states (such as Shift being held down) are not reset. By using CGEventSource(stateID: .hidSystemState), it does not contaminate the key input state of other apps.
  • Behavior may be unstable when waiting for dead key input. It is difficult to determine from the outside whether the IME is in the middle of dead key input, so I have given up on perfect handling.
  • If keyCode 102/104 is sent while other input sources (Chinese, Korean, etc.) are active, it may cause unintended behavior. I have decided to accept this by assuming that EnJaSwitcher is exclusively for switching between Eisu/Kana.

Summary

If you find an asymmetric bug in the official OS API, have the idea to substitute it with a layer one level lower than the API layer (in this case, virtual key events). AI can provide general solutions based on API documentation, but humans must observe symptoms and isolate edge-case bugs.

5. Self-Signed Certificates, TCC Database, and Permission Persistence

Since the app requires both Accessibility and Input Monitoring permissions, having to reset permissions every time I build would be cumbersome. The TCC (Transparency, Consent, and Control) database strategy is the countermeasure for this.

macOS's TCC database identifies apps by their signing identity. Binaries signed with the same identity are treated as the "same app," and permissions persist. If the signature changes, TCC considers it a different app, and permissions must be re-granted.

Initially, I built with ad-hoc signing using codesign --sign -, but this signature generates a different identity for every build. This caused TCC to treat it as a different app every time, forcing me to re-grant Accessibility and Input Monitoring permissions repeatedly. This was hindering development.

The solution is to sign with a fixed self-signed certificate created in Keychain. For personal distribution without Notarization, this operation is perfectly practical.

  1. Open Keychain Access → Certificate Assistant → Create a Certificate.
  2. Name: EnJaSwitcher Dev (optional), Identity Type: Self-Signed Root, Certificate Type: Code Signing.
  3. After creation, sign with this certificate for every build.
codesign --force --sign "EnJaSwitcher Dev" EnJaSwitcher.app

You can verify the signing status with codesign.

codesign -d --verbose=4 EnJaSwitcher.app

If Authority=EnJaSwitcher Dev appears, it is signed with the intended certificate.

If you ever recreate the certificate or accidentally build without signing, TCC will judge it to be a different app. In this case, you must manually delete the old entry from the permission list in System Settings using the minus button, then add the new .app using the plus button.

Summary

TCC identifies apps by their signing identity. By creating a self-signed certificate once and signing all builds with the same certificate, you can avoid the ordeal of re-granting permissions during development.

6. LaunchAgent and the Trap of Incorrect Permission Assignment

Automatic startup at login is achieved using a LaunchAgent. Place a plist in ~/Library/LaunchAgents/.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.local.enja-switcher</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Applications/EnJaSwitcher.app/Contents/MacOS/enja-switcher</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
</dict>
</plist>

How to Load

launchctl bootstrap is recommended for macOS 13 and later. launchctl load still works but is considered deprecated.

# Recommended
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.local.enja-switcher.plist

# Legacy (works but deprecated)
launchctl load ~/Library/LaunchAgents/com.local.enja-switcher.plist

Check status.

launchctl list | grep enja-switcher

Behavior of KeepAlive

With KeepAlive = true, launchd will automatically restart the process if it crashes. This is helpful during development, but difficult to stop if it enters a crash loop. If you need fine-grained control, specify SuccessfulExit or Crashed in dictionary format.

The Trap of Incorrect Permission Assignment When Launching from Terminal

Launching the binary directly from Terminal during development can cause issues.

# Doing this grants Accessibility/Input Monitoring permissions to Terminal.app instead
./EnJaSwitcher.app/Contents/MacOS/enja-switcher

macOS is designed to request permissions for the parent process, so permission requests from a process started from Terminal are directed to Terminal.app. As a result, EnJaSwitcher.app is not granted permissions, and it will not work in production (when launched via LaunchAgent).

The correct way is to launch using the open command.

open /Applications/EnJaSwitcher.app

Since LaunchAgent launches the binary directly via launchd, the parent process issue does not occur. Permissions are correctly granted to EnJaSwitcher.app.

Summary

When running as a background service with LaunchAgent, launchctl bootstrap is the current recommendation. Running the binary directly from Terminal during development will assign permissions to Terminal.app, so be sure to use the open command.

Overall Learnings

What I strongly felt through this development is that your sense of the system's layered structure is what determines your ability to leverage AI.

The process of discovering IOHIDManager and the process of bypassing the TISSelectInputSource bug via virtual key sending followed the same structure. The intuition that "there must be something in a layer below the OS's public API" or "this process must be handled at a lower level" determines the quality of your questions to the AI. If the question is appropriate, the AI provides the knowledge. If the question is superficial, you will get a plausible general answer, and that is where it will stop.

You can leave the knowledge of specific APIs to the AI. What you cannot delegate is the overall map of where layers exist and what they are. The more foundation you have in computer science, OS mechanisms, and IO layer structures, the more you can extract from AI.

In an era where you can have AI write code, the importance of fundamentals does not diminish. I feel it actually increases.


Repository

GitHubで編集を提案

Discussion