iTranslated by AI
Is Explaining UI to AI Difficult? Comparing 4 Different Methods
Isn't it Hard to Explain UI to AI? I Tried 4 Methods
Nice to meet you. I usually develop web apps.
Recently, I started using Claude Code and Cursor, and they are very convenient, but there's one minor thing I'm struggling with.
It's the "How should I convey the UI layout to the AI?" problem.
What's the Problem?
For example, when you want to modify an existing screen, you need to tell the AI the current state, right?
"There's a header, a form below it, and a button in the bottom right..."
Even if you explain it like that, the layout the AI imagines can sometimes differ from your intention.
On the other hand, pasting the entire HTML or, even worse, an image, eats up a lot of the context window...
Is anyone else having this same problem?
4 Methods I Tried
I was curious, so I wrote the same login form in four different formats and actually measured the token counts.
1. Natural Language Description
There's "Login" in the header, input fields for email and password below it,
and a login button in the bottom right.
Token count: 102
It's easy to write, but ambiguity remains, like "where exactly is the bottom right?"
The AI sometimes interprets it differently.
2. ASCII art
+------------------+
| Login |
+------------------+
| Email: [____] |
| Pass: [____] |
| [Login] |
+------------------+
Token count: 84
It's visually easy to understand! However, if you edit it, the borders shift and it collapses.
When you have AI modify it, it breaks with high probability...
3. Grid Format (Custom)
grid: 4x3
A1..D1: { type: txt, value: "Login", align: center }
A2..D2: { type: input, label: "Email" }
A3..C3: { type: input, label: "Password" }
D3: { type: btn, value: "Login" }
Token count: 120
A notation similar to Excel cell references.
You write it like "merge A1 to D1 and set as text."
With this, it's hard to break even if the AI edits it.
There's no ambiguity because positions are determined by coordinates.
4. HTML
<div class="header">Login</div>
<form>
<input type="email" placeholder="Email">
<input type="password" placeholder="Password">
<button type="submit">Login</button>
</form>
Token count: 330
Accurate, but as expected, it's long.
With complex screens, the difference becomes even greater.
Comparison Results

| Format | Token Count | Accuracy | Edit Resistance |
|---|---|---|---|
| Natural Language | 102 | △ | - |
| ASCII | 84 | ○ | ✕ |
| Grid | 120 | ○ | ○ |
| HTML | 330 | ○ | ○ |
Created a Tool to Visualize Grid Format
Since I was at it, I also created a CLI tool to convert the grid format into a PNG image.
npx ktr login.kui -o login.png
With this, you can check the "layout understood by the AI" as an image.
It's convenient because you can immediately see if the recognition is correct.

Summary
Personally, I like the grid format.
It's shorter than HTML and more resistant to editing than ASCII art.
Finally
It's not suitable for fine-tuning small details, but since it's for mockups, this should be enough.
By the way, from a design philosophy standpoint, I don't recommend expanding the grid infinitely to make minute adjustments.
If you want to do that, you should use a full-fledged design tool.
If you like it, please give it a star or buy me a coffee!
I'd also be happy if you could submit a PR if you have any good ideas~
Discussion