iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📊

The Real Reason Claude Code Miscounts Commas in CSV Editing

に公開

Introduction

I previously asked GPT-4 to edit a CSV file.
Specifically, it was a product editing CSV used by operational staff for an e-commerce site.

The data followed a header row of over 100 columns and had many empty cells—meaning it was data with a massive sequence of commas.

"Put this value in the 37th column."

It should have been a simple instruction, but the cell positions shifted or the number of commas ended up being insufficient.
No matter how I tweaked it, something always went wrong somewhere.

In the end, I asked it to "write an editing script" and executed that in the terminal.
With this method, the task was completed in one go.

At that moment, I thought: "Maybe AI can't count commas (or numbers)?"

Time has passed since then, and AI models have evolved.
I wonder if this problem is solved in Claude Opus 4.5?

Experiment 1: Editing a 100-column CSV

For verification, I prepared an empty 100-column CSV (99 commas) and had the AI input data in four locations.

Instructions:

  • 7th column: Tokyo
  • 23rd column: Engineer
  • 58th column: 090-1234-5678
  • 91st column: Active

Results (1st attempt):

Column Expected Result
7th column Tokyo ✓ Correct
23rd column Engineer ✓ Correct
58th column 090-1234-5678 Entered in 59th column
91st column Active ✓ Correct

Only the 58th column was off by one.

Surprising?
Please try it yourself with the AI you have on hand.
You might get lucky once, but if you do it twice, it will likely fail somewhere.

(If you keep giving correction instructions, the context—specifically the "edit carefully" context mentioned later—accumulates and it improves slightly.)

Experiment 2: Single-location input

Let's try an even simpler case with only one input.

"Put 'Test' in the 47th column."

Results:

  • Test in the 47th column → ✓ Position is correct
  • Total number of commas → ✗ 98 instead of the required 99

Even though the position is correct, one comma is missing.

Looking closer:

  • Commas before Test: 48 ✓ Accurate
  • Commas after Test: 50 ✗ (Should be 51)

It decreased by one in the "filling the rest" part.

This was exactly as I expected.
Memories of disappointment when opening such files in a CSV editor come flooding back.

Experiment 3: Verification in a Different Environment

I conducted the same test in a session separate from Experiments 1 and 2 (a fresh state of Claude with no shared context).

Results:

  • All positions → ✓ Correct
  • Total number of commas → ✗ 100 instead of the required 99

This time, there was one too many.

Interim Summary: Positions Match, but the End Fluctuates

Patterns observed from the experiments so far:

  1. Data insertion at specified positions is generally accurate.
  2. The number of trailing commas fluctuates (around ±1).
  3. It can either decrease or increase.

It's not so much that it "can't count commas," but rather it feels like "it processes accurately up to the specified position, but fills the rest 'roughly'."

This is Opus 4.5, right? Even with thinking enabled?
Isn't it strange that it can't handle CSV editing for just one row with 100 columns?

Similar Phenomenon: The Closing Brace Problem in Code

While verifying this, it suddenly occurred to me.

When you have an AI write code, sometimes the closing braces get messed up.

if (condition) {
  doSomething();
}
}  // ← Extra closing brace
function process() {
  if (condition) {
    doSomething();
  // ← Missing closing brace
}

Sometimes they are missing, sometimes they are duplicated.
Usually, a linter detects it, and I fix it manually afterward.

It's unclear if there is a causal relationship with the CSV comma problem, but both are phenomena related to "termination processing."

LLM Characteristics? Claude's Own Hypotheses

Why does this happen? I asked Claude directly. These are likely just hypotheses, but here is the text as provided.

Hypothesis 1: Impact of the Tokenizer

LLMs process input by splitting it into units called tokens. How repetitions like ,,,, are tokenized can change depending on position and length. This may lead to instability in counting.

Hypothesis 2: Compression Bias

Training data often consists of organized sentences with few redundant repetitions. As a result, the model may be optimized to "moderately trim repetitions."

Hypothesis 3: Characteristics of the Attention Mechanism

The Transformer's attention mechanism might be poor at accurately tracking the "N-th item in a sequence of identical things." Attention is easily directed toward meaningful positions (where data is placed), while meaningless repetitions (empty comma sequences) might be processed vaguely.

Hypothesis 4: Absence of Working Memory

Humans can count while pointing with their fingers, but LLMs do not have "working memory." I also feel that sense of "Wait, which one was I on?" in the middle of a long repetition.


The important thing is that this happened with GPT-4 two years ago, and it still happens now with Claude Opus 4.5 (thinking).

In other words, it is highly likely a characteristic common to LLMs in general, rather than a specific model or performance issue.

Solutions

1. Prompts that direct focus to the end

I discovered during my experiments that simply adding a note like "accurately maintain the number of commas in the latter half" made it precise.

Make a 100-cell empty CSV and put "test" in the 49th cell.
⚠️ Important: Accurately maintain the number of commas in the latter half.

Even without telling it the specific number (e.g., "you need 51 commas at the back"), just directing its focus is effective.

2. Control with scripts

In fact, when I told Claude to "verify if the positions and numbers are correct," it spontaneously wrote and executed a script for verification.

const csv = ",,,,,,Tokyo,,,,,...";
const cells = csv.split(",");
console.log(`47th column: "${cells[46]}"`);

Why did it write a script when asked to verify? This is a significant clue.

If Claude trusted its own output, it should have been able to just look back at the conversation and count them directly.

In other words...
AI itself recognizes that scripts are more accurate than direct editing and verification.

3. Utilize the Skill feature

As you know, Claude has a feature called "Skills."
This is a mechanism where you can pre-define procedures or knowledge for executing specific tasks.

Scripts can be placed here and utilized via progressive loading as needed.

Having read this far, you can see why the ability to use scripts is necessary. By preparing a parser or similar as a CSV editing skill, you can ensure accurate processing.

Conversely, I do not recommend giving Claude a task like CSV editing in a situation where even a simple script (like just splitting by ",") is unavailable. A human counting commas while doing it would be much faster and more accurate.

Summary

The real reasons why AI makes mistakes with the number of commas in CSV editing:

  1. It is not that it "can't count."
  2. It can process accurately up to the specified position.
  3. The count at the end fluctuates (±1) during the "filling the rest" stage.
  4. It is a phenomenon similar to the closing brace problem in code (causality unknown).
  5. It is highly likely a characteristic of LLMs in general.

Countermeasures:

  • Make it aware of the end via prompts → Easy and effective.
  • Have it write a script → A method the AI itself trusts.
  • Standardize with the Skill feature → Into a reusable form.

Even looking at just the past six months, AI has achieved significant performance improvements.
However, it turns out that no progress has been made regarding CSV editing in over two years.

This is just one example, but it shows that "certain areas exist where AI is weak."

However, looking at recent trends, it seems we are in a phase where each camp is providing means to overcome their respective weaknesses. We need to identify what an AI (or a specific model) is bad at and build a work environment that can compensate for it.

I want to observe its daily behavior and maintain a good relationship with it.

Discussion