iTranslated by AI
Building a Terminal CSV Editor in Go for Production Environments
❗ This article is for Series 2, Day 12 of the Go Advent Calendar 2025.
Even now, with JSON and YAML being commonplace, CSV continues to be used for integration between various systems due to its good balance of readability and density. However, because it's a format that spread early on, files that can hardly be called "standard-compliant"—varying in character encodings, delimiters, and line breaks—are commonly circulating in the field.
In the era before RFC and other standards were fully established, we used to align on representations with the counterparts of the target system beforehand, like "this character encoding, this line break, and this delimiter," and perform integration tests face-to-face. Even so, no matter how many rules were decided, exceptions always occur during operation.
Long ago, for system data integration within a client's intranet, I used to do the gritty work of creating CSV data and then transferring it via FTP.
Until the system stabilized, occasionally inappropriate data would get into the CSV, causing errors. In those times, the system engineers stationed at the client's site—hostages, if you will—usually corrected the data with a text editor. That's because...
- In production environments, there was no GUI, and since you could only log in via a text terminal, graphical CSV editors were hard to use (Excel? You must be joking).
- General CSV editors normalize and save the entire dataset, so with "ill-behaved CSVs" that are non-standard in terms of BOM, double quotes, or line breaks, diffs would occur even in cells that weren't changed (which could invite unnecessary trouble).
For the past few years, I've been making a simple terminal CSV viewer in Go and added an editing feature as a bonus. I thought that if I made it "modify only the bare minimum" as an added value, it might be usable even in such severe situations. That's why I worked hard on it.
Features
Quoted from:
-
Minimal diffs on save
For cells that have not been edited, it preserves the original text's representation (line breaks, double quotes, BOM, encoding, delimiters, etc.) as much as possible.
Therefore, only the changes you actually made appear as diffs. It is ideal when you want to safely edit actual data. -
vi-like cursor movement, Emacs-like cell editing
Move withh/j/k/l, etc., and edit withCtrl-based keys. -
Supports both files and standard input
You can open CSV files directly or read data via pipes. -
Fast startup and background loading
Files open quickly while the loading process continues in the background. -
Visual display of changes
Edited cells are displayed with an underline, and you can revert to the state before modification with theukey. -
Displays syntax information of original data
Details such as the presence of quotes, delimiters, and character encoding are displayed on the bottom line of the screen. -
Support for various character encodings
- UTF-8 (default)
- UTF-16
- Current Windows code page (auto-detected)
- Any encoding registered in the IANA registry (specified with
-iana NAME)
-
Color settings
- The default color scheme is for dark backgrounds.
- Switch to a scheme for light backgrounds with the
-rvoption. - Suppresses color display if the environment variable
NO_COLORis defined ( https://no-color.org/ )

Technical Topics
Minimal diffs on save
To minimize diffs on save, it is sufficient to preserve the byte sequence representation as it was when it was read for any unedited CSV cell data.
Therefore, I designed it to hold both the "raw data as read" and the "normalized data for display."
// Cell data
type Cell struct {
// Original data (byte slice: including delimiters, double quotes, etc.)
original []byte
// Normalized data for display (UTF8)
text string
// Normalized data for display converted back to the original encoding with delimiters added,
// making it an original-compatible byte slice representation.
// When `! bytes.Equal(original, source)`, we know a change has been made.
source []byte
}
// Row data
type Row struct {
// Cell data
Cell []Cell
// Line break code. Takes one of "", "\n", "\r\n"
// ( "" is only possible for the last row )
Term string
}
// Entire file information
type Mode struct {
NonUTF8 bool // true if non-UTF8
Comma byte // Delimiter
DefaultTerm string // Line break code for creating new rows
hasBom tristate // Does the first row have a BOM?
endian endian
decoder *encoding.Decoder
encoder *encoding.Encoder
}
// Type for distinguishing BOM presence
type tristate int
const (
triNotSet tristate = iota // BOM not confirmed
triFalse // Does not have BOM
triTrue // Has BOM
)
Fast startup and background loading
This is simply achieved by having another goroutine handle the loading process while waiting for key input. This is where the Go language excels most.
Previously, when I was creating a binary editor, I made this utility package. I'm repurposing it here.
package nonblock
type _Response struct {
data string
err error
}
type NonBlock struct {
chReq chan struct{}
chRes chan _Response
}
func New(getter func() (string, error)) *NonBlock {
chReq := make(chan struct{})
chRes := make(chan _Response)
// Goroutine running in the background. It calls the key input function only when signaled (chReq),
// but since it might be blocked for a long time, it is designed to return a response (chRes) when finished.
go func() {
for _ = range chReq {
data, err := getter()
chRes <- _Response{data: data, err: err}
}
close(chRes)
}()
return &NonBlock{
chReq: chReq,
chRes: chRes,
}
}
func (w *NonBlock) GetOr(work func() bool) (string, error) {
w.chReq <- struct{}{}
for {
select {
case res := <-w.chRes:
return res.data, res.err
default:
if cont := work(); !cont {
res := <-w.chRes
return res.data, res.err
}
}
}
}
func (w *NonBlock) Close() {
close(w.chReq)
}
Regarding usage, first:
keyWorker := nonblock.New( getkeyFunc )
you register the key input function to an instance in this manner. Then, when it's time to actually input a key:
key,err := keyWorker.GetOr( func() bool {
// Processing to be done in the background
// :
})
calling it this way provides a mechanism where the loading process is performed only while waiting for key input.
This is a fairly standard structure, but I think the design turned out quite well in the way it was sub-packaged to absorb minor complexities.
Limitations of nonblock
However, even with this sub-package, I recently discovered a case where the program stops. This tool, csvi, also supports data input from standard input:
pwsh -Command "Get-Content utf_ken_all.csv ; Start-Sleep 15" | csvi
If the output side of the pipeline goes to sleep without closing the pipeline like this, the data input for csvi enters a wait state, making it inoperable.
Therefore, in the latest version (unreleased), I am trying to separate it into three types:
- A goroutine for key input
- A goroutine for data input
- A goroutine for the main operation
The goal is to ensure that even if data input enters a wait state, key input and the main operation do not stop.
As an aside, while code that fully utilizes channels and goroutines like this is the true essence of Go, it is honestly technically difficult. Just in case there were any dangerous parts, I had Chappy (ChatGPT) check the source. Then...
"In this source, if both key input and data input are blocked, the program may hang." (I think I was told something like that)
Well, if the user doesn't provide key input and no data comes in, isn't that just inevitable?
Support for various character encodings
First, the challenge is determining the character encoding:
- If
\xEF\xBB\xBF(the BOM code for UTF-8) is included, it's confirmed as UTF-8. Then, set the BOM inclusion flag. - If
\xFF\xFE(the BOM code for UTF-16LE) is included, it's confirmed as UTF-16LE. Then, set the BOM inclusion flag. - If
\xFE\xFF(the BOM code for UTF-16BE) is included, it's confirmed as UTF-16BE. Then, set the BOM inclusion flag. - If
\x00is at an even position, it's confirmed as UTF-16LE (Since characters that also exist in ASCII have 0 in the upper byte, this is used for detection. For example, a comma is\x2C\x00). - If
\x00is at an odd position, it's confirmed as UTF-16BE (Similarly, a comma is\x00\x2C).
Until these definitive judgments are made, it is read as provisional UTF-8. Then, at the point where utf8.Valid(s) fails for the read byte sequence s, it is considered non-UTF-8 from then on. If it is non-UTF-8, the specific character encoding is determined as follows:
- Use the character encoding specified in advance by command-line options.
- For Windows, the character encoding of the current code page (so-called ANSI).
- For UNIX-like systems, assume appropriately from values such as the
LANGenvironment variable (which is a bit unreliable).
Regarding the validity of this:
- In the case of determining between UTF-8 and Shift_JIS, there are few overlapping codes, so the accuracy is quite high.
- In the case of Chinese or Korean, it's not quite as good (according to ChatGPT).
Well, even so, in the case of csvi, there is no need to worry even if the determination fails.
This is because it remembers the byte sequence before decoding, so it can be reinterpreted later. By pressing the L key, the user can explicitly redisplay with a specified character encoding. *
- Excluding UTF-16. Since delimiters are also 16-bit in UTF-16, the cell boundaries change, and it simply cannot be handled. Even so, the reason for supporting UTF-16 is that when I changed jobs to become a Windows programmer in the 2010s, I accidentally made data files in Unicode, i.e., UTF-16 TSV. Checking the data files provided by customers was extremely tedious! I wanted to save the soul of my past self (exaggeration).
Summary
Until now, I have mostly been recognized for Windows-based terminal tools, such as the command-line shell nyagos and the terminal automation tool Expect-Lua for Windows. However, csvi became my effective Linux OSS debut as it was introduced on overseas Linux news sites for the first time (as mentioned below).
Riding on that momentum, I also provided binaries for macOS and FreeBSD. Although the numbers are small, there are people using them, and I find it very rewarding.
(With Go, as long as you support both Windows and UNIX systems, cross-building for most platforms becomes possible, so it's easy as long as you can manage the verification environments.)
I felt it was reasonably complete quite early on, so I rushed it to v1.0. However, as it began to be used more widely, I started receiving various issues. It seems that when a tool is used heavily by a diverse range of people, the areas where it failed to meet needs start to surface.
When I use it only by myself, I tend to weigh the "patience when using it" against the "tediousness of making it" and settle for "this is good enough." But if I recklessly proceed with implementation based only on what "seems convenient" without considering the "tediousness of making it," the specifications will become unnecessarily large and complex with unused features, and maintainability will deteriorate.
These days, I truly feel that user feedback is necessary as a trigger to proceed with well-balanced updates.
References
- csvi - A simple cross-platform terminal CSV editor. - Terminal Trove
- csvi - terminal CSV editor - LinuxLinks
- Youtube - Command-line database tool SQL-Bless and CSV text file ...
- CLI-Werkzeuge im Kurztest
- 命令行資料庫工具 SQL-Bless 與 CSV文字檔編輯 CSVI – 簡睿隨筆
- Notes on Address Data [text.Baldanders.info] > [Extra 2] CSV File Editor CSVI
Discussion