iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
💻

Developing a Go Package for Reading CSV Data

に公開

goark/csvdata Package

There is a standard package called encoding/csv that processes data according to RFC 4180. However, since encoding/csv itself only provides basic functionality, I've found it increasingly tedious to write messy boilerplate code (and tests) every time.

So, I wrote a small package called goark/csvdata that adds some extra features to the encoding/csv standard package.

For example, suppose you have a CSV file like this:

sample.csv
"order", name ,"mass","distance","habitable"
1, Mercury, 0.055, 0.4,false
2, Venus, 0.815, 0.7,false
3, Earth, 1.0, 1.0,true
4, Mars, 0.107, 1.5,false

You can write the reading process as follows:

sample.go
package main

import (
    _ "embed"
    "errors"
    "fmt"
    "io"
    "os"
    "strings"

    "github.com/goark/csvdata"
)

//go:embed sample.csv
var planets string

func main() {
    rc := csvdata.New(strings.NewReader(planets), true)
    for {
        if err := rc.Next(); err != nil {
            if errors.Is(err, io.EOF) {
                break
            }
            fmt.Fprintln(os.Stderr, err)
            return
        }
        order, err := rc.ColumnInt64("order", 10)
        if err != nil {
            fmt.Fprintln(os.Stderr, err)
            return
        }
        fmt.Println("    Order =", order)
        fmt.Println("     Name =", rc.Column("name"))
        mass, err := rc.ColumnFloat64("mass")
        if err != nil {
            fmt.Fprintln(os.Stderr, err)
            return
        }
        fmt.Println("     Mass =", mass)
        habitable, err := rc.ColumnBool("habitable")
        if err != nil {
            fmt.Fprintln(os.Stderr, err)
            return
        }
        fmt.Println("Habitable =", habitable)
    }
}

Running this will produce the following output:

$ go run sample.go
    Order = 1
     Name = Mercury
     Mass = 0.055
Habitable = false
    Order = 2
     Name = Venus
     Mass = 0.815
Habitable = false
    Order = 3
     Name = Earth
     Mass = 1
Habitable = true
    Order = 4
     Name = Mars
     Mass = 0.107
Habitable = false

By the way,

rt := csvdata.New(tsvReader, true).WithComma('\t')

you can also support TSV and other formats by specifying the separator with the WithComma() method.

The embed standard package and //go:embed directive introduced in Go 1.16 are truly wonderful; they make preparing test data significantly easier. I imagine that cases where we prepare CSV or JSON files as test data and quickly feed them into tests using a package like this one will increase in the future.

For now, I'll start replacing the CSV data reading processes related to COVID-2019 with the goark/csvdata package.

[Appendix] Reading Shift-JIS Encoded CSV Data

For CSV files exported from Excel or similar tools, the character encoding may be Shift-JIS. In such cases, you can use the golang.org/x/text/encoding/japanese package to read the data while converting it to UTF-8 encoding.

In other words, you can rewrite the csvdata.New() function call in the previous sample.go code like this:

// Reading CSV data from os.Stdin
rc := csvdata.New(japanese.ShiftJIS.NewDecoder().Reader(os.Stdin), true)

This way, you can process the CSV data while reading only as much as needed.

[Update] Support for Excel Files Added

https://zenn.dev/spiegel/articles/20211003-excel-as-a-csv

Accordingly, I have changed the internal structure.

References

https://zenn.dev/koya_iwamura/articles/53a4469271022e
https://text.baldanders.info/golang/embeded-filesystem/

GitHubで編集を提案

Discussion