iTranslated by AI
Developing a Go Package for Reading CSV Data
goark/csvdata Package
There is a standard package called encoding/csv that processes data according to RFC 4180. However, since encoding/csv itself only provides basic functionality, I've found it increasingly tedious to write messy boilerplate code (and tests) every time.
So, I wrote a small package called goark/csvdata that adds some extra features to the encoding/csv standard package.
For example, suppose you have a CSV file like this:
"order", name ,"mass","distance","habitable"
1, Mercury, 0.055, 0.4,false
2, Venus, 0.815, 0.7,false
3, Earth, 1.0, 1.0,true
4, Mars, 0.107, 1.5,false
You can write the reading process as follows:
package main
import (
_ "embed"
"errors"
"fmt"
"io"
"os"
"strings"
"github.com/goark/csvdata"
)
//go:embed sample.csv
var planets string
func main() {
rc := csvdata.New(strings.NewReader(planets), true)
for {
if err := rc.Next(); err != nil {
if errors.Is(err, io.EOF) {
break
}
fmt.Fprintln(os.Stderr, err)
return
}
order, err := rc.ColumnInt64("order", 10)
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
fmt.Println(" Order =", order)
fmt.Println(" Name =", rc.Column("name"))
mass, err := rc.ColumnFloat64("mass")
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
fmt.Println(" Mass =", mass)
habitable, err := rc.ColumnBool("habitable")
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
fmt.Println("Habitable =", habitable)
}
}
Running this will produce the following output:
$ go run sample.go
Order = 1
Name = Mercury
Mass = 0.055
Habitable = false
Order = 2
Name = Venus
Mass = 0.815
Habitable = false
Order = 3
Name = Earth
Mass = 1
Habitable = true
Order = 4
Name = Mars
Mass = 0.107
Habitable = false
By the way,
rt := csvdata.New(tsvReader, true).WithComma('\t')
you can also support TSV and other formats by specifying the separator with the WithComma() method.
The embed standard package and //go:embed directive introduced in Go 1.16 are truly wonderful; they make preparing test data significantly easier. I imagine that cases where we prepare CSV or JSON files as test data and quickly feed them into tests using a package like this one will increase in the future.
For now, I'll start replacing the CSV data reading processes related to COVID-2019 with the goark/csvdata package.
[Appendix] Reading Shift-JIS Encoded CSV Data
For CSV files exported from Excel or similar tools, the character encoding may be Shift-JIS. In such cases, you can use the golang.org/x/text/encoding/japanese package to read the data while converting it to UTF-8 encoding.
In other words, you can rewrite the csvdata.New() function call in the previous sample.go code like this:
// Reading CSV data from os.Stdin
rc := csvdata.New(japanese.ShiftJIS.NewDecoder().Reader(os.Stdin), true)
This way, you can process the CSV data while reading only as much as needed.
[Update] Support for Excel Files Added
Accordingly, I have changed the internal structure.
References
Discussion