😀

csvtojson

2020/05/11に公開

csvtojson

csvtojsonの使い方。

インストール

npm i -g csvtojson

使い方

ヘルプ

csvtojsonのヘルプを以下に示す。

help
Option --help not supported.
csvtojson: Convert csv to JSON format
version: 2.0.10
Usage: csvtojson [<command>] [<options>] filepath

Commands: 
	parse: (Default)Parse a csv file to json
	version: Show version of current csvtojson
Options: 
	--output: The format to be converted to. "json" (default) -- convert csv to json. "csv" -- convert csv to csv row array. "line" -- convert csv to csv line string
	--delimiter: delimiter to separate columns. Possible to give an array or just use 'auto'. default comma (,). e.g. --delimiter=# --delimiter='[",",";"]' --delimiter=auto
	--quote: quote surrounding a column content containing delimiters. To turn off quote, please use 'off' --quote=off. default double quote ("). e.g. chage to hash: --quote=# 
	--trim: Indicate if parser trim off spaces surrounding column content. e.g. " content " will be trimmed to "content". Default: true
	--checkType: This parameter turns on and off whether check field type. default is false.
	--ignoreEmpty: This parameter turns on and off whether ignore empty column values while parsing. default is false
	--noheader: Indicating csv data has no header row and first row is data row. Default is false
	--headers: An array to specify the headers of CSV data. If --noheader is false, this value will override CSV header. Default: null. Example: --headers='["my field","name"]'
	--flatKeys: Don't interpret dots (.) and square brackets in header fields as nested object or array identifiers at all (treat them like regular characters for JSON field identifiers). Default: false.
	--maxRowLength: the max character a csv row could have. 0 means infinite. If max number exceeded, parser will emit "error" of "row_exceed". if a possibly corrupted csv data provided, give it a number like 65535 so the parser wont consume memory. default: 10240
	--checkColumn: whether check column number of a row is the same as headers. If column number mismatched headers number, an error of "mismatched_column" will be emitted.. default: false
	--eol: Explicitly specify the end of line character to use.
	--quiet: If any error happens, quit the process quietly rather than log out the error. Default is false.
	--escape: escape character used in quoted column. Default is double quote (") according to RFC4108. Change to back slash (\) or other chars for your own case.
	--ignoreColumns: RegExp matched columns to ignore from input. e.g. --ignoreColumns=/(name|age)/ 
	--includeColumns: RegExp matched columns to include from input. e.g. --includeColumns=/(name|age)/ 
	--colParser: Specific parser for columns. e.g. --colParser='{"col1":"number","col2":"string"}'
	--alwaysSplitAtEOL: Always interpret each line (as defined by eol) as a row. This will prevent eol characters from being used within a row (even inside a quoted field). This ensures that misplaced quotes only break on row, and not all ensuing rows.
	--nullObject: How to parse if a csv cell contains 'null'. Default false will keep 'null' as string. Change to true if a null object is needed.
	--downstreamFormat: Option to set what JSON array format is needed by downstream. 'line' is also called ndjson format. This format will write lines of JSON (without square brackets and commas) to downstream. 'array' will write complete JSON array string to downstream (suitable for file writable stream etc). Default 'line'
Examples: 
	csvtojson < csvfile
	csvtojson <path to csvfile>
	cat <csvfile> | csvtojson
	csvtojson <csvfilepath> --checkType=false --trim=false --delimiter=#

サンプルデータ

このcsvファイルをjsonに変換することを考える。

サンプルデータ
NumberType,MixedType,StringType
1,1.0E+10,全角
2,2.0E-10,hankaku
3,-3.0E+10,with space
4,-4.0E-10,1

デフォルト

デフォルトの設定でcsvからjsonに変換すると、すべてのカラムが文字列扱いになる。

$ csvtojson sample.csv | jq
[
  {
    "NumberType": "1",
    "MixedType": "1.0E+10",
    "StringType": "全角"
  },
  {
    "NumberType": "2",
    "MixedType": "2.0E-10",
    "StringType": "hankaku"
  },
  {
    "NumberType": "3",
    "MixedType": "-3.0E+10",
    "StringType": "with space"
  },
  {
    "NumberType": "4",
    "MixedType": "-4.0E-10",
    "StringType": "whithInSpace"
  }
]

型付け

JSONSchemaに基づく型チェックができることを確認する。

静的型付け

--colParserオプションを使ってMixedTypeカラムをnumber型に変換する。

$ csvtojson sample.csv --colParser='{"MixedType": "number"}' | jq
[
  {
    "NumberType": "1",
    "MixedType": 10000000000,
    "StringType": "全角"
  },
  {
    "NumberType": "2",
    "MixedType": 2e-10,
    "StringType": "hankaku"
  },
  {
    "NumberType": "3",
    "MixedType": -30000000000,
    "StringType": "with space"
  },
  {
    "NumberType": "4",
    "MixedType": -4e-10,
    "StringType": "whithInSpace"
  }
]

動的型付け

--checkTypeオプションを使って、自動的にデータ型を識別する。
NumberTypeカラムをInteger型に、MixedTYpeNumber型に変換できた。

$ csvtojson sample.csv --checkType=true | jq
[
  {
    "NumberType": 1,
    "MixedType": 10000000000,
    "StringType": "全角"
  },
  {
    "NumberType": 2,
    "MixedType": 2e-10,
    "StringType": "hankaku"
  },
  {
    "NumberType": 3,
    "MixedType": -30000000000,
    "StringType": "with space"
  },
  {
    "NumberType": 4,
    "MixedType": -4e-10,
    "StringType": "whithInSpace"
  }
]

トリミング

デフォルトだと、_whithInSpace_のような前後に半角スペースを持つ文字列は自動的にトリミングされる。
--trimオプションでこれをオフにできる。

$ csvtojson sample.csv --checkType=true --trim=false | jq
[
  {
    "NumberType": 1,
    "MixedType": 10000000000,
    "StringType": "全角"
  },
  {
    "NumberType": 2,
    "MixedType": 2e-10,
    "StringType": "hankaku"
  },
  {
    "NumberType": 3,
    "MixedType": -30000000000,
    "StringType": "with space"
  },
  {
    "NumberType": 4,
    "MixedType": -4e-10,
    "StringType": " whithInSpace "
  }
]

ヘッダ有無

ヘッダのないcsvファイルの場合は、--noheaderオプションを使ってヘッダを飛ばす。

$ csvtojson sample.csv --checkType=true --trim=false --noheader=true | jq
[
  {
    "field1": "NumberType",
    "field2": "MixedType",
    "field3": "StringType"
  },
  {
    "field1": 1,
    "field2": 10000000000,
    "field3": "全角"
  },
  {
    "field1": 2,
    "field2": 2e-10,
    "field3": "hankaku"
  },
  {
    "field1": 3,
    "field2": -30000000000,
    "field3": "with space"
  },
  {
    "field1": 4,
    "field2": -4e-10,
    "field3": " whithInSpace "
  }
]

列フィルタ

--includeColumnsオプションを使って、出力する列をフィルタリングする。

$ csvtojson sample.csv --checkType=true --trim=false --includeColumns="/(NumberType|MixedType)/" | jq
[
  {
    "NumberType": 1,
    "MixedType": 10000000000
  },
  {
    "NumberType": 2,
    "MixedType": 2e-10
  },
  {
    "NumberType": 3,
    "MixedType": -30000000000
  },
  {
    "NumberType": 4,
    "MixedType": -4e-10
  }
]

Discussion