🌲

衛星画像から教師データを作ろう（樹種判定編）

2021/03/08に公開

Python

はじめに

衛星画像のピクセルがどのクラスに属するか判別するための教師データを作ることを目標とします（こういうのをピクセルベース分類といいます）。
手法として最尤法^[1]が使われることが多いので、クラスごとに特徴ベクトル（次元:バンド数）を求めます。

グランドトゥルースとして利用できるポリゴンを用意する
各ポリゴンの領域を撮像した衛星画像を取得する
分類するクラスごとに衛星画像からピクセルの情報を集計する

樹種判定に興味がなくても、衛星画像からピクセル値を集計する部分は別の課題においても役立つのではないかと思います。

樹種判定

衛星画像を使って何かをしてみようと考えたとき、だいたい初めに思いつくのが樹種判定ではないかと思います^[2]。

樹種判定とは画像に含まれる植物が、広葉樹か針葉樹か、杉かヒノキかなどその種類を判別することですが、やり方を調べてみると、QGISやArcGIS（有償^[3]）といったGUIを利用する資料がほとんどで、個人的にはワクワクしません。

そこで衛星画像の樹種判定の教師データを作る部分をPythonとフリーのライブラリだけでやってみることにします。

ソースを選ぶ

教師データを作るにはグランドトゥルース、実際に確認された正しい地上の情報が必要です。
プロの方たちは現場に行ってデータを取ったり知識や経験、資料^[4]を元にして、「ここは杉、こっちはヒノキ」なんてポチポチ画像に印をつけていきます。
訓練データが正確であるほど、数が多いほど学習の精度があがるわけですが、「とりあえず試してみよう」という段階ではそこまで時間や手間をかけたくありません。

そこで国土数値情報ダウンロードサービスに国有林野のポリゴンデータがあるので、これを利用していくことにします。

次に衛星画像のほうは、AWS上で公開されているLandsat-8を今回は採用します。
採用理由は無料で大量のシーンを利用できることはもちろんですが、後述するSTACで条件の絞り込みがやりやすいこと、COG化されておりファイルのほしい部分だけ取得できることも大きいです。

国有林野データ

国有林野データは執筆時点では平成30年のデータが公開されています。

都道府県ごとにファイルがわけられており、各ポリゴンにはその範囲内で生育面積上位3種の樹種コード(134種別)が割り振られています。

以降本記事では樹種コード一覧のページから抜き出して作成した配列をA45_trees.jsonとして利用します。

47都道府県分のGeoJSONは事前にダウンロードしてsrc配下におきました。
そして県ごとのままでは使いづらいでの、面積歩合が一番大きい樹種毎にポリゴンをまとめ直します。

import json

with open("data/A45_trees.json", encoding="utf-8") as f:
    trees = json.load(f) # 樹種コード

prefectures = range(1, 48)  # 実行したい都道府県のIDの配列
for tree in trees:
    features = []
    for id in prefectures:
        prefecture_id = str(id).zfill(2)
        try:
	    src_path = "src/A45-19_{id}.geojson".format(id=prefecture_id)

	    with open(src_path) as f:
                src = json.load(f)
	        for feature in src["features"]:
		    if feature["properties"]["A45_015"] == tree:
		        features.append(feature)
        except Exception as e:
	    print(prefecture_id, e)
    dst_path = "data/A45-19_{id}.geojson".format(id=tree)
export_geojson = {
    "type": "FeatureCollection",
    "features": features
}
with open(dst_path, "w", encoding="utf-8") as f:
    json.dump(export_geojson, f, indent=4, ensure_ascii=False)
print("done: " + tree)

このポリゴンをすべて学習に使えるといいのですが、範囲が狭すぎたり、ポリゴン内に様々な樹種が混在していて教師データとしての質がよくなかったりなど考えられるため、クレンジングしてやる必要があります。

ここの条件は工夫のしどころで各自考えてほしいですが、
今回は

単層林or竹林のポリゴンを選ぶ^[5]
Landsat-8の1ピクセルあたりの分解能が30m * 30mであることから、ポリゴンの面積が0.1ヘクタール(> 30 * 30 / 10000 = 0.09ヘクタール)以上のポリゴンを選ぶ

ことにします（樹種の条件）。

上記条件を満たすポリゴンがどれくらいあるのかまとめます。

樹種コード	全ポリゴン数	単層林or竹林	0.1ha以上	両条件を満たすポリゴン数	両条件を満たすポリゴン面積
ハンノキ	611	345	601	336	687.14
ウルシ	40	36	40	36	108.7
天アカマツ	28	2	27	1	0.34
トドマツ	126407	106961	126018	106629	630597.74
ポプラ	23	20	23	20	65.56
ウラジロガシ	2	2	2	2	0.95
イチョウ	10	10	10	10	11.41
シナノキ	3745	7	3739	7	26.08
シベリアカラマツ	3	2	3	2	5.22
アベマキ	3	1	3	1	0.39
天スギ	1130	0	1126	0	0
ブナ	69271	107	69126	107	161.1
ハチク	6	6	6	6	2.37
ギンドロ	1	1	1	1	0.1
コナラ	5340	825	5277	820	2574.6
カエデ	92	68	89	65	67.08
ミズナラ	16735	602	16636	574	1408.5
ツバキ	1	1	1	1	1.02
ヤシャブシ	16	7	15	6	3.56
エノキ	0	0	0	0	0
ナツツバキ	0	0	0	0	0
サワラ	1292	216	1287	212	534.13
カヤ	12	8	12	8	10.26
モウソウチク	86	86	83	83	83.9
バンクシャマツ	140	113	140	113	627.64
ヤチダモ	3119	2824	3030	2737	6336.09
ケヤマハンノキ	79	19	76	18	22.94
エゴノキ	0	0	0	0	0
ダケカンバ	15765	398	15730	396	1315.85
オニグルミ	161	129	157	125	230.76
天エゾマツ	8190	1	8187	1	4.25
スギ	200729	198141	197953	195390	818885.83
フランスカイガンショウ	0	0	0	0	0
米マツ	0	0	0	0	0
シラベ	1125	111	1113	111	410.32
シイ	436	9	434	9	6.67
コメツガ	4287	0	4278	0	0
カラマツ	64207	61387	63973	61156	413522.94
アカマツ	39663	25676	39407	25510	133448.49
ヤマモモ	194	181	191	179	427.29
オヒョウ	0	0	0	0	0
欧州シラカバ	3	3	3	3	1.55
キリ	58	53	56	51	39.67
欧州アカマツ	199	161	194	156	1031.61
イロハモミジ	2	2	2	2	1.71
シデ	17	0	17	0	0
シオジ	14	1	12	1	0.17
ヤマサクラ	27	25	27	25	43.07
その他外来広葉樹	72	11	67	6	3.22
琉球マツ	290	184	289	183	2307.85
チョウセンゴヨウ	1	0	1	0	0
サワグルミ	46	8	46	8	6
タブ	71	24	71	24	50.19
ヤマグルマ	1	0	1	0	0
センダン	4	2	2	0	0
ツクバネカシ	0	0	0	0	0
イジュ	2	1	2	1	0.84
アキグミ	2	2	2	2	2.85
クス	179	142	176	139	320.83
他Ｌ	254622	11524	253005	11415	65491.85
イチイガシ	175	153	172	150	356.82
イヌエンジュ	49	45	47	43	49.65
ストローブマツ	1035	872	1027	864	6005.31
天アカエゾ	1113	0	1108	0	0
ヒバ	8436	518	8379	512	1871.05
アカエゾマツ	28232	23985	28113	23871	129392.06
ニセアカシヤ	171	88	163	81	261.98
エゾマツ	6534	2263	6517	2248	16262.56
トウヒ	459	132	455	131	407.27
天ヒノキ	2522	0	2516	0	0
ネズコ	195	0	194	0	0
アラカシ	62	34	62	34	83.51
グイマツ	703	588	696	582	2684.85
天トドマツ	28667	2	28647	2	8.16
天カラマツ	94	0	94	0	0
イチイ	23	14	23	14	20.17
アサダ	1	0	1	0	0
ニレ	254	7	252	6	4.19
ウラジロモミ	834	746	832	745	3017.14
ケヤキ	1233	960	1215	948	1642.73
ダフリカカラマツ	2	2	2	2	3.38
ウダイカンバ	1134	359	1127	354	1211.49
ヒメバラモミ	0	0	0	0	0
アカガシ	77	6	77	6	14.92
イタヤカエデ	2837	28	2834	26	34.97
ホオノキ	21	4	20	3	3.49
アオトド	1033	41	1033	41	120.77
ツゲ	1	0	1	0	0
ソレンアカマツ	0	0	0	0	0
キハダ	105	72	104	71	174.51
ヤナギ	49	25	48	24	23.67
スダジイ	5	1	5	1	1.03
ツガ	1216	3	1216	3	4.32
外国針葉樹	135	135	114	114	162.18
コバノヤマハンノキ	97	80	97	80	137.53
ミズキ	20	13	20	13	56.77
ヤブツバキ	0	0	0	0	0
アイグロマツ	3	3	3	3	11.72
マダケ	49	49	46	46	46.52
コーカサスモミ	2	1	2	1	0.35
他Ｎ	936	388	907	364	1315.76
イヌブナ	66	0	66	0	0
ハリモミ	1	0	1	0	0
サクラ	236	206	223	195	332.23
シラカバ	616	256	604	248	423.14
ヒルギ	23	0	23	0	0
クリ	402	224	393	218	576.3
センノキ	29	3	29	3	11.55
トガサワラ	0	0	0	0	0
コブシ	4	3	4	3	0.49
ムクノキ	2	2	2	2	0.69
クヌギ	2170	1457	2160	1450	4875.4
レジノーサマツ	20	20	20	20	88.79
クロマツ	5060	4283	4772	4019	18488.3
カンバ	16933	433	16908	426	2016.05
アオダモ	27	16	27	16	13.34
カシワ	596	89	476	84	98.23
シウリザクラ	4	0	4	0	0
ドイツトウヒ	638	581	623	566	2664.79
イヌマキ	32	32	31	31	58.68
カツラ	122	30	120	28	69.68
ヒメコマツ	423	84	418	79	127.7
ウバメガシ	38	2	38	2	4.05
ミズメ	79	12	78	11	23.78
トチノキ	62	34	54	27	34.79
コウヤマキ	40	2	40	2	2.86
モミ	1347	516	1335	510	1540.02
イス	147	1	147	1	2.05
トネリコ	1	1	1	1	0.34
シラカシ	9	9	9	9	16.78
スラッシュマツ	16	16	16	16	72.43
ドロノキ	278	234	271	227	453.4
ヒノキ	100808	98863	100030	98090	454748.54
リキダマツ	18	17	18	17	27.09

この条件で絞り込むと該当するポリゴンが一つもない樹種が27ありました。
そもそも面積歩合が一番大きいポリゴンが一つもない樹種コードが11もあり、全樹種分の教師データを作ることは諦めざるおえなさそうです。

Landsat-8

AWS上のLandsat-8のデータのありかはSTACという形式で公開されています。

Landsat-8は地球上を毎日撮像しているためデータ（シーン）の数が多いので、path/rowの範囲でSTAC Catalogが分けられています。
https://landsat-stac.s3.amazonaws.com/landsat-8-l1/{path}/{row}/catalog.json
STAC Catalog サンプル: https://landsat-stac.s3.amazonaws.com/landsat-8-l1/208/124/catalog.json

{
  "id": "124", 
  "stac_version": "0.6.0", 
  "description": "eo:row catalog", 
  "links": [
    {"rel": "self", "href": "https://landsat-stac.s3.amazonaws.com/landsat-8-l1/208/124/catalog.json"}, 
    {"rel": "root", "href": "../../../catalog.json"}, 
    {"rel": "parent", "href": "../catalog.json"}, 
    {"rel": "item", "href": "2017-01-02/LC82081242017002LGN00.json"}, 
    ...
  ]
}

それぞれの画像のメタ情報（撮影された時期や雲量など）、ファイルのURLが記述されたSTAC ItemのJSONには、catalog.jsonのlinksのうちrelがitemとなっているオブジェクトのhrefから辿っていきます。

STAC Item サンプル: https://landsat-stac.s3.amazonaws.com/landsat-8-l1/208/124/2017-01-02/LC82081242017002LGN00.json

ようは

欲しい地点の緯度経度からpath/rowを求める
path/rowよりパスを組み立てSTAC CatalogのJSONを取得し、linksから各シーンのSTAC ItemのJSONを取得する
STAC Itemのメタ情報が条件を満たすシーンはassetsに記載されてパスから画像を取得する

ことで、欲しい緯度経度範囲を含んだ画像を手に入れることができます。

緯度経度からpath/rowを求める

path/rowは地球上に割り当てられたフレームで、どのような位置形状かはNASAがshapefileで配布されています。

この中から「Descending (daytime)」をダウンロードして解凍します^[6]。

import geopandas as gpd
from shapely.geometry import Polygon

df_shp = gpd.read_file("src/WRS2_descending_0/WRS2_descending.shp")

# テストポリゴン
target = Polygon([[135, 35], [134, 35], [134, 34], [135, 34], [135, 35]])

filtered = df_shp[df_shp.geometry.intersects(target)]
print("all: {}".format(len(filtered.index)))
for i,row in filtered.iterrows():
    print("{}, {}/{}".format(i, row.ROW, row.PATH))
    
# all: 3
# 15287, 36/110
# 15288, 37/110
# 27935, 36/111

28892個あるrow/pathのポリゴンの中から、指定したポリゴンを含むpath/rowを探すサンプルコードです。事前にgeopandasとshapelyをinstallしてください。

STAC Itemの取得

path/rowからcatalog.jsonを取得し、itemのassetsを表示してみます。

import requests
base_path = "https://landsat-stac.s3.amazonaws.com/landsat-8-l1"

def get_catalog(path, row):
    response = requests.get("{}/{}/{}/catalog.json".format(base_path, path, row))
    return response.json()

def get_item(path, row, item_href):
    response = requests.get("{}/{}/{}/{}".format(base_path, path, row, item_href))
    return response.json()

catalog = get_catalog(208, 124)
item_links = list(filter(lambda l: l["rel"] == "item", catalog["links"]))
print("links:")
print(item_links)

item = get_item(208, 124, item_links[0]["href"])
print("assets:")
print(item["assets"])


# links:
# [{'rel': 'item', 'href': '2017-01-02/LC82081242017002LGN00.json'}, {'rel': 'item', 'href': '2017-01-18/LC82081242017018LGN00.json'}, {'rel': 'item', 'href': '2017-02-03/LC82081242017034LGN00.json'}, {'rel': 'item', 'href': '2017-02-19/LC82081242017050LGN00.json'}, {'rel': 'item', 'href': '2018-01-05/LC82081242018005LGN00.json'}, ...]

# assets of LC82081242017002LGN00.json:
# {'index': {'type': 'text/html', 'title': 'HTML index page', 'href': 'https://s3-us-west-2.amazonaws.com/landsat-pds/L8/208/124/LC82081242017002LGN00/index.html'}, 'thumbnail': {'title': 'Thumbnail image', 'type': 'image/jpeg', 'href': 'https://s3-us-west-2.amazonaws.com/landsat-pds/L8/208/124/LC82081242017002LGN00/LC82081242017002LGN00_thumb_large.jpg'}, 'B1': {'type': 'image/x.geotiff', 'eo:bands': [0], 'title': 'Band 1 (coastal)', 'href': 'https://s3-us-west-2.amazonaws.com/landsat-pds/L8/208/124/LC82081242017002LGN00/LC82081242017002LGN00_B1.TIF'}, 'B2': {'type': 'image/x.geotiff', 'eo:bands': [1], 'title': 'Band 2 (blue)', 'href': 'https://s3-us-west-2.amazonaws.com/landsat-pds/L8/208/124/LC82081242017002LGN00/LC82081242017002LGN00_B2.TIF'}, ... }}

事前にrequestsをinstallしてください。

今回はこの後さらに、

5月から9月に撮像されたもの^[7]
雲量eo:cloud_coverが3%以下^[8]

のものに絞り込みます(シーン条件)。
ただしは雲量かなり厳しい条件にしているので、該当するほとんどシーンはほとんど存在しないかもしれません。

画像を取得する

画像のURLを取得できるところまで来たので、次は実際に画像を取得します。
画像全体を取ろうとするとサイズが大きく時間がかかってしまうので、Polygonで指定した部分だけ取得しましょう。

import rasterio
from rasterio.mask import mask
from shapely.geometry import Polygon

def get_masked_image(image_href, poly):
    data = rasterio.open(image_href)
    masked, mask_transform = mask(dataset=data, shapes=poly.geometry.to_crs(crs=data.crs.data), crop=True, all_touched=True)
    return masked[0,:,:], mask_transform

B3 = 'http://landsat-pds.s3.amazonaws.com/c1/L8/202/025/LC08_L1TP_202025_20190420_20190507_01_T1/LC08_L1TP_202025_20190420_20190507_01_T1_B3.TIF'

mask_polygon = gpd.GeoDataFrame(crs={'init': 'epsg:4326'}, geometry=[Polygon([
  [-2.4, 51.0],
  [-2.4, 50.5],
  [-0.5, 50.5],
  [-2.4, 51.0]
])]) # path/row = 202/025 の範囲に含まれるポリゴン

masked, _ = get_masked_image(B3, mask_polygon)

事前にrasterioをinstallしてください。to_crsがwarningを出すかもですが無視します。

取得した画像の例ですが、三角形のポリゴンで指定した範囲で切り取られています^[9]。

またピクセルの値（DN値）をそのまま使うのでなく、TOA(上端大気反射率)に変換するならmlt.txtから変換係数を拾う必要があります。

def get_mtl(mtl_href):
    response = requests.get(mtl_href)

    return {
        "reflectance_mult": [float(val) for val in re.findall('REFLECTANCE_MULT_BAND_\d+\s+=\s+(.*)\n', response.text)],
        "reflectance_add": [float(val) for val in re.findall('REFLECTANCE_ADD_BAND_\d+\s+=\s+(.*)\n', response.text)],
        "sun_elevation": float(re.findall('SUN_ELEVATION\s+=\s+(.*)\n', response.text)[0])
    }

# {'reflectance_mult': [2e-05, 2e-05, 2e-05, 2e-05, 2e-05, 2e-05, 2e-05, 2e-05, 2e-05], 'reflectance_add': [-0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1], 'sun_elevation': 17.22616125}

STAC ItemのJSONを引数として渡すと、そのシーンの各バンド（バンド8は他のバンドと解像度や性質が異なるため今回は除く）のピクセル値をTOAに変換した値の配列を返すメソッドです。

def sum_up(item, target_polygon, bands=[1, 2, 3, 4, 5, 6, 7, 9], max_len=None, nodata=0):
    mtl = get_mtl(item["assets"]["MTL"]["href"])
    ret = []

    for band in bands:
        href = item["assets"]["B{}".format(band)]["href"]
        masked, _ = get_masked_image(href, target_polygon)
        mult = mtl["reflectance_mult"][band - 1]
        add = mtl["reflectance_add"][band - 1]
        denominator = math.sin(math.radians(mtl["sun_elevation"]))
        values = masked.flatten()
        values = values[values != 0]
        if not max_len is None:
            values = values[:max_len]
        ret.append(list((values * mult + add) / denominator))
    return ret

item =get_item("202", "025", "2019-04-20/LC82020252019110.json")
mask_polygon = gpd.GeoDataFrame(crs={'init': 'epsg:4326'}, geometry=[Polygon([
  [-2.4, 51.0],
  [-2.4, 50.5],
  [-0.5, 50.5],
  [-2.4, 51.0]
])])
print(sum_up(item, mask_polygon, 100))

# (9, 100) の配列
# [[0.12934585334705642, 0.1304886254230629, 0.1312061799824158, 0.12876117926165778, ...], ...]

本当はTOAを計算する前にDEMを使って地形の補正をすべきなのですが省略しています。精度にこだわりたいときは試してみてください。

ここまでの処理をまとめる

landsat8.py: Landsat8のSTACや画像をいい感じに取得するメソッドたち

import re
import requests
import rasterio
import math
import geopandas as gpd
from shapely.geometry import Polygon
from rasterio.mask import mask

base_path = "https://landsat-stac.s3.amazonaws.com/landsat-8-l1"


def get_pathrow(target_polygon):
    df_shp = gpd.read_file("src/WRS2_descending_0/WRS2_descending.shp")
    filtered = df_shp[df_shp.geometry.intersects(target_polygon)]
    ret = []
    for i, row in filtered.iterrows():
        ret.append({
            'row': row.ROW,
            'path': row.PATH,
            'intersect': row.geometry.intersection(target_polygon)
        })
    return ret


def get_catalog(path, row):
    response = requests.get(
        "{}/{}/{}/catalog.json".format(base_path, str(path).zfill(3), str(row).zfill(3)))
    return response.json()


def get_item(path, row, item_href):
    response = requests.get(
        "{}/{}/{}/{}".format(base_path, str(path).zfill(3), str(row).zfill(3), item_href))
    return response.json()


def get_mtl(mtl_href):
    response = requests.get(mtl_href)

    return {
        "reflectance_mult": [float(val) for val in re.findall('REFLECTANCE_MULT_BAND_\d+\s+=\s+(.*)\n', response.text)],
        "reflectance_add": [float(val) for val in re.findall('REFLECTANCE_ADD_BAND_\d+\s+=\s+(.*)\n', response.text)],
        "sun_elevation": float(re.findall('SUN_ELEVATION\s+=\s+(.*)\n', response.text)[0])
    }


def get_masked_image(image_href, mask_polygon):
    data = rasterio.open(image_href)
    masked, mask_transform = mask(dataset=data, shapes=mask_polygon.geometry.to_crs(
        crs=data.crs.data), crop=True, all_touched=True)
    return masked[0, :, :], mask_transform


def sum_up(item, target_polygon, bands=[1, 2, 3, 4, 5, 6, 7, 9], max_len=None, nodata=0):
    mtl = get_mtl(item["assets"]["MTL"]["href"])
    ret = []

    for band in bands:
        href = item["assets"]["B{}".format(band)]["href"]
        masked, _ = get_masked_image(href, target_polygon)
        mult = mtl["reflectance_mult"][band - 1]
        add = mtl["reflectance_add"][band - 1]
        denominator = math.sin(math.radians(mtl["sun_elevation"]))
        values = masked.flatten()
        values = values[values != 0]
        if not max_len is None:
            values = values[:max_len]
        ret.append(list((values * mult + add) / denominator))
    return ret

main.py: 樹種毎に、ポリゴンを含んだシーンの情報を取得し、条件に合うシーンは画像を取得しピクセルのTOAを計算する。バンド毎に100ピクセル集まったら終了し、各バンドの平均値（とついでに分散）を保存する。100ピクセル集計することができなかった樹種はNODATAとする。

import json
import csv
import statistics
import landsat8
import geopandas as gpd
from shapely.geometry import Polygon

MAX_CLOUD_COVER = 3
MIN_MONTH = 5
MAX_MONTH = 9
MAX_PIXELS = 100

def is_valid_item(link):
    if link["rel"] == "item":
        month = int(link["href"][5:7])
        return month >= MIN_MONTH and month <= MAX_MONTH
    return False

if __name__ == "__main__":
    df_shp = gpd.read_file("src/WRS2_descending_0/WRS2_descending.shp")

    with open("data/A45_trees.json", encoding="utf-8") as f:
        trees = json.load(f)

    with open('data/A45-19_values.csv', 'w', encoding='utf-8') as f:
        writer = csv.writer(f)
        for tree in trees:
            band_values = [[], [], [], [], [], [], [], []]
            filled = False
            with open("data/A45-19_{id}.geojson".format(id=tree), encoding="utf-8") as f:
                geojson = json.load(f)
                features = geojson["features"]
            print(tree)
            for feature in features:
                props = feature["properties"]
                if not props["A45_025"] in ["単", "竹林"] or props["A45_027"] < 0.1:
                    continue  # 樹種の条件

                try:
                    scenes = landsat8.get_pathrow(Polygon(feature["geometry"]["coordinates"][0]))  # シーンの条件1
                except Exception as e:
                    scenes = []

                for scene in scenes:
                    catalog = landsat8.get_catalog(scene["path"], scene["row"])
                    item_links = list(
                        filter(is_valid_item, catalog["links"]))
                    
                    for item_link in item_links:
                        item = landsat8.get_item(scene["path"], scene["row"],
                                        item_link["href"])
                        item_props = item["properties"]
                        cloud_cover = item_props["eo:cloud_cover"]
                        if cloud_cover > MAX_CLOUD_COVER:
                            continue # シーンの条件2

                        try:
                            mask_polygon = gpd.GeoDataFrame(crs={'init': 'epsg:4326'}, geometry=[scene["intersect"]])  # すべてPolygonの想定
                            values = landsat8.sum_up(item, mask_polygon, max_len=MAX_PIXELS)
                            check = 0
                            for i in range(len(band_values)):
                                band_values[i].extend(values[i])
                                if len(band_values[i]) >= MAX_PIXELS:
                                    band_values = band_values
                                    check += 1

                            if len(band_values) <= check:
                                filled = True
                                break
                        except Exception as e:
                            print(tree, e)
                    else:
                        continue
                    break
                else:
                    continue
                break
            if filled:
                mean = [statistics.mean(values[:MAX_PIXELS]) for values in band_values]
                variance = [statistics.variance(values[:MAX_PIXELS]) for values in band_values]
                writer.writerow([tree] + mean + variance)

data/A45-19_{id}.geojsonは生成済みとする。

そうして集計できた樹種ごとの特徴ベクトルがこちら（一部のみ表示）。

樹種コード	バンド1	バンド2	バンド3	バンド4	バンド5	バンド6	バンド7	バンド9	バンド1(分散)	バンド2(分散)	バンド3(分散)	バンド4(分散)	バンド5(分散)	バンド6(分散)	バンド7(分散)	バンド9(分散)
ハンノキ	0.12412561268106592	0.10001798528705498	0.08312223514492063	0.05849279663722897	0.2611220367203851	0.13093231416670076	0.06363007485754517	0.001920034282376625	5.093785135424055e-06	4.609879502665581e-06	4.7344535308526665e-05	1.545760164328514e-05	0.003908169576869823	0.0002686370278722432	0.00025693016902138705	7.673056482899172e-08
ウルシ	0.11333166343038814	0.08920966424817571	0.07697857649150924	0.04708287937950255	0.3839477616165485	0.1703240658721925	0.06972000537924883	0.0014122765934026984	6.848026435668064e-05	7.226565801010066e-05	0.00016287991597427207	7.210032937847197e-05	0.00819596748594126	0.0014271322455686962	0.00025175522713771764	5.0732803871881825e-08
トドマツ	0.1310720656110398	0.10719486822045156	0.0937022481742604	0.05818641800967698	0.3586468108684962	0.14105503940563185	0.059256311453110375	0.002789413233667782	3.2957832881030394e-07	5.573364130827411e-07	1.2716590589234926e-05	5.534314997871114e-06	0.0014595957613806993	0.00039149091177742936	7.786802437155692e-05	3.193508283695392e-08
ポプラ	0.12118173637577959	0.09681671737221566	0.0803778917290018	0.05093046446586008	0.3442040411213453	0.14586820899552838	0.061084398963265625	0.0016541864944345445	0.0004751044107334024	0.00043514543492540473	0.000353024842050845	0.00029768951677585894	0.008799470548947814	0.0010213594652468876	0.00030276596590938517	1.1093467256943756e-07

表を全てみる

102の樹種でベクトルが計算できました^[10]。

まとめ

衛星画像のピクセルをクラス分類するための教師データを作りました。
例題ではグランドトゥルースとして国有林野ポリゴンとLandsat-8の画像を利用しました。

この教師データを使って分類をしてみるのは別の機会としますが、うまくいかなかったら学習に利用するポリゴンやLandsat-8の画像の絞り込み条件を変えたり、サンプル数を100より増やすなどするとよいのではないでしょうか。

脚注

http://rs.aoyaman.com/img_pro/b7.html ↩︎
あとは車などの物体検出がよく話題に出ます。 ↩︎
個人なら機能制限された無償版や期限付きの体験版の利用ができます。 ↩︎
林地台帳など ↩︎
単層林で検索すると「単一の樹冠層を構成する森林」とあり、面積歩合が一番大きい樹種以外による反射光が少ないことを期待できそうなため。 ↩︎
Ascendingもありますがこちらは地球の夜側を撮像しているため一般的には使われません。 ↩︎
冬はピクセルが雪か否かの判断がめんどうなのと、紅葉の影響を除くため ↩︎
ピクセルが雲か否かの判断がめんどうなため ↩︎
今回は日本国内のみを対象とするため問題はありませんが、rasterioは180度線をまたぐときの処理が甘いケースがあるため海外の一部シーンではうまくいかないかもしれません。 ↩︎
手元では集計に数時間かかりました。IO周りを最適化するともっと速くなると思います。 ↩︎