🐥

Wasm(バイナリ)を読む

2022/11/22に公開

Rust

WebAssembly

tech

Wasm(バイナリ)を読む

ちょっと前にWasmに入門してみて興味を持ったので、もうちょっと深堀ってみる。
今回は、前回の入門で書いたWasmモジュール(ローカルホスト上でハロワしただけ)を気合で読んで仕様を垣間見てみる。

Wasmモジュール

入門で軽く触れたけどもう少し見てみる。

公式の記載としてはこんな感じ。

WebAssembly programs are organized into modules, which are the unit of deployment, loading, and compilation.

一言でまとめると、WebAssemblyのプログラムを意味のある単位でまとめたもの。
コンパイル後吐き出したり、実行対象として読み込んだり、デプロイしたりする単位。(脳死翻訳)

公式ドキュメントを見てみる

公式ドキュメントにある、Wasmモジュールのバイナリフォーマットの記載を見てみると、まさに求めてた情報が書いてあった。

magic   ::= 0x00 0x61 0x73 0x6d
version ::= 0x01 0x00 0x00 0x00
... (以下色々と続いてる)

なるほど、.asmってマジックナンバーが書いてあるのか。
LinuxのELFにも同じようなの書いてあったよな〜と思ったら書いてあった。

$ file ./target/debug/hello-world
./target/debug/rust: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, (以下省略)

$ hexdump -C ./target/debug/hello-world | head -5
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 3e 00 01 00 00 00  50 78 00 00 00 00 00 00  |..>.....Px......|
00000020  40 00 00 00 00 00 00 00  78 6c 3b 00 00 00 00 00  |@.......xl;.....|
00000030  00 00 00 00 40 00 38 00  0e 00 40 00 2c 00 2b 00  |....@.8...@.,.+.|
00000040  06 00 00 00 04 00 00 00  40 00 00 00 00 00 00 00  |........@.......|

実際のバイナリを見てみる

ちなみにソースはこれ。

fn main() {
    println!("Hello, Wasm!");
}

サイズを確認してみる。

$ du -h ./hello-wasm.wasm 
2.0M	./hello-wasm.wasm

hexdumpしてみる。

$ hexdump -C ./hello-wasm.wasm | head -10
00000000  00 61 73 6d 01 00 00 00  01 74 11 60 00 00 60 01  |.asm.....t.`..`.|
00000010  7f 00 60 01 7f 01 7e 60  02 7f 7f 00 60 01 7f 01  |..`...~`....`...|
00000020  7f 60 02 7f 7f 01 7f 60  03 7f 7f 7f 00 60 03 7f  |.`.....`.....`..|
00000030  7f 7f 01 7f 60 04 7f 7f  7f 7f 01 7f 60 00 01 7f  |....`.......`...|
00000040  60 05 7f 7f 7f 7f 7f 00  60 04 7f 7f 7f 7f 00 60  |`.......`......`|
00000050  05 7f 7f 7f 7f 7f 01 7f  60 07 7f 7f 7f 7f 7f 7f  |........`.......|
000000e0  11 65 6e 76 69 72 6f 6e  5f 73 69 7a 65 73 5f 67  |.environ_sizes_g|
000000f0  65 74 00 05 16 77 61 73  69 5f 73 6e 61 70 73 68  |et...wasi_snapsh|
00000100  6f 74 5f 70 72 65 76 69  65 77 31 09 70 72 6f 63  |ot_preview1.proc|
...

マジックナンバー

00 61 73 6dで.asmって書いてある実物を見ると「今Wasmを見てるんだわ」と実感が湧いてなんか改めて感動。

バージョン番号

01 00 00 00はバージョン番号。今後バージョンアップしていくとここが変わっていくし、ここを元に互換性の判定とかするようになるんだろうか。

鬼の`0x7f`畑

バージョン番号からその直後、なんか7fが鬼のように乱立している。(怖い)
ついでに60も大繁殖してて怖い。

公式ドキュメントを見てみるとこんな記載があった。

The preamble is followed by a sequence of sections.

どうやら、マジックナンバー・バージョン番号の後には「セクション」と呼ばれる領域が続いているらしい。
このox7f畑もセクションの一部？

セクション

もう公式ドキュメントとバイナリの怒涛の往復を繰り返すしかしてないが、公式の記載はこんな感じ。

Each section consists of

a one-byte section id,

the u32 size of the contents, in bytes,

the actual contents, whose structure is depended on the section id.

Every section is optional; an omitted section is equivalent to the section being present with empty contents.

セクションID(1 byte)が来て、セクション内のコンテンツのバイト数(最大u32幅)、その後にセクションの内容といった構成らしい。
セクションIDの値によって、そのセクションの役割が決まるみたい。(1だったら type section とか)

セクション内のコンテンツのサイズが最大u32幅としているのは、セクションのサイズによってサイズ情報が1 byteのセクションもあったり、3 bytes使っているセクションもあったりする。

実際に、wasmerとかでも使われているwasmparserクレートの実装を見てみると、何をしているのかよくわかる。

TypeSectionReaderの実装

impl<'a> TypeSectionReader<'a> {
    /// Constructs a new `TypeSectionReader` for the given data and offset.
    pub fn new(data: &'a [u8], offset: usize) -> Result<Self> {
        let mut reader = BinaryReader::new_with_offset(data, offset);
        let count = reader.read_var_u32()?;  // ここでセクション内のサイズを読み込んでる
        Ok(Self { reader, count })  
    }

read_var_u32の実装

pub fn read_var_u32(&mut self) -> Result<u32> {
    // Optimization for single byte i32.
    let byte = self.read_u8()?;
    if (byte & 0x80) == 0 {  // 先頭ビットが`1`かどうかチェックしてる
        Ok(u32::from(byte))
    } else {
        self.read_var_u32_big(byte)
    }
}

8 bit読み込んで、先頭ビットが1だったらこれ以上大きい数値は想定されないとして、そこまでで値を確定するようにしているみたい。

ちなみに先頭ビットが1だった時に呼ばれるread_var_u32_bigはこんな感じ。
1 byteずつ読み込んでは先頭ビットをチェック、シフトして直前の値と論理和を取るを繰り返す。

fn read_var_u32_big(&mut self, byte: u8) -> Result<u32> {
    let mut result = (byte & 0x7F) as u32;
    let mut shift = 7;
    loop {
        let byte = self.read_u8()?;
        result |= ((byte & 0x7F) as u32) << shift;
        if shift >= 25 && (byte >> (32 - shift)) != 0 {
            let msg = if byte & 0x80 != 0 {
                "invalid var_u32: integer representation too long"
            } else {
                "invalid var_u32: integer too large"
            };
            // The continuation bit or unused bits are set.
            return Err(BinaryReaderError::new(msg, self.original_position() - 1));
        }
        shift += 7;
        if (byte & 0x80) == 0 {
            break;
        }
    }
    Ok(result)
}

こうして1 byteも無駄なくしようという工夫なのか、と勝手に感じとって感動。
ちなみにセクションサイズの記載はオプションらしい。

セクションサイズの計算方法がわかった所で実際のバイナリを見てみる。
バージョン番号の直後が01 74 11 60 00となっているので、一番先頭のセクションだけ見てみるとこんな感じ。

Section ID
- 1 (= type section)
Size
- 74 (= 116 bytes)

Type section

一発目Type Sectionという知らない子が出てきたので、ドキュメントに戻って仕様を追ってみる。

The type section has the id 1. It decodes into a vector of function types that represent the component of a module.

このセクションにはモジュール内で使用される型情報が書き込まれており、どうやらFunction Typeのベクターで構成されるらしい。

Function Type

Function Typeも知らない子だったので、どんどん潜っていく。

Function types are encoded by the byte 0x60 followed by the respective vectors of parameter and result types.

こっちは0x60から始まるResult Typeのベクター。
バイナリ上では60 {引数のResult Type} {戻り値のResult Type}みたいな感じ。

Result Type

潜る潜る。

Result types are encoded by the respective vectors of value types.

Value Typeのベクター。

Value Type

そろそろ潜りすぎて窒息しそう。

Value types are encoded with their respective encoding as a number type, vector type, or reference type.

Number Type・Vector Type・Reference Typeのいずれかで構成される。

Number Type/Vector Type/Reference Type

もうひと踏ん張り。

Number Type

Number types are encoded by a single byte.

各型に対応した1 byteの数値。

0x7f => i32
0x7e => i64
0x7d => f32
0x7c => f64

Vector Type

Vector types are also encoded by a single byte.

128bit幅分の領域で、このサイズに収まるように自由な組み合わせで値をぶっこめる複合型。
0x7Bで定義される。
⚠️⚠️⚠️このVector Typeは Vectorsとは別物なので注意！！ (Vectorsについては後述) ⚠️⚠️⚠️

Reference Type

Reference types are also encoded by a single byte.

関数、もしくは外部から差し込まれる値への参照を意味する型。参照先の値に応じて1 byteの数値が対応している。

0x70 => funcref (関数への参照)
0x6f => externref (外部から差し込まれる値への参照)

Vectors

Vectors are encoded with their u32 length followed by the encoding of their element sequence.

前述のVector Typeが、128 bit幅の複合型だったのに対し、こちらは長さ指定のシーケンス。
u32で先頭にサイズを定義して、定義した分の数だけ値が続く。
ここまでで散々出てきているベクターといっているのはこっちのVectorsのこと。
一般的にベクターというとこっちのイメージが個人的にある。

この辺までで道具が揃ってそうなので引き返す。

もう一度バイナリに立ち返る

少々窒息しそうになったが、ここでもう一度バイナリに戻ってみる。

$ hexdump -C ./hello-wasm.wasm | head -10
00000000  00 61 73 6d 01 00 00 00  01 74 11 60 00 00 60 01  |.asm.....t.`..`.|
00000010  7f 00 60 01 7f 01 7e 60  02 7f 7f 00 60 01 7f 01  |..`...~`....`...|
00000020  7f 60 02 7f 7f 01 7f 60  03 7f 7f 7f 00 60 03 7f  |.`.....`.....`..|
00000030  7f 7f 01 7f 60 04 7f 7f  7f 7f 01 7f 60 00 01 7f  |....`.......`...|
00000040  60 05 7f 7f 7f 7f 7f 00  60 04 7f 7f 7f 7f 00 60  |`.......`......`|
00000050  05 7f 7f 7f 7f 7f 01 7f  60 07 7f 7f 7f 7f 7f 7f  |........`.......|
000000e0  11 65 6e 76 69 72 6f 6e  5f 73 69 7a 65 73 5f 67  |.environ_sizes_g|
000000f0  65 74 00 05 16 77 61 73  69 5f 73 6e 61 70 73 68  |et...wasi_snapsh|
00000100  6f 74 5f 70 72 65 76 69  65 77 31 09 70 72 6f 63  |ot_preview1.proc|
...

`7f`畑の正体 (ついでに`60`も)

感の良い方はもうお気づきかもしれないが、乱立している7fは数値型のi32であることがわかる。
さらに、60は関数型が定義されている箇所に現れている。

`Type Section`を分解

一番最初に定義されているセクションである、Type Sectionを分解するとこんな感じになる。

01 ------------------------------------------ Section ID (= Type Section)
   74 --------------------------------------- セクション内のサイズ
      11 ------------------------------------ 定義されているFunction Typeの数 (= 17)
         60 00 00 --------------------------- (void) -> void
         60 01 7f 00 ------------------------ (i32) -> void
         60 01 7f 01 7e --------------------- (i32) -> i64
         60 02 7f 7f 00 --------------------- (i32, i32) -> void
         60 01 7f 01 7f --------------------- (i32) -> i32
         60 02 7f 7f 01 7f ------------------ (i32, i32) -> i32
         60 03 7f 7f 7f 00 ------------------ (i32, i32, i32) -> void
         60 03 7f 7f 7f 01 7f --------------- (i32, i32, i32) -> i32
         60 04 7f 7f 7f 7f 01 7f ------------ (i32, i32, i32, i32) -> i32
         60 00 01 7f ------------------------ (void) -> i32
         60 05 7f 7f 7f 7f 7f 00 ------------ (i32, i32, i32, i32, i32) -> void
         60 04 7f 7f 7f 7f 00 --------------- (i32, i32, i32, i32) => void
         60 05 7f 7f 7f 7f 7f 01 7f --------- (i32, i32, i32, i32, i32) -> i32
         60 07 7f 7f 7f 7f 7f 7f 7f 00 ------ (i32, i32, i32, i32, i32, i32, i32) -> void 
         60 06 7f 7f 7f 7f 7f 7f 01 7f ------ (i32, i32, i32, i32, i32, i32) -> i32
         60 07 7f 7f 7f 7f 7f 7f 7f 01 7f --- (i32, i32, i32, i32, i32, i32, i32) -> i32
         60 03 7e 7f 7f 01 7f --------------- (i64, i32, i32) -> i32

怒涛の7f地獄で怯え散らかしていたがスッキリした。

Import Section

Type Sectionの次は、Section IDが2のImport Sectionが続いている。

The import section has the id 2. It decodes into a vector of imports that represent the imports component of a module.

「外部のWasmモジュールから取り込む情報」を定義したセクションらしく、「どのモジュールから、どんな名前で、何を取り込むか」といったリストが書き込まれている。

これも分解

02 ----------------------------- Section ID (= Import Section)
   96 01 ----------------------- セクション内のサイズ
         04 -------------------- 取り込む要素の数 (= 4)
            16 ----------------- モジュール名の長さ (= 22 bytes)
               77 61 73 ~ 31 ---   モジュール名 (= "wasi_snapshot_preview1")
               08 --------------   読み込み対象の名前の長さ (= 8 bytes)
               66 64 5f ~ 65 ---   読み込み対象の名前 (= "fd_write")
               00 --------------   読み込み先のベクター (= Type Section)
               08 --------------   参照先の要素番号 (= Type Sectionの8番目)
            16 ----------------- モジュール名の長さ (= 22 bytes)
               77 61 73 ~ 31 ---   モジュール名 (= "wasi_snapshot_preview1")
               0b --------------   読み込み対象の名前の長さ (= 11 bytes)
               65 6e 76 ~ 74 ---   読み込み対象の名前 (= "environ_get")
               00 --------------   読み込み先のベクター (= Type Section)
               05 --------------   参照先の要素番号 (= Type Sectionの5番目)
            16 ----------------- モジュール名の長さ (= 22 bytes)
               77 61 73 ~ 31 ---   モジュール名 (= "wasi_snapshot_preview1")
               11 --------------   読み込み対象の名前の長さ (= 17 bytes)
               65 6e 76 ~ 74 ---   読み込み対象の名前 (= "environ_sizes_get")
               00 --------------   読み込み先のベクター (= Type Section)
               05 --------------   参照先の要素番号 (= Type Sectionの5番目)
            16 ----------------- モジュール名の長さ (= 22 bytes)
               77 61 73 ~ 31 ---   モジュール名 (= "wasi_snapshot_preview1")
               09 --------------   読み込み対象の名前の長さ (= 9 bytes)
               70 72 6f ~ 74 ---   読み込み対象の名前 (= "proc_exit")
               00 --------------   読み込みタイプ (= 関数)
               01 --------------   参照先の要素番号 (= Type Sectionの1番目)

Function Section

Import Sectionに続く、Section IDが3のセクション。

関数のシグネチャのベクターが管理されている。

分解分解

ここはセクションサイズは省略されていて、Section IDの直後はシグネチャの要素数が来ている。
このセクションに関しては、要素数 == セクションサイズとしてもほぼ同義なので省かれているのかも？
大部分は同じような繰り返しが続くので、分解も一部抜粋。

03 --------------- Section ID (= Function Section)
   90 02 --------- セクション内の要素数
         8e 02 --- `Type Section`上にある型情報の要素番号
(以下略)

Table Section

Function Sectionに続く、Section ID4のセクション。

The table section has the id 4. It decodes into a vector of tables that represent the tables component of a module.

Tableという「関数や外部から差し込まれた値への参照先を保持するデータ構造」のベクターが定義されたセクション。

Table一つ一つには、テーブル内の最小エントリ数・最大エントリ数と、前述のReference Typeが定義されている。
コードからは、0から始まる要素番号でベクター内の特定のテーブルにアクセスする。

分解分解分解

04 ------------------ Section ID (= Table Section)
   05 --------------- セクション内のサイズ
      01 ------------ `Table`の数 (= 1)
         70 ---------   Reference Type (= 関数への参照)
	 01 60 60 ---   テーブル内の最小・最大エントリ数 (= 96)

Memory Section

Table Sectionに続くセクション。Section IDは5。

The memory section has the id 5. It decodes into a vector of memories that represent the mems component of a module.

Linear Memoryという、「ヒープ空間のように動的に割当可能なメモリ領域」に関する情報のベクターが定義されたセクション。
公式のドキュメントではMemoryとして省略されてる。

それぞれのMemoryには、空間の初期サイズと最大サイズが定義されている。
こちらもTableと同様、コードからは「0から始まる要素番号」でベクター内の特定のMemoryを参照する。

分解分解分解分解

05 --------------- Section ID (= Memory Section)
   03 ------------ セクション内のサイズ
      01 --------- `Memory`の数 (= 1)
         00 11 ---   割当可能なメモリサイズの最小値・最大値 (= 0~17ページ)

Global Section

Memory Sectionに続くセクション。Section IDは6。

The global section has the id 6. It decodes into a vector of globals that represent the component of a module.

「モジュール内のグローバル変数に関する情報」のベクターが定義されたセクション。
グローバル変数には、変数の型(Number Type)・可変性・初期値を決定する式から構成される。
可変性と初期値についてはこんな感じ。

可変性

{Value Type} 0x00 => const {Value Type}
{Value Type} 0x10 => var {Value Type}

式
命令を並べて0x0bで終端を表現。命令は結構たくさんあるので割愛。

これも分解するしかない

06 ------------------------------ Section ID (= Global Section)
   19 --------------------------- セクション内のサイズ
      03 ------------------------ グローバル変数の数 (= 3)
         7f --------------------- 変数の型 (= i32)
	    01 ------------------   可変性 (= var)
	    41 80 80 c0 00 0b ---   初期値 (= const i32 0x80)
	 7f --------------------- 変数の型 (= i32)
	    00 ------------------   可変性 (= const)
	    41 90 da c0 00 0b ---   初期値 (= const i32 0x90)
	 7f  -------------------- 変数の型 (= i32)
	    00 ------------------   可変性 (= const)
	    41 84 da c0 00 0b ---   初期値 (= const i32)

Export Section

Global Sectionの次の、Section ID7のセクション。

The export section has the id 7. It decodes into a vector of exports that represent the exports component of a module.

「外部のWasmモジュールやランタイムから参照可能な情報」のベクターで構成されている。
エントリポイントとなる関数をエクスポートしてランタイムから参照できるようにしたりとか、グローバル変数などをエクスポートして外部Wasmモジュールから利用可能にしたりとか？

分解しないと (使命感)

07 ------------------------------ Section ID (= Export Section)
   37 --------------------------- セクション内のサイズ
      05 ------------------------ エクスポートする要素の数 (= 5)
         06 --------------------- エクスポートする要素名の長さ (= 6 bytes)
            6d 65 6d 6f 72 79 ---   エクスポートする要素名 (= "memory")
            02 ------------------   エクスポートする要素の場所 (= `Table Section`)
               00 ---------------   要素番号
         0b --------------------- エクスポートする要素名の長さ (= 11 bytes)
            5f 5f 68 ~ 65 -------   エクスポートする要素名 (= "__heap_base") 
            03 ------------------   エクスポートする要素の場所 (= `Global Section`)
               01 ---------------   要素番号
         0a --------------------- エクスポートする要素名の長さ (= 10 bytes)
            5f 5f 64 ~ 64 -------   エクスポートする要素名 (= "__data_end")
            03 ------------------   エクスポートする要素の場所 (= `Global Section`)
               02 ---------------   要素番号
         06 --------------------- エクスポートする要素名の長さ (= 6 bytes)
            5f 73 74 61 72 74 ---   エクスポートする要素名 (= "_start")
            00 ------------------   エクスポートする要素の場所 (= `Type Section`)
            90 02 ---------------   要素番号
         04 --------------------- エクスポートする要素名の長さ (= 4 bytes)
            6d 61 69 6e ---------   エクスポートする要素名 (= "main")
            00 ------------------   エクスポートする要素の場所 (= `Type Section`)
            91 02 ---------------   要素番号

Start Section

Section ID8のセクション。

The start section has the id 8. It decodes into an optional start function that represents the start component of a module.

このセクションはオプションらしく、今回調査対象のバイナリには含まれていなかった。
どうやら、Wasmモジュールが読み込まれたタイミング(正確にはテーブルやリニアメモリが初期化された直後)に自動的に関数を呼び出してもらうことができるらしく、その呼び出し対象の関数をこのセクションで指定できるみたい。

Element Section

Section ID9のセクション。

The element section has the id 9. It decodes into a vector of element segments that represent the elems component of a module.

Elementという「各テーブルの初期化情報？」のベクターが定義されたセクションみたい。
ここもFunction Section同様、セクション内のサイズ保持せず要素数で管理されているっぽいように見える。

それぞれのElementには、passive・active・declarativeといったモードがありそれぞれ初期化が実行されるタイミングが違う。

passive
- table.init命令が呼ばれたタイミングで、定義されているデータの塊を対象のテーブルにコピーして初期化する。
active
- 指定された初期化先を、モジュールのメモリ読み込み時のタイミングで自動的に初期化する。
declarative
- 実行時には使用できないらしいが、ref.funcなどで前方宣言？のために使われるらしい。(ここはちょっとイメージつかなかった)

分解

09 ------------------ Section ID (= Elem Section)
   83 01 ------------ Elementの要素数
         01 --------- Elementのモード (= passive)
            00 ------ Elementの種類 (= funcref)
            41 ------ Element内部の要素数 (= 65)
               01 --- ここから下は要素数分の初期化情報の塊
	       0b 
	       5f 
	       0e 
               0b
               08
               12 
               e5 01 
               4b 
(以下略)

Code Section

Section ID10。

The code section has the id 10. It decodes into a vector of code entries that are pairs of value type vectors and expressions. They represent the locals and body field of the functions in the funcs component of a module. The type fields of the respective functions are encoded separately in the function section.

実際の処理がCodeとして書き込まれているセクションで、なかなかボリュームがある。
それぞれCodeという単位のベクターになっていて、それぞれのCodeには「ローカル変数の宣言」・「関数本体の処理になる式」が定義されている。

言うまでもなく分解

これまで通りセクションまるごと分解しようかと思ったが、めちゃくちゃ量が多かったので一部抜粋。

0a --------------------------------------------- Section ID (= Code Section)
   a3 e1 03 ------------------------------------ セクション内のサイズ (=  bytes) 
            8e 02 ------------------------------ コードの数 (= )
                  05 --------------------------- コードのサイズ (= 5 bytes)
                     00 ------------------------   ローカル変数の個数 (= 0) 
                     10 ------------------------   関数呼び出し (= call)
		        8e 01 ------------------     呼び出す関数の要素番号 (= )
		     0b ------------------------   コードの終端
                  1b --------------------------- コードのサイズ (= 27 bytes)
                     01 ------------------------   ローカル変数の個数 (= 1)
                        01 ---------------------     ローカル変数のサイズ (= 1 byte)
                        7f ---------------------     ローカル変数の型 (= i32)
                     02 ------------------------ ブロックの開始
                        40 ---------------------   ブロックの戻り値の型 (= void)
                           10 ------------------   関数呼び出し (= call)
                              8f 80 80 80 00 ---     呼び出す関数の要素番号 (= )
                           22 ------------------   直後に続く値をローカル変数に格納して値を返す (= local.tee)
                              00 ---------------     格納先のローカル変数のテーブル上の要素番号 (= 0)
                           45 ------------------   i32.eqz
                           0d ------------------ 直前の演算結果がtrueの場合、指定したラベルにジャンプする (= br_if)
                              00 ---------------   ラベルの要素番号 (= 0)
                           20 ------------------ ローカル変数から値を取得 (= local.get)
                              00 --------------- 取得元のローカル変数のテーブル上の要素番号 (= 0)
                           10 ------------------ 関数呼び出し (= call)
                              9a 81 80 80 00 ---   呼び出す関数の要素番号
                           00 ------------------ ブロックの戻り値となる値 (= unreachable ※voidの代わり)
                        0b --------------------- ブロック内のコードの終端
                     0b ------------------------ ブロックの終端
(以下略)

Data Section

Section ID11のセクション。

The data section has the id 11. It decodes into a vector of data segments that represent the datas component of a module.

ELFとかのデータセグメントに相当する領域。静的な値などがここに書き込まれている。
実態はdata segmentのベクター。

こちらもelement segmentのように、それぞれのデータセグメントはactiveもしくはpassiveといったモードを持っており、それぞれのモードの特徴としては以下。

active
- メモリ領域の指定によって、初期化段階で自動的にLinear Memory内にコピーされる。
passive
- memory.init命令で逐次Linear Memory内にコピーする。

分解

これもめちゃくちゃ量が多かったので一部抜粋。出力してたHello, Wasm!が出てきた。

# Data Section
0b --------------------------------------------------------- Section ID (= Data Section)
   99 55 --------------------------------------------------- セクション内のサイズ
         02 ------------------------------------------------ モード (= active)
            00 ---------------------------------------------   Linear Memoryの要素番号
            41 --------------------------------------------- Linear Memoryのオフセットを決める式の型 (= i32.const)
               80 80 c0 00 ---------------------------------   オフセットの値
               0b ------------------------------------------ 式の終端
            f7 54 ------------------------------------------ データのサイズ
                  48 65 6c 6c 6f 2c 20 57 61 73 6d 21 0a --- Hello, Wasm!\n
(以下略)

Data Count Section

Section ID12。

The data count section has the id 12. It decodes into an optional u32 that represents the number of data segments in the data section. If this count does not match the length of the data segment vector, the module is malformed.

このセクションもオプションらしい。(これも今回のバイナリには含まれてなかった)
Data Section内のデータセグメントの数をu32で指定するらしく、single-pass validationというのを簡素化するために使われるとのこと。

最後に

ここまでめちゃくちゃ人間の目でバイナリを読み込んできたわけだが、実はwasm2watとかwasm-decompileを使えばもっと人間様向きのフォーマット(S式とかC-likeなシンタックス)で読める。
もっと複雑なプログラムをデバッグする時とかは、そのへんを使うのが懸命だし健康的な気がする。
(実際中盤以降くらいになると、7fとか0bとか頻出するものについては16進数を見て「あれか」と意味がわかるような頭のおかしい状態になってきてた)

ただ今回に関しては根本的なWasmの勉強にはなったし、調べながら読んでて個人的にめちゃくちゃ楽しめたのでOK。
個人的にWasm熱が高まってて今後も色々調べて行きたいし、「なぜWasmがセキュアなのか」とかも仕様レベルで調べてまとめてみたいな〜と思った。

Wasm(バイナリ)を読む

Wasmモジュール

公式ドキュメントを見てみる

実際のバイナリを見てみる

マジックナンバー

バージョン番号

鬼の0x7f畑

セクション

Type section

Function Type

Result Type

Value Type

Number Type/Vector Type/Reference Type

Vectors

もう一度バイナリに立ち返る

7f畑の正体 (ついでに60も)

Type Sectionを分解

Import Section

これも分解

Function Section

分解分解

Table Section

分解分解分解

Memory Section

分解分解分解分解

Global Section

これも分解するしかない

Export Section

分解しないと (使命感)

Start Section

Element Section

分解

Code Section

言うまでもなく分解

Data Section

分解

Data Count Section

最後に

Discussion

鬼の`0x7f`畑

`7f`畑の正体 (ついでに`60`も)

`Type Section`を分解