x86エミュレータをRustで実装するログ

nozo 2021/02/12に更新

Rustを勉強するために「自作エミュレータで学ぶx86アーキテクチャ」をもとにx86エミュレータをRustで実装するログ。

作業リポジトリは下記。

https://github.com/isNozo/x86_emu

nozo 2021/02/12に更新

書籍の「2.3章初めてのエミュレータ」から実装を始める。まずはファイルの読み込み部分を実装していく。次のRustチュートリアルが参考になりそう。

チュートリアルをもとにプログラムを書き進めていくと、次のプログラムがコンパイルエラーとなった。早速Rustの洗礼を受ける。

use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();
    let filename = args[1];

    println!("filename = {}", filename);
}

このプログラムはenv::args().collect()でコマンドライン引数を取得して、その1番目の要素をprintln!で表示しようとした何の変哲もないもの。

エラーメッセージは下記の通り。値args[1]を変数filenameへ束縛する部分で怒られている。

$ cargo run hoge
   Compiling x86_emu v0.1.0 (/workspaces/rust_prj/x86_emu)
error[E0507]: cannot move out of index of `Vec<String>`
 --> src/main.rs:5:20
  |
5 |     let filename = args[1];
  |                    ^^^^^^^
  |                    |
  |                    move occurs because value has type `String`, which does not implement the `Copy` trait
  |                    help: consider borrowing here: `&args[1]`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0507`.
error: could not compile `x86_emu`

To learn more, run the command again with --verbose.

これが巷でよく聞く「所有権」と呼ばれる仕組みによるエラーっぽい？

所有権について知るために簡単な例を動かしてみる。次のプログラムもコンパイルエラーとなる。

fn main() {
    let s1 = String::from("hello");
    let s2 = s1;
    
    println!("{}, world!", s1);
}

エラーメッセージは下記の通り。

error[E0382]: borrow of moved value: `s1`
 --> src/main.rs:5:28
  |
2 |     let s1 = String::from("hello");
  |         -- move occurs because `s1` has type `String`, which does not implement the `Copy` trait
3 |     let s2 = s1;
  |              -- value moved here
4 |     
5 |     println!("{}, world!", s1);
  |                            ^^ value borrowed here after move

nozo 2021/02/15

次は構造体を使ってみる。
Emulatorの構造体を以下のように定義した。

const REGISTER_COUNT : usize = 8;

enum Register { EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI }

struct Emulator {
    // General-purpose Registers
    registers: [u32; REGISTER_COUNT],
    // EFLAGS Register
    eflags: u32,
    // Instruction Pointer
    eip: u32,
    // Memory
    memory: Vec<u8>
}

ここで少しひっかかった点がひとつ。

Cによる実装に合わせて、Rustでも構造体Emulatorのregistersを配列、Registerを列挙型として定義した。
これは列挙型Registerをインデックスとして配列registersにregisters[EAX]みたいにアクセスすることを意図したものだが、実際にやってみるとうまくいかない。

適当な配列のインデックスにRegisterの要素を入れてみる。

    let arr = [1, 2, 3];
    println!("arr[EAX]={}", arr[Register::EAX]);

インデックスの型がusizeじゃないよみたいなこと言われる。
これは確かにRustに限らず暗黙の型変換をしない言語はこうなりそう。

error[E0277]: the type `[{integer}]` cannot be indexed by `Register`
  --> src/main.rs:33:29
   |
33 |     println!("arr[EAX]={}", arr[Register::EAX]);
   |                             ^^^^^^^^^^^^^^^^^^ slice indices are of type `usize` or ranges of `usize`
   |
   = help: the trait `SliceIndex<[{integer}]>` is not implemented for `Register`
   = note: required because of the requirements on the impl of `Index<Register>` for `[{integer}]`

調べてみるとasキーワードでキャストして使う例が見つかった。

以下のように書ける。

    let arr = [1, 2, 3];
    println!("arr[EAX]={}", arr[Register::EAX as usize]);

実行例：

$ cargo run
arr[EAX]=1

構造体Emulatorのインスタンスを生成する関数は以下のように書けた。

fn create_emu(eip: u32, esp: u32) -> Emulator {
    let mut emu = Emulator {
        // Clear all resisters by 0
        registers: [0; RegisterCount as usize],
        // Clear eflags by 0
        eflags: 0,
        // Init EIP register
        eip: eip,
        // Init memory
        memory: Vec::new()
    };

    // Init ESP register
    emu.registers[ESP as usize] = esp;

    emu
}

呼び出し側は以下の通り。

    // Create emulator with EIP=0x0000 and ESP=0x7c00
    let mut emu = create_emu(0x0000, 0x7c00);
    
    // Read binary file into memory
    file.read_to_end(&mut emu.memory)
        .expect("something went wrong reading the file");

nozo 2021/02/15

続いて、書籍のC言語では下記のようにx86の命令に対応する関数を定義して、それを関数ポインタの配列で保持する実装がされていた。

typedef void instruction_func_t(Emulator*);
instruction_func_t* instructions[256];
void init_instructions(void)
{
    int i;
    memset(instructions, 0, sizeof(instructions));
    for (i = 0; i < 8; i++) {
        instructions[0xB8 + i] = mov_r32_imm32;
    }
    instructions[0xEB] = short_jump;
}

Rustでは関数ポインタ配列をどうやって実装したらよい？

調べてみるとRustでは関数型をfnというキーワードで定義するみたい。

例えば下記のように２つのi32を引数に受け取りi32を返す関数addの型はfn(i32, i32) -> i32と書くことができる。

fn add(x: i32, y: i32) -> i32 {
	x + y
}

let mut x = add(5,7);

type Binop = fn(i32, i32) -> i32;
let bo: Binop = add;
x = bo(5,7);

このことから、今回の場合は配列に格納したいx86の命令に対応する関数は次のように可変参照のEmulatorを１つ受け取る関数なので(可変参照なのは命令の実行でEmulatorのレジスタ、メモリが書き換えられる可能性があるから)、

fn something_instruction(emu: &mut Emulator) {
	//--snip--
}

次のような型[Option<fn(&mut Emulator)>; INSTRUCTIONS_COUNT]の配列を定義したらよさそう。

    // Initialize the x86 instructions table
    // The None value in the instructions table indicates that instruction is not implemented
    let mut instructions: [Option<fn(&mut Emulator)>; INSTRUCTIONS_COUNT]
        = [None; INSTRUCTIONS_COUNT];

Option型を使ったのは定義されていない命令をNoneで表すことで下記のように未定義の場合分けが簡単そうだったから。

        match instructions[code as usize] {
            // Execute the instruction
            Some(inst) => inst(&mut emu),
            // Stop the program if the instructin is not implemented
            None => {
                println!("\nNot Implemented: {:#04x}\n", code);
                break;
            }
        }

あとは可変で宣言したinstructionsを下記のように初期化したらOKそう。

// Initialize a instructions table
const INSTRUCTIONS_COUNT: usize = 256;
fn init_instructions(instructions: &mut [Option<fn(&mut Emulator)>; INSTRUCTIONS_COUNT]) {
    //--snip--
    //各命令を登録する
}

nozo 2021/02/15

前回まででx86命令の関数を格納する関数ポインタ配列を定義することができた。今回はいくつかのx86命令の関数を定義して簡単な実行ファイルをエミュレータで実行してみる。

下記の２つの命令を実装した。
mov_r32_imm32は32bitの即値をレジスタにコピーするmov命令で、short_jumpは8bitの値の範囲(-128から+127)でジャンプするjmp命令に対応している。

fn mov_r32_imm32(emu: &mut Emulator) {
    // Get a target register from opecode
    let reg = get_code8(emu, 0) - 0xB8;
    // Get 32bit immediate data from operand
    let imm = get_code32(emu, 1);
    // Set immediate data to the target register
    emu.registers[reg as usize] = imm;
    // Count up the EIP register
    emu.eip += 5;
}

fn short_jump(emu: &mut Emulator) {
    // Get a 8bit jump diff
    let diff = get_sign_code8(emu, 1);
    // Add the diff to the EIP register
    emu.eip += (diff + 2) as u32;
}

get_code*はそれぞれメモリ（u8型の配列）から8bit、32bitの値を取得する関数となっている。32bitの方は機械語がリトルエンディアンでメモリに格納されているため8bitずつ読み取って並び替える処理になっている。

fn get_code8(emu: &Emulator, offset: usize) -> u8 {
    emu.memory[emu.eip as usize + offset]
}

fn get_sign_code8(emu: &Emulator, offset: usize) -> i8 {
    emu.memory[emu.eip as usize + offset] as i8
}

fn get_code32(emu: &Emulator, offset: usize) -> u32 {
    let mut ret: u32 = 0x0000_00000;
    
    // Get a 32bit data as little endian
    for i in 0..4 {
        ret |= (get_code8(emu, offset + i) as u32) << (i * 8);
    }
    ret
}

上記の実装で機械語ファイルを読み込みエミュレータ上で実行してみる。
読み込む機械語ファイルは下記の通り。

$ objdump -b binary -D -m i386 short_jmp.bin 

short_jmp.bin:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0:   b8 29 00 00 00          mov    $0x29,%eax
   5:   eb f9                   jmp    0x0

しかし、実際に実行してみると実行時エラーが発生してしまった。

$ cargo run test/short_jmp.bin 
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/x86_emu test/short_jmp.bin`
EIP = 0x00000000, Code = 0xb8
EIP = 0x00000005, Code = 0xeb
thread 'main' panicked at 'attempt to add with overflow', src/main.rs:84:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

ログを見るとmov命令は実行できていて、jmp命令でパニックしているように見える。
エラーメッセージはpanicked at 'attempt to add with overflow'と出力されていて、以下の行の計算でオーバーフローが起きていると言っている。

    // Add the diff to the EIP register
    emu.eip += (diff + 2) as u32;

エラーメッセージが言っているオーバーフローとは何のことだろうか。調べてみるとRustのチュートリアルページに整数のオーバーフローについて書かれていた。

ここにはこんなことが書かれている。

Let’s say you have a variable of type u8 that can hold values between 0 and 255. If ?you try to change the variable to a value outside of that range, such as 256, integer overflow will occur. Rust has some interesting rules involving this behavior. When you’re compiling in debug mode, Rust includes checks for integer overflow that cause your program to panic at runtime if this behavior occurs.

Rustではdebugモードでプログラムをコンパイルすると、プログラムの実行時に整数型の値がその型の範囲を超えていないかチェックを行い、超えていたらプログラムをパニックさせるようだ。

上のプログラムでパニックが発生した時のjmp命令のオペランドは0xF9であり、これは2の補数表現で-7にあたる。そのためEIP=5のときにjmp命令の2byte分だけ進めた後に-7することになり、最終的に0x00番地へのジャンプとして書かれている。

   5:   eb f9                   jmp    0x0

上記の挙動を以下のように実装していた。

    // Add the diff to the EIP register
    emu.eip += (diff + 2) as u32;

具体的に数値を当てはめてみると、最後のu32の加算でオーバーフローしていることがわかる。

0x0000_0005 + ((0xF9 + 0x02) as u32)  // i8の加算(ここではオーバーフローしない)
0x0000_0005 + (0xFB as u32)           // i8からu32への符号拡張キャストで上位ビットがFで埋められる
0x0000_0005 + 0xFFFF_FFFB             // u32の加算(ここでオーバーフローする) 
0x0000_0000

このプログラムを正しく書こうとすると次のように書ける。

    emu.eip = (emu.eip as i32 + (diff + 2) as i32) as u32;

しかし、キャストが至る所に現れてかなりまどろっこしい感じがする。
先ほどの整数のオーバーフローの説明には以下のような続きがある。

When you’re compiling in release mode with the --release flag, Rust does not include checks for integer overflow that cause panics. Instead, if overflow occurs, Rust performs two’s complement wrapping. In short, values greater than the maximum value the type can hold “wrap around” to the minimum of the values the type can hold. In the case of a u8, 256 becomes 0, 257 becomes 1, and so on. The program won’t panic, but the variable will have a value that probably isn’t what you were expecting it to have. Relying on integer overflow’s wrapping behavior is considered an error. If you want to wrap explicitly, you can use the standard library type Wrapping.

releseモードでコンパイルすると実行時の整数のオーバーフローチェックは行われず、オーバーフローした場合は整数型の範囲内に納まるようにラップされる（u8なら256が0になる）。
Rustではこのオーバーフロー時のラップする挙動はプログラマが期待したものではないという考えであり、ラップしたいときは明示的にWrappingライブラリを使うようにと言っている。

プリミティブ型であるu32などにはそれぞれwrapping_addというメソッドが定義されている。

これを使うと下記のようにオーバーフローする加算もラップするように書くことができる。

    emu.eip = emu.eip.wrapping_add((diff + 2) as u32);

wrapping_addを使ったときの実行結果は以下の通り。
実行時エラーは発生せず正常に終了する。

$ cargo run test/short_jmp.bin
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/x86_emu test/short_jmp.bin`
EIP = 0x00000000, Code = 0xb8
EIP = 0x00000005, Code = 0xeb

end of program.

EAX = 0x00000029
ECX = 0x00000000
EDX = 0x00000000
EBX = 0x00000000
ESP = 0x00007c00
EBP = 0x00000000
ESI = 0x00000000
EDI = 0x00000000
EIP = 0x00000000

このスクラップは2021/07/29にクローズされました