👌

inkwell を使用する際のメモ

2025/01/12に公開

Rust で LLVM IR を吐き出すためのクレートに inkwell というものがあります。最近使って遊んでいたのですが、ドキュメントが乏しかったりしてやりたい操作を実現する方法を調べるのに時間がかかったことがあったので、メモ程度に諸々書き残しておきます。

前提

バージョン
- LLVM: 18
用語等
- https://llvm.org/docs/LangRef.html
  - LLVM IR に関する基本的な用語
- https://www.npopov.com/2023/04/07/LLVM-middle-end-pipeline.html
  - Middle end や Back end という言葉はこのブログに従います
  - 筆者の理解はだいたい以下の通りです
    - Middle end: LLVM IR をいじるレイヤー（Instruction Selection より前）
    - Back end: ターゲット固有の諸々（レジスタ割り付けとか）をやるレイヤー（Instruction Selection より後）

スニペットのようなもの

調べてパッと見つからなかったものを記載しています。ドキュメントに載っていなくても inkwell のソースコードのコメントを見ると書いてある場合が結構あります。あるいは、基本的に LLVM の API をラップしているものなので、LLVM API の使い方が分かればどうにかなることが多い印象です。

グローバル変数の定義

なぜか文字列の定義だけは簡単にできる。

let ctx = Context::create();
let module = ctx.create_module("ModA");
let arr = [1, 2, 3, 4, 5];

let i32_type = ctx.i32_type();
let arr_type = i32_type.array_type(arr.len() as u32);
let glb = module.add_global(arr_type, None, "my_array");
let arr = arr
    .into_iter()
    .map(|x| i32_type.const_int(x as u64, false))
    .collect::<Vec<_>>();
let arr = i32_type.const_array(&arr);
glb.set_initializer(&arr);

外部の関数を呼ぶ

（最後によしなにリンクされることを前提として）関数の宣言だけすればよいです。

let ctx = Context::create();
let module = ctx.create_module("ModA");

// Assume a function like "void pow2_array(float *, float *, int)"
let i32_type = ctx.i32_type();
let ptr_type = ctx.ptr_type(AddressSpace::default());
let void_type = ctx.void_type();
let fn_type = void_type.fn_type(&[ptr_type.into(), ptr_type.into(), i32_type.into()], false);
let fn_value = module.add_function("pow2_array", fn_type, None);

`TargetMachine` の作成

example に generic なターゲットの生成方法がありますが、ホストマシンのものを生成することもできます。ネイティブ向けにチューニングされたバイナリが欲しいときはこちらを使用したほうが良いです。

use inkwell::targets::{CodeModel, InitializationConfig, RelocMode, Target, TargetMachine};
use inkwell::OptimizationLevel;

Target::initialize_native(&InitializationConfig::default()).unwrap();
let target_triple = TargetMachine::get_default_triple();
let cpu = TargetMachine::get_host_cpu_name().to_string();
let features = TargetMachine::get_host_cpu_features().to_string();
let target_machine = Target::from_triple(&target_triple)
    .unwrap()
    .create_target_machine(
        &target_triple,
        &cpu,
        &features,
        OptimizationLevel::Aggressive,
        RelocMode::PIC,
        CodeModel::Default,
    )
    .unwrap();

Attribute の取得と適用

いい感じに推論してくれるパスがあるようなので、あまり頑張って付けなくてもいいかもしれません。

use inkwell::attributes::{Attribute, AttributeLoc};

let ctx = Context::create();

let noalias = ctx.create_enum_attribute(
    Attribute::get_named_enum_kind_id("noalias"),
    0
);
let dereferenceable_8 = ctx.create_enum_attribute(
    Attribute::get_named_enum_kind_id("dereferenceable"),
    8
);

let target_cpu = ctx.create_string_attribute(
    "target-cpu",
    target_machine.get_cpu().to_str().unwrap()
);
let target_features = ctx.create_string_attribute(
    "target-features",
    target_machine.get_feature_string().to_str().unwrap(),
);

// Add the attributes to the i-th argument.
// fn_value.add_attribute(AttributeLoc::Param(i), noalias);
// fn_value.add_attribute(AttributeLoc::Param(i), dereferenceable_8);
//
// Add the attributes to the function itself.
// fn_value.add_attribute(AttributeLoc::Function, target_cpu);
// fn_value.add_attribute(AttributeLoc::Function, target_features);

余談

target-cpu や target-features を指定すると、その関数に関しては TargetMachine を上書きするような挙動をするようです（参考）。筆者は最初勘違いして全ての関数にこれを付与していたのですが、TargetMachine の設定を適切に行っていれば特段必要のないものでした。

Intrinsic の取得

名前で探した後に引数の型を指定してオーバーロードの解決をする必要があります。使用するときは普通の関数呼び出しと同じです。

use inkwell::intrinsics::Intrinsic;

let ctx = Context::create();
let module = ctx.create_module("ModA");

let sqrt_f32 = Intrinsic::find("llvm.sqrt").unwrap();
let sqrt_f32 = sqrt_f32.get_declaration(&module, &[ctx.f32_type().into()]).unwrap();

いい感じに最適化をかける (Middle end)

LLVM IR のレベルでの最適化パスの適用は明示的に行う必要があります。inkwell::module::Module::run_passes というメソッドがあるので、それを呼び出せば良いです。パスは自分で指定する必要がありますが、default<O3> のように指定すると O3 相当のパスを全部適用してくれるらしいです。

use inkwell::passes::PassBuilderOptions;

module.run_passes(
    "default<O3>",
    &target_machine,
    PassBuilderOptions::create()
)

じゃあ TargetMachine を作るときに指定した OptimizationLevel はなんだったのか、と思ったのですが、これは Back end で使用するパスの選択に使われるようです。実際見に行ってみると CodeGenOptLevel という名前になっています。inkwell の名前があまりよろしくなさそう？

また、一部の最適化パスは第三引数の PassBuilderOptions 経由で明示的に ON/OFF を切り替えられるようですが、特別な理由がない限りは（少なくとも O3 なら）デフォルトのままで良さそうです。

その他

セグフォするとき

だいたい生成した LLVM IR が悪いことが多いです。生成した IR が壊れているかどうかは、verify というパスを適用してみると分かります。

module.print_to_file("out.ll");

$opt --passes=verify out.ll -o /dev/null

変な箇所があるとなんらかのエラーが出ます。正しい場合は特に何も出力されずに正常終了します。筆者は「Basic Block の先頭以外に PHI を入れてしまう」「Basic Block の最後に jump を入れ忘れる」等をよくやります。一応 run_passes に渡す PassBuilderOptions で verify の ON/OFF を切り替えられるようなのですが、エラーメッセージは闇に葬られていそうなので、意味があるのかはあまり分かりません。間違った IR を確実に弾くことができる、という意味では嬉しい気もしますが。

動くサンプル

折角なので最後に動くサンプルを一つ置いておきます。やっていることは memcpy もどき^[1]です。

use inkwell::context::Context
use inkwell::attributes::{Attribute, AttributeLoc};

fn main() {
    let ctx = Context::create();
    let module = ctx.create_module("sample");
    let builder = ctx.create_builder();
    let noalias = ctx.create_enum_attribute(Attribute::get_named_enum_kind_id("noalias"), 0);
    let i8_type = ctx.i8_type();
    let i64_type = ctx.i64_type();
    let ptr_type = ctx.ptr_type(inkwell::AddressSpace::default());

    let fn_type = ptr_type.fn_type(&[ptr_type.into(), ptr_type.into(), i64_type.into()], false);
    let func = module.add_function("my_memcpy", fn_type, None);
    func.add_attribute(AttributeLoc::Param(0), noalias);
    func.add_attribute(AttributeLoc::Param(1), noalias);

    let entry = ctx.append_basic_block(func, "entry");
    let latch = ctx.append_basic_block(func, "latch");
    let exit = ctx.append_basic_block(func, "exit");

    let dst = func.get_nth_param(0).unwrap().into_pointer_value();
    let src = func.get_nth_param(1).unwrap().into_pointer_value();
    let n = func.get_nth_param(2).unwrap().into_int_value();

    builder.position_at_end(entry);
    let cond = builder
        .build_int_compare(inkwell::IntPredicate::EQ, n, i64_type.const_zero(), "cond")
        .unwrap();
    builder.build_conditional_branch(cond, exit, latch).unwrap();

    builder.position_at_end(latch);
    let ind = builder.build_phi(i64_type, "ind").unwrap();
    let ind_int = ind.as_basic_value().into_int_value();
    let src =
        unsafe { builder.build_in_bounds_gep(i8_type, src, &[ind_int], "gep.src") }.unwrap();
    let val = builder
        .build_load(i8_type, src, "val")
        .unwrap()
        .into_int_value();
    let dst =
        unsafe { builder.build_in_bounds_gep(i8_type, dst, &[ind_int], "gep.dst") }.unwrap();
    builder.build_store(dst, val).unwrap();
    let ind_next = builder
        .build_int_add(ind_int, i64_type.const_int(1, false), "ind.next")
        .unwrap();
    let cond = builder
        .build_int_compare(inkwell::IntPredicate::ULT, ind_next, n, "cond")
        .unwrap();
    builder.build_conditional_branch(cond, latch, exit).unwrap();
    ind.add_incoming(&[(&ind_next, latch), (&i64_type.const_zero(), entry)]);

    builder.position_at_end(exit);
    let dst = func.get_nth_param(0).unwrap().into_pointer_value();
    builder.build_return(Some(&dst)).unwrap();

    let engine = module
        .create_jit_execution_engine(inkwell::OptimizationLevel::Aggressive)
        .unwrap();
    type MyMemcpy = unsafe extern "C" fn(*mut u8, *const u8, i64) -> *mut u8;
    let my_memcpy = unsafe { engine.get_function::<MyMemcpy>("my_memcpy").unwrap() };
    let src = [3, 1, 4, 1, 5];
    let mut dst = vec![0; src.len()];
    let len: i64 = (std::mem::size_of::<u8>() * src.len()).try_into().unwrap();
    unsafe { my_memcpy.call(dst.as_mut_ptr(), src.as_ptr(), len) };
    assert_eq!(dst, src);
}

ところで、この程度のループなら LLVM は勝手に memcpy に置換^[2]してくれます。

https://godbolt.org/z/dqn531nz9

実際に置換しているのは loop-idiom というパスです。--passes の引数の文字列^[3]はそのまま module の run_passes の第一引数に渡せるので、興味のある方は試してみてください。

脚注

実際には build_memcpy というメソッドがあるので、わざわざループを書く必要はありません (https://thedan64.github.io/inkwell/inkwell/builder/struct.Builder.html#method.build_memcpy) ↩︎
この置換が常に結果をもたらすかどうかは微妙らしいです。例えば ICC では長さ 96 以下の場合はその場でループを回しています (https://godbolt.org/z/WMfYcKfnb) ↩︎
opt の引数について: https://llvm.org/docs/NewPassManager.html#invoking-opt ↩︎

前提