libunwind から backtrace を作るコードは

Programmatic access to the call stack in C++ - Eli Bendersky's website にあるように

#define UNW_LOCAL_ONLY
#include <libunwind.h>
#include <stdio.h>

// Call this function to get a backtrace.
void backtrace() {
  unw_cursor_t cursor;
  unw_context_t context;

  // Initialize cursor to current frame for local unwinding.
  unw_getcontext(&context);
  unw_init_local(&cursor, &context);

  // Unwind frames one by one, going up the frame stack.
  while (unw_step(&cursor) > 0) {
    unw_word_t offset, pc;
    unw_get_reg(&cursor, UNW_REG_IP, &pc);
    if (pc == 0) {
      break;
    }
    printf("0x%lx:", pc);

    char sym[256];
    if (unw_get_proc_name(&cursor, sym, sizeof(sym), &offset) == 0) {
      printf(" (%s+0x%lx)\n", sym, offset);
    } else {
      printf(" -- error: unable to obtain symbol name for this frame\n");
    }
  }
}

void foo() {
  backtrace(); // <-------- backtrace here!
}

void bar() {
  foo();
}

int main(int argc, char **argv) {
  bar();

  return 0;
}

こうして

gcc -o libunwind_backtrace -Wall -g libunwind_backtrace.c -lunwind

こうじゃ

$ LD_LIBRARY_PATH=/usr/local/lib ./libunwind_backtrace
0x400958: (foo+0xe)
0x400968: (bar+0xe)
0x400983: (main+0x19)
0x7f6046b99ec5: (__libc_start_main+0xf5)
0x400779: (_start+0x29)

We can obtain the function symbol names and the address of the instruction where the call was made (more precisely, the return address which is the next instruction).

良かったですね、なんですが行番号やファイル名も欲しいんだよなぁ！

Fortunately, it's all in the DWARF information of the binary, and given the address we can extract the exact call location in a number of ways. The simplest is probably to call addr2line

でも runtime が addr2line 呼び出しとかしてほしくないし、addr2line 相当のコードをコピペするのもなんだかなという話ですわ。

ひとまずは https://github.com/eliben/pyelftools/blob/master/examples/dwarf_decode_address.py みたいなコード書けば良さそうなんだけど、libbacktrace ってやつもあって、こっちのほうが production 向けって感じがするので、最終的には dwarf_decode_address みたいなことやるにしても、libbacktrace の流れ読んでおいても良いでしょう！

C/C++: printing stacktrace containing file name, function name, and line numbers using libbacktrace - Jiyang Tang

tanishiking

libbacktrace をどうやって使うかって〜いうと、テスト読むと良さそうで

backtrace_create_state でなんか state を初期化して、

backtrace_full でバックトレースを取得

backtrace_create_state で executable から dwarf information を読んで、キャッシュを作っておいて、backtrace_full はそれを読んで...って感じかな

tanishiking

と思ったけど backtrace_create_state は本当に state のためのメモリを確保するだけっぽい

tanishiking

backtrace_full はこんな感じ

_Unwind_Backtrace (unwind, &bdata); で bdata->can_alloc が >= 0 であることをチェックしたり、bdataにもろもろのデータを置いたりしそう

bdataが void pointer なのでどういうデータがあるのかよく分からんけどああこれですね

いや！ bdata->data がやっぱ void* だった

tanishiking

どういう仕組みかよく分からんけど _Unwind_Backtrace はこれに行き着きそうで

この backtrace_pcinfo ってやつでなんかいい感じに bdata->data にデータを入れるのね

  if (!bdata->can_alloc)
    bdata->ret = bdata->callback (bdata->data, pc, NULL, 0, NULL);
  else
    bdata->ret = backtrace_pcinfo (bdata->state, pc, bdata->callback,
				   bdata->error_callback, bdata->data);

tanishiking

まさしくそれなやつ

fileline_initialize と fileline_fn に分けて見ていく

tanishiking

fileline_initialize

なんか自分自身を読み込む方法いろいろあるっぽい

state->filename
libstacktrace の backtrace_create_state の引数に渡したやつ

FILENAME is the path name
of the executable file; if it is NULL the library will try
system-specific path names. If not NULL, FILENAME must point to a
permanent buffer.

ほかはだいたいOS specific なインターフェースをいろいろ試してる感じね

c++ - Finding current executable's path without /proc/self/exe - Stack Overflow

そして取得した descriptor を使って

backtrace_initialize

Read initial debug data from a descriptor, and set the
fileline_data, syminfo_fn, and syminfo_data fields of STATE.
Return the fileln_fn field in *FILELN_FN--this is done this way so
that the synchronization code is only implemented once.

ほうほう

各executableに対して実装されていて、例えばELF

この elf_add ってやつで ELF をパースして読み込んでいく

tanishiking

長くなってきたので

elf_add はこういう感じでかなり長い

PIE チェック

実行ファイルを位置独立実行形式(PIE: Position Independent Code)にすると、実行ファイルは位置独立となり、任意のメモリアドレスにロードできるようになる。このため、PIEな実行ファイルをロードするアドレスをランダム化することによって、ROPなどのコード再利用攻撃に対処することが可能である。
位置独立実行形式(PIE)によって確保できるエントロピー(32bitの場合) - /dev/null

なんだけどこれだとaddressが一意に決まらなくて困るので(?)死ぬ(死ぬのは私であり、プロセスが死ぬわけではない)

debug_info と debug_line を読む

うんうん

debuglink も参照しちゃう偉い

elfファイルのdebugセクション分割とgdbの分割されたデバッグ情報のサポート機能めも - φ(・・*)ゞｳｰﾝ　カーネルとか弄ったりのメモ

なんかいろいろ頑張って最終的には

backtrace_dwarf_add ってやつに行き着く

backtrace_dwarf_add

debug_info とか読んで、マップを作る

なんかしらんけど debug_aranges は使ってないらしい
なんかなくても良くね? みたいな話で clang もデフォルトでは debug_aranges 使わなくなったらしいしなんかそんなに深い理由はないのかもしれん
Consider emitting DWARF aranges · Issue #45246 · rust-lang/rust

で fileline_fn に dwarf_fileline って関数へのポイインタをセットする

dwarf_fileline はさっき作った debug_info のマッピングからDIEとか行情報取ってきて返すやつ

tanishiking

で、回り回って backtrace_full からよばれる backtrace_pcinfo は fileline_fn を呼ぶんでしたね、すなわち dwarf_fileline

tanishiking

当然 PIE だと ASLR によって実行ごとにアドレスが違ってひけまんな

ていうか PIE なプログラムに埋め込まれてる DWARF の low_pc とかって何?

it looks like passing file offsets instead of addresses works. –
Aliaksei Kandratsenka

うーん？

tanishiking

お前、お前天才

tanishiking

Linux だと /proc/pid/maps