RubyでGBエミュレータを作る
TODO
- & 0xffをいろんな場所に直書きしているのをなんとかする
- cpu命令を別クラスに実装する
ロゴつくる- エラーハンドリング
putsにする- 実行コマンドを追加する
- 実行コマンドごとにファイルを分ける
- start, bench, stackprof
- apuの定数
- audioの環境変数(48000)を参照してステップ数を計算するようにする
- 512をaudioから引っ張ってくる
- joypad入力管理を別クラスにする
yjitの有効化を実行時に行うREADMEにスクショ追加- ppuの実装を見直す
- マニュアルと他実装を見る
- ぷよぷよのOPが変
- テストを書く
影なしのスクショを取る- タイトルとfpsを表示する
- シリアル通信を追加する
- 他のテストも試す
- pyboy参考
- テストファイルを作って、ciで実行する
- 描画バグの修正
- MBCタイプの追加
- ゲームボーイカラー対応
- Wasm対応
- apu, ppu, joypadのリファクタリング
- ベンチマークまわりの整備
- 描画と音声のありなしをできるようにしたい
- READMEにも追加する
- Rubyのベンチマークプログラムとして使えるようにしたい
- 不要なrequireの削除
- 描画の高速化
- 不要な描画をしない
参考記事
- https://voidproc.com/blog/archives/664
- https://mjhd.hatenablog.com/entry/2021/04/14/221813
- https://keichi.dev/post/write-yourself-a-game-boy-emulator/
- https://hackmd.io/@anqou/HJcvRrwy9
- https://qiita.com/linoscope/items/244d931aaae07df2c27e
- https://imrannazar.com/GameBoy-Emulation-in-JavaScript
資料
ROM読み込み
.gbファイルを読み込めるようにする
ヘッダー(0x0100-0x014F)
titleを取得してみる
data = File.open('tobu.gb', 'r') { _1.read.bytes }
p data[0x134..0x143].pack('C*').strip
=> "TOBU"
Romクラスのインスタンスのフィールドに詰め込んでいく。
Github Copilotがサジェストしてくれる。便利。
CPU実装
HelloWorldが動くのを最初の目標にする
bgの描画処理とcpuの命令の実装が必要そう
0x0100に到達した時点でレジスタに初期値が入ってるんだけど、これはそういうものっぽい
この辺もサジェストしてくれる
0xf3まで打つとコメントと処理内容がサジェストされた
nを符号付きとして扱う場合があるので注意
HelloWorldに必要な最低限の処理は実装できたので、PPUの実装に移る
HelloWorldが動いたので、cpu命令を全部実装していく
疑問点
- DEC命令のハーフキャリーフラグの説明が、H - Set if no borrow from bit 4.となっているが、borrowがあったときにたてる?
CPUテスト実行結果
bgbのレジスタを見ながらデバッグする。
各テストがこけていた理由をメモする。はまりポイント
- 1
- 5: pop afでfの値をsetするとき、下位4bitを0000にしていなかった
- 3
- 1: 0xe8と0xf8のcフラグの計算方法
- cflag = (@sp & 0xff) + (byte & 0xff) > 0xff で通った
- 1: 0xe8と0xf8のcフラグの計算方法
テストが通った!
速度を上げるために描画方法を変えたらぼやけるようになってしまった
halt命令の注意点
停止状態はimeがfalseであろうと、割り込みが発生したら解除される。
PPU
わからんポイント
画面描画の仕組み
- 描画処理が行われるタイミング
- → エミュレータでは、CPU命令を実行するたびにサイクル数を数えておき、VBlankになるまでサイクル数がたまったら画面を更新すれば良い
- VRAMにデータを書き込むタイミング
- → VBlankになるまでlyレジスタを見ながらループする
計測方法
tobu.gbの最初の1500フレームをheadlessで実行したときにかかる時間を3回はかる
v1.0.0
yjit: false
1: 36.740829 sec
2: 36.468515 sec
3: 36.177083 sec
FPS: 41.1385591742566
yjit: true
1: 32.305559 sec
2: 32.094778 sec
3: 31.889601 sec
FPS: 46.73385499531633
描画処理にも時間がかかっているためそちらも高速化が必要だが、
とりあえず描画なしで60fps以上出せることを目標に高速化していく
stackprof1回目
→ render_spritesがボトルネックになっている
==================================
Mode: cpu(1000)
Samples: 9081 (1.08% miss rate)
GC: 4 (0.04%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
3727 (41.0%) 1920 (21.1%) Rubyboy::Ppu#render_sprites
1800 (19.8%) 1800 (19.8%) Rubyboy::Operand#initialize
1448 (15.9%) 1448 (15.9%) Integer#zero?
3346 (36.8%) 1296 (14.3%) Enumerable#each_slice
919 (10.1%) 919 (10.1%) Integer#<<
424 (4.7%) 424 (4.7%) Integer#<=>
3552 (39.1%) 294 (3.2%) Array#each
162 (1.8%) 159 (1.8%) Rubyboy::Cpu#flags
147 (1.6%) 147 (1.6%) Array#size
104 (1.1%) 104 (1.1%) Integer#>>
2220 (24.4%) 71 (0.8%) Rubyboy::Ppu#render_bg
6259 (68.9%) 58 (0.6%) Rubyboy::Ppu#step
149 (1.6%) 55 (0.6%) Rubyboy::Ppu#get_color
44 (0.5%) 44 (0.5%) Rubyboy::Ppu#to_signed_byte
177 (1.9%) 38 (0.4%) Rubyboy::Timer#step
34 (0.4%) 34 (0.4%) Integer#-@
146 (1.6%) 29 (0.3%) Rubyboy::Cartridge::Mbc1#read_byte
29 (0.3%) 29 (0.3%) Rubyboy::Registers#write8
915 (10.1%) 24 (0.3%) Rubyboy::Ppu#get_pixel
17 (0.2%) 17 (0.2%) Rubyboy::Registers#read8
14 (0.2%) 14 (0.2%) Rubyboy::Cpu#increment_pc_by_byte
9054 (99.7%) 14 (0.2%) Rubyboy::Console#bench
434 (4.8%) 11 (0.1%) Range#===
398 (4.4%) 10 (0.1%) Rubyboy::Bus#read_byte
9 (0.1%) 9 (0.1%) Rubyboy::Ppu#handle_ly_eq_lyc
154 (1.7%) 8 (0.1%) Rubyboy::Ppu#render_window
1244 (13.7%) 7 (0.1%) Rubyboy::Ppu#get_tile_index
2597 (28.6%) 6 (0.1%) Rubyboy::Cpu#exec
110 (1.2%) 5 (0.1%) Rubyboy::Bus#write_byte
9 (0.1%) 5 (0.1%) Rubyboy::Ppu#write_byte
render_sprites
Rubyboy::Ppu#render_sprites (/Users/yamasaki/dev/gb-emulator/rubyboy/rubyboy/lib/rubyboy/ppu.rb:220)
samples: 1920 self (21.1%) / 3727 total (41.0%)
callers:
3727 ( 100.0%) Rubyboy::Ppu#step
1902 ( 51.0%) Enumerable#each_slice
46 ( 1.2%) Enumerator#with_index
35 ( 0.9%) Array#each
29 ( 0.8%) Integer#times
callees (1807 total):
3307 ( 183.0%) Enumerator#each
339 ( 18.8%) Enumerator#with_index
39 ( 2.2%) Enumerable#each_slice
36 ( 2.0%) Array#each
34 ( 1.9%) Integer#-@
29 ( 1.6%) Integer#times
20 ( 1.1%) Rubyboy::Ppu#get_pixel
9 ( 0.5%) Integer#zero?
5 ( 0.3%) Rubyboy::Ppu#get_color
1 ( 0.1%) Enumerable#sort_by
code:
| 220 | def render_sprites
3 (0.0%) | 221 | return if @lcdc[LCDC[:sprite_enable]].zero?
| 222 |
2 (0.0%) | 223 | sprite_height = @lcdc[LCDC[:sprite_size]].zero? ? 8 : 16
| 224 | sprites = []
| 225 | cnt = 0
3346 (36.8%) | 226 | @oam.each_slice(4).each do |sprite_attr|
| 227 | sprite = {
| 228 | y: (sprite_attr[0] - 16) % 256,
| 229 | x: (sprite_attr[1] - 8) % 256,
| 230 | tile_index: sprite_attr[2],
| 231 | flags: sprite_attr[3]
| 232 | }
| 233 | next if sprite[:y] > @ly || sprite[:y] + sprite_height <= @ly
| 234 |
| 235 | sprites << sprite
| 236 | cnt += 1
15 (0.2%) / 15 (0.2%) | 237 | break if cnt == 10
1887 (20.8%) / 1887 (20.8%) | 238 | end
386 (4.3%) / 12 (0.1%) | 239 | sprites = sprites.sort_by.with_index { |sprite, i| [-sprite[:x], -i] }
| 240 |
36 (0.4%) | 241 | sprites.each do |sprite|
| 242 | flags = sprite[:flags]
4 (0.0%) | 243 | pallet = flags[SPRITE_FLAGS[:dmg_palette]].zero? ? @obp0 : @obp1
| 244 | tile_index = sprite[:tile_index]
| 245 | tile_index &= 0xfe if sprite_height == 16
| 246 | y = (@ly - sprite[:y]) % 256
2 (0.0%) / 2 (0.0%) | 247 | y = sprite_height - y - 1 if flags[SPRITE_FLAGS[:y_flip]] == 1
| 248 | tile_index = (tile_index + 1) % 256 if y >= 8
| 249 | y %= 8
| 250 |
29 (0.3%) | 251 | 8.times do |x|
2 (0.0%) / 2 (0.0%) | 252 | x_flipped = flags[SPRITE_FLAGS[:x_flip]] == 1 ? 7 - x : x
| 253 |
20 (0.2%) | 254 | pixel = get_pixel(tile_index, x_flipped, y)
| 255 | i = (sprite[:x] + x) % 256
| 256 |
| 257 | next if pixel.zero? || i >= LCD_WIDTH
2 (0.0%) / 2 (0.0%) | 258 | next if flags[SPRITE_FLAGS[:priority]] == 1 && @bg_pixels[i] != 0
| 259 |
5 (0.1%) | 260 | @buffer[@ly * LCD_WIDTH + i] = get_color(pallet, pixel)
| 261 | end
spriteを毎回作らないように修正
FPS: 46.73385499531633 → 49.2233733053377
Integer#zero?
== 0に修正
FPS: 49.2233733053377 → 49.36641822413328
オペランドクラスのかわりにハッシュを使う
FPS: 49.36641822413328 → 50.94130878614299
stackprof 2回目
==================================
Mode: cpu(1000)
Samples: 5666 (1.73% miss rate)
GC: 7 (0.12%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
1334 (23.5%) 1334 (23.5%) Rubyboy::Ppu#to_signed_byte
1260 (22.2%) 1260 (22.2%) Integer#<<
662 (11.7%) 662 (11.7%) Integer#<=>
913 (16.1%) 471 (8.3%) Array#each
403 (7.1%) 403 (7.1%) Rubyboy::Registers#read8
410 (7.2%) 214 (3.8%) Enumerable#each_slice
3739 (66.0%) 190 (3.4%) Rubyboy::Ppu#step
180 (3.2%) 180 (3.2%) Rubyboy::Timer#step
961 (17.0%) 165 (2.9%) Rubyboy::Ppu#render_sprites
195 (3.4%) 163 (2.9%) Rubyboy::Cpu#flags
98 (1.7%) 98 (1.7%) Integer#>>
2415 (42.6%) 87 (1.5%) Rubyboy::Ppu#render_bg
57 (1.0%) 57 (1.0%) Array#size
138 (2.4%) 49 (0.9%) Rubyboy::Ppu#get_color
152 (2.7%) 32 (0.6%) Rubyboy::Cartridge::Mbc1#read_byte
31 (0.5%) 31 (0.5%) Rubyboy::Registers#write8
29 (0.5%) 29 (0.5%) Integer#-@
1020 (18.0%) 27 (0.5%) Rubyboy::Ppu#get_pixel
19 (0.3%) 19 (0.3%) Rubyboy::Interrupt#interrupts
869 (15.3%) 17 (0.3%) Rubyboy::Cpu#get_value
1349 (23.8%) 15 (0.3%) Rubyboy::Ppu#get_tile_index
5634 (99.4%) 13 (0.2%) Rubyboy::Console#bench
643 (11.3%) 11 (0.2%) Rubyboy::Bus#read_byte
569 (10.0%) 10 (0.2%) Rubyboy::Cpu#ld8
9 (0.2%) 9 (0.2%) Rubyboy::Cpu#increment_pc_by_byte
8 (0.1%) 8 (0.1%) Rubyboy::Ppu#handle_ly_eq_lyc
100 (1.8%) 7 (0.1%) Rubyboy::Bus#write_byte
666 (11.8%) 6 (0.1%) Range#===
164 (2.9%) 5 (0.1%) Rubyboy::Ppu#render_window
5 (0.1%) 5 (0.1%) FFI::FunctionType#initialize
PPUのリファクタリング
Initialize tile_map_addr outside the loop
FPS: 50.94130878614299 → 56.6580741129914
Precompute outside the loop
FPS: 56.6580741129914 → 60.44140113483162
TODO: 定数をやめる
速度は上がるが可読性が落ちるのであんまりやりたくない
Ruby v3.2 -> v3.3
FPS: 61.021 → 115.236
GCが多く発生しているのをなんとかしたい
ポケモン赤のスタート画面が重いのでリファクタリングする。
音を出すと特に重い
rubyboy % stackprof stackprof-cpu-myapp.dump
==================================
Mode: cpu(1000)
Samples: 16405 (4.57% miss rate)
GC: 5593 (34.09%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
3688 (22.5%) 3688 (22.5%) (sweeping)
2332 (14.2%) 2109 (12.9%) Enumerable#flat_map
2050 (12.5%) 2050 (12.5%) Integer#<=>
5593 (34.1%) 1679 (10.2%) (garbage collection)
1038 (6.3%) 1038 (6.3%) Rubyboy::Ppu#to_signed_byte
1004 (6.1%) 1004 (6.1%) Rubyboy::SDL.RenderClear
646 (3.9%) 646 (3.9%) Rubyboy::Ppu#get_pixel
437 (2.7%) 437 (2.7%) Integer#>>
701 (4.3%) 332 (2.0%) Rubyboy::Ppu#render_sprites
1354 (8.3%) 278 (1.7%) Rubyboy::Lcd#draw
3825 (23.3%) 257 (1.6%) Rubyboy::Ppu#step
1627 (9.9%) 255 (1.6%) Rubyboy::Ppu#render_bg
633 (3.9%) 247 (1.5%) Enumerable#each_slice
230 (1.4%) 230 (1.4%) Rubyboy::Registers#read8
226 (1.4%) 226 (1.4%) (marking)
2332 (14.2%) 223 (1.4%) Rubyboy::Console#buffer_to_pixel_data
2933 (17.9%) 194 (1.2%) Integer#times
1228 (7.5%) 185 (1.1%) Rubyboy::Ppu#render_window
178 (1.1%) 178 (1.1%) Rubyboy::Timer#step
524 (3.2%) 110 (0.7%) Rubyboy::Ppu#get_color
95 (0.6%) 95 (0.6%) Rubyboy::Registers#write8
116 (0.7%) 81 (0.5%) Rubyboy::Cpu#flags
80 (0.5%) 80 (0.5%) Rubyboy::Registers#read16
662 (4.0%) 80 (0.5%) Rubyboy::Cartridge::Mbc1#read_byte
1203 (7.3%) 69 (0.4%) Rubyboy::Cpu#ld8
62 (0.4%) 62 (0.4%) Array#size
57 (0.3%) 57 (0.3%) Rubyboy::Cpu#increment_pc_by_byte
56 (0.3%) 56 (0.3%) Rubyboy::SDL.UpdateTexture
44 (0.3%) 44 (0.3%) Rubyboy::Interrupt#interrupts
2090 (12.7%) 44 (0.3%) Range#===
画面の配列を最後にflat_mapで3倍にするのをやめて、都度r,g,bの三色分つくるようにした
rubyboy % stackprof stackprof-cpu-myapp.dump
==================================
Mode: cpu(1000)
Samples: 11758 (6.52% miss rate)
GC: 3103 (26.39%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
1857 (15.8%) 1857 (15.8%) Integer#<=>
1542 (13.1%) 1542 (13.1%) (sweeping)
3103 (26.4%) 1459 (12.4%) (garbage collection)
1087 (9.2%) 1087 (9.2%) Rubyboy::SDL.RenderClear
950 (8.1%) 950 (8.1%) Rubyboy::Ppu#to_signed_byte
1797 (15.3%) 646 (5.5%) Rubyboy::Ppu#render_bg
606 (5.2%) 606 (5.2%) Rubyboy::Ppu#get_pixel
1381 (11.7%) 467 (4.0%) Rubyboy::Ppu#render_window
712 (6.1%) 287 (2.4%) Rubyboy::Ppu#render_sprites
281 (2.4%) 281 (2.4%) Rubyboy::SDL.UpdateTexture
618 (5.3%) 261 (2.2%) Enumerable#each_slice
4152 (35.3%) 249 (2.1%) Rubyboy::Ppu#step
246 (2.1%) 246 (2.1%) Integer#>>
192 (1.6%) 192 (1.6%) Rubyboy::Registers#read8
3252 (27.7%) 178 (1.5%) Integer#times
162 (1.4%) 162 (1.4%) Rubyboy::Timer#step
102 (0.9%) 102 (0.9%) (marking)
321 (2.7%) 101 (0.9%) Rubyboy::Ppu#get_color
600 (5.1%) 100 (0.9%) Rubyboy::Cartridge::Mbc1#read_byte
88 (0.7%) 88 (0.7%) Rubyboy::Registers#write8
80 (0.7%) 80 (0.7%) Rubyboy::Registers#read16
78 (0.7%) 78 (0.7%) Array#size
1108 (9.4%) 73 (0.6%) Rubyboy::Cpu#ld8
104 (0.9%) 69 (0.6%) Rubyboy::Cpu#flags
64 (0.5%) 64 (0.5%) Rubyboy::SDL.GetKeyboardState
620 (5.3%) 56 (0.5%) Array#each
1904 (16.2%) 52 (0.4%) Range#===
1441 (12.3%) 52 (0.4%) Rubyboy::Lcd#draw
41 (0.3%) 41 (0.3%) Rubyboy::Cpu#increment_pc_by_byte
8648 (73.5%) 39 (0.3%) Rubyboy::Console#bench
heap-profilerで調べたところcpu.rbのHashがメモリを大量につかっているので修正する
rubyboy % heap-profiler tmp/report
Total allocated: 563.01 MB (4198804 objects)
Total retained: 10.13 kB (252 objects)
allocated memory by gem
-----------------------------------
563.01 MB rubyboy/lib
320.00 B heap-profiler-0.7.0
allocated memory by file
-----------------------------------
454.17 MB rubyboy/lib/rubyboy/cpu.rb
93.18 MB rubyboy/lib/rubyboy/ppu.rb
10.06 MB rubyboy/lib/rubyboy/apu.rb
4.35 MB rubyboy/lib/rubyboy/audio.rb
1.25 MB rubyboy/lib/rubyboy.rb
720.00 B rubyboy/lib/rubyboy/lcd.rb
416.00 B rubyboy/lib/rubyboy/apu_channels/channel2.rb
416.00 B rubyboy/lib/rubyboy/apu_channels/channel1.rb
320.00 B rubyboy/lib/rubyboy/bus.rb
320.00 B heap-profiler-0.7.0/lib/heap_profiler/reporter.rb
296.00 B rubyboy/lib/rubyboy/interrupt.rb
120.00 B rubyboy/lib/rubyboy/apu_channels/channel4.rb
80.00 B rubyboy/lib/rubyboy/registers.rb
40.00 B rubyboy/lib/rubyboy/cartridge/mbc1.rb
40.00 B rubyboy/lib/rubyboy/apu_channels/channel3.rb
allocated memory by location
-----------------------------------
77.83 MB rubyboy/lib/rubyboy/cpu.rb:600
65.28 MB rubyboy/lib/rubyboy/ppu.rb:248
38.96 MB rubyboy/lib/rubyboy/cpu.rb:283
35.15 MB rubyboy/lib/rubyboy/cpu.rb:174
19.13 MB rubyboy/lib/rubyboy/cpu.rb:85
18.67 MB rubyboy/lib/rubyboy/cpu.rb:272
18.47 MB rubyboy/lib/rubyboy/cpu.rb:87
16.77 MB rubyboy/lib/rubyboy/cpu.rb:234
15.01 MB rubyboy/lib/rubyboy/cpu.rb:292
14.61 MB rubyboy/lib/rubyboy/ppu.rb:239
13.74 MB rubyboy/lib/rubyboy/cpu.rb:231
13.67 MB rubyboy/lib/rubyboy/cpu.rb:79
11.77 MB rubyboy/lib/rubyboy/cpu.rb:172
11.38 MB rubyboy/lib/rubyboy/cpu.rb:166
11.35 MB rubyboy/lib/rubyboy/cpu.rb:167
10.51 MB rubyboy/lib/rubyboy/cpu.rb:96
9.17 MB rubyboy/lib/rubyboy/cpu.rb:114
8.86 MB rubyboy/lib/rubyboy/cpu.rb:86
8.55 MB rubyboy/lib/rubyboy/cpu.rb:219
8.08 MB rubyboy/lib/rubyboy/ppu.rb:244
7.46 MB rubyboy/lib/rubyboy/cpu.rb:280
6.67 MB rubyboy/lib/rubyboy/cpu.rb:227
6.22 MB rubyboy/lib/rubyboy/cpu.rb:173
6.18 MB rubyboy/lib/rubyboy/cpu.rb:63
5.74 MB rubyboy/lib/rubyboy/cpu.rb:256
5.55 MB rubyboy/lib/rubyboy/cpu.rb:113
5.51 MB rubyboy/lib/rubyboy/cpu.rb:178
5.32 MB rubyboy/lib/rubyboy/cpu.rb:61
5.31 MB rubyboy/lib/rubyboy/cpu.rb:123
5.21 MB rubyboy/lib/rubyboy/ppu.rb:236
5.15 MB rubyboy/lib/rubyboy/cpu.rb:294
4.76 MB rubyboy/lib/rubyboy/cpu.rb:229
4.35 MB rubyboy/lib/rubyboy/audio.rb:31
3.60 MB rubyboy/lib/rubyboy/cpu.rb:58
3.34 MB rubyboy/lib/rubyboy/cpu.rb:70
3.30 MB rubyboy/lib/rubyboy/cpu.rb:94
2.95 MB rubyboy/lib/rubyboy/cpu.rb:163
2.87 MB rubyboy/lib/rubyboy/cpu.rb:106
2.64 MB rubyboy/lib/rubyboy/cpu.rb:147
2.12 MB rubyboy/lib/rubyboy/cpu.rb:228
2.12 MB rubyboy/lib/rubyboy/cpu.rb:139
2.01 MB rubyboy/lib/rubyboy/cpu.rb:276
1.90 MB rubyboy/lib/rubyboy/cpu.rb:71
1.87 MB rubyboy/lib/rubyboy/apu.rb:51
1.87 MB rubyboy/lib/rubyboy/apu.rb:58
1.74 MB rubyboy/lib/rubyboy/cpu.rb:66
1.55 MB rubyboy/lib/rubyboy/cpu.rb:171
1.44 MB rubyboy/lib/rubyboy/apu.rb:53
1.44 MB rubyboy/lib/rubyboy/apu.rb:60
1.39 MB rubyboy/lib/rubyboy/apu.rb:52
allocated memory by class
-----------------------------------
462.20 MB Hash
49.79 MB Array
14.61 MB Enumerator
10.96 MB <memo> (IMEMO)
10.96 MB <ifunc> (IMEMO)
10.06 MB Float
4.35 MB FFI::MemoryPointer
55.88 kB FFI::Pointer
25.68 kB <throw_data> (IMEMO)
6.92 kB <callcache> (IMEMO)
2.96 kB <constcache> (IMEMO)
96.00 B <ment> (IMEMO)
allocated objects by gem
-----------------------------------
4198796 rubyboy/lib
8 heap-profiler-0.7.0
allocated objects by file
-----------------------------------
2839605 rubyboy/lib/rubyboy/cpu.rb
1105342 rubyboy/lib/rubyboy/ppu.rb
251462 rubyboy/lib/rubyboy/apu.rb
1294 rubyboy/lib/rubyboy.rb
1048 rubyboy/lib/rubyboy/audio.rb
18 rubyboy/lib/rubyboy/lcd.rb
8 rubyboy/lib/rubyboy/bus.rb
8 heap-profiler-0.7.0/lib/heap_profiler/reporter.rb
5 rubyboy/lib/rubyboy/apu_channels/channel2.rb
5 rubyboy/lib/rubyboy/apu_channels/channel1.rb
3 rubyboy/lib/rubyboy/apu_channels/channel4.rb
2 rubyboy/lib/rubyboy/registers.rb
2 rubyboy/lib/rubyboy/interrupt.rb
1 rubyboy/lib/rubyboy/cartridge/mbc1.rb
1 rubyboy/lib/rubyboy/apu_channels/channel3.rb
allocated objects by location
-----------------------------------
689584 rubyboy/lib/rubyboy/ppu.rb:248
486434 rubyboy/lib/rubyboy/cpu.rb:600
273889 rubyboy/lib/rubyboy/ppu.rb:239
243478 rubyboy/lib/rubyboy/cpu.rb:283
219714 rubyboy/lib/rubyboy/cpu.rb:174
119570 rubyboy/lib/rubyboy/cpu.rb:85
116703 rubyboy/lib/rubyboy/cpu.rb:272
115434 rubyboy/lib/rubyboy/cpu.rb:87
104839 rubyboy/lib/rubyboy/cpu.rb:234
93804 rubyboy/lib/rubyboy/cpu.rb:292
91296 rubyboy/lib/rubyboy/ppu.rb:236
85878 rubyboy/lib/rubyboy/cpu.rb:231
85438 rubyboy/lib/rubyboy/cpu.rb:79
73590 rubyboy/lib/rubyboy/cpu.rb:172
71146 rubyboy/lib/rubyboy/cpu.rb:166
70944 rubyboy/lib/rubyboy/cpu.rb:167
65709 rubyboy/lib/rubyboy/cpu.rb:96
57340 rubyboy/lib/rubyboy/cpu.rb:114
55390 rubyboy/lib/rubyboy/cpu.rb:86
53465 rubyboy/lib/rubyboy/cpu.rb:219
50512 rubyboy/lib/rubyboy/ppu.rb:244
46849 rubyboy/lib/rubyboy/apu.rb:51
46847 rubyboy/lib/rubyboy/apu.rb:58
46596 rubyboy/lib/rubyboy/cpu.rb:280
41691 rubyboy/lib/rubyboy/cpu.rb:227
38898 rubyboy/lib/rubyboy/cpu.rb:173
38615 rubyboy/lib/rubyboy/cpu.rb:63
36024 rubyboy/lib/rubyboy/apu.rb:53
36023 rubyboy/lib/rubyboy/apu.rb:60
35883 rubyboy/lib/rubyboy/cpu.rb:256
34737 rubyboy/lib/rubyboy/apu.rb:52
34736 rubyboy/lib/rubyboy/apu.rb:59
34695 rubyboy/lib/rubyboy/cpu.rb:113
34468 rubyboy/lib/rubyboy/cpu.rb:178
33268 rubyboy/lib/rubyboy/cpu.rb:61
33186 rubyboy/lib/rubyboy/cpu.rb:123
32219 rubyboy/lib/rubyboy/cpu.rb:294
29773 rubyboy/lib/rubyboy/cpu.rb:229
22490 rubyboy/lib/rubyboy/cpu.rb:58
20864 rubyboy/lib/rubyboy/cpu.rb:70
20604 rubyboy/lib/rubyboy/cpu.rb:94
18442 rubyboy/lib/rubyboy/cpu.rb:163
17946 rubyboy/lib/rubyboy/cpu.rb:106
16512 rubyboy/lib/rubyboy/cpu.rb:147
13241 rubyboy/lib/rubyboy/cpu.rb:228
13230 rubyboy/lib/rubyboy/cpu.rb:139
12579 rubyboy/lib/rubyboy/cpu.rb:276
11893 rubyboy/lib/rubyboy/cpu.rb:71
10850 rubyboy/lib/rubyboy/cpu.rb:66
9706 rubyboy/lib/rubyboy/cpu.rb:171
allocated objects by class
-----------------------------------
2888757 Hash
416967 Array
273888 <memo> (IMEMO)
273888 <ifunc> (IMEMO)
251442 Float
91296 Enumerator
1040 FFI::MemoryPointer
642 <throw_data> (IMEMO)
635 FFI::Pointer
173 <callcache> (IMEMO)
74 <constcache> (IMEMO)
2 <ment> (IMEMO)
retained memory by gem
-----------------------------------
9.81 kB rubyboy/lib
320.00 B heap-profiler-0.7.0
retained memory by file
-----------------------------------
3.92 kB rubyboy/lib/rubyboy/cpu.rb
2.20 kB rubyboy/lib/rubyboy/ppu.rb
960.00 B rubyboy/lib/rubyboy.rb
720.00 B rubyboy/lib/rubyboy/lcd.rb
720.00 B rubyboy/lib/rubyboy/apu.rb
328.00 B rubyboy/lib/rubyboy/audio.rb
320.00 B rubyboy/lib/rubyboy/bus.rb
320.00 B heap-profiler-0.7.0/lib/heap_profiler/reporter.rb
160.00 B rubyboy/lib/rubyboy/apu_channels/channel2.rb
160.00 B rubyboy/lib/rubyboy/apu_channels/channel1.rb
120.00 B rubyboy/lib/rubyboy/apu_channels/channel4.rb
80.00 B rubyboy/lib/rubyboy/registers.rb
40.00 B rubyboy/lib/rubyboy/interrupt.rb
40.00 B rubyboy/lib/rubyboy/cartridge/mbc1.rb
40.00 B rubyboy/lib/rubyboy/apu_channels/channel3.rb
retained memory by location
-----------------------------------
160.00 B rubyboy/lib/rubyboy.rb:79
160.00 B rubyboy/lib/rubyboy.rb:78
152.00 B rubyboy/lib/rubyboy/ppu.rb:248
120.00 B rubyboy/lib/rubyboy/lcd.rb:28
120.00 B rubyboy/lib/rubyboy/audio.rb:34
80.00 B rubyboy/lib/rubyboy/registers.rb:73
80.00 B rubyboy/lib/rubyboy/ppu.rb:209
80.00 B rubyboy/lib/rubyboy/ppu.rb:107
80.00 B rubyboy/lib/rubyboy/lcd.rb:45
80.00 B rubyboy/lib/rubyboy/lcd.rb:44
80.00 B rubyboy/lib/rubyboy/lcd.rb:35
80.00 B rubyboy/lib/rubyboy/lcd.rb:31
80.00 B rubyboy/lib/rubyboy/lcd.rb:30
80.00 B rubyboy/lib/rubyboy/lcd.rb:29
80.00 B rubyboy/lib/rubyboy/cpu.rb:23
80.00 B rubyboy/lib/rubyboy/cpu.rb:1248
80.00 B rubyboy/lib/rubyboy/audio.rb:27
80.00 B rubyboy/lib/rubyboy/apu_channels/channel2.rb:76
80.00 B rubyboy/lib/rubyboy/apu.rb:65
80.00 B rubyboy/lib/rubyboy/apu.rb:51
80.00 B rubyboy/lib/rubyboy/apu.rb:50
80.00 B rubyboy/lib/rubyboy.rb:75
80.00 B rubyboy/lib/rubyboy.rb:74
80.00 B rubyboy/lib/rubyboy.rb:43
80.00 B heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:58
80.00 B heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:53
80.00 B heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:52
48.00 B rubyboy/lib/rubyboy/ppu.rb:239
48.00 B rubyboy/lib/rubyboy/audio.rb:28
40.00 B rubyboy/lib/rubyboy/ppu.rb:306
40.00 B rubyboy/lib/rubyboy/ppu.rb:180
40.00 B rubyboy/lib/rubyboy/ppu.rb:179
40.00 B rubyboy/lib/rubyboy/lcd.rb:27
40.00 B rubyboy/lib/rubyboy/cpu.rb:92
40.00 B rubyboy/lib/rubyboy/cpu.rb:894
40.00 B rubyboy/lib/rubyboy/cpu.rb:76
40.00 B rubyboy/lib/rubyboy/cpu.rb:748
40.00 B rubyboy/lib/rubyboy/cpu.rb:69
40.00 B rubyboy/lib/rubyboy/cpu.rb:64
40.00 B rubyboy/lib/rubyboy/cpu.rb:275
40.00 B rubyboy/lib/rubyboy/cpu.rb:254
40.00 B rubyboy/lib/rubyboy/cpu.rb:219
40.00 B rubyboy/lib/rubyboy/cpu.rb:1027
40.00 B rubyboy/lib/rubyboy/bus.rb:87
40.00 B rubyboy/lib/rubyboy.rb:81
40.00 B rubyboy/lib/rubyboy.rb:80
40.00 B rubyboy/lib/rubyboy.rb:76
40.00 B rubyboy/lib/rubyboy.rb:45
40.00 B rubyboy/lib/rubyboy.rb:44
40.00 B rubyboy/lib/rubyboy.rb:39
retained memory by class
-----------------------------------
6.96 kB <callcache> (IMEMO)
3.00 kB <constcache> (IMEMO)
96.00 B <ment> (IMEMO)
72.00 B Thread::Mutex
retained objects by gem
-----------------------------------
244 rubyboy/lib
8 heap-profiler-0.7.0
retained objects by file
-----------------------------------
98 rubyboy/lib/rubyboy/cpu.rb
54 rubyboy/lib/rubyboy/ppu.rb
24 rubyboy/lib/rubyboy.rb
18 rubyboy/lib/rubyboy/lcd.rb
18 rubyboy/lib/rubyboy/apu.rb
8 rubyboy/lib/rubyboy/bus.rb
8 rubyboy/lib/rubyboy/audio.rb
8 heap-profiler-0.7.0/lib/heap_profiler/reporter.rb
4 rubyboy/lib/rubyboy/apu_channels/channel2.rb
4 rubyboy/lib/rubyboy/apu_channels/channel1.rb
3 rubyboy/lib/rubyboy/apu_channels/channel4.rb
2 rubyboy/lib/rubyboy/registers.rb
1 rubyboy/lib/rubyboy/interrupt.rb
1 rubyboy/lib/rubyboy/cartridge/mbc1.rb
1 rubyboy/lib/rubyboy/apu_channels/channel3.rb
retained objects by location
-----------------------------------
4 rubyboy/lib/rubyboy.rb:79
4 rubyboy/lib/rubyboy.rb:78
3 rubyboy/lib/rubyboy/ppu.rb:248
3 rubyboy/lib/rubyboy/lcd.rb:28
3 rubyboy/lib/rubyboy/audio.rb:34
2 rubyboy/lib/rubyboy/registers.rb:73
2 rubyboy/lib/rubyboy/ppu.rb:209
2 rubyboy/lib/rubyboy/ppu.rb:107
2 rubyboy/lib/rubyboy/lcd.rb:45
2 rubyboy/lib/rubyboy/lcd.rb:44
2 rubyboy/lib/rubyboy/lcd.rb:35
2 rubyboy/lib/rubyboy/lcd.rb:31
2 rubyboy/lib/rubyboy/lcd.rb:30
2 rubyboy/lib/rubyboy/lcd.rb:29
2 rubyboy/lib/rubyboy/cpu.rb:23
2 rubyboy/lib/rubyboy/cpu.rb:1248
2 rubyboy/lib/rubyboy/audio.rb:27
2 rubyboy/lib/rubyboy/apu_channels/channel2.rb:76
2 rubyboy/lib/rubyboy/apu.rb:65
2 rubyboy/lib/rubyboy/apu.rb:51
2 rubyboy/lib/rubyboy/apu.rb:50
2 rubyboy/lib/rubyboy.rb:75
2 rubyboy/lib/rubyboy.rb:74
2 rubyboy/lib/rubyboy.rb:43
2 heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:58
2 heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:53
2 heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:52
1 rubyboy/lib/rubyboy/ppu.rb:306
1 rubyboy/lib/rubyboy/ppu.rb:193
1 rubyboy/lib/rubyboy/ppu.rb:180
1 rubyboy/lib/rubyboy/ppu.rb:179
1 rubyboy/lib/rubyboy/lcd.rb:37
1 rubyboy/lib/rubyboy/lcd.rb:36
1 rubyboy/lib/rubyboy/lcd.rb:27
1 rubyboy/lib/rubyboy/cpu.rb:92
1 rubyboy/lib/rubyboy/cpu.rb:894
1 rubyboy/lib/rubyboy/cpu.rb:76
1 rubyboy/lib/rubyboy/cpu.rb:748
1 rubyboy/lib/rubyboy/cpu.rb:69
1 rubyboy/lib/rubyboy/cpu.rb:275
1 rubyboy/lib/rubyboy/cpu.rb:254
1 rubyboy/lib/rubyboy/cpu.rb:219
1 rubyboy/lib/rubyboy/cpu.rb:1027
1 rubyboy/lib/rubyboy/bus.rb:87
1 rubyboy/lib/rubyboy.rb:81
1 rubyboy/lib/rubyboy.rb:80
1 rubyboy/lib/rubyboy.rb:76
1 rubyboy/lib/rubyboy.rb:45
1 rubyboy/lib/rubyboy.rb:44
1 rubyboy/lib/rubyboy.rb:39
retained objects by class
-----------------------------------
174 <callcache> (IMEMO)
75 <constcache> (IMEMO)
2 <ment> (IMEMO)
1 Thread::Mutex
Allocated String Report
-----------------------------------
Retained String Report
-----------------------------------
cpuリファクタリング前
rubyboy % stackprof stackprof-cpu-myapp.dump
==================================
Mode: cpu(1000)
Samples: 12679 (6.31% miss rate)
GC: 2873 (22.66%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
2092 (16.5%) 2092 (16.5%) Integer#<=>
1480 (11.7%) 1480 (11.7%) (sweeping)
2873 (22.7%) 1320 (10.4%) (garbage collection)
1180 (9.3%) 1180 (9.3%) Rubyboy::Ppu#to_signed_byte
1153 (9.1%) 1153 (9.1%) Rubyboy::SDL.RenderClear
749 (5.9%) 749 (5.9%) Rubyboy::Ppu#get_pixel
2181 (17.2%) 739 (5.8%) Rubyboy::Ppu#render_bg
1691 (13.3%) 608 (4.8%) Rubyboy::Ppu#render_window
868 (6.8%) 378 (3.0%) Rubyboy::Ppu#render_sprites
300 (2.4%) 300 (2.4%) Integer#>>
770 (6.1%) 292 (2.3%) Enumerable#each_slice
5044 (39.8%) 290 (2.3%) Rubyboy::Ppu#step
221 (1.7%) 221 (1.7%) Rubyboy::Registers#read8
220 (1.7%) 220 (1.7%) Rubyboy::SDL.UpdateTexture
3939 (31.1%) 189 (1.5%) Integer#times
184 (1.5%) 184 (1.5%) Rubyboy::Timer#step
388 (3.1%) 118 (0.9%) Rubyboy::Ppu#get_color
109 (0.9%) 109 (0.9%) Rubyboy::Registers#write8
105 (0.8%) 105 (0.8%) Array#size
664 (5.2%) 74 (0.6%) Rubyboy::Cartridge::Mbc1#read_byte
73 (0.6%) 73 (0.6%) (marking)
749 (5.9%) 68 (0.5%) Array#each
100 (0.8%) 68 (0.5%) Rubyboy::Cpu#flags
66 (0.5%) 66 (0.5%) Rubyboy::Cpu#increment_pc_by_byte
1206 (9.5%) 64 (0.5%) Rubyboy::Cpu#ld8
61 (0.5%) 61 (0.5%) Rubyboy::Registers#read16
9800 (77.3%) 57 (0.4%) Rubyboy::Console#bench
2137 (16.9%) 48 (0.4%) Range#===
1434 (11.3%) 38 (0.3%) Rubyboy::Lcd#draw
35 (0.3%) 35 (0.3%) Rubyboy::SDL.GetKeyboardState
起動直後のところははやい
rubyboy % stackprof stackprof-cpu-myapp.dump
==================================
Mode: cpu(1000)
Samples: 8706 (8.09% miss rate)
GC: 890 (10.22%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
1135 (13.0%) 1135 (13.0%) Rubyboy::SDL.RenderClear
1034 (11.9%) 1034 (11.9%) Integer#<=>
2797 (32.1%) 1026 (11.8%) Rubyboy::Ppu#render_bg
939 (10.8%) 939 (10.8%) Rubyboy::Ppu#to_signed_byte
633 (7.3%) 633 (7.3%) Rubyboy::Ppu#get_pixel
455 (5.2%) 455 (5.2%) (sweeping)
1109 (12.7%) 454 (5.2%) Rubyboy::Ppu#render_sprites
890 (10.2%) 405 (4.7%) (garbage collection)
899 (10.3%) 387 (4.4%) Enumerable#each_slice
4575 (52.5%) 280 (3.2%) Rubyboy::Ppu#step
247 (2.8%) 247 (2.8%) Rubyboy::SDL.UpdateTexture
231 (2.7%) 231 (2.7%) Integer#>>
3311 (38.0%) 192 (2.2%) Integer#times
164 (1.9%) 164 (1.9%) Rubyboy::Timer#step
374 (4.3%) 139 (1.6%) Rubyboy::Ppu#render_window
116 (1.3%) 116 (1.3%) Rubyboy::Registers#read8
96 (1.1%) 96 (1.1%) Array#size
298 (3.4%) 87 (1.0%) Rubyboy::Ppu#get_color
455 (5.2%) 62 (0.7%) Rubyboy::Cartridge::Mbc1#read_byte
7807 (89.7%) 55 (0.6%) Rubyboy::Console#bench
974 (11.2%) 51 (0.6%) Array#each
48 (0.6%) 48 (0.6%) Rubyboy::Registers#write8
47 (0.5%) 47 (0.5%) Rubyboy::Registers#read16
52 (0.6%) 33 (0.4%) Rubyboy::Cpu#flags
30 (0.3%) 30 (0.3%) (marking)
1434 (16.5%) 30 (0.3%) Rubyboy::Lcd#draw
1036 (11.9%) 30 (0.3%) Range#===
25 (0.3%) 25 (0.3%) Rubyboy::Interrupt#interrupts
1547 (17.8%) 25 (0.3%) Rubyboy::Cpu#exec
21 (0.2%) 21 (0.2%) Rubyboy::SDL.GetKeyboardState
cpu内で全命令の引数にハッシュを使っていた箇所をシンボルを使うようにすることで、GCを13.16%まで減らせた。続きは
リファクタリング第二弾
目標
ポケモン赤を音ありで60fps安定させる
現状(音無し)
rubyboy % RUBYOPT=--yjit bundle exec rubyboy bench
Ruby: 3.3.0
YJIT: true
1: 29.089206 sec
FPS: 51.56551883884352
rubyboy % bundle exec stackprof stackprof-cpu-myapp.dump
==================================
Mode: cpu(1000)
Samples: 11807 (5.57% miss rate)
GC: 1554 (13.16%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
2237 (18.9%) 2237 (18.9%) Integer#<=>
1133 (9.6%) 1133 (9.6%) Rubyboy::Ppu#to_signed_byte
1108 (9.4%) 1108 (9.4%) Rubyboy::SDL.RenderClear
1037 (8.8%) 1037 (8.8%) (sweeping)
2236 (18.9%) 804 (6.8%) Rubyboy::Ppu#render_bg
770 (6.5%) 770 (6.5%) Rubyboy::Ppu#get_pixel
1652 (14.0%) 585 (5.0%) Rubyboy::Ppu#render_window
1554 (13.2%) 488 (4.1%) (garbage collection)
386 (3.3%) 386 (3.3%) Integer#>>
888 (7.5%) 364 (3.1%) Rubyboy::Ppu#render_sprites
787 (6.7%) 315 (2.7%) Enumerable#each_slice
5070 (42.9%) 275 (2.3%) Rubyboy::Ppu#step
236 (2.0%) 236 (2.0%) Rubyboy::SDL.UpdateTexture
207 (1.8%) 207 (1.8%) Rubyboy::Timer#step
199 (1.7%) 199 (1.7%) Rubyboy::Registers#a=
1007 (8.5%) 189 (1.6%) Rubyboy::Cpu#get_value
3985 (33.8%) 189 (1.6%) Integer#times
130 (1.1%) 130 (1.1%) Rubyboy::Cpu#flags
116 (1.0%) 116 (1.0%) Array#size
380 (3.2%) 96 (0.8%) Rubyboy::Ppu#get_color
719 (6.1%) 90 (0.8%) Rubyboy::Cartridge::Mbc1#read_byte
2302 (19.5%) 71 (0.6%) Range#===
761 (6.4%) 67 (0.6%) Array#each
61 (0.5%) 61 (0.5%) Rubyboy::Registers#hl
10253 (86.8%) 48 (0.4%) Rubyboy::Console#bench
48 (0.4%) 48 (0.4%) Rubyboy::Registers#b=
46 (0.4%) 46 (0.4%) Rubyboy::Cpu#increment_pc_by_byte
1410 (11.9%) 45 (0.4%) Rubyboy::Lcd#draw
1171 (9.9%) 38 (0.3%) Rubyboy::Ppu#get_tile_index
36 (0.3%) 36 (0.3%) Rubyboy::Registers#f=
CPUのリファクタリング
やったこと
- heap-profilerでメモリ使用箇所を探して最適化する
- flag取得のために毎回ハッシュを作っていた箇所をつくらないように
- レジスタの読み書きをsendメソッドを使わず
when :a then @registers.a = value
のように愚直に
結果
rubyboy % RUBYOPT=--yjit bundle exec rubyboy bench
Ruby: 3.3.0
YJIT: true
1: 26.798767 sec
FPS: 55.97272441676141
rubyboy % bundle exec stackprof stackprof-cpu-myapp.dump
==================================
Mode: cpu(1000)
Samples: 10430 (5.57% miss rate)
GC: 283 (2.71%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
2275 (21.8%) 2275 (21.8%) Integer#<=>
1267 (12.1%) 1267 (12.1%) Rubyboy::SDL.RenderClear
1186 (11.4%) 1186 (11.4%) Rubyboy::Ppu#to_signed_byte
2366 (22.7%) 864 (8.3%) Rubyboy::Ppu#render_bg
784 (7.5%) 784 (7.5%) Rubyboy::Ppu#get_pixel
1773 (17.0%) 641 (6.1%) Rubyboy::Ppu#render_window
992 (9.5%) 415 (4.0%) Rubyboy::Ppu#render_sprites
334 (3.2%) 334 (3.2%) Integer#>>
852 (8.2%) 319 (3.1%) Enumerable#each_slice
5453 (52.3%) 311 (3.0%) Rubyboy::Ppu#step
4199 (40.3%) 213 (2.0%) Integer#times
188 (1.8%) 188 (1.8%) Rubyboy::Timer#step
187 (1.8%) 187 (1.8%) (sweeping)
142 (1.4%) 142 (1.4%) Rubyboy::SDL.UpdateTexture
129 (1.2%) 129 (1.2%) Array#size
426 (4.1%) 114 (1.1%) Rubyboy::Ppu#get_color
851 (8.2%) 109 (1.0%) Array#each
981 (9.4%) 105 (1.0%) Rubyboy::Cpu#get_value
283 (2.7%) 85 (0.8%) (garbage collection)
708 (6.8%) 75 (0.7%) Rubyboy::Cartridge::Mbc1#read_byte
67 (0.6%) 67 (0.6%) Rubyboy::Cpu#increment_pc_by_byte
10147 (97.3%) 66 (0.6%) Rubyboy::Console#bench
2327 (22.3%) 53 (0.5%) Range#===
43 (0.4%) 43 (0.4%) Rubyboy::Registers#hl
37 (0.4%) 37 (0.4%) Rubyboy::Interrupt#interrupts
34 (0.3%) 34 (0.3%) Rubyboy::Registers#a=
2958 (28.4%) 33 (0.3%) Rubyboy::Cpu#exec
1216 (11.7%) 30 (0.3%) Rubyboy::Ppu#get_tile_index
2030 (19.5%) 28 (0.3%) Rubyboy::Bus#read_byte
1455 (14.0%) 26 (0.2%) Rubyboy::Lcd#draw
FPS: 51.56551883884352 -> 55.97272441676141
GC: 13.16% -> 2.71%
Integer#<=>を減らす
やったこと
数値の比較は以下のようなaddrによる分岐で多く発生してしまう。
→ あらかじめaddrと処理の内容をキャッシュしておくことで比較無しで高速に処理を実行できるようにする (参考: https://www.slideshare.net/mametter/ruby-65182128#46)
def read_byte(addr)
case addr
when 0x0000..0x7fff
@mbc.read_byte(addr)
when 0x8000..0x9fff
@ppu.read_byte(addr)
...
結果
rubyboy % RUBYOPT=--yjit bundle exec rubyboy bench
Ruby: 3.3.0
YJIT: true
1: 21.75409 sec
FPS: 68.95255099156066
rubyboy % bundle exec stackprof stackprof-cpu-myapp.dump
==================================
Mode: cpu(1000)
Samples: 9505 (6.87% miss rate)
GC: 325 (3.42%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
1238 (13.0%) 1238 (13.0%) Rubyboy::Ppu#to_signed_byte
1208 (12.7%) 1208 (12.7%) Rubyboy::SDL.RenderClear
2558 (26.9%) 907 (9.5%) Rubyboy::Ppu#render_bg
865 (9.1%) 865 (9.1%) Rubyboy::Ppu#get_pixel
849 (8.9%) 849 (8.9%) Rubyboy::Cartridge::Mbc1#set_methods
1803 (19.0%) 663 (7.0%) Rubyboy::Ppu#render_window
1053 (11.1%) 460 (4.8%) Rubyboy::Ppu#render_sprites
5782 (60.8%) 346 (3.6%) Rubyboy::Ppu#step
906 (9.5%) 343 (3.6%) Enumerable#each_slice
313 (3.3%) 313 (3.3%) Integer#>>
4412 (46.4%) 245 (2.6%) Integer#times
237 (2.5%) 237 (2.5%) (sweeping)
197 (2.1%) 197 (2.1%) Rubyboy::Timer#step
193 (2.0%) 193 (2.0%) Rubyboy::SDL.UpdateTexture
1141 (12.0%) 162 (1.7%) Rubyboy::Bus#set_methods
433 (4.6%) 134 (1.4%) Rubyboy::Ppu#get_color
114 (1.2%) 114 (1.2%) Array#size
478 (5.0%) 109 (1.1%) Rubyboy::Cpu#get_value
918 (9.7%) 99 (1.0%) Array#each
75 (0.8%) 75 (0.8%) Rubyboy::Cpu#increment_pc_by_byte
9180 (96.6%) 68 (0.7%) Rubyboy::Console#bench
325 (3.4%) 65 (0.7%) (garbage collection)
49 (0.5%) 49 (0.5%) Integer#<=>
45 (0.5%) 45 (0.5%) Rubyboy::Interrupt#interrupts
36 (0.4%) 36 (0.4%) Rubyboy::Registers#hl
36 (0.4%) 36 (0.4%) Rubyboy::Registers#a=
1651 (17.4%) 35 (0.4%) Rubyboy::Cpu#exec
1042 (11.0%) 30 (0.3%) Rubyboy::Bus#read_byte
1267 (13.3%) 29 (0.3%) Rubyboy::Ppu#get_tile_index
1448 (15.2%) 26 (0.3%) Rubyboy::Lcd#draw
FPS: 55.97272441676141 -> 68.95255099156066
Integer#<=>: 21.8% -> 0.5%
rubyboy % RUBYOPT=--yjit bundle exec rubyboy bench
Ruby: 3.3.0
YJIT: true
1: 27.340768 sec
FPS: 54.86312601021302
音ありだと54fpsぐらい。これを60fps出るようにしたい。
PPUとAPUもリファクタリングする