Open50

RubyでGBエミュレータを作る

ピン留めされたアイテム
sacckeysacckey

TODO

  • & 0xffをいろんな場所に直書きしているのをなんとかする
  • cpu命令を別クラスに実装する
  • ロゴつくる
  • エラーハンドリング
  • putsにする
  • 実行コマンドを追加する
    • 実行コマンドごとにファイルを分ける
    • start, bench, stackprof
  • apuの定数
    • audioの環境変数(48000)を参照してステップ数を計算するようにする
    • 512をaudioから引っ張ってくる
  • joypad入力管理を別クラスにする
  • yjitの有効化を実行時に行う
  • READMEにスクショ追加
  • ppuの実装を見直す
    • マニュアルと他実装を見る
    • ぷよぷよのOPが変
  • テストを書く
  • 影なしのスクショを取る
  • タイトルとfpsを表示する
  • シリアル通信を追加する
  • 他のテストも試す
    • pyboy参考
  • テストファイルを作って、ciで実行する
  • 描画バグの修正
  • MBCタイプの追加
  • ゲームボーイカラー対応
  • Wasm対応
    • apu, ppu, joypadのリファクタリング
  • ベンチマークまわりの整備
    • 描画と音声のありなしをできるようにしたい
    • READMEにも追加する
  • Rubyのベンチマークプログラムとして使えるようにしたい
  • 不要なrequireの削除
  • 描画の高速化
    • 不要な描画をしない
Hidden comment
sacckeysacckey

CPU実装

HelloWorldが動くのを最初の目標にする
https://github.com/dusterherz/gb-hello-world

bgの描画処理とcpuの命令の実装が必要そう

sacckeysacckey

0x0100に到達した時点でレジスタに初期値が入ってるんだけど、これはそういうものっぽい

sacckeysacckey

0xf3まで打つとコメントと処理内容がサジェストされた

sacckeysacckey

HelloWorldに必要な最低限の処理は実装できたので、PPUの実装に移る

sacckeysacckey

HelloWorldが動いたので、cpu命令を全部実装していく

sacckeysacckey

疑問点

  • DEC命令のハーフキャリーフラグの説明が、H - Set if no borrow from bit 4.となっているが、borrowがあったときにたてる?
sacckeysacckey

各テストがこけていた理由をメモする。はまりポイント

  • 1
    • 5: pop afでfの値をsetするとき、下位4bitを0000にしていなかった
  • 3
    • 1: 0xe8と0xf8のcフラグの計算方法
      • cflag = (@sp & 0xff) + (byte & 0xff) > 0xff で通った
sacckeysacckey

テストが通った!
速度を上げるために描画方法を変えたらぼやけるようになってしまった

sacckeysacckey

halt命令の注意点

停止状態はimeがfalseであろうと、割り込みが発生したら解除される。

sacckeysacckey
sacckeysacckey

ppuのテストも通った
描画はraylibを使うように修正した

sacckeysacckey

bgbのテストROMでキー入力も確認できたが、スプライトが最前面に表示されていない。

sacckeysacckey

わからんポイント

画面描画の仕組み

  • 描画処理が行われるタイミング
    • → エミュレータでは、CPU命令を実行するたびにサイクル数を数えておき、VBlankになるまでサイクル数がたまったら画面を更新すれば良い
  • VRAMにデータを書き込むタイミング
    • → VBlankになるまでlyレジスタを見ながらループする
sacckeysacckey
sacckeysacckey

計測方法

tobu.gbの最初の1500フレームをheadlessで実行したときにかかる時間を3回はかる

v1.0.0

yjit: false
1: 36.740829 sec
2: 36.468515 sec
3: 36.177083 sec
FPS: 41.1385591742566

yjit: true
1: 32.305559 sec
2: 32.094778 sec
3: 31.889601 sec
FPS: 46.73385499531633

描画処理にも時間がかかっているためそちらも高速化が必要だが、
とりあえず描画なしで60fps以上出せることを目標に高速化していく

sacckeysacckey

stackprof1回目
→ render_spritesがボトルネックになっている

==================================
  Mode: cpu(1000)
  Samples: 9081 (1.08% miss rate)
  GC: 4 (0.04%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      3727  (41.0%)        1920  (21.1%)     Rubyboy::Ppu#render_sprites
      1800  (19.8%)        1800  (19.8%)     Rubyboy::Operand#initialize
      1448  (15.9%)        1448  (15.9%)     Integer#zero?
      3346  (36.8%)        1296  (14.3%)     Enumerable#each_slice
       919  (10.1%)         919  (10.1%)     Integer#<<
       424   (4.7%)         424   (4.7%)     Integer#<=>
      3552  (39.1%)         294   (3.2%)     Array#each
       162   (1.8%)         159   (1.8%)     Rubyboy::Cpu#flags
       147   (1.6%)         147   (1.6%)     Array#size
       104   (1.1%)         104   (1.1%)     Integer#>>
      2220  (24.4%)          71   (0.8%)     Rubyboy::Ppu#render_bg
      6259  (68.9%)          58   (0.6%)     Rubyboy::Ppu#step
       149   (1.6%)          55   (0.6%)     Rubyboy::Ppu#get_color
        44   (0.5%)          44   (0.5%)     Rubyboy::Ppu#to_signed_byte
       177   (1.9%)          38   (0.4%)     Rubyboy::Timer#step
        34   (0.4%)          34   (0.4%)     Integer#-@
       146   (1.6%)          29   (0.3%)     Rubyboy::Cartridge::Mbc1#read_byte
        29   (0.3%)          29   (0.3%)     Rubyboy::Registers#write8
       915  (10.1%)          24   (0.3%)     Rubyboy::Ppu#get_pixel
        17   (0.2%)          17   (0.2%)     Rubyboy::Registers#read8
        14   (0.2%)          14   (0.2%)     Rubyboy::Cpu#increment_pc_by_byte
      9054  (99.7%)          14   (0.2%)     Rubyboy::Console#bench
       434   (4.8%)          11   (0.1%)     Range#===
       398   (4.4%)          10   (0.1%)     Rubyboy::Bus#read_byte
         9   (0.1%)           9   (0.1%)     Rubyboy::Ppu#handle_ly_eq_lyc
       154   (1.7%)           8   (0.1%)     Rubyboy::Ppu#render_window
      1244  (13.7%)           7   (0.1%)     Rubyboy::Ppu#get_tile_index
      2597  (28.6%)           6   (0.1%)     Rubyboy::Cpu#exec
       110   (1.2%)           5   (0.1%)     Rubyboy::Bus#write_byte
         9   (0.1%)           5   (0.1%)     Rubyboy::Ppu#write_byte
sacckeysacckey

render_sprites

Rubyboy::Ppu#render_sprites (/Users/yamasaki/dev/gb-emulator/rubyboy/rubyboy/lib/rubyboy/ppu.rb:220)
  samples:  1920 self (21.1%)  /   3727 total (41.0%)
  callers:
    3727  (  100.0%)  Rubyboy::Ppu#step
    1902  (   51.0%)  Enumerable#each_slice
      46  (    1.2%)  Enumerator#with_index
      35  (    0.9%)  Array#each
      29  (    0.8%)  Integer#times
  callees (1807 total):
    3307  (  183.0%)  Enumerator#each
     339  (   18.8%)  Enumerator#with_index
      39  (    2.2%)  Enumerable#each_slice
      36  (    2.0%)  Array#each
      34  (    1.9%)  Integer#-@
      29  (    1.6%)  Integer#times
      20  (    1.1%)  Rubyboy::Ppu#get_pixel
       9  (    0.5%)  Integer#zero?
       5  (    0.3%)  Rubyboy::Ppu#get_color
       1  (    0.1%)  Enumerable#sort_by
  code:
                                  |   220  |     def render_sprites
    3    (0.0%)                   |   221  |       return if @lcdc[LCDC[:sprite_enable]].zero?
                                  |   222  | 
    2    (0.0%)                   |   223  |       sprite_height = @lcdc[LCDC[:sprite_size]].zero? ? 8 : 16
                                  |   224  |       sprites = []
                                  |   225  |       cnt = 0
 3346   (36.8%)                   |   226  |       @oam.each_slice(4).each do |sprite_attr|
                                  |   227  |         sprite = {
                                  |   228  |           y: (sprite_attr[0] - 16) % 256,
                                  |   229  |           x: (sprite_attr[1] - 8) % 256,
                                  |   230  |           tile_index: sprite_attr[2],
                                  |   231  |           flags: sprite_attr[3]
                                  |   232  |         }
                                  |   233  |         next if sprite[:y] > @ly || sprite[:y] + sprite_height <= @ly
                                  |   234  | 
                                  |   235  |         sprites << sprite
                                  |   236  |         cnt += 1
   15    (0.2%) /    15   (0.2%)  |   237  |         break if cnt == 10
 1887   (20.8%) /  1887  (20.8%)  |   238  |       end
  386    (4.3%) /    12   (0.1%)  |   239  |       sprites = sprites.sort_by.with_index { |sprite, i| [-sprite[:x], -i] }
                                  |   240  | 
   36    (0.4%)                   |   241  |       sprites.each do |sprite|
                                  |   242  |         flags = sprite[:flags]
    4    (0.0%)                   |   243  |         pallet = flags[SPRITE_FLAGS[:dmg_palette]].zero? ? @obp0 : @obp1
                                  |   244  |         tile_index = sprite[:tile_index]
                                  |   245  |         tile_index &= 0xfe if sprite_height == 16
                                  |   246  |         y = (@ly - sprite[:y]) % 256
    2    (0.0%) /     2   (0.0%)  |   247  |         y = sprite_height - y - 1 if flags[SPRITE_FLAGS[:y_flip]] == 1
                                  |   248  |         tile_index = (tile_index + 1) % 256 if y >= 8
                                  |   249  |         y %= 8
                                  |   250  | 
   29    (0.3%)                   |   251  |         8.times do |x|
    2    (0.0%) /     2   (0.0%)  |   252  |           x_flipped = flags[SPRITE_FLAGS[:x_flip]] == 1 ? 7 - x : x
                                  |   253  | 
   20    (0.2%)                   |   254  |           pixel = get_pixel(tile_index, x_flipped, y)
                                  |   255  |           i = (sprite[:x] + x) % 256
                                  |   256  | 
                                  |   257  |           next if pixel.zero? || i >= LCD_WIDTH
    2    (0.0%) /     2   (0.0%)  |   258  |           next if flags[SPRITE_FLAGS[:priority]] == 1 && @bg_pixels[i] != 0
                                  |   259  | 
    5    (0.1%)                   |   260  |           @buffer[@ly * LCD_WIDTH + i] = get_color(pallet, pixel)
                                  |   261  |         end

spriteを毎回作らないように修正

FPS: 46.73385499531633 → 49.2233733053377

sacckeysacckey

オペランドクラスのかわりにハッシュを使う

FPS: 49.36641822413328 → 50.94130878614299

sacckeysacckey

stackprof 2回目

==================================
  Mode: cpu(1000)
  Samples: 5666 (1.73% miss rate)
  GC: 7 (0.12%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      1334  (23.5%)        1334  (23.5%)     Rubyboy::Ppu#to_signed_byte
      1260  (22.2%)        1260  (22.2%)     Integer#<<
       662  (11.7%)         662  (11.7%)     Integer#<=>
       913  (16.1%)         471   (8.3%)     Array#each
       403   (7.1%)         403   (7.1%)     Rubyboy::Registers#read8
       410   (7.2%)         214   (3.8%)     Enumerable#each_slice
      3739  (66.0%)         190   (3.4%)     Rubyboy::Ppu#step
       180   (3.2%)         180   (3.2%)     Rubyboy::Timer#step
       961  (17.0%)         165   (2.9%)     Rubyboy::Ppu#render_sprites
       195   (3.4%)         163   (2.9%)     Rubyboy::Cpu#flags
        98   (1.7%)          98   (1.7%)     Integer#>>
      2415  (42.6%)          87   (1.5%)     Rubyboy::Ppu#render_bg
        57   (1.0%)          57   (1.0%)     Array#size
       138   (2.4%)          49   (0.9%)     Rubyboy::Ppu#get_color
       152   (2.7%)          32   (0.6%)     Rubyboy::Cartridge::Mbc1#read_byte
        31   (0.5%)          31   (0.5%)     Rubyboy::Registers#write8
        29   (0.5%)          29   (0.5%)     Integer#-@
      1020  (18.0%)          27   (0.5%)     Rubyboy::Ppu#get_pixel
        19   (0.3%)          19   (0.3%)     Rubyboy::Interrupt#interrupts
       869  (15.3%)          17   (0.3%)     Rubyboy::Cpu#get_value
      1349  (23.8%)          15   (0.3%)     Rubyboy::Ppu#get_tile_index
      5634  (99.4%)          13   (0.2%)     Rubyboy::Console#bench
       643  (11.3%)          11   (0.2%)     Rubyboy::Bus#read_byte
       569  (10.0%)          10   (0.2%)     Rubyboy::Cpu#ld8
         9   (0.2%)           9   (0.2%)     Rubyboy::Cpu#increment_pc_by_byte
         8   (0.1%)           8   (0.1%)     Rubyboy::Ppu#handle_ly_eq_lyc
       100   (1.8%)           7   (0.1%)     Rubyboy::Bus#write_byte
       666  (11.8%)           6   (0.1%)     Range#===
       164   (2.9%)           5   (0.1%)     Rubyboy::Ppu#render_window
         5   (0.1%)           5   (0.1%)     FFI::FunctionType#initialize
sacckeysacckey

PPUのリファクタリング

Initialize tile_map_addr outside the loop

FPS: 50.94130878614299 → 56.6580741129914

Precompute outside the loop

FPS: 56.6580741129914 → 60.44140113483162

TODO: 定数をやめる

速度は上がるが可読性が落ちるのであんまりやりたくない

sacckeysacckey

ポケモン赤のスタート画面が重いのでリファクタリングする。
音を出すと特に重い

rubyboy % stackprof stackprof-cpu-myapp.dump          
==================================
  Mode: cpu(1000)
  Samples: 16405 (4.57% miss rate)
  GC: 5593 (34.09%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      3688  (22.5%)        3688  (22.5%)     (sweeping)
      2332  (14.2%)        2109  (12.9%)     Enumerable#flat_map
      2050  (12.5%)        2050  (12.5%)     Integer#<=>
      5593  (34.1%)        1679  (10.2%)     (garbage collection)
      1038   (6.3%)        1038   (6.3%)     Rubyboy::Ppu#to_signed_byte
      1004   (6.1%)        1004   (6.1%)     Rubyboy::SDL.RenderClear
       646   (3.9%)         646   (3.9%)     Rubyboy::Ppu#get_pixel
       437   (2.7%)         437   (2.7%)     Integer#>>
       701   (4.3%)         332   (2.0%)     Rubyboy::Ppu#render_sprites
      1354   (8.3%)         278   (1.7%)     Rubyboy::Lcd#draw
      3825  (23.3%)         257   (1.6%)     Rubyboy::Ppu#step
      1627   (9.9%)         255   (1.6%)     Rubyboy::Ppu#render_bg
       633   (3.9%)         247   (1.5%)     Enumerable#each_slice
       230   (1.4%)         230   (1.4%)     Rubyboy::Registers#read8
       226   (1.4%)         226   (1.4%)     (marking)
      2332  (14.2%)         223   (1.4%)     Rubyboy::Console#buffer_to_pixel_data
      2933  (17.9%)         194   (1.2%)     Integer#times
      1228   (7.5%)         185   (1.1%)     Rubyboy::Ppu#render_window
       178   (1.1%)         178   (1.1%)     Rubyboy::Timer#step
       524   (3.2%)         110   (0.7%)     Rubyboy::Ppu#get_color
        95   (0.6%)          95   (0.6%)     Rubyboy::Registers#write8
       116   (0.7%)          81   (0.5%)     Rubyboy::Cpu#flags
        80   (0.5%)          80   (0.5%)     Rubyboy::Registers#read16
       662   (4.0%)          80   (0.5%)     Rubyboy::Cartridge::Mbc1#read_byte
      1203   (7.3%)          69   (0.4%)     Rubyboy::Cpu#ld8
        62   (0.4%)          62   (0.4%)     Array#size
        57   (0.3%)          57   (0.3%)     Rubyboy::Cpu#increment_pc_by_byte
        56   (0.3%)          56   (0.3%)     Rubyboy::SDL.UpdateTexture
        44   (0.3%)          44   (0.3%)     Rubyboy::Interrupt#interrupts
      2090  (12.7%)          44   (0.3%)     Range#===
sacckeysacckey

画面の配列を最後にflat_mapで3倍にするのをやめて、都度r,g,bの三色分つくるようにした

rubyboy % stackprof stackprof-cpu-myapp.dump          
==================================
  Mode: cpu(1000)
  Samples: 11758 (6.52% miss rate)
  GC: 3103 (26.39%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      1857  (15.8%)        1857  (15.8%)     Integer#<=>
      1542  (13.1%)        1542  (13.1%)     (sweeping)
      3103  (26.4%)        1459  (12.4%)     (garbage collection)
      1087   (9.2%)        1087   (9.2%)     Rubyboy::SDL.RenderClear
       950   (8.1%)         950   (8.1%)     Rubyboy::Ppu#to_signed_byte
      1797  (15.3%)         646   (5.5%)     Rubyboy::Ppu#render_bg
       606   (5.2%)         606   (5.2%)     Rubyboy::Ppu#get_pixel
      1381  (11.7%)         467   (4.0%)     Rubyboy::Ppu#render_window
       712   (6.1%)         287   (2.4%)     Rubyboy::Ppu#render_sprites
       281   (2.4%)         281   (2.4%)     Rubyboy::SDL.UpdateTexture
       618   (5.3%)         261   (2.2%)     Enumerable#each_slice
      4152  (35.3%)         249   (2.1%)     Rubyboy::Ppu#step
       246   (2.1%)         246   (2.1%)     Integer#>>
       192   (1.6%)         192   (1.6%)     Rubyboy::Registers#read8
      3252  (27.7%)         178   (1.5%)     Integer#times
       162   (1.4%)         162   (1.4%)     Rubyboy::Timer#step
       102   (0.9%)         102   (0.9%)     (marking)
       321   (2.7%)         101   (0.9%)     Rubyboy::Ppu#get_color
       600   (5.1%)         100   (0.9%)     Rubyboy::Cartridge::Mbc1#read_byte
        88   (0.7%)          88   (0.7%)     Rubyboy::Registers#write8
        80   (0.7%)          80   (0.7%)     Rubyboy::Registers#read16
        78   (0.7%)          78   (0.7%)     Array#size
      1108   (9.4%)          73   (0.6%)     Rubyboy::Cpu#ld8
       104   (0.9%)          69   (0.6%)     Rubyboy::Cpu#flags
        64   (0.5%)          64   (0.5%)     Rubyboy::SDL.GetKeyboardState
       620   (5.3%)          56   (0.5%)     Array#each
      1904  (16.2%)          52   (0.4%)     Range#===
      1441  (12.3%)          52   (0.4%)     Rubyboy::Lcd#draw
        41   (0.3%)          41   (0.3%)     Rubyboy::Cpu#increment_pc_by_byte
      8648  (73.5%)          39   (0.3%)     Rubyboy::Console#bench
sacckeysacckey

https://github.com/Shopify/heap-profiler

heap-profilerで調べたところcpu.rbのHashがメモリを大量につかっているので修正する

rubyboy % heap-profiler tmp/report                                           
Total allocated: 563.01 MB (4198804 objects)
Total retained: 10.13 kB (252 objects)

allocated memory by gem
-----------------------------------
 563.01 MB  rubyboy/lib
  320.00 B  heap-profiler-0.7.0

allocated memory by file
-----------------------------------
 454.17 MB  rubyboy/lib/rubyboy/cpu.rb
  93.18 MB  rubyboy/lib/rubyboy/ppu.rb
  10.06 MB  rubyboy/lib/rubyboy/apu.rb
   4.35 MB  rubyboy/lib/rubyboy/audio.rb
   1.25 MB  rubyboy/lib/rubyboy.rb
  720.00 B  rubyboy/lib/rubyboy/lcd.rb
  416.00 B  rubyboy/lib/rubyboy/apu_channels/channel2.rb
  416.00 B  rubyboy/lib/rubyboy/apu_channels/channel1.rb
  320.00 B  rubyboy/lib/rubyboy/bus.rb
  320.00 B  heap-profiler-0.7.0/lib/heap_profiler/reporter.rb
  296.00 B  rubyboy/lib/rubyboy/interrupt.rb
  120.00 B  rubyboy/lib/rubyboy/apu_channels/channel4.rb
   80.00 B  rubyboy/lib/rubyboy/registers.rb
   40.00 B  rubyboy/lib/rubyboy/cartridge/mbc1.rb
   40.00 B  rubyboy/lib/rubyboy/apu_channels/channel3.rb

allocated memory by location
-----------------------------------
  77.83 MB  rubyboy/lib/rubyboy/cpu.rb:600
  65.28 MB  rubyboy/lib/rubyboy/ppu.rb:248
  38.96 MB  rubyboy/lib/rubyboy/cpu.rb:283
  35.15 MB  rubyboy/lib/rubyboy/cpu.rb:174
  19.13 MB  rubyboy/lib/rubyboy/cpu.rb:85
  18.67 MB  rubyboy/lib/rubyboy/cpu.rb:272
  18.47 MB  rubyboy/lib/rubyboy/cpu.rb:87
  16.77 MB  rubyboy/lib/rubyboy/cpu.rb:234
  15.01 MB  rubyboy/lib/rubyboy/cpu.rb:292
  14.61 MB  rubyboy/lib/rubyboy/ppu.rb:239
  13.74 MB  rubyboy/lib/rubyboy/cpu.rb:231
  13.67 MB  rubyboy/lib/rubyboy/cpu.rb:79
  11.77 MB  rubyboy/lib/rubyboy/cpu.rb:172
  11.38 MB  rubyboy/lib/rubyboy/cpu.rb:166
  11.35 MB  rubyboy/lib/rubyboy/cpu.rb:167
  10.51 MB  rubyboy/lib/rubyboy/cpu.rb:96
   9.17 MB  rubyboy/lib/rubyboy/cpu.rb:114
   8.86 MB  rubyboy/lib/rubyboy/cpu.rb:86
   8.55 MB  rubyboy/lib/rubyboy/cpu.rb:219
   8.08 MB  rubyboy/lib/rubyboy/ppu.rb:244
   7.46 MB  rubyboy/lib/rubyboy/cpu.rb:280
   6.67 MB  rubyboy/lib/rubyboy/cpu.rb:227
   6.22 MB  rubyboy/lib/rubyboy/cpu.rb:173
   6.18 MB  rubyboy/lib/rubyboy/cpu.rb:63
   5.74 MB  rubyboy/lib/rubyboy/cpu.rb:256
   5.55 MB  rubyboy/lib/rubyboy/cpu.rb:113
   5.51 MB  rubyboy/lib/rubyboy/cpu.rb:178
   5.32 MB  rubyboy/lib/rubyboy/cpu.rb:61
   5.31 MB  rubyboy/lib/rubyboy/cpu.rb:123
   5.21 MB  rubyboy/lib/rubyboy/ppu.rb:236
   5.15 MB  rubyboy/lib/rubyboy/cpu.rb:294
   4.76 MB  rubyboy/lib/rubyboy/cpu.rb:229
   4.35 MB  rubyboy/lib/rubyboy/audio.rb:31
   3.60 MB  rubyboy/lib/rubyboy/cpu.rb:58
   3.34 MB  rubyboy/lib/rubyboy/cpu.rb:70
   3.30 MB  rubyboy/lib/rubyboy/cpu.rb:94
   2.95 MB  rubyboy/lib/rubyboy/cpu.rb:163
   2.87 MB  rubyboy/lib/rubyboy/cpu.rb:106
   2.64 MB  rubyboy/lib/rubyboy/cpu.rb:147
   2.12 MB  rubyboy/lib/rubyboy/cpu.rb:228
   2.12 MB  rubyboy/lib/rubyboy/cpu.rb:139
   2.01 MB  rubyboy/lib/rubyboy/cpu.rb:276
   1.90 MB  rubyboy/lib/rubyboy/cpu.rb:71
   1.87 MB  rubyboy/lib/rubyboy/apu.rb:51
   1.87 MB  rubyboy/lib/rubyboy/apu.rb:58
   1.74 MB  rubyboy/lib/rubyboy/cpu.rb:66
   1.55 MB  rubyboy/lib/rubyboy/cpu.rb:171
   1.44 MB  rubyboy/lib/rubyboy/apu.rb:53
   1.44 MB  rubyboy/lib/rubyboy/apu.rb:60
   1.39 MB  rubyboy/lib/rubyboy/apu.rb:52

allocated memory by class
-----------------------------------
 462.20 MB  Hash
  49.79 MB  Array
  14.61 MB  Enumerator
  10.96 MB  <memo> (IMEMO)
  10.96 MB  <ifunc> (IMEMO)
  10.06 MB  Float
   4.35 MB  FFI::MemoryPointer
  55.88 kB  FFI::Pointer
  25.68 kB  <throw_data> (IMEMO)
   6.92 kB  <callcache> (IMEMO)
   2.96 kB  <constcache> (IMEMO)
   96.00 B  <ment> (IMEMO)

allocated objects by gem
-----------------------------------
   4198796  rubyboy/lib
         8  heap-profiler-0.7.0

allocated objects by file
-----------------------------------
   2839605  rubyboy/lib/rubyboy/cpu.rb
   1105342  rubyboy/lib/rubyboy/ppu.rb
    251462  rubyboy/lib/rubyboy/apu.rb
      1294  rubyboy/lib/rubyboy.rb
      1048  rubyboy/lib/rubyboy/audio.rb
        18  rubyboy/lib/rubyboy/lcd.rb
         8  rubyboy/lib/rubyboy/bus.rb
         8  heap-profiler-0.7.0/lib/heap_profiler/reporter.rb
         5  rubyboy/lib/rubyboy/apu_channels/channel2.rb
         5  rubyboy/lib/rubyboy/apu_channels/channel1.rb
         3  rubyboy/lib/rubyboy/apu_channels/channel4.rb
         2  rubyboy/lib/rubyboy/registers.rb
         2  rubyboy/lib/rubyboy/interrupt.rb
         1  rubyboy/lib/rubyboy/cartridge/mbc1.rb
         1  rubyboy/lib/rubyboy/apu_channels/channel3.rb

allocated objects by location
-----------------------------------
    689584  rubyboy/lib/rubyboy/ppu.rb:248
    486434  rubyboy/lib/rubyboy/cpu.rb:600
    273889  rubyboy/lib/rubyboy/ppu.rb:239
    243478  rubyboy/lib/rubyboy/cpu.rb:283
    219714  rubyboy/lib/rubyboy/cpu.rb:174
    119570  rubyboy/lib/rubyboy/cpu.rb:85
    116703  rubyboy/lib/rubyboy/cpu.rb:272
    115434  rubyboy/lib/rubyboy/cpu.rb:87
    104839  rubyboy/lib/rubyboy/cpu.rb:234
     93804  rubyboy/lib/rubyboy/cpu.rb:292
     91296  rubyboy/lib/rubyboy/ppu.rb:236
     85878  rubyboy/lib/rubyboy/cpu.rb:231
     85438  rubyboy/lib/rubyboy/cpu.rb:79
     73590  rubyboy/lib/rubyboy/cpu.rb:172
     71146  rubyboy/lib/rubyboy/cpu.rb:166
     70944  rubyboy/lib/rubyboy/cpu.rb:167
     65709  rubyboy/lib/rubyboy/cpu.rb:96
     57340  rubyboy/lib/rubyboy/cpu.rb:114
     55390  rubyboy/lib/rubyboy/cpu.rb:86
     53465  rubyboy/lib/rubyboy/cpu.rb:219
     50512  rubyboy/lib/rubyboy/ppu.rb:244
     46849  rubyboy/lib/rubyboy/apu.rb:51
     46847  rubyboy/lib/rubyboy/apu.rb:58
     46596  rubyboy/lib/rubyboy/cpu.rb:280
     41691  rubyboy/lib/rubyboy/cpu.rb:227
     38898  rubyboy/lib/rubyboy/cpu.rb:173
     38615  rubyboy/lib/rubyboy/cpu.rb:63
     36024  rubyboy/lib/rubyboy/apu.rb:53
     36023  rubyboy/lib/rubyboy/apu.rb:60
     35883  rubyboy/lib/rubyboy/cpu.rb:256
     34737  rubyboy/lib/rubyboy/apu.rb:52
     34736  rubyboy/lib/rubyboy/apu.rb:59
     34695  rubyboy/lib/rubyboy/cpu.rb:113
     34468  rubyboy/lib/rubyboy/cpu.rb:178
     33268  rubyboy/lib/rubyboy/cpu.rb:61
     33186  rubyboy/lib/rubyboy/cpu.rb:123
     32219  rubyboy/lib/rubyboy/cpu.rb:294
     29773  rubyboy/lib/rubyboy/cpu.rb:229
     22490  rubyboy/lib/rubyboy/cpu.rb:58
     20864  rubyboy/lib/rubyboy/cpu.rb:70
     20604  rubyboy/lib/rubyboy/cpu.rb:94
     18442  rubyboy/lib/rubyboy/cpu.rb:163
     17946  rubyboy/lib/rubyboy/cpu.rb:106
     16512  rubyboy/lib/rubyboy/cpu.rb:147
     13241  rubyboy/lib/rubyboy/cpu.rb:228
     13230  rubyboy/lib/rubyboy/cpu.rb:139
     12579  rubyboy/lib/rubyboy/cpu.rb:276
     11893  rubyboy/lib/rubyboy/cpu.rb:71
     10850  rubyboy/lib/rubyboy/cpu.rb:66
      9706  rubyboy/lib/rubyboy/cpu.rb:171

allocated objects by class
-----------------------------------
   2888757  Hash
    416967  Array
    273888  <memo> (IMEMO)
    273888  <ifunc> (IMEMO)
    251442  Float
     91296  Enumerator
      1040  FFI::MemoryPointer
       642  <throw_data> (IMEMO)
       635  FFI::Pointer
       173  <callcache> (IMEMO)
        74  <constcache> (IMEMO)
         2  <ment> (IMEMO)

retained memory by gem
-----------------------------------
   9.81 kB  rubyboy/lib
  320.00 B  heap-profiler-0.7.0

retained memory by file
-----------------------------------
   3.92 kB  rubyboy/lib/rubyboy/cpu.rb
   2.20 kB  rubyboy/lib/rubyboy/ppu.rb
  960.00 B  rubyboy/lib/rubyboy.rb
  720.00 B  rubyboy/lib/rubyboy/lcd.rb
  720.00 B  rubyboy/lib/rubyboy/apu.rb
  328.00 B  rubyboy/lib/rubyboy/audio.rb
  320.00 B  rubyboy/lib/rubyboy/bus.rb
  320.00 B  heap-profiler-0.7.0/lib/heap_profiler/reporter.rb
  160.00 B  rubyboy/lib/rubyboy/apu_channels/channel2.rb
  160.00 B  rubyboy/lib/rubyboy/apu_channels/channel1.rb
  120.00 B  rubyboy/lib/rubyboy/apu_channels/channel4.rb
   80.00 B  rubyboy/lib/rubyboy/registers.rb
   40.00 B  rubyboy/lib/rubyboy/interrupt.rb
   40.00 B  rubyboy/lib/rubyboy/cartridge/mbc1.rb
   40.00 B  rubyboy/lib/rubyboy/apu_channels/channel3.rb

retained memory by location
-----------------------------------
  160.00 B  rubyboy/lib/rubyboy.rb:79
  160.00 B  rubyboy/lib/rubyboy.rb:78
  152.00 B  rubyboy/lib/rubyboy/ppu.rb:248
  120.00 B  rubyboy/lib/rubyboy/lcd.rb:28
  120.00 B  rubyboy/lib/rubyboy/audio.rb:34
   80.00 B  rubyboy/lib/rubyboy/registers.rb:73
   80.00 B  rubyboy/lib/rubyboy/ppu.rb:209
   80.00 B  rubyboy/lib/rubyboy/ppu.rb:107
   80.00 B  rubyboy/lib/rubyboy/lcd.rb:45
   80.00 B  rubyboy/lib/rubyboy/lcd.rb:44
   80.00 B  rubyboy/lib/rubyboy/lcd.rb:35
   80.00 B  rubyboy/lib/rubyboy/lcd.rb:31
   80.00 B  rubyboy/lib/rubyboy/lcd.rb:30
   80.00 B  rubyboy/lib/rubyboy/lcd.rb:29
   80.00 B  rubyboy/lib/rubyboy/cpu.rb:23
   80.00 B  rubyboy/lib/rubyboy/cpu.rb:1248
   80.00 B  rubyboy/lib/rubyboy/audio.rb:27
   80.00 B  rubyboy/lib/rubyboy/apu_channels/channel2.rb:76
   80.00 B  rubyboy/lib/rubyboy/apu.rb:65
   80.00 B  rubyboy/lib/rubyboy/apu.rb:51
   80.00 B  rubyboy/lib/rubyboy/apu.rb:50
   80.00 B  rubyboy/lib/rubyboy.rb:75
   80.00 B  rubyboy/lib/rubyboy.rb:74
   80.00 B  rubyboy/lib/rubyboy.rb:43
   80.00 B  heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:58
   80.00 B  heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:53
   80.00 B  heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:52
   48.00 B  rubyboy/lib/rubyboy/ppu.rb:239
   48.00 B  rubyboy/lib/rubyboy/audio.rb:28
   40.00 B  rubyboy/lib/rubyboy/ppu.rb:306
   40.00 B  rubyboy/lib/rubyboy/ppu.rb:180
   40.00 B  rubyboy/lib/rubyboy/ppu.rb:179
   40.00 B  rubyboy/lib/rubyboy/lcd.rb:27
   40.00 B  rubyboy/lib/rubyboy/cpu.rb:92
   40.00 B  rubyboy/lib/rubyboy/cpu.rb:894
   40.00 B  rubyboy/lib/rubyboy/cpu.rb:76
   40.00 B  rubyboy/lib/rubyboy/cpu.rb:748
   40.00 B  rubyboy/lib/rubyboy/cpu.rb:69
   40.00 B  rubyboy/lib/rubyboy/cpu.rb:64
   40.00 B  rubyboy/lib/rubyboy/cpu.rb:275
   40.00 B  rubyboy/lib/rubyboy/cpu.rb:254
   40.00 B  rubyboy/lib/rubyboy/cpu.rb:219
   40.00 B  rubyboy/lib/rubyboy/cpu.rb:1027
   40.00 B  rubyboy/lib/rubyboy/bus.rb:87
   40.00 B  rubyboy/lib/rubyboy.rb:81
   40.00 B  rubyboy/lib/rubyboy.rb:80
   40.00 B  rubyboy/lib/rubyboy.rb:76
   40.00 B  rubyboy/lib/rubyboy.rb:45
   40.00 B  rubyboy/lib/rubyboy.rb:44
   40.00 B  rubyboy/lib/rubyboy.rb:39

retained memory by class
-----------------------------------
   6.96 kB  <callcache> (IMEMO)
   3.00 kB  <constcache> (IMEMO)
   96.00 B  <ment> (IMEMO)
   72.00 B  Thread::Mutex

retained objects by gem
-----------------------------------
       244  rubyboy/lib
         8  heap-profiler-0.7.0

retained objects by file
-----------------------------------
        98  rubyboy/lib/rubyboy/cpu.rb
        54  rubyboy/lib/rubyboy/ppu.rb
        24  rubyboy/lib/rubyboy.rb
        18  rubyboy/lib/rubyboy/lcd.rb
        18  rubyboy/lib/rubyboy/apu.rb
         8  rubyboy/lib/rubyboy/bus.rb
         8  rubyboy/lib/rubyboy/audio.rb
         8  heap-profiler-0.7.0/lib/heap_profiler/reporter.rb
         4  rubyboy/lib/rubyboy/apu_channels/channel2.rb
         4  rubyboy/lib/rubyboy/apu_channels/channel1.rb
         3  rubyboy/lib/rubyboy/apu_channels/channel4.rb
         2  rubyboy/lib/rubyboy/registers.rb
         1  rubyboy/lib/rubyboy/interrupt.rb
         1  rubyboy/lib/rubyboy/cartridge/mbc1.rb
         1  rubyboy/lib/rubyboy/apu_channels/channel3.rb

retained objects by location
-----------------------------------
         4  rubyboy/lib/rubyboy.rb:79
         4  rubyboy/lib/rubyboy.rb:78
         3  rubyboy/lib/rubyboy/ppu.rb:248
         3  rubyboy/lib/rubyboy/lcd.rb:28
         3  rubyboy/lib/rubyboy/audio.rb:34
         2  rubyboy/lib/rubyboy/registers.rb:73
         2  rubyboy/lib/rubyboy/ppu.rb:209
         2  rubyboy/lib/rubyboy/ppu.rb:107
         2  rubyboy/lib/rubyboy/lcd.rb:45
         2  rubyboy/lib/rubyboy/lcd.rb:44
         2  rubyboy/lib/rubyboy/lcd.rb:35
         2  rubyboy/lib/rubyboy/lcd.rb:31
         2  rubyboy/lib/rubyboy/lcd.rb:30
         2  rubyboy/lib/rubyboy/lcd.rb:29
         2  rubyboy/lib/rubyboy/cpu.rb:23
         2  rubyboy/lib/rubyboy/cpu.rb:1248
         2  rubyboy/lib/rubyboy/audio.rb:27
         2  rubyboy/lib/rubyboy/apu_channels/channel2.rb:76
         2  rubyboy/lib/rubyboy/apu.rb:65
         2  rubyboy/lib/rubyboy/apu.rb:51
         2  rubyboy/lib/rubyboy/apu.rb:50
         2  rubyboy/lib/rubyboy.rb:75
         2  rubyboy/lib/rubyboy.rb:74
         2  rubyboy/lib/rubyboy.rb:43
         2  heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:58
         2  heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:53
         2  heap-profiler-0.7.0/lib/heap_profiler/reporter.rb:52
         1  rubyboy/lib/rubyboy/ppu.rb:306
         1  rubyboy/lib/rubyboy/ppu.rb:193
         1  rubyboy/lib/rubyboy/ppu.rb:180
         1  rubyboy/lib/rubyboy/ppu.rb:179
         1  rubyboy/lib/rubyboy/lcd.rb:37
         1  rubyboy/lib/rubyboy/lcd.rb:36
         1  rubyboy/lib/rubyboy/lcd.rb:27
         1  rubyboy/lib/rubyboy/cpu.rb:92
         1  rubyboy/lib/rubyboy/cpu.rb:894
         1  rubyboy/lib/rubyboy/cpu.rb:76
         1  rubyboy/lib/rubyboy/cpu.rb:748
         1  rubyboy/lib/rubyboy/cpu.rb:69
         1  rubyboy/lib/rubyboy/cpu.rb:275
         1  rubyboy/lib/rubyboy/cpu.rb:254
         1  rubyboy/lib/rubyboy/cpu.rb:219
         1  rubyboy/lib/rubyboy/cpu.rb:1027
         1  rubyboy/lib/rubyboy/bus.rb:87
         1  rubyboy/lib/rubyboy.rb:81
         1  rubyboy/lib/rubyboy.rb:80
         1  rubyboy/lib/rubyboy.rb:76
         1  rubyboy/lib/rubyboy.rb:45
         1  rubyboy/lib/rubyboy.rb:44
         1  rubyboy/lib/rubyboy.rb:39

retained objects by class
-----------------------------------
       174  <callcache> (IMEMO)
        75  <constcache> (IMEMO)
         2  <ment> (IMEMO)
         1  Thread::Mutex

Allocated String Report
-----------------------------------

Retained String Report
-----------------------------------
sacckeysacckey

cpuリファクタリング前

rubyboy % stackprof stackprof-cpu-myapp.dump          
==================================
  Mode: cpu(1000)
  Samples: 12679 (6.31% miss rate)
  GC: 2873 (22.66%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      2092  (16.5%)        2092  (16.5%)     Integer#<=>
      1480  (11.7%)        1480  (11.7%)     (sweeping)
      2873  (22.7%)        1320  (10.4%)     (garbage collection)
      1180   (9.3%)        1180   (9.3%)     Rubyboy::Ppu#to_signed_byte
      1153   (9.1%)        1153   (9.1%)     Rubyboy::SDL.RenderClear
       749   (5.9%)         749   (5.9%)     Rubyboy::Ppu#get_pixel
      2181  (17.2%)         739   (5.8%)     Rubyboy::Ppu#render_bg
      1691  (13.3%)         608   (4.8%)     Rubyboy::Ppu#render_window
       868   (6.8%)         378   (3.0%)     Rubyboy::Ppu#render_sprites
       300   (2.4%)         300   (2.4%)     Integer#>>
       770   (6.1%)         292   (2.3%)     Enumerable#each_slice
      5044  (39.8%)         290   (2.3%)     Rubyboy::Ppu#step
       221   (1.7%)         221   (1.7%)     Rubyboy::Registers#read8
       220   (1.7%)         220   (1.7%)     Rubyboy::SDL.UpdateTexture
      3939  (31.1%)         189   (1.5%)     Integer#times
       184   (1.5%)         184   (1.5%)     Rubyboy::Timer#step
       388   (3.1%)         118   (0.9%)     Rubyboy::Ppu#get_color
       109   (0.9%)         109   (0.9%)     Rubyboy::Registers#write8
       105   (0.8%)         105   (0.8%)     Array#size
       664   (5.2%)          74   (0.6%)     Rubyboy::Cartridge::Mbc1#read_byte
        73   (0.6%)          73   (0.6%)     (marking)
       749   (5.9%)          68   (0.5%)     Array#each
       100   (0.8%)          68   (0.5%)     Rubyboy::Cpu#flags
        66   (0.5%)          66   (0.5%)     Rubyboy::Cpu#increment_pc_by_byte
      1206   (9.5%)          64   (0.5%)     Rubyboy::Cpu#ld8
        61   (0.5%)          61   (0.5%)     Rubyboy::Registers#read16
      9800  (77.3%)          57   (0.4%)     Rubyboy::Console#bench
      2137  (16.9%)          48   (0.4%)     Range#===
      1434  (11.3%)          38   (0.3%)     Rubyboy::Lcd#draw
        35   (0.3%)          35   (0.3%)     Rubyboy::SDL.GetKeyboardState
sacckeysacckey

起動直後のところははやい

rubyboy % stackprof stackprof-cpu-myapp.dump          
==================================
  Mode: cpu(1000)
  Samples: 8706 (8.09% miss rate)
  GC: 890 (10.22%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      1135  (13.0%)        1135  (13.0%)     Rubyboy::SDL.RenderClear
      1034  (11.9%)        1034  (11.9%)     Integer#<=>
      2797  (32.1%)        1026  (11.8%)     Rubyboy::Ppu#render_bg
       939  (10.8%)         939  (10.8%)     Rubyboy::Ppu#to_signed_byte
       633   (7.3%)         633   (7.3%)     Rubyboy::Ppu#get_pixel
       455   (5.2%)         455   (5.2%)     (sweeping)
      1109  (12.7%)         454   (5.2%)     Rubyboy::Ppu#render_sprites
       890  (10.2%)         405   (4.7%)     (garbage collection)
       899  (10.3%)         387   (4.4%)     Enumerable#each_slice
      4575  (52.5%)         280   (3.2%)     Rubyboy::Ppu#step
       247   (2.8%)         247   (2.8%)     Rubyboy::SDL.UpdateTexture
       231   (2.7%)         231   (2.7%)     Integer#>>
      3311  (38.0%)         192   (2.2%)     Integer#times
       164   (1.9%)         164   (1.9%)     Rubyboy::Timer#step
       374   (4.3%)         139   (1.6%)     Rubyboy::Ppu#render_window
       116   (1.3%)         116   (1.3%)     Rubyboy::Registers#read8
        96   (1.1%)          96   (1.1%)     Array#size
       298   (3.4%)          87   (1.0%)     Rubyboy::Ppu#get_color
       455   (5.2%)          62   (0.7%)     Rubyboy::Cartridge::Mbc1#read_byte
      7807  (89.7%)          55   (0.6%)     Rubyboy::Console#bench
       974  (11.2%)          51   (0.6%)     Array#each
        48   (0.6%)          48   (0.6%)     Rubyboy::Registers#write8
        47   (0.5%)          47   (0.5%)     Rubyboy::Registers#read16
        52   (0.6%)          33   (0.4%)     Rubyboy::Cpu#flags
        30   (0.3%)          30   (0.3%)     (marking)
      1434  (16.5%)          30   (0.3%)     Rubyboy::Lcd#draw
      1036  (11.9%)          30   (0.3%)     Range#===
        25   (0.3%)          25   (0.3%)     Rubyboy::Interrupt#interrupts
      1547  (17.8%)          25   (0.3%)     Rubyboy::Cpu#exec
        21   (0.2%)          21   (0.2%)     Rubyboy::SDL.GetKeyboardState
Hidden comment
sacckeysacckey

リファクタリング第二弾

目標

ポケモン赤を音ありで60fps安定させる

現状(音無し)

rubyboy % RUBYOPT=--yjit bundle exec rubyboy bench
Ruby: 3.3.0
YJIT: true
1: 29.089206 sec
FPS: 51.56551883884352
rubyboy % bundle exec stackprof stackprof-cpu-myapp.dump
==================================
  Mode: cpu(1000)
  Samples: 11807 (5.57% miss rate)
  GC: 1554 (13.16%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      2237  (18.9%)        2237  (18.9%)     Integer#<=>
      1133   (9.6%)        1133   (9.6%)     Rubyboy::Ppu#to_signed_byte
      1108   (9.4%)        1108   (9.4%)     Rubyboy::SDL.RenderClear
      1037   (8.8%)        1037   (8.8%)     (sweeping)
      2236  (18.9%)         804   (6.8%)     Rubyboy::Ppu#render_bg
       770   (6.5%)         770   (6.5%)     Rubyboy::Ppu#get_pixel
      1652  (14.0%)         585   (5.0%)     Rubyboy::Ppu#render_window
      1554  (13.2%)         488   (4.1%)     (garbage collection)
       386   (3.3%)         386   (3.3%)     Integer#>>
       888   (7.5%)         364   (3.1%)     Rubyboy::Ppu#render_sprites
       787   (6.7%)         315   (2.7%)     Enumerable#each_slice
      5070  (42.9%)         275   (2.3%)     Rubyboy::Ppu#step
       236   (2.0%)         236   (2.0%)     Rubyboy::SDL.UpdateTexture
       207   (1.8%)         207   (1.8%)     Rubyboy::Timer#step
       199   (1.7%)         199   (1.7%)     Rubyboy::Registers#a=
      1007   (8.5%)         189   (1.6%)     Rubyboy::Cpu#get_value
      3985  (33.8%)         189   (1.6%)     Integer#times
       130   (1.1%)         130   (1.1%)     Rubyboy::Cpu#flags
       116   (1.0%)         116   (1.0%)     Array#size
       380   (3.2%)          96   (0.8%)     Rubyboy::Ppu#get_color
       719   (6.1%)          90   (0.8%)     Rubyboy::Cartridge::Mbc1#read_byte
      2302  (19.5%)          71   (0.6%)     Range#===
       761   (6.4%)          67   (0.6%)     Array#each
        61   (0.5%)          61   (0.5%)     Rubyboy::Registers#hl
     10253  (86.8%)          48   (0.4%)     Rubyboy::Console#bench
        48   (0.4%)          48   (0.4%)     Rubyboy::Registers#b=
        46   (0.4%)          46   (0.4%)     Rubyboy::Cpu#increment_pc_by_byte
      1410  (11.9%)          45   (0.4%)     Rubyboy::Lcd#draw
      1171   (9.9%)          38   (0.3%)     Rubyboy::Ppu#get_tile_index
        36   (0.3%)          36   (0.3%)     Rubyboy::Registers#f=
sacckeysacckey

CPUのリファクタリング

やったこと

  • heap-profilerでメモリ使用箇所を探して最適化する
    • flag取得のために毎回ハッシュを作っていた箇所をつくらないように
    • レジスタの読み書きをsendメソッドを使わず when :a then @registers.a = value のように愚直に

結果

rubyboy % RUBYOPT=--yjit bundle exec rubyboy bench                           
Ruby: 3.3.0
YJIT: true
1: 26.798767 sec
FPS: 55.97272441676141
rubyboy % bundle exec stackprof stackprof-cpu-myapp.dump
==================================
  Mode: cpu(1000)
  Samples: 10430 (5.57% miss rate)
  GC: 283 (2.71%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      2275  (21.8%)        2275  (21.8%)     Integer#<=>
      1267  (12.1%)        1267  (12.1%)     Rubyboy::SDL.RenderClear
      1186  (11.4%)        1186  (11.4%)     Rubyboy::Ppu#to_signed_byte
      2366  (22.7%)         864   (8.3%)     Rubyboy::Ppu#render_bg
       784   (7.5%)         784   (7.5%)     Rubyboy::Ppu#get_pixel
      1773  (17.0%)         641   (6.1%)     Rubyboy::Ppu#render_window
       992   (9.5%)         415   (4.0%)     Rubyboy::Ppu#render_sprites
       334   (3.2%)         334   (3.2%)     Integer#>>
       852   (8.2%)         319   (3.1%)     Enumerable#each_slice
      5453  (52.3%)         311   (3.0%)     Rubyboy::Ppu#step
      4199  (40.3%)         213   (2.0%)     Integer#times
       188   (1.8%)         188   (1.8%)     Rubyboy::Timer#step
       187   (1.8%)         187   (1.8%)     (sweeping)
       142   (1.4%)         142   (1.4%)     Rubyboy::SDL.UpdateTexture
       129   (1.2%)         129   (1.2%)     Array#size
       426   (4.1%)         114   (1.1%)     Rubyboy::Ppu#get_color
       851   (8.2%)         109   (1.0%)     Array#each
       981   (9.4%)         105   (1.0%)     Rubyboy::Cpu#get_value
       283   (2.7%)          85   (0.8%)     (garbage collection)
       708   (6.8%)          75   (0.7%)     Rubyboy::Cartridge::Mbc1#read_byte
        67   (0.6%)          67   (0.6%)     Rubyboy::Cpu#increment_pc_by_byte
     10147  (97.3%)          66   (0.6%)     Rubyboy::Console#bench
      2327  (22.3%)          53   (0.5%)     Range#===
        43   (0.4%)          43   (0.4%)     Rubyboy::Registers#hl
        37   (0.4%)          37   (0.4%)     Rubyboy::Interrupt#interrupts
        34   (0.3%)          34   (0.3%)     Rubyboy::Registers#a=
      2958  (28.4%)          33   (0.3%)     Rubyboy::Cpu#exec
      1216  (11.7%)          30   (0.3%)     Rubyboy::Ppu#get_tile_index
      2030  (19.5%)          28   (0.3%)     Rubyboy::Bus#read_byte
      1455  (14.0%)          26   (0.2%)     Rubyboy::Lcd#draw

FPS: 51.56551883884352 -> 55.97272441676141
GC: 13.16% -> 2.71%

sacckeysacckey

Integer#<=>を減らす

やったこと

数値の比較は以下のようなaddrによる分岐で多く発生してしまう。
→ あらかじめaddrと処理の内容をキャッシュしておくことで比較無しで高速に処理を実行できるようにする (参考: https://www.slideshare.net/mametter/ruby-65182128#46)

def read_byte(addr)
  case addr
  when 0x0000..0x7fff
    @mbc.read_byte(addr)
  when 0x8000..0x9fff
    @ppu.read_byte(addr)
...

結果

rubyboy % RUBYOPT=--yjit bundle exec rubyboy bench
Ruby: 3.3.0
YJIT: true
1: 21.75409 sec
FPS: 68.95255099156066
rubyboy % bundle exec stackprof stackprof-cpu-myapp.dump
==================================
  Mode: cpu(1000)
  Samples: 9505 (6.87% miss rate)
  GC: 325 (3.42%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      1238  (13.0%)        1238  (13.0%)     Rubyboy::Ppu#to_signed_byte
      1208  (12.7%)        1208  (12.7%)     Rubyboy::SDL.RenderClear
      2558  (26.9%)         907   (9.5%)     Rubyboy::Ppu#render_bg
       865   (9.1%)         865   (9.1%)     Rubyboy::Ppu#get_pixel
       849   (8.9%)         849   (8.9%)     Rubyboy::Cartridge::Mbc1#set_methods
      1803  (19.0%)         663   (7.0%)     Rubyboy::Ppu#render_window
      1053  (11.1%)         460   (4.8%)     Rubyboy::Ppu#render_sprites
      5782  (60.8%)         346   (3.6%)     Rubyboy::Ppu#step
       906   (9.5%)         343   (3.6%)     Enumerable#each_slice
       313   (3.3%)         313   (3.3%)     Integer#>>
      4412  (46.4%)         245   (2.6%)     Integer#times
       237   (2.5%)         237   (2.5%)     (sweeping)
       197   (2.1%)         197   (2.1%)     Rubyboy::Timer#step
       193   (2.0%)         193   (2.0%)     Rubyboy::SDL.UpdateTexture
      1141  (12.0%)         162   (1.7%)     Rubyboy::Bus#set_methods
       433   (4.6%)         134   (1.4%)     Rubyboy::Ppu#get_color
       114   (1.2%)         114   (1.2%)     Array#size
       478   (5.0%)         109   (1.1%)     Rubyboy::Cpu#get_value
       918   (9.7%)          99   (1.0%)     Array#each
        75   (0.8%)          75   (0.8%)     Rubyboy::Cpu#increment_pc_by_byte
      9180  (96.6%)          68   (0.7%)     Rubyboy::Console#bench
       325   (3.4%)          65   (0.7%)     (garbage collection)
        49   (0.5%)          49   (0.5%)     Integer#<=>
        45   (0.5%)          45   (0.5%)     Rubyboy::Interrupt#interrupts
        36   (0.4%)          36   (0.4%)     Rubyboy::Registers#hl
        36   (0.4%)          36   (0.4%)     Rubyboy::Registers#a=
      1651  (17.4%)          35   (0.4%)     Rubyboy::Cpu#exec
      1042  (11.0%)          30   (0.3%)     Rubyboy::Bus#read_byte
      1267  (13.3%)          29   (0.3%)     Rubyboy::Ppu#get_tile_index
      1448  (15.2%)          26   (0.3%)     Rubyboy::Lcd#draw

FPS: 55.97272441676141 -> 68.95255099156066
Integer#<=>: 21.8% -> 0.5%

rubyboy % RUBYOPT=--yjit bundle exec rubyboy bench      
Ruby: 3.3.0
YJIT: true
1: 27.340768 sec
FPS: 54.86312601021302

音ありだと54fpsぐらい。これを60fps出るようにしたい。
PPUとAPUもリファクタリングする