The Mythical IO-Bound Rails App を読む

When the topic of Rails performance comes up, it is commonplace to hear that the database is the bottleneck, so Rails applications are IO-bound anyway, hence Ruby performance doesn’t matter that much, and all you need is a healthy dose of concurrency to make your service scale.
But how true is this in general?

Rails のパフォーマンスは DB 周りの IO-bound でボトルネックになりがちで、Ruby のパフォーマンスより必要なのはサービスをスケールするための並行処理だと言われてるけど、実際どう？

dak2

As such, the database being the bottleneck is true, but it doesn’t imply that the application is spending the majority of its time waiting on IO.
When properly indexed, and assuming the database isn’t overloaded, the vast majority of queries, especially the mundane lookups by primary key, take less than a couple of milliseconds, often just a fraction of a millisecond. If the application does any substantial amount of transformations on that data to render it in HTML or JSON, it will without a doubt spend as much or more time executing Ruby code than waiting for IOs.
どうやら Rails のパフォーマンス問題での IO-bound は何も DB からデータを引っ張ってくるところだけじゃないぞというのを言いたいのかも

dak2

However, over the last couple of years, many people reported how YJIT reduced their application latency by 15 to 30%. Like Discourse seeing a 15.8-19.6% speedup with JIT 3.2, Lobsters seeing a 26% speedup, Basecamp and Hey seeing a 26% speedup or Shopify’s Storefront Renderer app seeing a 17% speedup.
If these applications were really spending the overwhelming majority of their time waiting on IO, it would be impossible for YJIT to perform this well overall.
Even on very JIT-friendly benchmarks with no IO at all, YJIT only speeds up Ruby by 2 or 3x. On more realistic benchmarks like lobsters, it’s more around 1.7x. Based on this, we can assume with fairly good confidence that all these applications are certainly not spending 80% of their time waiting on IO.
IO 無しの JIT フレンドリーなベンチマークで Ruby を 2-3倍、lobsters のような現実的なベンチマークでおよそ 1.7倍の高速化していることから、IO 待ちのために 80% の時間を費やしているわけではない
=> 80% どこから出てきたという感じではあるが、80% と仮定したときに残りの 20% が Ruby コードの実行だとして、それを一部高速化しても全体で 15-19% も高速化されないだろうというのは確かにそう

dak2

CPU starvation looks like IO to most eyes

One thing that can cause people to overestimate how much IO their app is doing is that, in most cases, CPU starvation will look like Ruby is waiting on IO.
If you look at how the overwhelming majority of IO durations are measured, including in Rails logs and in all the most popular application performance managers, it’s generally done in a very simple, obvious way:
start = Time.now
database_connection.execute("SELECT ...")
query_duration = (Time.now - start) * 1000.0
puts "Query took: #{query_duration.round(2)}ms"
Logically, if this code logs: Query took: 20.0ms, you might legitimately think it took 20 milliseconds to perform the SQL query, but that’s not necessarily true.
It actually means that performing the query and getting the thread scheduled again took 20 milliseconds, and you cannot possibly tell how much each part took individually (Edit: At John Duff’s request, I wrote a very quick guide on how you can tell if your application is experiencing some form of CPU starvation).
これクエリ実行に 20msec かかったと思ってしまうな

スレッドのリスケジューリング + クエリ実行に 20msec で、個別の処理にどれくらいかかったのかはここでは分からない

 CPU starvationChatGPT に聞いてみた
CPU スターべーション (CPU Starvation) とは、システムのプロセスやスレッドが CPU リソースを必要としているのに、十分に割り当てられずに待たされる状態 のことです。

「CPU 飢餓」 とも呼ばれ、CPU 負荷が高い環境やスケジューリングの問題がある場合に発生します。
https://ja.wikipedia.org/wiki/リソーススタベーション
特定のスレッドで CPU を長時間占有すると、他のスレッドが GVL を取得できずに待機し続けて starvation が生じる可能性あり

fork してマルチプロセス化などすると GVL 回避できそう
Thread.pass とかで適度に GVL を解放して別スレッドに移るというのもありそう

https://docs.ruby-lang.org/ja/latest/method/Thread/s/pass.html

dak2

So for all you know, the query might have been performed in under a millisecond, and all the remaining time was spent waiting to acquire the GVL, running GC, or waiting for the operating system scheduler to resume your process.

Knowing which is which is crucial:

If all this time was spent performing the query, it suggests that your application is IO-heavy and that you may be able to use more concurrency (processes, threads, or fibers) to get some extra throughput.

If all this time was spent waiting on the scheduler, then you might want to do the absolute opposite and use less concurrency to reduce latency.

本当にクエリ実行時間がかかってるなら、並行処理(processes, threads, or fibers) を使ってスループットあげれるかも
- forkしたり、スレッド増やしたり、fiber で etc...
スケジューラー待ちだったら、反対にレイテンシを減らすために並行度減らした方が良いかも
- これなんでだっけ

dak2

 スケジューラー待ちだったら、反対にレイテンシを減らすために並行度減らした方が良い
 (1) CPU コア数を超えるスレッドが存在すると、スケジューリングの負荷が増大するCPU のコア数が 4 つしかないのに、100 個のスレッドを作った場合を考える。
OS は 100 スレッドすべてを公平に動かそうとする。
1 回のスケジューリングごとに 4 つのスレッドしか実行できない (4コアだから)。
残りの 96 スレッドは「待機状態」となり、次のスケジューリングを待たなければならない。
OS は次のスレッドを実行するために、頻繁にコンテキストスイッチを発生させる。
結果として、コンテキストスイッチのオーバーヘッドが増え、処理全体が遅くなる。
📌 つまり、スレッド数を増やしても、CPU コアの数が限られていると、実際に同時実行できるスレッド数は増えない。

 (2) コンテキストスイッチのオーバーヘッドが起きるCPU は 1つのスレッドを処理中に、別のスレッドに切り替えることをコンテキストスイッチ (Context Switch) と呼ぶ。 コンテキストスイッチには CPU レジスタの保存・復元などのオーバーヘッドが発生 する。
通常、コンテキストスイッチには 数マイクロ秒 (µs) かかる。

しかし、スレッド数が多すぎると、スレッドの切り替え回数が増えて、CPU 時間の多くがコンテキストスイッチに使われるようになる。

これにより、実際の計算処理に使える CPU 時間が減ってしまう。

📌 「スレッドを増やせば並列処理が速くなる」と思われがちだが、一定数を超えると逆に遅くなる。

 (3) Ruby の GVL の影響があるMRI (CRuby) には GVL (Global VM Lock) という仕組みがあり、

Ruby のスレッドは 1 つずつしか同時に実行できない。
スレッドが 100 個あっても、実際に動作するのは 1 スレッドのみ。

OS のスケジューラは、「公平に実行しよう」としてスレッドを切り替えるが、GVL のせいで意味がない。

GVL を取得できるまでの待ち時間が増え、スレッドが待機状態になる (CPU スターべーション発生)。

📌 GVL のせいで、スレッドが増えても並列実行は不可能になり、スケジューラの切り替えコストだけが増える
スレッドが減ることでスレッド間の切り替えが減って、スケジューラ待ちの時間が減ってレイテンシが改善する可能性がある
generated by ChatGPT

dak2

This issue is not unique to Ruby, whenever you have a workload that mixes IO and CPU work, you have to arbitrate between allowing more concurrency and risk degrading latency, or reducing concurrency to ensure a low latency but decrease utilization. The more heterogeneous your workload is, as typical in a monolith, the harder it is to find the best compromise. That’s one of the benefits of micro-services, getting more homogenous workloads, making it easier to achieve higher server utilization without impacting latency too much.
Ruby に限った話ではなく、IO と CPU 作業が混在するワークロードは下記のトレードオフがある
同時実行を増やしてレイテンシが悪化するリスクを飲むか
スレッドや非同期処理を増やす
IO 待ち時に他のタスクを並行して処理できるが、スケジューラ待ちが増えてレイテンシが悪化する（上に記載した話）


同時実行を減らして CPU 使用率を下げつつレイテンシを悪化させないようにする
CPU や IO の利用効率が下がる（リソースのアイドル）

モノリスとマイクロサービスによっても異なる

モノリスは IO と CPU 作業が混在するワークロードなので上記の問題が顕著になる

マイクロサービスは IO と CPU 作業それぞれでワークロードを分けられるので、比較的対処しやすい傾向

dak2

However, given that the default implementation of Ruby has a GVL, this problem is even more pronounced. Instead of all threads on the server having to share all CPU cores, you end up with multiple small buckets of threads that each have to share one CPU, hence you can end up with threads that can’t be resumed even though there are some free cores or the server.
Ruby GVL の影響で1プロセス内のスレッドは1CPUコアしか使えないので、空きコアがあるにも関わらず処理を再開できないスレッドが発生する可能性がある

=> 基本的には、1プロセス内でスレッドを増やしても GVL の影響で 1CPU コアしか使えない

=> CPU を並列で利用するには、マルチプロセスでプロセスを分割する選択肢が有りそう
The thing to keep in mind is that, as a general rule that doesn’t only apply to Ruby, CPU-bound and IO-bound work loads don’t mix well, and should ideally be handled by distinct systems. For small projects, you can likely tolerate the latency impact of collocating all your workloads in a one-size-fits-all system, but as you scale you will increasingly need to segregate IO-intensive workloads from CPU-intensive ones.
Ruby に限らず一般論として CPU バウンドと IO バウンドな処理は混ざらず、理想的には別々にハンドリングすべき

小規模なプロジェクトのワークロードは mix されてもレイテンシの影響は許容できるかもしれないが、大規模なプロジェクトになってくると、IO 集約 と CPU 集約な処理は分離する必要が出てくる

dak2

One important thing to note is that what I’m saying above is only aimed at the web server part of Rails applications. Most apps also use a background job runner, Sidekiq being the most popular, and background jobs often take care of lots of slow IO operations, such as sending e-mails, performing API calls, etc.

上記は Rails アプリケーションの Web サーバー部分に限った話
Sidekiq とかのバックグランドジョブで低速な IO 処理をすることがよくある

dak2

At that point, you might wonder why it matters how much time Rails applications spend waiting on IO.

For the average Rails user, it is important to know this, because it is what defines which execution model is best suited to deploy their application:

If an application is truly IO-bound, as in spending more than 95% of its time waiting for IO, then using an asynchronous execution model is likely what will get you the best results.

If an application isn’t fully IO-bound, but still is quite IO-heavy, then using a threaded server, with a reasonable number of threads per process, is probably what will get you the best tradeoff between latency and throughput.

If an application doesn’t spend significantly more than half its time on IO, then it might be preferable to use a purely process-based solution.

アプリケーションが本当に I/O バウンド（I/O待ち時間が95%以上）なら、非同期実行モデルを使う
- I/O待ちの間に非同期で他のリクエストを処理することでスループットが向上
アプリケーションが完全に I/O バウンドではないが I/O ヘビーならプロセスごとにスレッド数を適切に設定したサーバーを使用
- レイテンシとスループットのトレードオフをベストな状態にできる
  - 増やしすぎるとスレッドのスイッチのためのスケジューリングに時間がかかって、レイテンシが大きくなるので注意
アプリケーションの半分以上の時間を IO に費やさないのであれば、プロセスベース
- CPU バウンド
  - 複数の CPU コアを並列で利用できるように、マルチプロセスでワークロードを構築

dak2

But also for the Ruby community at large, I think it’s important not to disregard Ruby’s performance under the pretext that it doesn’t matter since it’s all about the database anyway. This isn’t to say that Ruby is slow, it is without a doubt more than fast enough for writing web applications that provide a good user experience.

DB の問題だから Ruby のパフォーマンスは関係ないと思って、Ruby のパフォーマンスを軽視しないことが重要
Ruby は良い UX を提供するアプリケーションを書くには十分速い

dak2

YJIT による Rails アプリケーション高速化の事例を見ると、思ったほど I/O バウンドしてないのでは
- https://zenn.dev/link/comments/3e8603f30951c0
I/O バウンドな処理は何も Rails アプリケーションに限った話ではない
- ワークロードによって適切なモデリングを
  - https://zenn.dev/link/comments/55c18a32f24d9c

このスクラップは2025/02/15にクローズされました