Why Does Everyone Hate fork(2)? を読む

If you’ve ever deployed a Ruby application to production, it is almost certain you’ve interacted with fork(2) whether you realize it or not. Have you configured Puma’s worker setting? Well, Puma uses fork(2) to spawn these workers, more accurately the Ruby Process.fork method, which is the Ruby API for the underlying fork(2) syscall.
気づいてないかどうかに関わらず fork(2) を使っているだろう
Yet, many people would argue that fork(2) is evil and shouldn’t be used. Personally I kinda both agree and disagree with that point of view, and I’ll try to explain why.
多くの人は fork(2) が悪で使うべきではないと言っているが、個人的には賛成でもあり反対でもある

dak2

 PaperA fork() in the road
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf

 その他https://yupo5656.hatenadiary.org/entry/20040715/p1

dak2

Of course, modern operating systems don’t actually copy all that, and instead use Copy-on-Write, but that’s still very costly, and can easily take hundreds of milliseconds if the parent process is big.
That’s why this historical usage of fork(2) to spawn other programs is mostly considered deprecated today, and most newer software will use more modern APIs such as posix_spawn(3) or vfork(2)+exec(2).
fork は親プロセスから子プロセスを複製した段階で、メモリ空間は共有しておいて、CoW 方式で書き込みがあった際にメモリをコピーしていくが、親プロセスが大きい場合は数百ミリ秒かかることもある
歴史的な fork の利用法は大体非推奨で、モダンな posix_spawn か vfork(2)+exec(2) を使うことが多い

dak2

 posix_spawnThe posix_spawn() and posix_spawnp() functions provide the functionality of a combined fork(2) and exec(3), with some optional housekeeping steps in the child process before the exec(3).
https://man7.org/linux/man-pages/man3/posix_spawn.3.html

dak2

シンプルな echo サーバーを作りながら理解する
require 'socket'

server = TCPServer.new('localhost', 8000)

while socket = server.accept
  while line = socket.gets
    socket.write(line)
  end
  socket.close
end
telnet localhost 8000 でクライアントからサーバーへ接続して、サーバーからエコーしてくれる

ただし、2つのクライアントで telnet しようとすると、最後に接続しようとしたクライアントからはサーバーへ何も送れずにエコーできない
そのために fork してマルチプロセス化することで複数クライアントから接続できるようにする
require 'socket'

server = TCPServer.new('localhost', 8000)
puts "Server started on port 8000..."
children = []

while socket = server.accept
  puts "Client connected: #{socket.peeraddr[2]}:#{socket.peeraddr[1]}"
  # prune exited children
  children.reject! { |pid| Process.wait(pid, Process::WNOHANG) }

  if (child_pid = Process.fork)
    children << child_pid
    puts "Forked child: #{child_pid}"
    socket.close
  else
    while line = socket.gets
      puts "Received: #{line.strip}"
      socket.write(line)
    end
    socket.close
    Process.exit(0)
  end
end
If you are an astute reader (or simply already knowledgeable about fork(2) semantics), you may have noticed that after the call to fork, both the parent and the new children have access to the socket. That is because, in UNIX, sockets are “files”, hence represented by a “file descriptor”, and part of the fork(2) semantic is that all file descriptors are also inherited.
親と子の両方で socket にアクセスできる

UNIX では socket はファイルで、だからファイルディスクリプタとして表されるし、fork のセマンティックの一部として全てのファイルディスクリプタは継承される
https://man7.org/linux/man-pages/man2/fork.2.html
The child inherits copies of the parent's set of open file descriptors.  Each file descriptor in the child refers to the same open file description (see open(2)) as the corresponding file descriptor in the parent.  This means that the two file descriptors share open file status flags, file offset, and signal-driven I/O attributes (see the description of F_SETOWN and F_SETSIG in fcntl(2)).
https://man7.org/linux/man-pages/man2/socket.2.html
socket() creates an endpoint for communication and returns a file descriptor that refers to that endpoint.  The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.
https://stackoverflow.com/a/5256705
In simple words, when you open a file, the operating system creates an entry to represent that file and store the information about that opened file. So if there are 100 files opened in your OS then there will be 100 entries in OS (somewhere in kernel). These entries are represented by integers like (...100, 101, 102....). This entry number is the file descriptor.
file descriptor の説明

dak2

親と子の両方で socket にアクセスできる

ということは socket を適切に閉じないといけないので、socket が開いたまま放置されるとリソースリークが起こるのか

require 'socket'

server = TCPServer.new('localhost', 8000)
puts "Server started on port 8000..."
children = []

while socket = server.accept
  puts "Client connected: #{socket.peeraddr[2]}:#{socket.peeraddr[1]}"
  # prune exited children
  children.reject! { |pid| Process.wait(pid, Process::WNOHANG) }

  if (child_pid = Process.fork)
    children << child_pid
    puts "Forked child: #{child_pid}"
    # socket.close
  else
    while line = socket.gets
      puts "Received: #{line.strip}"
      socket.write(line)
    end
    socket.close
    Process.exit(0)
  end
end

上記ファイルのように親プロセスで socket を close しないようにしてみる

複数クライアントから telnet localhost 8000 した後の lsof -i:8000 の結果

COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
ruby    10501 dak2    8u  IPv6 0x96c486cc19ae706e      0t0  TCP localhost:irdmi (LISTEN)
ruby    10501 dak2    9u  IPv6 0xe3018fc925a77f5e      0t0  TCP localhost:irdmi->localhost:49590 (ESTABLISHED)
ruby    10501 dak2   10u  IPv6  0x7d07d30b805a151      0t0  TCP localhost:irdmi->localhost:49591 (ESTABLISHED)
telnet  10525 dak2    5u  IPv6 0x65ec43b54e62d81e      0t0  TCP localhost:49590->localhost:irdmi (ESTABLISHED)
ruby    10526 dak2    8u  IPv6 0x96c486cc19ae706e      0t0  TCP localhost:irdmi (LISTEN)
ruby    10526 dak2    9u  IPv6 0xe3018fc925a77f5e      0t0  TCP localhost:irdmi->localhost:49590 (ESTABLISHED)
telnet  10560 dak2    5u  IPv6 0x342da95438db4d01      0t0  TCP localhost:49591->localhost:irdmi (ESTABLISHED)
ruby    10561 dak2    8u  IPv6 0x96c486cc19ae706e      0t0  TCP localhost:irdmi (LISTEN)
ruby    10561 dak2    9u  IPv6 0xe3018fc925a77f5e      0t0  TCP localhost:irdmi->localhost:49590 (ESTABLISHED)
ruby    10561 dak2   10u  IPv6  0x7d07d30b805a151      0t0  TCP localhost:irdmi->localhost:49591 (ESTABLISHED)

telnet のコネクションを閉じた後の lsof -i:8000 の結果

COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
ruby    10501 dak2    8u  IPv6 0x96c486cc19ae706e      0t0  TCP localhost:irdmi (LISTEN)
ruby    10501 dak2    9u  IPv6 0xe3018fc925a77f5e      0t0  TCP localhost:irdmi->localhost:49590 (CLOSE_WAIT)
ruby    10501 dak2   10u  IPv6  0x7d07d30b805a151      0t0  TCP localhost:irdmi->localhost:49591 (CLOSE_WAIT)

CLOSE_WAIT 状態の socket が残っている

dak2

require "bundler/inline"
gemfile do
  gem "trilogy"
  gem "bigdecimal" # for trilogy
end

client = Trilogy.new
client.ping

if child_pid = Process.fork
  sleep 0.1 # Give some time to the child

  5.times do |i|
    p client.query("SELECT #{i}").first[0]
  end
  Process.kill(:KILL, child_pid)
  Process.wait(child_pid)
else
  loop do
    client.query('SELECT "oops"')
  end
end

If you run this script, you’ll get a somewhat random output, similar to this:

"oops"
1
"oops"
"oops"
3

What’s happening here is that both processes are writing inside the same socket. For the MySQL server, it’s not a big deal because our queries are small, so they’re somewhat “atomically” written into the socket if we were to issue larger queries, two queries might end up interleaved, which would cause the server to close the connection with some form of protocol error.

子プロセスが fork されると MySQL のコネクションを確立している socket （ファイルディスクリプタとして扱われる）も複製される
=> client = Trilogy.new でコネクション確立しているので、子プロセスでもコネクション確立した状態で利用できてしまう

負荷の大きなクエリを発行すると interleave してしまうので、プロトコルエラーでコネクションが閉じられるかもしれない

But for the client, it’s really bad. Because the responses of both processes are sent back in the same socket, and each client is issuing read(2) and might be getting the response to the query it just issued, but the response of another unrelated query issued by the other process.

When two processes try to read(2) on the same socket, they each get part of the data, but you don’t have proper control over which process gets what, and it’s unrealistic to try to synchronize the two processes so they each get the response they expect.

2つのプロセスが同じソケットから read しようとしても、もう一方のプロセスが発行したクエリをのレスポンスを取得する可能性がある
どのプロセスが何を取得するかを適切に制御することはできないし、同期的にそれぞれが期待するレスポンスを取得しようとするのは現実的でない

dak2

With this in mind, you can imagine how much of a hassle it can be to properly close all the sockets and other open files of an application before you call fork(2). Perhaps you can be diligent in your own code, but you likely are using some libraries that may not expect fork(2) to be called and don’t allow you to close their file descriptors.
fork(2) をコールする前にソケットを全て適切に閉じるのは大変
For the fork+exec use case, there’s a nice feature that makes this much easier, you can mark a file descriptor as needing to be closed when exec is called, and the operating system takes care of that for you, O_CLOEXEC (for close on exec), which in Ruby is conveniently exposed as a method on the IO class:
Ruby だと IO クラスに exec 時にファイルディスクリプタを close する close_on_exec メソッドが定義されている
STDIN.close_on_exec = true
https://docs.ruby-lang.org/en/3.4/IO.html#method-i-close_on_exec-3D

dak2

Instead, what most code that wants to be fork-safe does, it either trying to detect a fork happened by continuously checking the current process ID:

fork-safe なコードはプロセスIDを比較することで socket のハンドリングをしているよう

def query
  if Process.pid != @old_pid
    @connection.close
    @connection = nil
    @old_pid = Process.pid
  end

  @connection ||= connect
  @connection.query
end

Or alternatively rely on some at_fork callback, in C land usually it is pthread_atfork, and since Ruby since 3.1, you can decorate Process._fork (note the _):

pthread_atfork(3) は fork(2) 実行時に動作する処理をフックする関数を登録するレジスター

       #include <pthread.h>

       int pthread_atfork(typeof(void (void)) *prepare,
                          typeof(void (void)) *parent,
                          typeof(void (void)) *child);

prepare, parent, child に処理を渡す
prepare とかは fork 前に実行することになる

  When fork(2) is called in a multithreaded process, only the
  calling thread is duplicated in the child process.  The original
  intention of pthread_atfork() was to allow the child process to be
  returned to a consistent state.  For example, at the time of the
  call to fork(2), other threads may have locked mutexes that are
  visible in the user-space memory duplicated in the child.  Such
  mutexes would never be unlocked, since the threads that placed the
  locks are not duplicated in the child.  The intent of
  pthread_atfork() was to provide a mechanism whereby the
  application (or a library) could ensure that mutexes and other
  process and thread state would be restored to a consistent state.
  In practice, this task is generally too difficult to be
  practicable.

  After a fork(2) in a multithreaded process returns in the child,
  the child should call only async-signal-safe functions (see
  signal-safety(7)) until such time as it calls execve(2) to execute
  a new program.

  POSIX.1 specifies that pthread_atfork() shall not fail with the
  error EINTR.

マルチスレッドなプロセスで fork(2) をコールすると、fork(2) をコールしたスレッドだけが子プロセスに複製される
例えば、親プロセスの任意のスレッドで mutex ロックを取った場合、子プロセスでは mutex ロックが複製されるが、mutex ロックをとったスレッドはコピーされない
- fork(2) をコールしたスレッドのみコピーされる
- mutex ロックを取ったスレッドはロックを解除する予定だった
- スレッドがコピーされないので、子プロセスでロック解除できない
- その状態で子プロセスからロックを取ろうとする
  - 永遠にロック解除できずにデッドロックとなる

GPT に想定シナリオのコードを書いてもらった

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void *worker(void *arg) {
    pthread_mutex_lock(&mutex);  // 🔒 ミューテックスをロック
    printf("Worker thread acquired lock\n");
    sleep(2);  // ここで `fork()` が呼ばれる前にスリープ
    pthread_mutex_unlock(&mutex);  // 🟢 ロックを解除
    return NULL;
}

int main() {
    pthread_t thread;
    pthread_create(&thread, NULL, worker, NULL);
    sleep(1);  // `worker()` が `mutex` をロックする時間を確保

    if (fork() == 0) {
        // 🔴 ここで子プロセスが作られる
        printf("Child process running\n");

        pthread_mutex_lock(&mutex);  // ⚠️ ここでデッドロック
        printf("Child acquired lock\n");

        pthread_mutex_unlock(&mutex);
        exit(0);
    } else {
        printf("Parent process running\n");
    }

    pthread_join(thread, NULL);
    return 0;
}

dak2

In the case of today’s Ruby programmers, however, the reason to use fork(2) over threads, is that it’s the only way to get true parallelism 2 on MRI, the default and most commonly used implementation of Ruby. Because of the infamous GVL, Ruby threads only really allow to parallelize IO operations, and can’t parallelize Ruby code execution, hence pretty much all Ruby application servers integrate with fork(2) in some way so they can exploit more than a single CPU core.

Luckily, some of the pitfalls of mixing threads with fork(2) are alleviated by Ruby. For instance, Ruby mutexes are automatically released when their owner dies, due to how they are implemented. In pseudo Ruby code they’d look like this:

スレッドよりも fork(2) を使う理由としては、並列性を獲得する唯一の方法だから
GVL の影響でIO 操作の並列化くらいしかできず、Ruby コードは1スレッドごとしか実行できない

dak2

Luckily, some of the pitfalls of mixing threads with fork(2) are alleviated by Ruby. For instance, Ruby mutexes are automatically released when their owner dies, due to how they are implemented. In pseudo Ruby code they’d look like this:
Ruby の mutext は実装方法によって、オーナーが死ぬと自動的に解放される

擬似的なコードは下記
class Mutex
  def lock
    if @owner == Fiber.current
      raise ThreadError, "deadlock; recursive locking"
    end

    while @owner&.alive?
      sleep(1)
    end

    @owner = Fiber.current
  end
end
Of course in reality they’re not sleeping in a loop to wait, they use a much more efficient way to block, but it’s to give you the general idea. The important point is that Ruby mutexes keep a reference to the fiber (hence thread) that acquired the lock, and automatically ignore it if it’s dead. Hence upon fork, all mutexes held by the background thread are immediately released, which avoids most deadlock scenarios.
重要な点は Ruby の mutex はロックを取得した fiber への参照を保持していること

それが死んでいれば自動的に無視するということ

=> 擬似コード的には fiber が生きてたら sleep して待っていることになる
なので fork すると、background のスレッドが保持していた mutex は全て即座に解放され、ほとんどのデッドロックシナリオを回避できる
It’s not perfect of course, if a thread died while holding a mutex, it’s very possible that it left the resource that was protected by the mutex in an inconsistent state, in practice however I’ve never experienced something like that, granted it’s likely because the existence of the GVL somewhat reduces the need for mutexes.
もちろん完璧でなく、mutex を保持したスレッドが死んでしまうと、mutex　によって保護されていたリソースが一貫性のない状態になってしまう可能性はある

dak2

While I never got hard proof of it, I suspect this was happening to some Ruby users because from my understanding, glibc’s getaddrinfo(3), which Ruby uses to resolve host names, does use a global mutex, and Ruby calls it with the GVL released, allowing for a fork to happen concurrently.
To prevent this, I added another lock inside MRI, to prevent Process.fork from happening while a getaddrinfo(3) call is ongoing. This is far from perfect, but given how much Ruby relies on Process.fork, that seemed like a sensible thing to do.
ホスト名を解決する getaddrinfo(3) は global mutex を使っていて、Ruby は GVL を解放した状態で呼び出しているので、fork を同時に行える

getaddrinfo(3) を呼び出している間は Process.fork を防ぐようにした
https://man7.org/linux/man-pages/man3/getaddrinfo.3.html

dak2

So to answer the question in the title, the reason fork(2) is hated is because it doesn’t compose well, particularly in native code. If you wish to use it, you have to be extremely careful about the code you are writing and linking to. Whenever you use a library you have to make sure it won’t spawn some threads, or hold onto file descriptors, and given the choice between fork(2) and threads, most systems programmers will choose threads. They have their own pitfalls, but they compose better, and it is likely that you are calling into APIs that are using threads under the hoods, so the choice is somewhat already made for you.
fork(2) が嫌われる理由は、ネイティブコードではうまく構成できないから

=> mutex や socket の解放などのハンドリングが難しく、mutex を解放できずにデッドロックが起きたり、socket を解放できずに意図しないレスポンスを取得する可能性がある
ほとんどのプログラマは fork(2) よりスレッドを選ぶだろう

スレッドにもそれなりに落とし穴があるが、よりうまく構成できる

dak2

まとめ

fork(2) をコールする前にソケットを全て適切に閉じるのは大変
- だからこそ、ネイティブコード側でこれをハンドリングするのは面倒なので嫌われがち
- fork(2) は socket などの file descriptor を子プロセスに共有される
  - DB へのコネクションのソケットだとすると、1つのソケットを複数のプロセス間で扱っているので意図しない結果が返る可能性があり
- fork(2) は fork(2) を呼び出したスレッドしか複製しない
  - 他のスレッドは複製しないので、mutex ロックを取った後に fork(2) されると、ロックを解放するスレッドが複製されず、ロックが解放されない状態でロックを獲得しようとしてデッドロックになる可能性

このスクラップは2025/02/16にクローズされました