iTranslated by AI
Basics of Exclusive Control
Introduction
There are many resources in a system that must not be accessed simultaneously. A familiar example is the database for Ubuntu's package management system. This database is updated by the operation of the apt command, but if two or more apt processes were allowed to run at the same time, the database could be corrupted, leading to a critical system failure.
To avoid such problems, there is a mechanism called "exclusive control" (mutual exclusion) that ensures only one process can access a certain resource at a time. Exclusive control is one of the essential functions provided by an OS.
Cases Requiring Exclusive Control
Exclusive control can be counter-intuitive and quite difficult to understand, so I will explain it here using a relatively simple mechanism called "file locking." For the explanation, we will use a simple program called inc that reads the contents of a file, adds 1 to the number written inside, and then finishes.
#!/bin/bash
TMP=$(cat count)
echo $((TMP + 1)) >count
As an initial state, assume there is a file named count with "0" written in it.
$ cat count
0
In this state, let's check the contents of the count file after calling the inc program.
$ ./inc
$ cat count
1
$
It might seem obvious, but the content of the count file increased by one from 0 to 1. Now, let's reset the content of the count file back to 0 and run the inc program 10,000 times.
$ echo 0 >count
$ for ((i=0;i<10000;i++)) ; do ./inc ; done
$ cat count
10000
$
As expected, the content of the count file became 10,000.
Now for the main point. Let's try running the inc program in parallel using inc & and see what happens.
$ for ((i=0;i<10000;i++)) ; do ./inc & done
...
$ cat count
57
$
The expected value is 10,000, but the result is completely different: 57. This happens because multiple inc programs are running in parallel, and the following scenario can occur:
- Program A reads 0 from the
countfile. - Program B reads 0 from the
countfile. - Program A writes 1 to the
countfile. - Program B writes 1 to the
countfile.
While this is just an experimental program, it's enough to be surprising; however, if a similar problem occurred in a bank system's process handling your savings balance, it would be chilling.
To avoid such problems, the series of processes—reading the value of count, adding 1, and writing that value back to the count file—must be executed by only one inc program at a time. Exclusive control is what makes this possible.
Intuitive Implementation of Exclusive Control
In the context of exclusive control, the aforementioned series of operations is called a "critical section." Executing the operations within a critical section all at once, without letting other processes interrupt, is called an "atomic operation."
To achieve exclusive control in the inc program, what if we represented whether a process is already in the critical section based on the existence of a file named lock? The inc-wrong-lock program implements this idea.
#!/bin/bash
while : ; do
if [ ! -e lock ] ; then
break
fi
done
touch lock
TMP=$(cat count)
echo $((TMP + 1)) >count
rm -f lock
As you can see, before the original processing of the inc program, it checks for the existence of the lock file. If it doesn't exist, it creates the lock file to enter the critical section, and then deletes the lock file when finished. It looks like it might work.
Let's run it.
$ echo 0 >count
$ rm lock
$ for ((i=0;i<10000;i++)) ; do ./inc-wrong-lock & done
...
$ cat count
11
$
You probably guessed from the program's name, but it failed completely. Why?
The reason the inc-wrong-lock program didn't work correctly is that the following scenario can occur:
- Program A confirms there is no
lockfile and proceeds. - Program B confirms there is no
lockfile and proceeds. - Program A reads 0 from the
countfile. - Program B reads 0 from the
countfile. - And so on, just like the original
incprogram.
To avoid this problem, the series of operations—checking for the existence of the lock file and, if it doesn't exist, creating the file and proceeding—must be made into an atomic operation. It might feel like we are going in circles, but file locking is exactly what achieves this.
Correct Exclusive Control
File locking changes the status of a file to lock/unlock using system calls such as flock() or fcntl(). Specifically, it executes the following processes atomically:
- Check if the file is locked.
- If it is locked, make the system call fail.
- If it is not locked, lock it and make the system call succeed.
I won't explain how to use system calls here, but if you're interested, you can check man 2 flock or read the descriptions of F_SETLK and F_GETLK in man 2 fcntl.
The file locking mechanism can also be used from shell scripts via the flock command. It's easy to use; as shown in the inc-lock program below, it executes the program specified in the second argument while holding a lock on the file specified in the first argument.
#!/bin/bash
flock lock ./inc
Now, let's run 10,000 instances of the inc-lock program in parallel.
$ echo 0 >count
$ touch lock
$ for ((i=0;i<10000;i++)) ./inc-lock & done
...
$ cat count
10000
$
It finally worked.
As mentioned earlier, exclusive control is very complex, but if you read the content of this article repeatedly or try writing out the execution flow yourself, you should eventually be able to understand it.
How is File Locking Implemented?
In the previous section, I mentioned that file locking is one way to achieve exclusive control. But how is file locking implemented? In fact, this is usually not achieved at the level of a high-level language like C, but at the level of the machine language layer.
Suppose you wrote the following virtual assembly instruction sequence to implement a lock.
start:
load r0 mem # (1) Read memory at address 'mem' and store it in register 'r0'. A value of 1 in 'mem' means it's locked; 0 means it's not.
test r0 # (2) Check if r0 is 0 or something else.
jmpz enter # (3) If r0 is 0 (unlocked), jump to the label 'enter'.
jmp start # (4) If r0 is not 0 (locked), return to the label 'start'.
enter:
store mem 1 # (5) Write 1 to 'mem'. This performs the lock.
...
<critical section>
...
store mem 0 # (6) Write 0 to 'mem' to unlock.
Is this enough? Not quite. If two processes execute (1) simultaneously, both will judge that they can enter the critical section. This happens because the sequence of operations from (1) to (5) is not atomic.
To solve this problem, many CPU architectures provide instructions that execute operations equivalent to (1) through (5) atomically. If you are interested, try searching for keywords like "compare and exchange" or "compare and swap."
When I first wrote this, I stated that "the above CPU instructions are essential to achieve exclusive control" without the following note, but I have corrected it thanks to a suggestion from kumagi-san. Thank you very much.
There are methods to achieve exclusive control at the high-level language level, but they are less efficient in terms of time and space than the CPU instructions mentioned above. If you are interested, try searching for "Peterson's algorithm."
Discussion
アセンブリ言語のところですが、 (3) の説明は
outではなく、enterでしょうか?ご指摘ありがとうございました。修正しました