iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🐕

Understanding the Garbage Collection Mechanism in neo-c

に公開2

Well, even if you are not interested in neo-c itself, I think some of you might be interested in automatic heap deallocation, so I will write about it.

I have rewritten this theme about four times: neo-c, comelang, comelang2, and neo-c again. At first, I often thought I had written a perfect algorithm, only to be devastated when running valgrind showed memory leaks. I rewrote it over and over, and although I felt like giving up, thinking it was impossible, I finally managed to complete it. In the current neo-c, I think leaks rarely occur if you write code normally, and even if they do, it is easy to debug because there is a memory leak detection system.

Now, as for the core algorithm, roughly speaking, it works as follows:

  1. Heap rvalues are added to a list.

Functions that return a heap, such as new or other allocation functions, are treated as heap rvalues and added to a list as temporary memory that might be deallocated. At this point, the reference count of the generated heap is often 0. Even if it is 1 or greater, it is still added to the rvalue list.

  1. When assigned to a heap variable or field, the heap's reference count is incremented by +1 and removed from the rvalue list.

The heap value already stored in the variable or field has its reference count decremented by -1, and if it becomes 0, it is freed. If nothing was assigned, neo-c performs zero-clear initialization like Java, so the free operation is performed on NULL, resulting in no action (come_free performs a null check and ignores NULL).
After that, the memory being assigned has its reference count incremented by +1 and is removed from the rvalue list.

  1. At the end of a statement, rvalues in the list with a reference count of 0 are freed.

Here, temporary objects (rvalues) are cleared. If they have been assigned to a variable or field, the reference count will be 1 or greater, so they will not be freed.

  1. Upon reaching the end of a block, the end of a function, or at break or return, the variables reach the end of their lifetime, so the reference count of the heap bound to the variables is decremented by -1, and if it becomes 0, it is freed.

Since neo-c's free automatically creates a finalizer for struct memory, struct fields are also freed recursively. Even if an assignment to a variable is performed two or more times, each variable decrements the count by -1, so the reference count eventually reaches 0 and is freed.

In short, this is the mechanism. I chose an incrementing GC instead of automatic free because the lifetime issue is complex when there are multiple references. This mechanism is easier for users than introducing the concept of ownership. However, there is the problem of circular references. neo-c does issue a warning, and circular references rarely occur when writing code in the standard way. I have used neo-c for its own self-hosting, vin (vi clone), zed (text processing interpreter), shsh (shell), mf (file manager), webweb (web server), dbdb (database server), and minux9 (RISCV UNIX-like OS), and I have never encountered such problems. Perhaps it is because I unconsciously write code with ownership in mind.

Well, since people who don't want to use the language might still be interested in the mechanism, I decided to write about it.
Actually, by using this algorithm and the attributes of functions that generate heaps originally included in C (the header of functions that generate heaps has that mark), it might be possible to perform automatic heap freeing in standard C without introducing too many special concepts.

As a side note, a major feature of neo-c is that it outputs to standard C. Because of this mechanism, it can even run on microcontrollers (map, list, regex, etc., are also written in standard C, so they work on microcontrollers). I used to use LLVM, but I don't think outputting to C makes much of a difference. LLVM assemblers are high-level and not much different from C; the result would be the same regardless of the target. Also, one could consider outputting to C++. This is because many recent microcontrollers use C++. Conversely, outputting to C makes some microcontrollers (like Arduino) unusable.

I believe transpilers to C are a technology with huge potential. Just as C outputs to assembly, other high-level languages could ensure usability and write efficient code by outputting to C.

However, in the end, since tools like codex and claude code are now writing C code themselves, usability is becoming less significant. After all, AI will surely free the heap properly. Perhaps my harvest from this project is that my understanding of the C language itself has deepened.

Discussion

SusanCalvinSusanCalvin

お久しぶりです、元気そうで安心しました

ab25cqab25cq

お久しぶりです。ぼちぼちやってます。最近はcodex, claude codeの進化が激しくパソコンの前でコード書くことも減ってきました。スマホでやってます。暇になったので調理したり買い物に行ったりしてます。あとは数学の勉強したりしてますね。

1