iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
👓

Learning CPython Memory Management Slowly: Mechanisms and How to Observe Memory Growth

に公開

Introduction

Python is a programming language that allows you to write code without worrying much about memory, unlike C or C++.
Also, with the increasing performance of modern machines and rising labor costs, we no longer live in an era where "one byte of memory is a drop of blood," and thus, people are less conscious of memory usage than before.
However, if you completely ignore memory, you might end up paying the price at the most critical moments.

In this article, we will examine an overview of how memory is managed in CPython and its observation methods.
The sample code and behavior described in this article include results verified using CPython 3.14 on an Intel Mac with GIL disabled, as well as information gathered through debug execution.
Therefore, please note that results may vary depending on the CPU, CPython version, or build configuration.

Intended Readers and Prerequisites
This article assumes the following readers:

  • People who want to grasp the overview of Python's memory management
  • People who want to know how to observe Python's memory
    • If you want to skip all the details and just perform the observation, please refer to How to observe memory.
  • People who are interested in the internal structure of CPython

Also, knowledge of the following level is assumed:

  • Able to read and write Python programs
  • Able to check whether the Python version you are using has GIL enabled or disabled
  • Understanding of general terms such as garbage collection and reference counters
  • Desirable to be able to read C code

CPython Memory Allocation

CPython's memory management heavily relies on dynamic memory allocation.
In CPython, there are three categories of memory allocation APIs (allocator domains), each with a different role.

  • Raw domain
  • Mem domain
  • Object domain

The Raw domain is used when allocation needs to be delegated to the system allocator or when memory allocation needs to be performed even when no thread state is associated.
A representative API is PyMem_RawMalloc.

The Mem domain is used when Python buffers that need to be allocated in a state associated with a thread state and general-purpose memory buffers are required.
A representative API is PyMem_Malloc.

The Object domain is used for the purpose of memory allocation for Python objects.
A representative API is PyObject_Malloc.

Small-Size Memory Allocation

When allocating small-sized memory (512 bytes or less[1]) in the Mem or Object domains, concepts of arenas, pools, and blocks exist to reduce the frequency of system allocator calls.

Memory acquisition from the OS is performed in units of arenas.
An arena is divided into units called pools.
Furthermore, a pool consists of blocks of the same size.

Image of Arena

In the case of the 64-bit version of Python 3.14, the sizes of arenas and pools are as follows:

  • Arena: 1 MiB units[2]
  • Pool: 16 KiB units (64 pools per arena)[3]

Regarding blocks, the size of the block and which pool it is placed in are determined according to the number of requested bytes.[4]
Since CPython uses a method to allocate a block close to the requested size, if 9 bytes are requested, a 16-byte block will be allocated.

Management of Object Lifetime

Python objects allocated in the Object domain manage their lifetime using reference counting.
However, in the following cases, circular references occur, so reference counting alone cannot properly destroy the objects.

cycle_objects = [1,2,3]
cycle_objects.append(cycle_objects)

To destroy objects where such circular references have occurred, circular garbage collection is also supported.

Reference Counter

The basic concepts are as follows:

  • Every time an object is referenced, its counter increases.
  • When a reference to an object is released, the counter is decremented (e.g., when a variable is deleted via del).
  • When the counter reaches 0, the memory is released.

How to check the reference counter

There are two simple ways to observe the reference counter of a specific object from a Python program. [5]

  • Observing the reference counter via sys.getrefcount
  • Observing the reference counter using the internal structure of PyObject
Observing reference counters with sys.getrefcount

The reference counter of a specific object can be observed using sys.getrefcount.
When observing the reference counter with sys.getrefcount, the act of executing sys.getrefcount itself may affect the reference counter.
This is because passing an object as an argument to sys.getrefcount creates a reference to that object.
This is mentioned in the documentation for sys.getrefcount.

The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().

If that were all, you could simply subtract one from the observed reference counter. However, in CPython 3.14, it does not always result in that. There are cases where it affects the reference counter and cases where it can be observed without affecting it.

The following code results in 2 in an environment where the GIL is enabled in CPython 3.14, as it affects the reference counter.

import sys
a = int("99999")
print("sys.getrefcount(a):", sys.getrefcount(a))

On the other hand, the following code results in 1 without affecting the reference counter when using CPython 3.14 in my environment. (In CPython 3.13, it results in 2, and the results may vary in different environments.)

import sys
def test1():
    a = int("99999")
    print("id(a):", id(a))
    print("sys.getrefcount(a):", sys.getrefcount(a))
test1()

The reason for this is that there are cases where a reference is added and cases where it is not when passing an object to sys.getrefcount in the bytecode.
In any case, when measuring the reference counter with sys.getrefcount, pay attention to the Python version and the way it is called.

Why are there cases in the Python 3.14 test1 example where the reference counter is not affected?

To resolve this question, we need to look at the bytecode.

If you are not familiar with Python bytecode, please check the following article:
Deciphering CPython 3.13 Bytecode Through Experiments

First, let's check the bytecode when executing sys.getrefcount using the following method.

import dis
src = """
import sys
a = int("99999")
print("sys.getrefcount(a):", sys.getrefcount(a))
"""

code = compile(src, "<module>", "exec")

print("=== co_consts ===")
print(code.co_consts)

print("\n=== dis ===")
dis.dis(code)

print("\n=== run ===")
ns = {}
exec(code, ns, ns)

This bytecode is as follows:

=== dis ===
  0           RESUME                   0

  2           LOAD_SMALL_INT           0
              LOAD_CONST               1 (None)
              IMPORT_NAME              0 (sys)
              STORE_NAME               0 (sys)

  3           LOAD_NAME                1 (int)
              PUSH_NULL
              LOAD_CONST               2 ('99999')
              CALL                     1
              STORE_NAME               2 (a)

  4           LOAD_NAME                3 (print)
              PUSH_NULL
              LOAD_CONST               3 ('sys.getrefcount(a):')
              LOAD_NAME                0 (sys)
              LOAD_ATTR                8 (getrefcount)
              PUSH_NULL
              LOAD_NAME                2 (a)
              CALL                     1
              CALL                     2
              POP_TOP
              LOAD_CONST               1 (None)
              RETURN_VALUE

In this case, you can see that LOAD_NAME is used when passing variable a to sys.getrefcount. As a result, the reference to object a increases at the point it is passed to getrefcount.

Next, let's check the bytecode for the test1 function in Python 3.14.

import dis
dis.dis(test1)
 21           RESUME                   0

 22           LOAD_GLOBAL              1 (int + NULL)
              LOAD_CONST               0 ('99999')
              CALL                     1
              STORE_FAST               0 (a)

 23           LOAD_GLOBAL              3 (print + NULL)
              LOAD_CONST               1 ('id(a):')
              LOAD_GLOBAL              5 (id + NULL)
              LOAD_FAST_BORROW         0 (a)
              CALL                     1
              CALL                     2
              POP_TOP

 24           LOAD_GLOBAL              3 (print + NULL)
              LOAD_CONST               2 ('sys.getrefcount(a):')
              LOAD_GLOBAL              6 (sys)
              LOAD_ATTR                8 (getrefcount)
              PUSH_NULL
              LOAD_FAST_BORROW         0 (a)
              CALL                     1
              CALL                     2
              POP_TOP
              LOAD_CONST               3 (None)
              RETURN_VALUE

You can see that LOAD_FAST_BORROW is used when passing variable a to sys.getrefcount.

LOAD_FAST_BORROW is an opcode added in 3.14 to reduce reference counting overhead. This instruction utilizes a borrowed reference. This refers to a reference where the code using the object "borrows" it without incrementing the reference count.

Note that because it becomes LOAD_FAST_BORROW as a result of optimization, there are cases where it might not become LOAD_FAST_BORROW even in the same version depending on the CPython build configuration or the application of optimizations.

Also, you can see that in Python 3.13, it is LOAD_FAST instead of LOAD_FAST_BORROW.

Observing Reference Counters Using the Internal Structure of PyObject

While sys.getrefcount is an easy way to check an object's reference counter, as mentioned above, the act of observation itself can sometimes affect the reference counter.
Next, we will introduce a method to observe the reference counter using the internal structure of PyObject, referencing the implementation of sys.getrefcount.

Implementation of getrefcount

The implementation of sys.getrefcount in CPython is as follows:

Python-3.14.2/Include/refcount.h

    static inline Py_ssize_t _Py_REFCNT(PyObject *ob) {
    #if !defined(Py_GIL_DISABLED)
        return ob->ob_refcnt;
    #else
        uint32_t local = _Py_atomic_load_uint32_relaxed(&ob->ob_ref_local);
        if (local == _Py_IMMORTAL_REFCNT_LOCAL) {
            return _Py_IMMORTAL_INITIAL_REFCNT;
        }
        Py_ssize_t shared = _Py_atomic_load_ssize_relaxed(&ob->ob_ref_shared);
        return _Py_STATIC_CAST(Py_ssize_t, local) +
               Py_ARITHMETIC_RIGHT_SHIFT(Py_ssize_t, shared, _Py_REF_SHARED_SHIFT);
    #endif
    }
  • If the GIL is enabled, it uses ob_refcnt; otherwise, ob_ref_local + ob_ref_shared >> 2 becomes the reference counter.
  • _Py_IMMORTAL_REFCNT_LOCAL will be explained in the "Immortal Objects" section below.

This implementation serves as a useful reference when observing the reference counter on your own.

Python supports objects of various types. All these types have a PyObject, and it is possible to observe the reference counter from its fields.

When the GIL is enabled, you can check the reference counter by looking at the ob_refcnt of the PyObject. Below is a sample to obtain the ob_refcnt of the PyObject from a specified object in Python.

import sysconfig, sys
import dis
import ctypes

class PyObject(ctypes.Structure):
    _fields_ = [
        ("ob_refcnt", ctypes.c_ssize_t),
        ("ob_type", ctypes.c_void_p),
    ]
def inspect_base(addr):
    # Cast the specified pointer to the PyObject type
    obj = ctypes.cast(addr, ctypes.POINTER(PyObject)).contents
    ob_refcnt = obj.ob_refcnt
    ob_type = obj.ob_type
    print(f"Address: {addr}, Refcount: {ob_refcnt}, Type Address: {ob_type}")

def test1():
    a = int("99999")
    print("id(a):", id(a))
    print("sys.getrefcount(a):", sys.getrefcount(a))
    inspect_base(id(a))
test1()

For more details on PyObject and id, please refer to How are Python objects represented in memory? Part 1.

How to observe reference counters in the free-threaded version

In Python 3.14, a free-threaded version of the binary exists. In the free-threaded version, the structure of PyObject is different.

struct _object {
    // ob_tid stores the thread id (or zero). It is also used by the GC and the
    // trashcan mechanism as a linked list pointer and by the GC to store the
    // computed "gc_refs" refcount.
    uintptr_t ob_tid;
    uint16_t _padding;
    PyMutex ob_mutex;           // per-object lock
    uint8_t ob_gc_bits;         // gc-related state
    uint32_t ob_ref_local;      // local reference count
    Py_ssize_t ob_ref_shared;   // shared (atomic) reference count
    PyTypeObject *ob_type;
};

As you can see, ob_refcnt does not exist, and the reference counter is expressed by combining ob_ref_local and ob_ref_shared.
For details, refer to PEP 703 – Making the Global Interpreter Lock Optional in CPython - Reference Counting.

The inspect_base compatible with free-threading is as follows:

class PyObject(ctypes.Structure):
    _fields_ = [
        ("ob_tid", ctypes.c_void_p),
        ("_padding", ctypes.c_uint16),
        ("ob_mutex", ctypes.c_uint8),
        ("ob_gc_bits", ctypes.c_uint8),
        ("ob_ref_local", ctypes.c_uint32),
        ("ob_ref_shared", ctypes.c_ssize_t),
        ("ob_type", ctypes.c_void_p),
    ]
def inspect_base(addr):
    # Cast the specified pointer to the PyObject type
    obj = ctypes.cast(addr, ctypes.POINTER(PyObject)).contents
    ob_ref_local = obj.ob_ref_local
    ob_ref_shared = obj.ob_ref_shared
    ob_type = obj.ob_type
    UINT32_MAX = (1 << 32) - 1
    if ob_ref_local == UINT32_MAX:
        ref_cnt = 3 << 30 # _Py_IMMORTAL_INITIAL_REFCNT
    else:
        ref_cnt = ob_ref_local + (ob_ref_shared >> 2)
    
    print(f"Address: {addr}, ob_ref_shared: {ob_ref_shared},  ob_ref_local: {ob_ref_local},ref_cnt:{ref_cnt}, Type Address: {ob_type}")

This method allows for observation without affecting the reference counter, but please note that because it requires low-level operations, it may stop working due to differences in Python versions or environment configurations.

Reference Counting Observation Examples

Here, we will actually observe the reference counts in various cases.

Cases of General Literals

The reference counts of variables using literals result in complex outcomes because references that Python programmers are not aware of are generated automatically.

For example, let's look at the following case:

print('============================ test3')
def test3():
    a = 99999
    print("id(a):", id(a))
    print("sys.getrefcount(a):", sys.getrefcount(a))
    inspect_base(id(a))
    print("test3.__code__.co_consts:", test3.__code__.co_consts)

test3() # Refcount: 3

When running this sample in Python 3.14, based on the previous examples, one might guess that the reference count would be 1.
However, in reality, it becomes 2 or 3.

Even when running the same code on the same Python 3.14, the reference count changes depending on whether it was executed as a script file or interactively.
If executed from a script file, the reference count of the object created using the literal becomes 3.
In that case, it is referenced from the following two locations in addition to the variable itself:

  • The tuple of constants used in the bytecode (test3.__code__.co_consts)
  • The literal stored during the compilation of the .py file

In Python, the tuple of constants used in the bytecode can be checked via function_name.__code__.co_consts, but it is difficult to verify the literal stored during the compilation of the .py file.
It is necessary to use techniques like CPython's debug execution to observe where it is being referenced from.

Investigation into where it is created and where it is referenced from

Let's actually check where the literal stored during the compilation of the .py file is created and referenced from.

Python files are executed by the following function:

3.14/Python/pythonrun.c/pyrun_file

static PyObject *
pyrun_file(FILE *fp, PyObject *filename, int start, PyObject *globals,
           PyObject *locals, int closeit, PyCompilerFlags *flags)
{
    PyArena *arena = _PyArena_New();
    if (arena == NULL) {
        return NULL;
    }

    mod_ty mod;
    mod = _PyParser_ASTFromFile(fp, filename, NULL, start, NULL, NULL,
                                flags, NULL, arena);

    if (closeit) {
        fclose(fp);
    }

    PyObject *ret;
    if (mod != NULL) {
        // The script is executed here, and the reference count is observed within it.
        // In other words, it is observed before the arena is released by _PyArena_Free.
        ret = run_mod(mod, filename, globals, locals, flags, arena, NULL, 0);
    }
    else {
        ret = NULL;
    }
    _PyArena_Free(arena);

    return ret;
}

When _PyParser_ASTFromFile is executed, the syntax is parsed.
In _PyPegen_number_token, which is executed during that process, it can be confirmed that the parsed literal is stored in the a_objects list of PyArena.

3.14/Parser/pegen.c

expr_ty
_PyPegen_number_token(Parser *p)
{
    Token *t = _PyPegen_expect_token(p, NUMBER);
    if (t == NULL) {
        return NULL;
    }

    const char *num_raw = PyBytes_AsString(t->bytes);
    if (num_raw == NULL) {
        p->error_indicator = 1;
        return NULL;
    }

    if (p->feature_version < 6 && strchr(num_raw, '_') != NULL) {
        p->error_indicator = 1;
        return RAISE_SYNTAX_ERROR("Underscores in numeric literals are only supported "
                                  "in Python 3.6 and greater");
    }

    PyObject *c = parsenumber(num_raw);

    if (c == NULL) {
        p->error_indicator = 1;
        PyThreadState *tstate = _PyThreadState_GET();
        // The only way a ValueError should happen in _this_ code is via
        // PyLong_FromString hitting a length limit.
        if (tstate->current_exception != NULL &&
            Py_TYPE(tstate->current_exception) == (PyTypeObject *)PyExc_ValueError
        ) {
            PyObject *exc = PyErr_GetRaisedException();
            /* Intentionally omitting columns to avoid a wall of 1000s of '^'s
             * on the error message. Nobody is going to overlook their huge
             * numeric literal once given the line. */
            RAISE_ERROR_KNOWN_LOCATION(
                p, PyExc_SyntaxError,
                t->lineno, -1 /* col_offset */,
                t->end_lineno, -1 /* end_col_offset */,
                "%S - Consider hexadecimal for huge integer literals "
                "to avoid decimal conversion limits.",
                exc);
            Py_DECREF(exc);
        }
        return NULL;
    }
    // **Note**: 'c' is the literal object, and it is being added to p->arena.a_objects
    if (_PyArena_AddPyObject(p->arena, c) < 0) {
        Py_DECREF(c);
        p->error_indicator = 1;
        return NULL;
    }

    return _PyAST_Constant(c, NULL, t->lineno, t->col_offset, t->end_lineno,
                           t->end_col_offset, p->arena);
}

Note that PyArena here is a different concept from the arena used for allocating small chunks of memory; PyArena is the following structure:

3.14/Include/internal/pycore_pyarena.h

typedef struct _arena PyArena;

3.14/Python/pyarena.c

struct _arena {
    /* Pointer to the first block allocated for the arena. Never NULL.
       This pointer is used only to find the first block when the arena is freed.
     */
    block *a_head;

    /* Pointer to the block currently being used for allocation.
       The ab_next field of this block should be NULL.
       If ab_next is not NULL after a block_alloc() call,
       it means a new block was allocated, and a_cur must be
       reset to point to that block.
     */
    block *a_cur;

    /* A Python list object containing references to all PyObject pointers
       associated with this arena.
       These are DECREF'd when the arena is freed.
     */
    PyObject *a_objects;

#if defined(Py_DEBUG)
    /* Debug output */
    size_t total_allocs;
    size_t total_size;
    size_t total_blocks;
    size_t total_block_size;
    size_t total_big_blocks;
#endif
};

However, this content may change depending on the environment being verified.
If you suspect it is being referenced from somewhere different than the above, you might be able to grasp the creation source or reference source using the following methods.

Hints for investigation via debug execution:

  • Set a conditional breakpoint at PyLong_FromLong in 3.14.2/Objects/longobject.c to find out what the pointer of the expected literal will be. Also check the caller.
  • Set a conditional breakpoint at Py_INCREF in 3.14.2/Include/refcount.h to find out where the expected pointer is called from.

Investigation using --with-trace-refs:

By providing --with-trace-refs when building CPython, sys.getobjects becomes available.
This makes it possible to enumerate currently dynamically allocated Python objects.

import sys
# Investigate objects while some code is running
sys.getobjects(200000)

How to execute

PYTHONDUMPREFS=1 /opt/python-3.14.2-pydebug/bin/python3.14d my_script.py

While this doesn't directly tell you who created the object, it provides hints on where to look within CPython.

Note that the IDs of variables referencing the same literal will all be the same.
This can be verified with the following code:

def test5():
    a = 99997
    print(id(a))
    print(sys.getrefcount(a))
def test6():
    a = 99997
    b = 99997
    print(id(a))
    print(id(b))
    print(sys.getrefcount(a))
    print(sys.getrefcount(b))

test5()
test6()

From these results, it can be confirmed that variable a in test5, and variable a and variable b in test6 all point to the same ID.

4464648304
3
4464648304
4464648304
4
4
Literals for Small Numbers or Small Characters

When looking at the reference counter of variables storing small numbers or small character literals, there are cases where the value becomes extremely large.

def test7():
    a = 256
    b = 257
    c = "h"
    d = "hello"
    print(f"id(a): {id(a)} getrefcount(a): {sys.getrefcount(a)} _is_immortal:{sys._is_immortal(a)}")
    print(f"id(b): {id(b)} getrefcount(b): {sys.getrefcount(b)} _is_immortal:{sys._is_immortal(b)}")
    print(f"id(c): {id(c)} getrefcount(c): {sys.getrefcount(c)} _is_immortal:{sys._is_immortal(c)}")
    print(f"id(d): {id(d)} getrefcount(d): {sys.getrefcount(d)} _is_immortal:{sys._is_immortal(d)}")
test7()
id(a): 4365499168 getrefcount(a): 3221225472 _is_immortal:True
id(b): 4357021520 getrefcount(b): 3 _is_immortal:False
id(c): 4365558032 getrefcount(c): 3221225472 _is_immortal:True
id(d): 4357388880 getrefcount(d): 3 _is_immortal:False

Since Python 3.12, the concept of Immortal Objects has been added.
In CPython, small integers from -5 to 256 and small characters are registered as Immortal Objects and shared globally.
The value 3221225472 (3 << 30) appearing in the reference counter represents the reference count for Immortal Objects. This value is defined as _Py_IMMORTAL_INITIAL_REFCNT in CPython.

For more details regarding Immortal Objects, please refer to the following blog post:

Processing That Increases or Decreases Reference Counts

We will verify that reference counters increase and decrease with a simple example.
In the following example, we confirm that the reference counter increases upon object reference and decreases when the reference is released. Finally, we attempt to observe that the reference counter can no longer be accurately measured once the object is destroyed.

Since sys.getrefcount cannot be used for observation after object destruction, we will try to observe the reference count by passing the object's ID to the inspect_base function we created earlier.
After the object is destroyed, the value of the reference counter becomes undefined, so the number itself is meaningless. (We just want to confirm it has changed from the previous observation.)

def test_add_ref1():
    a = int('99900')
    addr = id(a)
    print("After variable creation", sys.getrefcount(a)) # 1
    inspect_base(addr)
    b = a
    print("After b = a", sys.getrefcount(a)) # 2
    inspect_base(addr)
    c = []
    c.append(a)
    print("After c.append(a)", sys.getrefcount(a)) # 3
    inspect_base(addr)
    del b
    print("After del b", sys.getrefcount(a)) # 2
    inspect_base(addr)
    c.clear()
    print("After c.clear()", sys.getrefcount(a)) # 1
    inspect_base(addr)
    return id(a)

addr = test_add_ref1()
print("After function completion")
inspect_base(addr) # The object at addr should be deleted, so the memory value becomes undefined

Output

After variable creation 1
Address: 4583176048, Refcount: 1, Type Address: 4548738440
After b = a 2
Address: 4583176048, Refcount: 2, Type Address: 4548738440
After c.append(a) 3
Address: 4583176048, Refcount: 3, Type Address: 4548738440
After del b 2
Address: 4583176048, Refcount: 2, Type Address: 4548738440
After c.clear() 1
Address: 4583176048, Refcount: 1, Type Address: 4548738440
After function completion
Address: 4583176048, Refcount: 0, Type Address: 4548738440

In this example, you can see the increase in the reference counter by assigning to variables or adding to containers, and the decrease by deleting variables or removing items from containers.
Finally, by exiting the function scope, variable a is no longer used, and as a result of the reference counter reaching 0, the object is destroyed. In this measurement, the Refcount became 0 after the function ended, but that does not mean "destruction was confirmed at 0"; it is simply an example of an undefined value.

Reference Counters When Using Weak References

Simply creating a weak reference object with weakref does not increase the reference counter.
In the following example, creating a weak reference object that points to an object does not increase the reference counter of the target object (corresponds to w = weakref.ref(a) in the code).
You can confirm that while retrieving the referent of a weak reference and holding that return value in a variable, a temporary strong reference is created, thus increasing the reference counter (corresponds to obj = w() in the code).

import weakref

print('============================ test_weakref_no_incref')
def test_weakref_no_incref():
    class Box:
        pass

    a = Box()
    addr = id(a)

    print("After variable creation", sys.getrefcount(a))
    inspect_base(addr)

    # Create a weak reference (this itself does not increase a's strong reference count)
    w = weakref.ref(a)
    print("w", w)
    print("After weakref.ref(a)", sys.getrefcount(a))  # Want to confirm this doesn't increase
    inspect_base(addr)

    # A reference increases for the first time here
    obj = w()
    print("After obj = w(), reference is possible", sys.getrefcount(a)) # +1
    inspect_base(addr)

    del obj
    print("del obj", sys.getrefcount(a)) # -1
    inspect_base(addr)

    del a
    inspect_base(addr)

    # Nothing can be retrieved
    print("w() ->", w())


test_weakref_no_incref()

Results

============================ test_weakref_no_incref
After variable creation 1
Address: 4535860064, Refcount: 1, Type Address: 140288173605632
w <weakref at 0x10e65b380; to 'test_weakref_no_incref.<locals>.Box' at 0x10e5bbb60>
After weakref.ref(a) 1
Address: 4535860064, Refcount: 1, Type Address: 140288173605632
After obj = w(), reference is possible 2
Address: 4535860064, Refcount: 2, Type Address: 140288173605632
del obj 1
Address: 4535860064, Refcount: 1, Type Address: 140288173605632
Address: 4535860064, Refcount: 0, Type Address: 140288173605632
w() -> None
Reference Counters When Circular References Occur

Let's check the behavior of reference counters when a circular reference occurs.

import gc
# Omitted
def test_add_ref2():
    a = []
    addr = id(a)
    print("Immediately after creation", sys.getrefcount(a)) # 1
    inspect_base(addr)
    a.append(a)
    print("After circular reference", sys.getrefcount(a)) # 1
    inspect_base(addr)
    del a
    print('After del a')
    inspect_base(addr)
    return addr
addr = test_add_ref2()
print('After function completion')
inspect_base(addr)
gc.collect()
print('After gc.collect()')
inspect_base(addr)  # Since the object at addr should be deleted, the memory value becomes undefined

Output

Immediately after creation 1
Address: 4539927744, Refcount: 1, Type Address: 4548734944
After circular reference 2
Address: 4539927744, Refcount: 2, Type Address: 4548734944
After del a
Address: 4539927744, Refcount: 1, Type Address: 4548734944
After function completion
Address: 4539927744, Refcount: 1, Type Address: 4548734944
After gc.collect()
Address: 4539927744, Refcount: 4542561216, Type Address: 4548734944

You can see that even after deleting the variable with del a and exiting the function, the reference counter remains at 1.
The timing when this variable a is destroyed will be after the garbage collection is executed.

Circular Garbage Collection

As mentioned earlier, garbage collection is used to resolve objects with circular references.
In Python's garbage collection, unreachable objects are identified and destroyed.
Not all Python objects are targets for garbage collection; instead, objects that have the potential to create circular references (such as lists, dictionaries, instances of custom classes, etc.) are targeted.

Garbage collection occurs at the following times:

  • Manual execution: When gc.collect() is executed.
  • Automatic execution: Python records the number of object allocations and deallocations. When the value obtained by subtracting the deallocations from the allocations exceeds a specific threshold.

Objects Subject to Garbage Collection

You can check whether a Python object is a target for garbage collection using gc.is_tracked.
Objects of str or int types are not targets for garbage collection.
Objects created from dict, list, functions, or classes are targets for garbage collection.
This can be verified with the following code:

import gc
v1 = 1
print(v1, type(v1), gc.is_tracked(v1))


class MyClass:
    pass


v2 = MyClass()
print(v2, type(v2), gc.is_tracked(v2))


def func_gen():
    yield 1
    yield 2


v3 = func_gen()
print(v3, type(v3), gc.is_tracked(v3))


async def func_co():
    return


v4 = func_co()
print(v4, type(v4), gc.is_tracked(v4))

c1 = [1, 2, 3]
print(c1, type(c1), gc.is_tracked(c1))

c2 = {"a": 1}
print(c2, type(c2), gc.is_tracked(c2))

c3 = (1, 2)
# (1, 2) <class 'tuple'> True
print(c3, type(c3), gc.is_tracked(c3))

c4 = (1, [])
# (1, 2) <class 'tuple'> True
print(c4, type(c4), gc.is_tracked(c4))

gc.collect()  # Run GC

print("after :")
print(v1, type(v1), gc.is_tracked(v1))
print(v2, type(v2), gc.is_tracked(v2))
print(c1, type(c1), gc.is_tracked(c1))
print(c2, type(c2), gc.is_tracked(c2))
print(c3, type(c3), gc.is_tracked(c3))
print(c4, type(c4), gc.is_tracked(c4))

Output

1 <class 'int'> False
<__main__.MyClass object at 0x10970d940> <class '__main__.MyClass'> True
<generator object func_gen at 0x1095cab00> <class 'generator'> True
<coroutine object func_co at 0x1095ca980> <class 'coroutine'> True
[1, 2, 3] <class 'list'> True
{'a': 1} <class 'dict'> True
(1, 2) <class 'tuple'> True
(1, []) <class 'tuple'> True
after :
1 <class 'int'> False
<__main__.MyClass object at 0x10970d940> <class '__main__.MyClass'> True
[1, 2, 3] <class 'list'> True
{'a': 1} <class 'dict'> True
(1, 2) <class 'tuple'> False
(1, []) <class 'tuple'> True
<sys>:0: RuntimeWarning: coroutine 'func_co' was never awaited

What is interesting is the behavior of the tuple type.
A tuple composed only of str or int types becomes untracked by garbage collection after a garbage collection has been performed once.
This is because optimization is executed during garbage collection, where tuples consisting only of types like integers and strings are untracked.

Optimization of the dict type before Python 3.13

Before Python 3.13, gc.is_tracked({"a": 1}) would result in False.
This was a result of the same optimization mentioned above for the tuple type being performed at the time of object creation.
However, in 3.14, this optimization process was removed, so gc.is_tracked({"a": 1}) now results in True.

https://github.com/python/cpython/issues/127010

Flow of Garbage Collection

Let's examine an example of executing garbage collection on objects that have circular references, as in the code below.


import gc
import ctypes

class PyObject(ctypes.Structure):
    _fields_ = [
        ("ob_refcnt", ctypes.c_ssize_t),
        ("ob_type", ctypes.c_void_p),
    ]
def inspect_base(addr):
    # Cast the specified pointer to the PyObject type
    obj = ctypes.cast(addr, ctypes.POINTER(PyObject)).contents
    ob_refcnt = obj.ob_refcnt
    ob_type = obj.ob_type
    print(f"Address: {addr}, Refcount: {ob_refcnt}, Type Address: {ob_type}")


class Link:
   def __init__(self, next_link=None):
       self.next_link = next_link
 
def test():
    link_3 = Link()
    link_2 = Link(link_3)
    link_1 = Link(link_2)
    link_3.next_link = link_1
    A = link_1

    link_4 = Link()
    link_5 = Link(link_4)
    link_4.next_link = link_5

    addr1 = id(link_1)
    addr2 = id(link_2)
    addr3 = id(link_3)
    addr4 = id(link_4)
    addr5 = id(link_5)

    del link_1, link_2, link_3
    del link_4, link_5

    inspect_base(addr1)
    inspect_base(addr2)
    inspect_base(addr3)
    inspect_base(addr4)
    inspect_base(addr5)

    gc.collect()
    print('after gc.collect()')
    inspect_base(addr1)
    inspect_base(addr2)
    inspect_base(addr3)
    inspect_base(addr4)
    inspect_base(addr5)

    # Represent this point in a diagram
    input("stop")
    del A
    print('after del A')
    inspect_base(addr1)
    inspect_base(addr2)
    inspect_base(addr3)
    inspect_base(addr4)
    inspect_base(addr5)

    gc.collect()
    print('after gc.collect()')
    inspect_base(addr1)
    inspect_base(addr2)
    inspect_base(addr3)
    inspect_base(addr4)
    inspect_base(addr5)
test()

The following diagram illustrates the state where only variable A remains.

Here is a brief introduction to the flow of executing garbage collection and deleting unreachable objects in this situation.
When garbage collection is executed, first, the reference count of each object is copied into gc_refs.

Next, for all objects targeted by garbage collection, the gc_refs of the objects referenced from their container items or custom class fields are decremented.

As a result, only objects reachable from variable A will have a gc_refs of 1 or more.
Those with gc_refs at 0 become candidates for unreachability.

Among the objects that are candidates for unreachability, those that can be transitively reached from variable A are removed from the candidates for unreachability.

Finally, the unreachable objects are deleted.

For the precise details, please refer to Garbage collector design and the actual implementation.

Implementation of Garbage Collection

The implementation of garbage collection differs completely depending on whether the GIL is enabled or disabled.

When the GIL is enabled
https://github.com/python/cpython/blob/main/Python/gc.c

For objects targeted by garbage collection, PyGC_Head is inserted before PyObject_HEAD.

From GC for the default build

The _gc_next and _gc_prev fields of PyGC_Head are used to hold information necessary for garbage collection.

When the GIL is disabled (Free-threaded)
https://github.com/python/cpython/blob/main/Python/gc_free_threading.c

In the free-threaded version, the structure of PyObject changes.

From GC for the free-threaded build

Information necessary for garbage collection is held in ob_gc_bits.

Memory Observation Methods

Finally, we will look at observation methods for implementations that perform operations that exhaust memory.
We will verify this using code used in a tutorial for a tool called memray.

Executing the function in this code consumes a large amount of memory.

The code that resolves the memory issue is as follows:

This reduces memory usage by changing the part where the list was being constructed with append to use yield.

Observation Using the PYTHONMALLOCSTATS Environment Variable

When you run Python with PYTHONMALLOCSTATS set, it outputs memory allocation statistics when an arena is created and when the process terminates.

Execution Method

PYTHONMALLOCSTATS=1 python exercise_1/fibonacci.py

First, let's use PYTHONMALLOCSTATS on the problematic code.

PYTHONMALLOCSTATS results for the problematic code
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------

# arenas allocated total           =                    0
# arenas reclaimed                 =                    0
# arenas highwater mark            =                    0
# arenas allocated current         =                    0
0 arenas * 1048576 bytes/arena     =                    0

# bytes in allocated blocks        =                    0
# bytes in available blocks        =                    0
0 unused pools * 16384 bytes       =                    0
# bytes lost to pool headers       =                    0
# bytes lost to quantization       =                    0
# bytes lost to arena alignment    =                    0
Total                              =                    0

arena map counts
# arena map mid nodes              =                    0
# arena map bot nodes              =                    0

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =                    0
# bytes lost to arena map bot      =                    0
Total                              =              262,144
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0     16           1              21          1000
    1     32           1             364           146
    2     48           4            1125           235
    3     64          12            3045            15
    4     80          15            2989            71
    5     96           2             266            74
    6    112           1             127            18
    7    128           2             216            38
    8    144           1              39            74
    9    160           1              61            41
   10    176           2             163            21
   11    192           1              17            68
   12    208           1              76             2
   13    224           1              35            37
   14    240           1              27            41
   15    256           1              36            27
   16    272           1              32            28
   17    288           1              20            36
   18    304           1              18            35
   19    320           1              19            32
   20    336           1              17            31
   21    352           1              12            34
   22    368           1               8            36
   23    384           1              12            30
   24    400           2              80             0
   25    416           1               6            33
   26    432           1               6            31
   27    448           1              10            26
   28    464           1               6            29
   29    480           1               8            26
   30    496           1               5            27
   31    512           1               7            24

# arenas allocated total           =                    1
# arenas reclaimed                 =                    0
# arenas highwater mark            =                    1
# arenas allocated current         =                    1
1 arenas * 1048576 bytes/arena     =            1,048,576

# bytes in allocated blocks        =              751,824
# bytes in available blocks        =              288,400
0 unused pools * 16384 bytes       =                    0
# bytes lost to pool headers       =                3,072
# bytes lost to quantization       =                5,280
# bytes lost to arena alignment    =                    0
Total                              =            1,048,576

arena map counts
# arena map mid nodes              =                    1
# arena map bot nodes              =                    1

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =              262,144
# bytes lost to arena map bot      =              131,072
Total                              =              655,360
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0     16           1              68           953
    1     32           2             777           243
    2     48          10            3168           232
    3     64          30            7469           181
    4     80          24            4849            47
    5     96           4             631            49
    6    112           3             350            85
    7    128           5             583            52
    8    144           1              97            16
    9    160           2             190            14
   10    176           8             709            27
   11    192           1              63            22
   12    208           4             267            45
   13    224           4             232            56
   14    240           2             104            32
   15    256           2             122             4
   16    272           3             122            58
   17    288           2              89            23
   18    304           2              65            41
   19    320           2              80            22
   20    336           2              75            21
   21    352           1              24            22
   22    368           1              44             0
   23    384           1              33             9
   24    400           4             160             0
   25    416           1              23            16
   26    432           1              23            14
   27    448           1              18            18
   28    464           1              16            19
   29    480           1              20            14
   30    496           1              18            14
   31    512           1              30             1

# arenas allocated total           =                    2
# arenas reclaimed                 =                    0
# arenas highwater mark            =                    2
# arenas allocated current         =                    2
2 arenas * 1048576 bytes/arena     =            2,097,152

# bytes in allocated blocks        =            1,863,984
# bytes in available blocks        =              217,312
0 unused pools * 16384 bytes       =                    0
# bytes lost to pool headers       =                6,144
# bytes lost to quantization       =                9,712
# bytes lost to arena alignment    =                    0
Total                              =            2,097,152

arena map counts
# arena map mid nodes              =                    1
# arena map bot nodes              =                    1

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =              262,144
# bytes lost to arena map bot      =              131,072
Total                              =              655,360
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0     16           1              77           944
    1     32           2             864           156
    2     48          11            3665            75
    3     64          34            8464           206
    4     80          27            5418            90
    5     96           6             862           158
    6    112           4             564            16
    7    128           7             781           108
    8    144           3             285            54
    9    160           4             390            18
   10    176          13            1181            15
   11    192           3             240            15
   12    208           7             472            74
   13    224           6             422            10
   14    240           5             282            58
   15    256           5             314             1
   16    272           6             343            17
   17    288           5             265            15
   18    304           5             254            11
   19    320           6             258            48
   20    336           6             263            25
   21    352           5             201            29
   22    368           6             223            41
   23    384           3             126             0
   24    400           5             185            15
   25    416           1              28            11
   26    432           1              27            10
   27    448           1              20            16
   28    464           1              19            16
   29    480           1              25             9
   30    496           1              22            10
   31    512           1              31             0

# arenas allocated total           =                    3
# arenas reclaimed                 =                    0
# arenas highwater mark            =                    3
# arenas allocated current         =                    3
3 arenas * 1048576 bytes/arena     =            3,145,728

# bytes in allocated blocks        =            2,896,800
# bytes in available blocks        =              222,960
0 unused pools * 16384 bytes       =                    0
# bytes lost to pool headers       =                9,216
# bytes lost to quantization       =               16,752
# bytes lost to arena alignment    =                    0
Total                              =            3,145,728

arena map counts
# arena map mid nodes              =                    1
# arena map bot nodes              =                    1

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =              262,144
# bytes lost to arena map bot      =              131,072
Total                              =              655,360
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0     16           1              77           944
    1     32           2             896           124
    2     48          12            3838           242
    3     64          34            8637            33
    4     80          28            5591           121
    5     96           7            1035           155
    6    112           6             737           133
    7    128           8             953            63
    8    144           5             458           107
    9    160           6             563            49
   10    176          15            1354            26
   11    192           5             413            12
   12    208           9             645            57
   13    224           9             595            53
   14    240           7             454            22
   15    256           5             315             0
   16    272           6             343            17
   17    288           5             265            15
   18    304           5             254            11
   19    320           6             258            48
   20    336           6             263            25
   21    352           5             201            29
   22    368           6             223            41
   23    384           5             207             3
   24    400           9             358             2
   25    416           6             201            33
   26    432           6             200            22
   27    448           6             192            24
   28    464           6             192            18
   29    480           6             198             6
   30    496           7             195            29
   31    512           7             204            13

# arenas allocated total           =                    4
# arenas reclaimed                 =                    0
# arenas highwater mark            =                    4
# arenas allocated current         =                    4
4 arenas * 1048576 bytes/arena     =            4,194,304

# bytes in allocated blocks        =            3,883,328
# bytes in available blocks        =              269,008
0 unused pools * 16384 bytes       =                    0
# bytes lost to pool headers       =               12,288
# bytes lost to quantization       =               29,680
# bytes lost to arena alignment    =                    0
Total                              =            4,194,304

arena map counts
# arena map mid nodes              =                    1
# arena map bot nodes              =                    1

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =              262,144
# bytes lost to arena map bot      =              131,072
Total                              =              655,360
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0     16           1              77           944
    1     32           2             896           124
    2     48          12            3838           242
    3     64          34            8637            33
    4     80          28            5591           121
    5     96           7            1035           155
    6    112           6             737           133
    7    128           8             953            63
    8    144           5             458           107
    9    160           6             563            49
   10    176          15            1354            26
   11    192           5             413            12
   12    208           9             645            57
   13    224           9             595            53
   14    240           7             454            22
   15    256           8             487            17
   16    272           9             516            24
   17    288           8             438            10
   18    304           9             427            50
   19    320           9             431            28
   20    336          10             436            44
   21    352           9             373            41
   22    368           9             396             0
   23    384          10             380            40
   24    400          14             531            29
   25    416          10             374            16
   26    432          11             373            34
   27    448          11             364            32
   28    464          11             365            20
   29    480          11             371             3
   30    496          10             320             0
   31    512           7             204            13

# arenas allocated total           =                    5
# arenas reclaimed                 =                    0
# arenas highwater mark            =                    5
# arenas allocated current         =                    5
5 arenas * 1048576 bytes/arena     =            5,242,880

# bytes in allocated blocks        =            4,899,232
# bytes in available blocks        =              286,608
0 unused pools * 16384 bytes       =                    0
# bytes lost to pool headers       =               15,360
# bytes lost to quantization       =               41,680
# bytes lost to arena alignment    =                    0
Total                              =            5,242,880

arena map counts
# arena map mid nodes              =                    1
# arena map bot nodes              =                    1

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =              262,144
# bytes lost to arena map bot      =              131,072
Total                              =              655,360
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0     16           1              77           944
    1     32           2             928            92
    2     48          12            4011            69
    3     64          35            8810           115
    4     80          29            5764           152
    5     96           8            1208           152
    6    112           7             910           105
    7    128           9            1125            18
    8    144           6             631            47
    9    160           8             736            80
   10    176          17            1527            37
   11    192           7             586             9
   12    208          11             818            40
   13    224          11             768            24
   14    240          10             626            54
   15    256          11             660            33
   16    272          12             689            31
   17    288          11             611             5
   18    304          12             600            36
   19    320          12             604             8
   20    336          13             609            15
   21    352          12             545             7
   22    368          13             569             3
   23    384          14             553            35
   24    400          18             704            16
   25    416          13             507             0
   26    432          11             373            34
   27    448          11             364            32
   28    464          11             365            20
   29    480          11             371             3
   30    496          12             368            16
   31    512          13             377            26

# arenas allocated total           =                    6
# arenas reclaimed                 =                    0
# arenas highwater mark            =                    6
# arenas allocated current         =                    6
6 arenas * 1048576 bytes/arena     =            6,291,456

# bytes in allocated blocks        =            5,958,544
# bytes in available blocks        =              245,264
0 unused pools * 16384 bytes       =                    0
# bytes lost to pool headers       =               18,384
# bytes lost to quantization       =               52,880
# bytes lost to arena alignment    =               16,384
Total                              =            6,291,456

arena map counts
# arena map mid nodes              =                    1
# arena map bot nodes              =                    1

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =              262,144
# bytes lost to arena map bot      =              131,072
Total                              =              655,360
4657
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------

# arenas allocated total           =                    7
# arenas reclaimed                 =                    6
# arenas highwater mark            =                    7
# arenas allocated current         =                    1
1 arenas * 1048576 bytes/arena     =            1,048,576

# bytes in allocated blocks        =                    0
# bytes in available blocks        =                    0
63 unused pools * 16384 bytes      =            1,032,192
# bytes lost to pool headers       =                    0
# bytes lost to quantization       =                    0
# bytes lost to arena alignment    =               16,384
Total                              =            1,048,576

arena map counts
# arena map mid nodes              =                    1
# arena map bot nodes              =                    1

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =              262,144
# bytes lost to arena map bot      =              131,072
Total                              =              655,360

What can be inferred from these results is as follows:

  • arenas allocated current increases monotonically from 0 to 6 and is reclaimed in the final output.
    • An arena, once allocated, is held throughout the process and destroyed upon process termination.
    • Based on the arena size alone, at least 6 MiB is being used.
  • Data around the 64-byte size is increasing.

Next, let's use PYTHONMALLOCSTATS on the code where the issue has been resolved.

PYTHONMALLOCSTATS=1 python solutions/exercise_1/fibonacci.py
PYTHONMALLOCSTATS results for the resolved code
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------

# arenas allocated total           =                    0
# arenas reclaimed                 =                    0
# arenas highwater mark            =                    0
# arenas allocated current         =                    0
0 arenas * 1048576 bytes/arena     =                    0

# bytes in allocated blocks        =                    0
# bytes in available blocks        =                    0
0 unused pools * 16384 bytes       =                    0
# bytes lost to pool headers       =                    0
# bytes lost to quantization       =                    0
# bytes lost to arena alignment    =                    0
Total                              =                    0

arena map counts
# arena map mid nodes              =                    0
# arena map bot nodes              =                    0

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =                    0
# bytes lost to arena map bot      =                    0
Total                              =              262,144
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0     16           1              16          1005
    1     32           1             347           163
    2     48           4            1062           298
    3     64          11            2805             0
    4     80          15            2929           131
    5     96           2             262            78
    6    112           1             126            19
    7    128           2             208            46
    8    144           1              40            73
    9    160           1              60            42
   10    176           2             119            65
   11    192           1              17            68
   12    208           1              75             3
   13    224           1              33            39
   14    240           1              27            41
   15    256           1              34            29
   16    272           1              30            30
   17    288           1              19            37
   18    304           1              19            34
   19    320           1              19            32
   20    336           1              16            32
   21    352           1              12            34
   22    368           1               8            36
   23    384           1              12            30
   24    400           2              71             9
   25    416           1               6            33
   26    432           1               6            31
   27    448           1               9            27
   28    464           1               6            29
   29    480           1               8            26
   30    496           1               5            27
   31    512           1               7            24

# arenas allocated total           =                    1
# arenas reclaimed                 =                    0
# arenas highwater mark            =                    1
# arenas allocated current         =                    1
1 arenas * 1048576 bytes/arena     =            1,048,576

# bytes in allocated blocks        =              712,656
# bytes in available blocks        =              311,248
0 unused pools * 16384 bytes       =                    0
# bytes lost to pool headers       =                3,024
# bytes lost to quantization       =                5,264
# bytes lost to arena alignment    =               16,384
Total                              =            1,048,576

arena map counts
# arena map mid nodes              =                    1
# arena map bot nodes              =                    1

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =              262,144
# bytes lost to arena map bot      =              131,072
Total                              =              655,360
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0     16           1              68           953
    1     32           2             777           243
    2     48          10            3092           308
    3     64          29            7314            81
    4     80          24            4799            97
    5     96           4             600            80
    6    112           3             329           106
    7    128           5             574            61
    8    144           1              96            17
    9    160           2             185            19
   10    176           8             703            33
   11    192           1              62            23
   12    208           4             265            47
   13    224           4             232            56
   14    240           2              97            39
   15    256           2             114            12
   16    272           2             117             3
   17    288           2              84            28
   18    304           2              59            47
   19    320           2              77            25
   20    336           2              72            24
   21    352           1              19            27
   22    368           1              44             0
   23    384           1              31            11
   24    400           4             158             2
   25    416           1              21            18
   26    432           1              21            16
   27    448           1              16            20
   28    464           1              15            20
   29    480           1              19            15
   30    496           1              17            15
   31    512           1              27             4

# arenas allocated total           =                    2
# arenas reclaimed                 =                    0
# arenas highwater mark            =                    2
# arenas allocated current         =                    2
2 arenas * 1048576 bytes/arena     =            2,097,152

# bytes in allocated blocks        =            1,816,992
# bytes in available blocks        =              231,664
0 unused pools * 16384 bytes       =                    0
# bytes lost to pool headers       =                6,048
# bytes lost to quantization       =                9,680
# bytes lost to arena alignment    =               32,768
Total                              =            2,097,152

arena map counts
# arena map mid nodes              =                    1
# arena map bot nodes              =                    1

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =              262,144
# bytes lost to arena map bot      =              131,072
Total                              =              655,360
Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------

# arenas allocated total           =                    3
# arenas reclaimed                 =                    2
# arenas highwater mark            =                    3
# arenas allocated current         =                    1
1 arenas * 1048576 bytes/arena     =            1,048,576

# bytes in allocated blocks        =                    0
# bytes in available blocks        =                    0
63 unused pools * 16384 bytes      =            1,032,192
# bytes lost to pool headers       =                    0
# bytes lost to quantization       =                    0
# bytes lost to arena alignment    =               16,384
Total                              =            1,048,576

arena map counts
# arena map mid nodes              =                    1
# arena map bot nodes              =                    1

# bytes lost to arena map root     =              262,144
# bytes lost to arena map mid      =              262,144
# bytes lost to arena map bot      =              131,072
Total                              =              655,360

What can be inferred from these results is as follows:

  • The maximum value of arenas allocated current is 2.
    • The number of allocated arenas has decreased compared to before the fix, meaning memory usage has reduced.

In this way, by monitoring the generation of object arenas using the PYTHONMALLOCSTATS environment variable, it is possible to indirectly observe memory usage.
However, this method cannot be parsed programmatically, and it cannot observe memory sizes outside the scope of arenas. In the previous case, we inferred that at least 6 MiB was used since 6 arenas were created, but the actual size is likely higher than that.

Observation using tracemalloc

tracemalloc is a library that makes tracking memory allocations much easier.
It allows you to check the peak memory usage and the size of objects currently remaining in memory.

Let's try observing the problematic code using tracemalloc.

from exercise_1.fibonacci import generate_fibonacci_hash
import tracemalloc

if __name__ == "__main__":
    tracemalloc.start()
    tracemalloc.reset_peak()
    # Code to be measured
    LENGTH_OF_SEQUENCE_1 = 33333
    LENGTH_OF_SEQUENCE_2 = 30000
    LENGTH_OF_SEQUENCE_3 = 34567
    generate_fibonacci_hash(
        LENGTH_OF_SEQUENCE_1, LENGTH_OF_SEQUENCE_2, LENGTH_OF_SEQUENCE_3
    )
    # End measurement
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics('lineno')
    print("[ Top 10 ]")
    for stat in top_stats[:10]:
        print(stat)
    current, peak = tracemalloc.get_traced_memory()
    print(f"current={current/1024/1024:.2f} MiB, peak={peak/1024/1024:.2f} MiB")

Observation results of the problematic code

[ Top 10 ]
/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:18: size=2528 B, count=79, average=32 B
/Users/user/work/techblog/python_perfomance/pipenv314/trace_fibonacci.py:8: size=400 B, count=1, average=400 B
/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:28: size=64 B, count=2, average=32 B
current=0.00 MiB, peak=145.06 MiB

What we can see from this is that the peak memory usage was 145.06 MiB, but it is not currently remaining.
Unlike PYTHONMALLOCSTATS, it allows you to observe the actual peak memory used by this process and where currently remaining objects were created.
The downside is that tracemalloc.take_snapshot only targets objects existing at that specific moment, making it difficult to determine which past objects were problematic.
While it is possible to identify causes by inserting tracemalloc into suspicious functions and observing them, it requires a significant amount of effort.

Observation using memray

memray is a memory profiler for Python and is a tool that provides significant help in solving memory-related issues.

https://github.com/bloomberg/memray
https://bloomberg.github.io/memray/overview.html

First, install memray.

python3 -m pip install memray

After that, run the problematic code.

memray python exercise_1/fibonacci.py 

When you run this code, a message like the following will be displayed.

Writing profile results into exercise_1/memray-fibonacci.py.69056.bin
4657
[memray] Successfully generated profile results.

You can now generate reports from the stored allocation records.
Some example commands to generate reports:

/Users/user/.local/share/virtualenvs/pipenv314-r3SaCLMX/bin/python -m memray flamegraph exercise_1/memray-fibonacci.py.69056.bin

Since memray-fibonacci.py.69056.bin has been created, convert it into a human-readable HTML using flamegraph.

flamegraph exercise_1/memray-fibonacci.py.69056.bin

This will create memray-flamegraph-fibonacci.py.69056.html.
By viewing this report, you can identify where memory allocations that cause issues are occurring.

In this example, you can confirm that 152 MiB is used and memory allocation is performed 80,000 times at output.append(output[i] + output[i + 1]).

The sample code this time was too simple, making it hard to see, but the graph is designed to make locations with high memory allocation visually easy to understand, as shown below.

Integration with pytest

By using the pytest-memray plugin, you can include memory usage tests in pytest.

https://github.com/bloomberg/pytest-memray

For this sample, please refer to the official memray test code.
https://github.com/bloomberg/memray/blob/main/docs/tutorials/tests/test_exercise_1.py

In this example, you can specify an upper limit for memory usage using pytest.mark.limit_memory.
If the test fails, a message like the following will be displayed.

__________________________________________________ test_fibonacci __________________________________________________
Test was limited to 100.0KiB but allocated 142.0MiB
------------------------------------------------ memray-max-memory -------------------------------------------------
List of allocations:
    - 270.8KiB allocated here:
        fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
        ...
    - 48.6MiB allocated here:
        fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
        ...
    - 270.8KiB allocated here:
        fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
        ...
    - 39.2MiB allocated here:
        fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
        ...
    - 53.4MiB allocated here:
        fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
        ...
    - 240.7KiB allocated here:
        fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
        ...
    - 6.3KiB allocated here:
        generate_fibonacci_hash:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:29

How memray works

A brief description of how memray works can be found here:
https://github.com/bloomberg/memray/discussions/225

Summary

In this article, we briefly explained how memory management is performed in CPython.

What I've described is just a simplified summary; to truly understand it with a practical sense, you probably need to read the CPython source code or perform debug executions yourself.

If you want to know how Python objects are laid out in memory or want to investigate bytecode, please refer to the following articles:

References

脚注
  1. The threshold for the size using the arena varies depending on the Python version. To check the exact value, refer to CPython's SMALL_REQUEST_THRESHOLD. ↩︎

  2. The arena size varies depending on the Python version and build method. To check the exact value, refer to CPython's ARENA_BITS. Also, for arena header information, refer to arena_object. ↩︎

  3. The pool size varies depending on the Python version and build method. To check the exact value, refer to CPython's POOL_BITS. Also, for pool header information, refer to pool_header. ↩︎

  4. To easily find a pool from the block size, usedpools is defined, allowing for high-speed searching for the corresponding pool from the block size. ↩︎

  5. While there are other means such as using C extensions or directly inspecting memory by attaching a debugger to a pointer obtained with id, they are out of scope for this article. ↩︎

Discussion