iTranslated by AI
Learning CPython Memory Management Slowly: Mechanisms and How to Observe Memory Growth
Introduction
Python is a programming language that allows you to write code without worrying much about memory, unlike C or C++.
Also, with the increasing performance of modern machines and rising labor costs, we no longer live in an era where "one byte of memory is a drop of blood," and thus, people are less conscious of memory usage than before.
However, if you completely ignore memory, you might end up paying the price at the most critical moments.
In this article, we will examine an overview of how memory is managed in CPython and its observation methods.
The sample code and behavior described in this article include results verified using CPython 3.14 on an Intel Mac with GIL disabled, as well as information gathered through debug execution.
Therefore, please note that results may vary depending on the CPU, CPython version, or build configuration.
Intended Readers and Prerequisites
This article assumes the following readers:
- People who want to grasp the overview of Python's memory management
- People who want to know how to observe Python's memory
- If you want to skip all the details and just perform the observation, please refer to How to observe memory.
- People who are interested in the internal structure of CPython
Also, knowledge of the following level is assumed:
- Able to read and write Python programs
- Able to check whether the Python version you are using has GIL enabled or disabled
- Understanding of general terms such as garbage collection and reference counters
- Desirable to be able to read C code
CPython Memory Allocation
CPython's memory management heavily relies on dynamic memory allocation.
In CPython, there are three categories of memory allocation APIs (allocator domains), each with a different role.
- Raw domain
- Mem domain
- Object domain
The Raw domain is used when allocation needs to be delegated to the system allocator or when memory allocation needs to be performed even when no thread state is associated.
A representative API is PyMem_RawMalloc.
The Mem domain is used when Python buffers that need to be allocated in a state associated with a thread state and general-purpose memory buffers are required.
A representative API is PyMem_Malloc.
The Object domain is used for the purpose of memory allocation for Python objects.
A representative API is PyObject_Malloc.
Small-Size Memory Allocation
When allocating small-sized memory (512 bytes or less[1]) in the Mem or Object domains, concepts of arenas, pools, and blocks exist to reduce the frequency of system allocator calls.
Memory acquisition from the OS is performed in units of arenas.
An arena is divided into units called pools.
Furthermore, a pool consists of blocks of the same size.

In the case of the 64-bit version of Python 3.14, the sizes of arenas and pools are as follows:
Regarding blocks, the size of the block and which pool it is placed in are determined according to the number of requested bytes.[4]
Since CPython uses a method to allocate a block close to the requested size, if 9 bytes are requested, a 16-byte block will be allocated.
Management of Object Lifetime
Python objects allocated in the Object domain manage their lifetime using reference counting.
However, in the following cases, circular references occur, so reference counting alone cannot properly destroy the objects.
cycle_objects = [1,2,3]
cycle_objects.append(cycle_objects)
To destroy objects where such circular references have occurred, circular garbage collection is also supported.
Reference Counter
The basic concepts are as follows:
- Every time an object is referenced, its counter increases.
- When a reference to an object is released, the counter is decremented (e.g., when a variable is deleted via
del). - When the counter reaches 0, the memory is released.
How to check the reference counter
There are two simple ways to observe the reference counter of a specific object from a Python program. [5]
- Observing the reference counter via sys.getrefcount
- Observing the reference counter using the internal structure of PyObject
Observing reference counters with sys.getrefcount
The reference counter of a specific object can be observed using sys.getrefcount.
When observing the reference counter with sys.getrefcount, the act of executing sys.getrefcount itself may affect the reference counter.
This is because passing an object as an argument to sys.getrefcount creates a reference to that object.
This is mentioned in the documentation for sys.getrefcount.
The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().
If that were all, you could simply subtract one from the observed reference counter. However, in CPython 3.14, it does not always result in that. There are cases where it affects the reference counter and cases where it can be observed without affecting it.
The following code results in 2 in an environment where the GIL is enabled in CPython 3.14, as it affects the reference counter.
import sys
a = int("99999")
print("sys.getrefcount(a):", sys.getrefcount(a))
On the other hand, the following code results in 1 without affecting the reference counter when using CPython 3.14 in my environment. (In CPython 3.13, it results in 2, and the results may vary in different environments.)
import sys
def test1():
a = int("99999")
print("id(a):", id(a))
print("sys.getrefcount(a):", sys.getrefcount(a))
test1()
The reason for this is that there are cases where a reference is added and cases where it is not when passing an object to sys.getrefcount in the bytecode.
In any case, when measuring the reference counter with sys.getrefcount, pay attention to the Python version and the way it is called.
Why are there cases in the Python 3.14 test1 example where the reference counter is not affected?
To resolve this question, we need to look at the bytecode.
If you are not familiar with Python bytecode, please check the following article:
Deciphering CPython 3.13 Bytecode Through Experiments
First, let's check the bytecode when executing sys.getrefcount using the following method.
import dis
src = """
import sys
a = int("99999")
print("sys.getrefcount(a):", sys.getrefcount(a))
"""
code = compile(src, "<module>", "exec")
print("=== co_consts ===")
print(code.co_consts)
print("\n=== dis ===")
dis.dis(code)
print("\n=== run ===")
ns = {}
exec(code, ns, ns)
This bytecode is as follows:
=== dis ===
0 RESUME 0
2 LOAD_SMALL_INT 0
LOAD_CONST 1 (None)
IMPORT_NAME 0 (sys)
STORE_NAME 0 (sys)
3 LOAD_NAME 1 (int)
PUSH_NULL
LOAD_CONST 2 ('99999')
CALL 1
STORE_NAME 2 (a)
4 LOAD_NAME 3 (print)
PUSH_NULL
LOAD_CONST 3 ('sys.getrefcount(a):')
LOAD_NAME 0 (sys)
LOAD_ATTR 8 (getrefcount)
PUSH_NULL
LOAD_NAME 2 (a)
CALL 1
CALL 2
POP_TOP
LOAD_CONST 1 (None)
RETURN_VALUE
In this case, you can see that LOAD_NAME is used when passing variable a to sys.getrefcount. As a result, the reference to object a increases at the point it is passed to getrefcount.
Next, let's check the bytecode for the test1 function in Python 3.14.
import dis
dis.dis(test1)
21 RESUME 0
22 LOAD_GLOBAL 1 (int + NULL)
LOAD_CONST 0 ('99999')
CALL 1
STORE_FAST 0 (a)
23 LOAD_GLOBAL 3 (print + NULL)
LOAD_CONST 1 ('id(a):')
LOAD_GLOBAL 5 (id + NULL)
LOAD_FAST_BORROW 0 (a)
CALL 1
CALL 2
POP_TOP
24 LOAD_GLOBAL 3 (print + NULL)
LOAD_CONST 2 ('sys.getrefcount(a):')
LOAD_GLOBAL 6 (sys)
LOAD_ATTR 8 (getrefcount)
PUSH_NULL
LOAD_FAST_BORROW 0 (a)
CALL 1
CALL 2
POP_TOP
LOAD_CONST 3 (None)
RETURN_VALUE
You can see that LOAD_FAST_BORROW is used when passing variable a to sys.getrefcount.
LOAD_FAST_BORROW is an opcode added in 3.14 to reduce reference counting overhead. This instruction utilizes a borrowed reference. This refers to a reference where the code using the object "borrows" it without incrementing the reference count.
Note that because it becomes LOAD_FAST_BORROW as a result of optimization, there are cases where it might not become LOAD_FAST_BORROW even in the same version depending on the CPython build configuration or the application of optimizations.
Also, you can see that in Python 3.13, it is LOAD_FAST instead of LOAD_FAST_BORROW.
Observing Reference Counters Using the Internal Structure of PyObject
While sys.getrefcount is an easy way to check an object's reference counter, as mentioned above, the act of observation itself can sometimes affect the reference counter.
Next, we will introduce a method to observe the reference counter using the internal structure of PyObject, referencing the implementation of sys.getrefcount.
Implementation of getrefcount
The implementation of sys.getrefcount in CPython is as follows:
Python-3.14.2/Include/refcount.h
static inline Py_ssize_t _Py_REFCNT(PyObject *ob) {
#if !defined(Py_GIL_DISABLED)
return ob->ob_refcnt;
#else
uint32_t local = _Py_atomic_load_uint32_relaxed(&ob->ob_ref_local);
if (local == _Py_IMMORTAL_REFCNT_LOCAL) {
return _Py_IMMORTAL_INITIAL_REFCNT;
}
Py_ssize_t shared = _Py_atomic_load_ssize_relaxed(&ob->ob_ref_shared);
return _Py_STATIC_CAST(Py_ssize_t, local) +
Py_ARITHMETIC_RIGHT_SHIFT(Py_ssize_t, shared, _Py_REF_SHARED_SHIFT);
#endif
}
- If the GIL is enabled, it uses
ob_refcnt; otherwise,ob_ref_local + ob_ref_shared >> 2becomes the reference counter. -
_Py_IMMORTAL_REFCNT_LOCALwill be explained in the "Immortal Objects" section below.
This implementation serves as a useful reference when observing the reference counter on your own.
Python supports objects of various types. All these types have a PyObject, and it is possible to observe the reference counter from its fields.
When the GIL is enabled, you can check the reference counter by looking at the ob_refcnt of the PyObject. Below is a sample to obtain the ob_refcnt of the PyObject from a specified object in Python.
import sysconfig, sys
import dis
import ctypes
class PyObject(ctypes.Structure):
_fields_ = [
("ob_refcnt", ctypes.c_ssize_t),
("ob_type", ctypes.c_void_p),
]
def inspect_base(addr):
# Cast the specified pointer to the PyObject type
obj = ctypes.cast(addr, ctypes.POINTER(PyObject)).contents
ob_refcnt = obj.ob_refcnt
ob_type = obj.ob_type
print(f"Address: {addr}, Refcount: {ob_refcnt}, Type Address: {ob_type}")
def test1():
a = int("99999")
print("id(a):", id(a))
print("sys.getrefcount(a):", sys.getrefcount(a))
inspect_base(id(a))
test1()
For more details on PyObject and id, please refer to How are Python objects represented in memory? Part 1.
How to observe reference counters in the free-threaded version
In Python 3.14, a free-threaded version of the binary exists. In the free-threaded version, the structure of PyObject is different.
struct _object {
// ob_tid stores the thread id (or zero). It is also used by the GC and the
// trashcan mechanism as a linked list pointer and by the GC to store the
// computed "gc_refs" refcount.
uintptr_t ob_tid;
uint16_t _padding;
PyMutex ob_mutex; // per-object lock
uint8_t ob_gc_bits; // gc-related state
uint32_t ob_ref_local; // local reference count
Py_ssize_t ob_ref_shared; // shared (atomic) reference count
PyTypeObject *ob_type;
};
As you can see, ob_refcnt does not exist, and the reference counter is expressed by combining ob_ref_local and ob_ref_shared.
For details, refer to PEP 703 – Making the Global Interpreter Lock Optional in CPython - Reference Counting.
The inspect_base compatible with free-threading is as follows:
class PyObject(ctypes.Structure):
_fields_ = [
("ob_tid", ctypes.c_void_p),
("_padding", ctypes.c_uint16),
("ob_mutex", ctypes.c_uint8),
("ob_gc_bits", ctypes.c_uint8),
("ob_ref_local", ctypes.c_uint32),
("ob_ref_shared", ctypes.c_ssize_t),
("ob_type", ctypes.c_void_p),
]
def inspect_base(addr):
# Cast the specified pointer to the PyObject type
obj = ctypes.cast(addr, ctypes.POINTER(PyObject)).contents
ob_ref_local = obj.ob_ref_local
ob_ref_shared = obj.ob_ref_shared
ob_type = obj.ob_type
UINT32_MAX = (1 << 32) - 1
if ob_ref_local == UINT32_MAX:
ref_cnt = 3 << 30 # _Py_IMMORTAL_INITIAL_REFCNT
else:
ref_cnt = ob_ref_local + (ob_ref_shared >> 2)
print(f"Address: {addr}, ob_ref_shared: {ob_ref_shared}, ob_ref_local: {ob_ref_local},ref_cnt:{ref_cnt}, Type Address: {ob_type}")
This method allows for observation without affecting the reference counter, but please note that because it requires low-level operations, it may stop working due to differences in Python versions or environment configurations.
Reference Counting Observation Examples
Here, we will actually observe the reference counts in various cases.
Cases of General Literals
The reference counts of variables using literals result in complex outcomes because references that Python programmers are not aware of are generated automatically.
For example, let's look at the following case:
print('============================ test3')
def test3():
a = 99999
print("id(a):", id(a))
print("sys.getrefcount(a):", sys.getrefcount(a))
inspect_base(id(a))
print("test3.__code__.co_consts:", test3.__code__.co_consts)
test3() # Refcount: 3
When running this sample in Python 3.14, based on the previous examples, one might guess that the reference count would be 1.
However, in reality, it becomes 2 or 3.
Even when running the same code on the same Python 3.14, the reference count changes depending on whether it was executed as a script file or interactively.
If executed from a script file, the reference count of the object created using the literal becomes 3.
In that case, it is referenced from the following two locations in addition to the variable itself:
- The tuple of constants used in the bytecode (
test3.__code__.co_consts) - The literal stored during the compilation of the .py file
In Python, the tuple of constants used in the bytecode can be checked via function_name.__code__.co_consts, but it is difficult to verify the literal stored during the compilation of the .py file.
It is necessary to use techniques like CPython's debug execution to observe where it is being referenced from.
Investigation into where it is created and where it is referenced from
Let's actually check where the literal stored during the compilation of the .py file is created and referenced from.
Python files are executed by the following function:
3.14/Python/pythonrun.c/pyrun_file
static PyObject *
pyrun_file(FILE *fp, PyObject *filename, int start, PyObject *globals,
PyObject *locals, int closeit, PyCompilerFlags *flags)
{
PyArena *arena = _PyArena_New();
if (arena == NULL) {
return NULL;
}
mod_ty mod;
mod = _PyParser_ASTFromFile(fp, filename, NULL, start, NULL, NULL,
flags, NULL, arena);
if (closeit) {
fclose(fp);
}
PyObject *ret;
if (mod != NULL) {
// The script is executed here, and the reference count is observed within it.
// In other words, it is observed before the arena is released by _PyArena_Free.
ret = run_mod(mod, filename, globals, locals, flags, arena, NULL, 0);
}
else {
ret = NULL;
}
_PyArena_Free(arena);
return ret;
}
When _PyParser_ASTFromFile is executed, the syntax is parsed.
In _PyPegen_number_token, which is executed during that process, it can be confirmed that the parsed literal is stored in the a_objects list of PyArena.
expr_ty
_PyPegen_number_token(Parser *p)
{
Token *t = _PyPegen_expect_token(p, NUMBER);
if (t == NULL) {
return NULL;
}
const char *num_raw = PyBytes_AsString(t->bytes);
if (num_raw == NULL) {
p->error_indicator = 1;
return NULL;
}
if (p->feature_version < 6 && strchr(num_raw, '_') != NULL) {
p->error_indicator = 1;
return RAISE_SYNTAX_ERROR("Underscores in numeric literals are only supported "
"in Python 3.6 and greater");
}
PyObject *c = parsenumber(num_raw);
if (c == NULL) {
p->error_indicator = 1;
PyThreadState *tstate = _PyThreadState_GET();
// The only way a ValueError should happen in _this_ code is via
// PyLong_FromString hitting a length limit.
if (tstate->current_exception != NULL &&
Py_TYPE(tstate->current_exception) == (PyTypeObject *)PyExc_ValueError
) {
PyObject *exc = PyErr_GetRaisedException();
/* Intentionally omitting columns to avoid a wall of 1000s of '^'s
* on the error message. Nobody is going to overlook their huge
* numeric literal once given the line. */
RAISE_ERROR_KNOWN_LOCATION(
p, PyExc_SyntaxError,
t->lineno, -1 /* col_offset */,
t->end_lineno, -1 /* end_col_offset */,
"%S - Consider hexadecimal for huge integer literals "
"to avoid decimal conversion limits.",
exc);
Py_DECREF(exc);
}
return NULL;
}
// **Note**: 'c' is the literal object, and it is being added to p->arena.a_objects
if (_PyArena_AddPyObject(p->arena, c) < 0) {
Py_DECREF(c);
p->error_indicator = 1;
return NULL;
}
return _PyAST_Constant(c, NULL, t->lineno, t->col_offset, t->end_lineno,
t->end_col_offset, p->arena);
}
Note that PyArena here is a different concept from the arena used for allocating small chunks of memory; PyArena is the following structure:
3.14/Include/internal/pycore_pyarena.h
typedef struct _arena PyArena;
struct _arena {
/* Pointer to the first block allocated for the arena. Never NULL.
This pointer is used only to find the first block when the arena is freed.
*/
block *a_head;
/* Pointer to the block currently being used for allocation.
The ab_next field of this block should be NULL.
If ab_next is not NULL after a block_alloc() call,
it means a new block was allocated, and a_cur must be
reset to point to that block.
*/
block *a_cur;
/* A Python list object containing references to all PyObject pointers
associated with this arena.
These are DECREF'd when the arena is freed.
*/
PyObject *a_objects;
#if defined(Py_DEBUG)
/* Debug output */
size_t total_allocs;
size_t total_size;
size_t total_blocks;
size_t total_block_size;
size_t total_big_blocks;
#endif
};
However, this content may change depending on the environment being verified.
If you suspect it is being referenced from somewhere different than the above, you might be able to grasp the creation source or reference source using the following methods.
Hints for investigation via debug execution:
- Set a conditional breakpoint at
PyLong_FromLongin3.14.2/Objects/longobject.cto find out what the pointer of the expected literal will be. Also check the caller. - Set a conditional breakpoint at
Py_INCREFin3.14.2/Include/refcount.hto find out where the expected pointer is called from.
Investigation using --with-trace-refs:
By providing --with-trace-refs when building CPython, sys.getobjects becomes available.
This makes it possible to enumerate currently dynamically allocated Python objects.
import sys
# Investigate objects while some code is running
sys.getobjects(200000)
How to execute
PYTHONDUMPREFS=1 /opt/python-3.14.2-pydebug/bin/python3.14d my_script.py
While this doesn't directly tell you who created the object, it provides hints on where to look within CPython.
Note that the IDs of variables referencing the same literal will all be the same.
This can be verified with the following code:
def test5():
a = 99997
print(id(a))
print(sys.getrefcount(a))
def test6():
a = 99997
b = 99997
print(id(a))
print(id(b))
print(sys.getrefcount(a))
print(sys.getrefcount(b))
test5()
test6()
From these results, it can be confirmed that variable a in test5, and variable a and variable b in test6 all point to the same ID.
4464648304
3
4464648304
4464648304
4
4
Literals for Small Numbers or Small Characters
When looking at the reference counter of variables storing small numbers or small character literals, there are cases where the value becomes extremely large.
def test7():
a = 256
b = 257
c = "h"
d = "hello"
print(f"id(a): {id(a)} getrefcount(a): {sys.getrefcount(a)} _is_immortal:{sys._is_immortal(a)}")
print(f"id(b): {id(b)} getrefcount(b): {sys.getrefcount(b)} _is_immortal:{sys._is_immortal(b)}")
print(f"id(c): {id(c)} getrefcount(c): {sys.getrefcount(c)} _is_immortal:{sys._is_immortal(c)}")
print(f"id(d): {id(d)} getrefcount(d): {sys.getrefcount(d)} _is_immortal:{sys._is_immortal(d)}")
test7()
id(a): 4365499168 getrefcount(a): 3221225472 _is_immortal:True
id(b): 4357021520 getrefcount(b): 3 _is_immortal:False
id(c): 4365558032 getrefcount(c): 3221225472 _is_immortal:True
id(d): 4357388880 getrefcount(d): 3 _is_immortal:False
Since Python 3.12, the concept of Immortal Objects has been added.
In CPython, small integers from -5 to 256 and small characters are registered as Immortal Objects and shared globally.
The value 3221225472 (3 << 30) appearing in the reference counter represents the reference count for Immortal Objects. This value is defined as _Py_IMMORTAL_INITIAL_REFCNT in CPython.
For more details regarding Immortal Objects, please refer to the following blog post:
Processing That Increases or Decreases Reference Counts
We will verify that reference counters increase and decrease with a simple example.
In the following example, we confirm that the reference counter increases upon object reference and decreases when the reference is released. Finally, we attempt to observe that the reference counter can no longer be accurately measured once the object is destroyed.
Since sys.getrefcount cannot be used for observation after object destruction, we will try to observe the reference count by passing the object's ID to the inspect_base function we created earlier.
After the object is destroyed, the value of the reference counter becomes undefined, so the number itself is meaningless. (We just want to confirm it has changed from the previous observation.)
def test_add_ref1():
a = int('99900')
addr = id(a)
print("After variable creation", sys.getrefcount(a)) # 1
inspect_base(addr)
b = a
print("After b = a", sys.getrefcount(a)) # 2
inspect_base(addr)
c = []
c.append(a)
print("After c.append(a)", sys.getrefcount(a)) # 3
inspect_base(addr)
del b
print("After del b", sys.getrefcount(a)) # 2
inspect_base(addr)
c.clear()
print("After c.clear()", sys.getrefcount(a)) # 1
inspect_base(addr)
return id(a)
addr = test_add_ref1()
print("After function completion")
inspect_base(addr) # The object at addr should be deleted, so the memory value becomes undefined
Output
After variable creation 1
Address: 4583176048, Refcount: 1, Type Address: 4548738440
After b = a 2
Address: 4583176048, Refcount: 2, Type Address: 4548738440
After c.append(a) 3
Address: 4583176048, Refcount: 3, Type Address: 4548738440
After del b 2
Address: 4583176048, Refcount: 2, Type Address: 4548738440
After c.clear() 1
Address: 4583176048, Refcount: 1, Type Address: 4548738440
After function completion
Address: 4583176048, Refcount: 0, Type Address: 4548738440
In this example, you can see the increase in the reference counter by assigning to variables or adding to containers, and the decrease by deleting variables or removing items from containers.
Finally, by exiting the function scope, variable a is no longer used, and as a result of the reference counter reaching 0, the object is destroyed. In this measurement, the Refcount became 0 after the function ended, but that does not mean "destruction was confirmed at 0"; it is simply an example of an undefined value.
Reference Counters When Using Weak References
Simply creating a weak reference object with weakref does not increase the reference counter.
In the following example, creating a weak reference object that points to an object does not increase the reference counter of the target object (corresponds to w = weakref.ref(a) in the code).
You can confirm that while retrieving the referent of a weak reference and holding that return value in a variable, a temporary strong reference is created, thus increasing the reference counter (corresponds to obj = w() in the code).
import weakref
print('============================ test_weakref_no_incref')
def test_weakref_no_incref():
class Box:
pass
a = Box()
addr = id(a)
print("After variable creation", sys.getrefcount(a))
inspect_base(addr)
# Create a weak reference (this itself does not increase a's strong reference count)
w = weakref.ref(a)
print("w", w)
print("After weakref.ref(a)", sys.getrefcount(a)) # Want to confirm this doesn't increase
inspect_base(addr)
# A reference increases for the first time here
obj = w()
print("After obj = w(), reference is possible", sys.getrefcount(a)) # +1
inspect_base(addr)
del obj
print("del obj", sys.getrefcount(a)) # -1
inspect_base(addr)
del a
inspect_base(addr)
# Nothing can be retrieved
print("w() ->", w())
test_weakref_no_incref()
Results
============================ test_weakref_no_incref
After variable creation 1
Address: 4535860064, Refcount: 1, Type Address: 140288173605632
w <weakref at 0x10e65b380; to 'test_weakref_no_incref.<locals>.Box' at 0x10e5bbb60>
After weakref.ref(a) 1
Address: 4535860064, Refcount: 1, Type Address: 140288173605632
After obj = w(), reference is possible 2
Address: 4535860064, Refcount: 2, Type Address: 140288173605632
del obj 1
Address: 4535860064, Refcount: 1, Type Address: 140288173605632
Address: 4535860064, Refcount: 0, Type Address: 140288173605632
w() -> None
Reference Counters When Circular References Occur
Let's check the behavior of reference counters when a circular reference occurs.
import gc
# Omitted
def test_add_ref2():
a = []
addr = id(a)
print("Immediately after creation", sys.getrefcount(a)) # 1
inspect_base(addr)
a.append(a)
print("After circular reference", sys.getrefcount(a)) # 1
inspect_base(addr)
del a
print('After del a')
inspect_base(addr)
return addr
addr = test_add_ref2()
print('After function completion')
inspect_base(addr)
gc.collect()
print('After gc.collect()')
inspect_base(addr) # Since the object at addr should be deleted, the memory value becomes undefined
Output
Immediately after creation 1
Address: 4539927744, Refcount: 1, Type Address: 4548734944
After circular reference 2
Address: 4539927744, Refcount: 2, Type Address: 4548734944
After del a
Address: 4539927744, Refcount: 1, Type Address: 4548734944
After function completion
Address: 4539927744, Refcount: 1, Type Address: 4548734944
After gc.collect()
Address: 4539927744, Refcount: 4542561216, Type Address: 4548734944
You can see that even after deleting the variable with del a and exiting the function, the reference counter remains at 1.
The timing when this variable a is destroyed will be after the garbage collection is executed.
Circular Garbage Collection
As mentioned earlier, garbage collection is used to resolve objects with circular references.
In Python's garbage collection, unreachable objects are identified and destroyed.
Not all Python objects are targets for garbage collection; instead, objects that have the potential to create circular references (such as lists, dictionaries, instances of custom classes, etc.) are targeted.
Garbage collection occurs at the following times:
- Manual execution: When gc.collect() is executed.
- Automatic execution: Python records the number of object allocations and deallocations. When the value obtained by subtracting the deallocations from the allocations exceeds a specific threshold.
Objects Subject to Garbage Collection
You can check whether a Python object is a target for garbage collection using gc.is_tracked.
Objects of str or int types are not targets for garbage collection.
Objects created from dict, list, functions, or classes are targets for garbage collection.
This can be verified with the following code:
import gc
v1 = 1
print(v1, type(v1), gc.is_tracked(v1))
class MyClass:
pass
v2 = MyClass()
print(v2, type(v2), gc.is_tracked(v2))
def func_gen():
yield 1
yield 2
v3 = func_gen()
print(v3, type(v3), gc.is_tracked(v3))
async def func_co():
return
v4 = func_co()
print(v4, type(v4), gc.is_tracked(v4))
c1 = [1, 2, 3]
print(c1, type(c1), gc.is_tracked(c1))
c2 = {"a": 1}
print(c2, type(c2), gc.is_tracked(c2))
c3 = (1, 2)
# (1, 2) <class 'tuple'> True
print(c3, type(c3), gc.is_tracked(c3))
c4 = (1, [])
# (1, 2) <class 'tuple'> True
print(c4, type(c4), gc.is_tracked(c4))
gc.collect() # Run GC
print("after :")
print(v1, type(v1), gc.is_tracked(v1))
print(v2, type(v2), gc.is_tracked(v2))
print(c1, type(c1), gc.is_tracked(c1))
print(c2, type(c2), gc.is_tracked(c2))
print(c3, type(c3), gc.is_tracked(c3))
print(c4, type(c4), gc.is_tracked(c4))
Output
1 <class 'int'> False
<__main__.MyClass object at 0x10970d940> <class '__main__.MyClass'> True
<generator object func_gen at 0x1095cab00> <class 'generator'> True
<coroutine object func_co at 0x1095ca980> <class 'coroutine'> True
[1, 2, 3] <class 'list'> True
{'a': 1} <class 'dict'> True
(1, 2) <class 'tuple'> True
(1, []) <class 'tuple'> True
after :
1 <class 'int'> False
<__main__.MyClass object at 0x10970d940> <class '__main__.MyClass'> True
[1, 2, 3] <class 'list'> True
{'a': 1} <class 'dict'> True
(1, 2) <class 'tuple'> False
(1, []) <class 'tuple'> True
<sys>:0: RuntimeWarning: coroutine 'func_co' was never awaited
What is interesting is the behavior of the tuple type.
A tuple composed only of str or int types becomes untracked by garbage collection after a garbage collection has been performed once.
This is because optimization is executed during garbage collection, where tuples consisting only of types like integers and strings are untracked.
Optimization of the dict type before Python 3.13
Before Python 3.13, gc.is_tracked({"a": 1}) would result in False.
This was a result of the same optimization mentioned above for the tuple type being performed at the time of object creation.
However, in 3.14, this optimization process was removed, so gc.is_tracked({"a": 1}) now results in True.
Flow of Garbage Collection
Let's examine an example of executing garbage collection on objects that have circular references, as in the code below.
import gc
import ctypes
class PyObject(ctypes.Structure):
_fields_ = [
("ob_refcnt", ctypes.c_ssize_t),
("ob_type", ctypes.c_void_p),
]
def inspect_base(addr):
# Cast the specified pointer to the PyObject type
obj = ctypes.cast(addr, ctypes.POINTER(PyObject)).contents
ob_refcnt = obj.ob_refcnt
ob_type = obj.ob_type
print(f"Address: {addr}, Refcount: {ob_refcnt}, Type Address: {ob_type}")
class Link:
def __init__(self, next_link=None):
self.next_link = next_link
def test():
link_3 = Link()
link_2 = Link(link_3)
link_1 = Link(link_2)
link_3.next_link = link_1
A = link_1
link_4 = Link()
link_5 = Link(link_4)
link_4.next_link = link_5
addr1 = id(link_1)
addr2 = id(link_2)
addr3 = id(link_3)
addr4 = id(link_4)
addr5 = id(link_5)
del link_1, link_2, link_3
del link_4, link_5
inspect_base(addr1)
inspect_base(addr2)
inspect_base(addr3)
inspect_base(addr4)
inspect_base(addr5)
gc.collect()
print('after gc.collect()')
inspect_base(addr1)
inspect_base(addr2)
inspect_base(addr3)
inspect_base(addr4)
inspect_base(addr5)
# Represent this point in a diagram
input("stop")
del A
print('after del A')
inspect_base(addr1)
inspect_base(addr2)
inspect_base(addr3)
inspect_base(addr4)
inspect_base(addr5)
gc.collect()
print('after gc.collect()')
inspect_base(addr1)
inspect_base(addr2)
inspect_base(addr3)
inspect_base(addr4)
inspect_base(addr5)
test()
The following diagram illustrates the state where only variable A remains.

Here is a brief introduction to the flow of executing garbage collection and deleting unreachable objects in this situation.
When garbage collection is executed, first, the reference count of each object is copied into gc_refs.

Next, for all objects targeted by garbage collection, the gc_refs of the objects referenced from their container items or custom class fields are decremented.

As a result, only objects reachable from variable A will have a gc_refs of 1 or more.
Those with gc_refs at 0 become candidates for unreachability.
Among the objects that are candidates for unreachability, those that can be transitively reached from variable A are removed from the candidates for unreachability.

Finally, the unreachable objects are deleted.

For the precise details, please refer to Garbage collector design and the actual implementation.
Implementation of Garbage Collection
The implementation of garbage collection differs completely depending on whether the GIL is enabled or disabled.
When the GIL is enabled
For objects targeted by garbage collection, PyGC_Head is inserted before PyObject_HEAD.

From GC for the default build
The _gc_next and _gc_prev fields of PyGC_Head are used to hold information necessary for garbage collection.
When the GIL is disabled (Free-threaded)
In the free-threaded version, the structure of PyObject changes.

From GC for the free-threaded build
Information necessary for garbage collection is held in ob_gc_bits.
Memory Observation Methods
Finally, we will look at observation methods for implementations that perform operations that exhaust memory.
We will verify this using code used in a tutorial for a tool called memray.
Executing the function in this code consumes a large amount of memory.
The code that resolves the memory issue is as follows:
This reduces memory usage by changing the part where the list was being constructed with append to use yield.
Observation Using the PYTHONMALLOCSTATS Environment Variable
When you run Python with PYTHONMALLOCSTATS set, it outputs memory allocation statistics when an arena is created and when the process terminates.
Execution Method
PYTHONMALLOCSTATS=1 python exercise_1/fibonacci.py
First, let's use PYTHONMALLOCSTATS on the problematic code.
PYTHONMALLOCSTATS results for the problematic code
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
# arenas allocated total = 0
# arenas reclaimed = 0
# arenas highwater mark = 0
# arenas allocated current = 0
0 arenas * 1048576 bytes/arena = 0
# bytes in allocated blocks = 0
# bytes in available blocks = 0
0 unused pools * 16384 bytes = 0
# bytes lost to pool headers = 0
# bytes lost to quantization = 0
# bytes lost to arena alignment = 0
Total = 0
arena map counts
# arena map mid nodes = 0
# arena map bot nodes = 0
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 0
# bytes lost to arena map bot = 0
Total = 262,144
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
0 16 1 21 1000
1 32 1 364 146
2 48 4 1125 235
3 64 12 3045 15
4 80 15 2989 71
5 96 2 266 74
6 112 1 127 18
7 128 2 216 38
8 144 1 39 74
9 160 1 61 41
10 176 2 163 21
11 192 1 17 68
12 208 1 76 2
13 224 1 35 37
14 240 1 27 41
15 256 1 36 27
16 272 1 32 28
17 288 1 20 36
18 304 1 18 35
19 320 1 19 32
20 336 1 17 31
21 352 1 12 34
22 368 1 8 36
23 384 1 12 30
24 400 2 80 0
25 416 1 6 33
26 432 1 6 31
27 448 1 10 26
28 464 1 6 29
29 480 1 8 26
30 496 1 5 27
31 512 1 7 24
# arenas allocated total = 1
# arenas reclaimed = 0
# arenas highwater mark = 1
# arenas allocated current = 1
1 arenas * 1048576 bytes/arena = 1,048,576
# bytes in allocated blocks = 751,824
# bytes in available blocks = 288,400
0 unused pools * 16384 bytes = 0
# bytes lost to pool headers = 3,072
# bytes lost to quantization = 5,280
# bytes lost to arena alignment = 0
Total = 1,048,576
arena map counts
# arena map mid nodes = 1
# arena map bot nodes = 1
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 262,144
# bytes lost to arena map bot = 131,072
Total = 655,360
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
0 16 1 68 953
1 32 2 777 243
2 48 10 3168 232
3 64 30 7469 181
4 80 24 4849 47
5 96 4 631 49
6 112 3 350 85
7 128 5 583 52
8 144 1 97 16
9 160 2 190 14
10 176 8 709 27
11 192 1 63 22
12 208 4 267 45
13 224 4 232 56
14 240 2 104 32
15 256 2 122 4
16 272 3 122 58
17 288 2 89 23
18 304 2 65 41
19 320 2 80 22
20 336 2 75 21
21 352 1 24 22
22 368 1 44 0
23 384 1 33 9
24 400 4 160 0
25 416 1 23 16
26 432 1 23 14
27 448 1 18 18
28 464 1 16 19
29 480 1 20 14
30 496 1 18 14
31 512 1 30 1
# arenas allocated total = 2
# arenas reclaimed = 0
# arenas highwater mark = 2
# arenas allocated current = 2
2 arenas * 1048576 bytes/arena = 2,097,152
# bytes in allocated blocks = 1,863,984
# bytes in available blocks = 217,312
0 unused pools * 16384 bytes = 0
# bytes lost to pool headers = 6,144
# bytes lost to quantization = 9,712
# bytes lost to arena alignment = 0
Total = 2,097,152
arena map counts
# arena map mid nodes = 1
# arena map bot nodes = 1
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 262,144
# bytes lost to arena map bot = 131,072
Total = 655,360
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
0 16 1 77 944
1 32 2 864 156
2 48 11 3665 75
3 64 34 8464 206
4 80 27 5418 90
5 96 6 862 158
6 112 4 564 16
7 128 7 781 108
8 144 3 285 54
9 160 4 390 18
10 176 13 1181 15
11 192 3 240 15
12 208 7 472 74
13 224 6 422 10
14 240 5 282 58
15 256 5 314 1
16 272 6 343 17
17 288 5 265 15
18 304 5 254 11
19 320 6 258 48
20 336 6 263 25
21 352 5 201 29
22 368 6 223 41
23 384 3 126 0
24 400 5 185 15
25 416 1 28 11
26 432 1 27 10
27 448 1 20 16
28 464 1 19 16
29 480 1 25 9
30 496 1 22 10
31 512 1 31 0
# arenas allocated total = 3
# arenas reclaimed = 0
# arenas highwater mark = 3
# arenas allocated current = 3
3 arenas * 1048576 bytes/arena = 3,145,728
# bytes in allocated blocks = 2,896,800
# bytes in available blocks = 222,960
0 unused pools * 16384 bytes = 0
# bytes lost to pool headers = 9,216
# bytes lost to quantization = 16,752
# bytes lost to arena alignment = 0
Total = 3,145,728
arena map counts
# arena map mid nodes = 1
# arena map bot nodes = 1
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 262,144
# bytes lost to arena map bot = 131,072
Total = 655,360
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
0 16 1 77 944
1 32 2 896 124
2 48 12 3838 242
3 64 34 8637 33
4 80 28 5591 121
5 96 7 1035 155
6 112 6 737 133
7 128 8 953 63
8 144 5 458 107
9 160 6 563 49
10 176 15 1354 26
11 192 5 413 12
12 208 9 645 57
13 224 9 595 53
14 240 7 454 22
15 256 5 315 0
16 272 6 343 17
17 288 5 265 15
18 304 5 254 11
19 320 6 258 48
20 336 6 263 25
21 352 5 201 29
22 368 6 223 41
23 384 5 207 3
24 400 9 358 2
25 416 6 201 33
26 432 6 200 22
27 448 6 192 24
28 464 6 192 18
29 480 6 198 6
30 496 7 195 29
31 512 7 204 13
# arenas allocated total = 4
# arenas reclaimed = 0
# arenas highwater mark = 4
# arenas allocated current = 4
4 arenas * 1048576 bytes/arena = 4,194,304
# bytes in allocated blocks = 3,883,328
# bytes in available blocks = 269,008
0 unused pools * 16384 bytes = 0
# bytes lost to pool headers = 12,288
# bytes lost to quantization = 29,680
# bytes lost to arena alignment = 0
Total = 4,194,304
arena map counts
# arena map mid nodes = 1
# arena map bot nodes = 1
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 262,144
# bytes lost to arena map bot = 131,072
Total = 655,360
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
0 16 1 77 944
1 32 2 896 124
2 48 12 3838 242
3 64 34 8637 33
4 80 28 5591 121
5 96 7 1035 155
6 112 6 737 133
7 128 8 953 63
8 144 5 458 107
9 160 6 563 49
10 176 15 1354 26
11 192 5 413 12
12 208 9 645 57
13 224 9 595 53
14 240 7 454 22
15 256 8 487 17
16 272 9 516 24
17 288 8 438 10
18 304 9 427 50
19 320 9 431 28
20 336 10 436 44
21 352 9 373 41
22 368 9 396 0
23 384 10 380 40
24 400 14 531 29
25 416 10 374 16
26 432 11 373 34
27 448 11 364 32
28 464 11 365 20
29 480 11 371 3
30 496 10 320 0
31 512 7 204 13
# arenas allocated total = 5
# arenas reclaimed = 0
# arenas highwater mark = 5
# arenas allocated current = 5
5 arenas * 1048576 bytes/arena = 5,242,880
# bytes in allocated blocks = 4,899,232
# bytes in available blocks = 286,608
0 unused pools * 16384 bytes = 0
# bytes lost to pool headers = 15,360
# bytes lost to quantization = 41,680
# bytes lost to arena alignment = 0
Total = 5,242,880
arena map counts
# arena map mid nodes = 1
# arena map bot nodes = 1
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 262,144
# bytes lost to arena map bot = 131,072
Total = 655,360
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
0 16 1 77 944
1 32 2 928 92
2 48 12 4011 69
3 64 35 8810 115
4 80 29 5764 152
5 96 8 1208 152
6 112 7 910 105
7 128 9 1125 18
8 144 6 631 47
9 160 8 736 80
10 176 17 1527 37
11 192 7 586 9
12 208 11 818 40
13 224 11 768 24
14 240 10 626 54
15 256 11 660 33
16 272 12 689 31
17 288 11 611 5
18 304 12 600 36
19 320 12 604 8
20 336 13 609 15
21 352 12 545 7
22 368 13 569 3
23 384 14 553 35
24 400 18 704 16
25 416 13 507 0
26 432 11 373 34
27 448 11 364 32
28 464 11 365 20
29 480 11 371 3
30 496 12 368 16
31 512 13 377 26
# arenas allocated total = 6
# arenas reclaimed = 0
# arenas highwater mark = 6
# arenas allocated current = 6
6 arenas * 1048576 bytes/arena = 6,291,456
# bytes in allocated blocks = 5,958,544
# bytes in available blocks = 245,264
0 unused pools * 16384 bytes = 0
# bytes lost to pool headers = 18,384
# bytes lost to quantization = 52,880
# bytes lost to arena alignment = 16,384
Total = 6,291,456
arena map counts
# arena map mid nodes = 1
# arena map bot nodes = 1
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 262,144
# bytes lost to arena map bot = 131,072
Total = 655,360
4657
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
# arenas allocated total = 7
# arenas reclaimed = 6
# arenas highwater mark = 7
# arenas allocated current = 1
1 arenas * 1048576 bytes/arena = 1,048,576
# bytes in allocated blocks = 0
# bytes in available blocks = 0
63 unused pools * 16384 bytes = 1,032,192
# bytes lost to pool headers = 0
# bytes lost to quantization = 0
# bytes lost to arena alignment = 16,384
Total = 1,048,576
arena map counts
# arena map mid nodes = 1
# arena map bot nodes = 1
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 262,144
# bytes lost to arena map bot = 131,072
Total = 655,360
What can be inferred from these results is as follows:
-
arenas allocated currentincreases monotonically from 0 to 6 and is reclaimed in the final output.- An arena, once allocated, is held throughout the process and destroyed upon process termination.
- Based on the arena size alone, at least 6 MiB is being used.
- Data around the 64-byte size is increasing.
Next, let's use PYTHONMALLOCSTATS on the code where the issue has been resolved.
PYTHONMALLOCSTATS=1 python solutions/exercise_1/fibonacci.py
PYTHONMALLOCSTATS results for the resolved code
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
# arenas allocated total = 0
# arenas reclaimed = 0
# arenas highwater mark = 0
# arenas allocated current = 0
0 arenas * 1048576 bytes/arena = 0
# bytes in allocated blocks = 0
# bytes in available blocks = 0
0 unused pools * 16384 bytes = 0
# bytes lost to pool headers = 0
# bytes lost to quantization = 0
# bytes lost to arena alignment = 0
Total = 0
arena map counts
# arena map mid nodes = 0
# arena map bot nodes = 0
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 0
# bytes lost to arena map bot = 0
Total = 262,144
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
0 16 1 16 1005
1 32 1 347 163
2 48 4 1062 298
3 64 11 2805 0
4 80 15 2929 131
5 96 2 262 78
6 112 1 126 19
7 128 2 208 46
8 144 1 40 73
9 160 1 60 42
10 176 2 119 65
11 192 1 17 68
12 208 1 75 3
13 224 1 33 39
14 240 1 27 41
15 256 1 34 29
16 272 1 30 30
17 288 1 19 37
18 304 1 19 34
19 320 1 19 32
20 336 1 16 32
21 352 1 12 34
22 368 1 8 36
23 384 1 12 30
24 400 2 71 9
25 416 1 6 33
26 432 1 6 31
27 448 1 9 27
28 464 1 6 29
29 480 1 8 26
30 496 1 5 27
31 512 1 7 24
# arenas allocated total = 1
# arenas reclaimed = 0
# arenas highwater mark = 1
# arenas allocated current = 1
1 arenas * 1048576 bytes/arena = 1,048,576
# bytes in allocated blocks = 712,656
# bytes in available blocks = 311,248
0 unused pools * 16384 bytes = 0
# bytes lost to pool headers = 3,024
# bytes lost to quantization = 5,264
# bytes lost to arena alignment = 16,384
Total = 1,048,576
arena map counts
# arena map mid nodes = 1
# arena map bot nodes = 1
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 262,144
# bytes lost to arena map bot = 131,072
Total = 655,360
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
0 16 1 68 953
1 32 2 777 243
2 48 10 3092 308
3 64 29 7314 81
4 80 24 4799 97
5 96 4 600 80
6 112 3 329 106
7 128 5 574 61
8 144 1 96 17
9 160 2 185 19
10 176 8 703 33
11 192 1 62 23
12 208 4 265 47
13 224 4 232 56
14 240 2 97 39
15 256 2 114 12
16 272 2 117 3
17 288 2 84 28
18 304 2 59 47
19 320 2 77 25
20 336 2 72 24
21 352 1 19 27
22 368 1 44 0
23 384 1 31 11
24 400 4 158 2
25 416 1 21 18
26 432 1 21 16
27 448 1 16 20
28 464 1 15 20
29 480 1 19 15
30 496 1 17 15
31 512 1 27 4
# arenas allocated total = 2
# arenas reclaimed = 0
# arenas highwater mark = 2
# arenas allocated current = 2
2 arenas * 1048576 bytes/arena = 2,097,152
# bytes in allocated blocks = 1,816,992
# bytes in available blocks = 231,664
0 unused pools * 16384 bytes = 0
# bytes lost to pool headers = 6,048
# bytes lost to quantization = 9,680
# bytes lost to arena alignment = 32,768
Total = 2,097,152
arena map counts
# arena map mid nodes = 1
# arena map bot nodes = 1
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 262,144
# bytes lost to arena map bot = 131,072
Total = 655,360
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
# arenas allocated total = 3
# arenas reclaimed = 2
# arenas highwater mark = 3
# arenas allocated current = 1
1 arenas * 1048576 bytes/arena = 1,048,576
# bytes in allocated blocks = 0
# bytes in available blocks = 0
63 unused pools * 16384 bytes = 1,032,192
# bytes lost to pool headers = 0
# bytes lost to quantization = 0
# bytes lost to arena alignment = 16,384
Total = 1,048,576
arena map counts
# arena map mid nodes = 1
# arena map bot nodes = 1
# bytes lost to arena map root = 262,144
# bytes lost to arena map mid = 262,144
# bytes lost to arena map bot = 131,072
Total = 655,360
What can be inferred from these results is as follows:
- The maximum value of
arenas allocated currentis 2.- The number of allocated arenas has decreased compared to before the fix, meaning memory usage has reduced.
In this way, by monitoring the generation of object arenas using the PYTHONMALLOCSTATS environment variable, it is possible to indirectly observe memory usage.
However, this method cannot be parsed programmatically, and it cannot observe memory sizes outside the scope of arenas. In the previous case, we inferred that at least 6 MiB was used since 6 arenas were created, but the actual size is likely higher than that.
Observation using tracemalloc
tracemalloc is a library that makes tracking memory allocations much easier.
It allows you to check the peak memory usage and the size of objects currently remaining in memory.
Let's try observing the problematic code using tracemalloc.
from exercise_1.fibonacci import generate_fibonacci_hash
import tracemalloc
if __name__ == "__main__":
tracemalloc.start()
tracemalloc.reset_peak()
# Code to be measured
LENGTH_OF_SEQUENCE_1 = 33333
LENGTH_OF_SEQUENCE_2 = 30000
LENGTH_OF_SEQUENCE_3 = 34567
generate_fibonacci_hash(
LENGTH_OF_SEQUENCE_1, LENGTH_OF_SEQUENCE_2, LENGTH_OF_SEQUENCE_3
)
# End measurement
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print("[ Top 10 ]")
for stat in top_stats[:10]:
print(stat)
current, peak = tracemalloc.get_traced_memory()
print(f"current={current/1024/1024:.2f} MiB, peak={peak/1024/1024:.2f} MiB")
Observation results of the problematic code
[ Top 10 ]
/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:18: size=2528 B, count=79, average=32 B
/Users/user/work/techblog/python_perfomance/pipenv314/trace_fibonacci.py:8: size=400 B, count=1, average=400 B
/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:28: size=64 B, count=2, average=32 B
current=0.00 MiB, peak=145.06 MiB
What we can see from this is that the peak memory usage was 145.06 MiB, but it is not currently remaining.
Unlike PYTHONMALLOCSTATS, it allows you to observe the actual peak memory used by this process and where currently remaining objects were created.
The downside is that tracemalloc.take_snapshot only targets objects existing at that specific moment, making it difficult to determine which past objects were problematic.
While it is possible to identify causes by inserting tracemalloc into suspicious functions and observing them, it requires a significant amount of effort.
Observation using memray
memray is a memory profiler for Python and is a tool that provides significant help in solving memory-related issues.
First, install memray.
python3 -m pip install memray
After that, run the problematic code.
memray python exercise_1/fibonacci.py
When you run this code, a message like the following will be displayed.
Writing profile results into exercise_1/memray-fibonacci.py.69056.bin
4657
[memray] Successfully generated profile results.
You can now generate reports from the stored allocation records.
Some example commands to generate reports:
/Users/user/.local/share/virtualenvs/pipenv314-r3SaCLMX/bin/python -m memray flamegraph exercise_1/memray-fibonacci.py.69056.bin
Since memray-fibonacci.py.69056.bin has been created, convert it into a human-readable HTML using flamegraph.
flamegraph exercise_1/memray-fibonacci.py.69056.bin
This will create memray-flamegraph-fibonacci.py.69056.html.
By viewing this report, you can identify where memory allocations that cause issues are occurring.

In this example, you can confirm that 152 MiB is used and memory allocation is performed 80,000 times at output.append(output[i] + output[i + 1]).
The sample code this time was too simple, making it hard to see, but the graph is designed to make locations with high memory allocation visually easy to understand, as shown below.

Integration with pytest
By using the pytest-memray plugin, you can include memory usage tests in pytest.
For this sample, please refer to the official memray test code.
In this example, you can specify an upper limit for memory usage using pytest.mark.limit_memory.
If the test fails, a message like the following will be displayed.
__________________________________________________ test_fibonacci __________________________________________________
Test was limited to 100.0KiB but allocated 142.0MiB
------------------------------------------------ memray-max-memory -------------------------------------------------
List of allocations:
- 270.8KiB allocated here:
fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
...
- 48.6MiB allocated here:
fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
...
- 270.8KiB allocated here:
fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
...
- 39.2MiB allocated here:
fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
...
- 53.4MiB allocated here:
fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
...
- 240.7KiB allocated here:
fibonacci:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:19
...
- 6.3KiB allocated here:
generate_fibonacci_hash:/Users/user/work/techblog/python_perfomance/pipenv314/exercise_1/fibonacci.py:29
How memray works
A brief description of how memray works can be found here:
- Tracks C-level memory allocations by intercepting
malloc/free/reallocfunction calls inlibc.so. - For Python memory allocation, refer to Python allocators.
- It can be confirmed that memory allocator processing is hooked using PyMem_SetAllocator.
- Stack investigation
- Python stacks are investigated using sys.setprofile.
- For native stacks, refer to Symbolic information in native mode.
Summary
In this article, we briefly explained how memory management is performed in CPython.
What I've described is just a simplified summary; to truly understand it with a practical sense, you probably need to read the CPython source code or perform debug executions yourself.
If you want to know how Python objects are laid out in memory or want to investigate bytecode, please refer to the following articles:
- How are Python objects represented in memory? Part 1
- How are Python objects represented in memory? Part 2
- Deciphering CPython 3.13 Bytecode Through Experiments
References
- Japanese Book: Garbage Collection Algorithms and Implementation
- Published in 2010, so it's old, but the fundamentals remain the same.
- English Book: CPython Internals: Your Guide to the Python 3 Interpreter
- Based on the Python 3.9 era, so it's slightly dated, but still a helpful reference.
- Python Docs: Memory Management
- Python Docs: Supporting Cyclic Garbage Collection
-
Garbage collector design
- Internal documentation regarding garbage collection
- Python Docs: Object Life Cycle
- Python Docs: Reference Counting
- PEP 683 – Immortal Objects, Using a Fixed Refcount
- PEP 445 – Add new APIs to customize Python memory allocators
-
PEP 703 – Making the Global Interpreter Lock Optional in CPython : Reference Counting
- Reference counting for the free-threaded version
- blog: How CPython Implements Reference Counting: Dissecting CPython Internals
- blog: Understanding Immortal Objects in Python 3.12: A Deep Dive into Python Internals
- memray Documentation
-
The threshold for the size using the arena varies depending on the Python version. To check the exact value, refer to CPython's SMALL_REQUEST_THRESHOLD. ↩︎
-
The arena size varies depending on the Python version and build method. To check the exact value, refer to CPython's ARENA_BITS. Also, for arena header information, refer to arena_object. ↩︎
-
The pool size varies depending on the Python version and build method. To check the exact value, refer to CPython's POOL_BITS. Also, for pool header information, refer to pool_header. ↩︎
-
To easily find a pool from the block size, usedpools is defined, allowing for high-speed searching for the corresponding pool from the block size. ↩︎
-
While there are other means such as using C extensions or directly inspecting memory by attaching a debugger to a pointer obtained with
id, they are out of scope for this article. ↩︎
Discussion