iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🥽

Reading Safetensors Headers

に公開

What is Safetensors

Safetensors is a library and file format developed by Hugging Face, primarily designed for safely and quickly reading and writing tensors.

https://github.com/huggingface/safetensors

The provided Python library is compatible with PyTorch, TensorFlow, and others. Furthermore, since it lacks the functionality to execute arbitrary code (unlike the pickle format) and is relatively safe, recent deep learning models are increasingly being distributed in this format.

Structure


Explanation of the Safetensors file structure [1]

Safetensors has a simple structure. It is broadly divided into the header size area (8 bytes), the header area (N bytes), and the buffer area (the remaining part). (Since the official names for these areas are unknown, I have given them these names in this article for convenience.) Because the header and buffer areas are separate, it is possible to use the header information to load only specific parts without reading the entire file.

In this article, I will explain how to read the header area of Safetensors.

Header Size Area

The first 8 bytes (uint64) represent the size of the header.

Header Area

The header area is UTF-8 JSON, so it can be easily read in many programming languages.

Header example
{
    "__metadata__": {
        "format": "pt"
    }, 
    "model.embed_tokens.weight": {
        "dtype": "F32", 
        "shape": [49152, 576], 
        "data_offsets": [0, 113246208]
    },
    "model.layers.0.input_layernorm.weight": {
        "dtype": "F32", 
        "shape": [576], 
        "data_offsets": [113246208, 113248512]
    }, 
    "model.layers.0.mlp.down_proj.weight": {
        "dtype": "F32", 
        "shape": [576, 1536], 
        "data_offsets": [113248512, 116787456]
    },
    ...
}

As shown in the explanatory image, basically:

"layer_name": {
    "dtype": "data_type",
    "shape": [dim1, dim2, ...],
    "data_offsets": [data_start, data_end]
}

is repeated.

Additionally, there is an optional __metadata__ field as a special key where you can store metadata. While there are no strict rules and you can include information freely, there is a constraint that it must be a string: string key-value pair. Although the header itself is in JSON format, only the string type can be used within __metadata__, so a bit of caution is needed.

To prevent DoS attacks, the maximum header size is limited to 100MB as a constraint [2]. If it exceeds the maximum size, it will fail to load with a HeaderTooLarge error.

dtype: Data Type

A string representing the data type. As of October 2, 2024, the following are available: [3]

  • BOOL: Boolean type
  • U8: Unsigned 8-bit integer
  • I8: Signed 8-bit integer
  • F8_E5M2: 8-bit floating point (5-bit exponent, 2-bit mantissa)
  • F8_E4M3: 8-bit floating point (4-bit exponent, 3-bit mantissa)
  • I16: Signed 16-bit integer
  • U16: Unsigned 16-bit integer
  • F16: 16-bit floating point
  • BF16: 16-bit floating point (Brain floating point)
  • I32: Signed 32-bit integer
  • U32: Unsigned 32-bit integer
  • F32: 32-bit floating point
  • F64: 64-bit floating point
  • I64: Signed 64-bit integer
  • U64: Unsigned 64-bit integer

However, depending on the library used to handle tensors, some data types may not be supported. [4]

shape: Tensor Shape

An array of integers representing the shape of the tensor.
For scalars (0-dimensional), specify it with an empty array [].

data_offsets: Data Start and End Positions

An array of integers [start, end] representing the start and end positions of the tensor data.

These are specified as relative positions from the beginning of the buffer area, not absolute positions. Therefore, in many cases, the start position of the first layer's data will be 0.

Specifications Regarding Metadata

While the content written in the __metadata__ field is flexible, a convention for using it to record model information has been proposed by Stability AI.

https://github.com/Stability-AI/ModelSpec

It allows for specifying model architecture, model names, and Base64-encoded thumbnail images. While there are items for text generation models, it is a standard primarily targeted at image generation models. I won't go into much depth here.

Reading Local Headers with Python

Let's try to retrieve the header of a Safetensors file using Python. As an example Safetensors file, I'm using the model file from HuggingFaceTB/SmolLM-135M.

main.py
import json

path = "./model.safetensors"  # Path where the safetensors file is located

with open(path, "rb") as f:
    # Read 8 bytes
    buffer = f.read(8)

# Convert the byte sequence to an integer in little-endian
header_size = int.from_bytes(buffer, byteorder="little")
print(f"header_size: {header_size}")

with open(path, "rb") as f:
    # Read the header portion
    f.seek(8)
    buffer = f.read(header_size)

# Decode the header portion as JSON
header = json.loads(buffer.decode("utf-8"))
print(header)
Execution results
 python ./main.py 
header_size: 30368
{'__metadata__': {'format': 'pt'}, 'model.embed_tokens.weight': {'dtype': 'F32', 'shape': [49152, 576], 'data_offsets': [0, 113246208]}, ...

The header size was 30368. Although the header output is partially omitted, you can see that it contains metadata and information for each layer of the model.

Reading Local Headers with Rust

Similar to Python, the process is the same. Since parsing JSON is a bit of a hassle, I've read it as a string here.

main.rs
use std::{fs::File, io::Read};

fn main() {
    let path = "./model.safetensors"; // Path where the safetensors file is located
    let mut file = File::open(path).unwrap();
    let mut buffer = vec![0u8; 8]; // Prepare an 8-byte buffer

    file.read_exact(&mut buffer).unwrap(); // Read 8 bytes from the file

    // Convert the buffer to u64 in little-endian
    let header_size = u64::from_le_bytes(buffer.try_into().unwrap());
    println!("header_size: {}", header_size);

    // Read for the header size
    let mut header_buffer = vec![0u8; header_size as usize];

    file.read_exact(&mut header_buffer).unwrap();
    let header = String::from_utf8(header_buffer).unwrap(); // Convert to text
    println!("{}", header);
}
Execution results
 cargo run -q
header_size: 30368
{"__metadata__":{"format":"pt"},"model.embed_tokens.weight":{"dtype":"F32","shape":[49152,576],"data_offsets":[0,113246208]}, ...

Reading Remote Headers with TypeScript

Thanks to the very simple structure of Safetensors, it is possible to retrieve layer information by fetching only the header portion without reading the entire file. By leveraging this characteristic and combining it with the HTTP Range request header, you can obtain information from a Safetensors file on the internet without downloading the complete file. Please refer to the MDN documentation for the Range header.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range

Below is an example of using TypeScript to retrieve the header without downloading the entire file.

main.ts
// Hugging Face model download URL
const fileUrl = "https://huggingface.co/HuggingFaceTB/SmolLM-135M/resolve/main/model.safetensors"

const headerSizeRes = await fetch(fileUrl,
    {
        method: "GET",
        headers: {
            // https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range
            "Range": "bytes=0-7" // Fetch 8 bytes
        }
    }
)
const headerSize = await headerSizeRes.arrayBuffer().then((buffer) => {
    // https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/DataView/getBigUint64 
    const view = new DataView(buffer) 
    // Read 8 bytes from the beginning (offset 0) in little-endian and convert to bigint
    // https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/DataView/getBigUint64
    return view.getBigUint64(0, true)
})
console.log(`headerSize: ${headerSize}`)

const headerRes = await fetch(
    fileUrl,
    {
        method: "GET",
        headers: {
            // Add 7n (bigint) to the header size to retrieve the header portion
            "Range": `bytes=8-${7n + headerSize}`
        }
    }
)
const json = await headerRes.json()
console.log(json)
Execution results
 bun run ./main.ts | head -n 10
headerSize: 30368
{
  __metadata__: {
    format: "pt",
  },
  "model.embed_tokens.weight": {
    dtype: "F32",
    shape: [ 49152, 576 ],
    data_offsets: [ 0, 113246208 ],
  },
  ...

I used Bun this time, but since it only uses standard features, it should work on other runtimes as well.

The method using the Range header is also introduced in the official documentation and is actually used by Hugging Face's model pages to display the total number of parameters and layer information. (You can see requests being made with the Range header if you monitor them from the Network tab.)

https://huggingface.co/docs/safetensors/metadata_parsing

Bonus

It's mostly for my own use, but I've created a CLI tool that can read or delete Safetensors metadata, so please give it a try if you're interested.

https://github.com/p1atdev/safemetadata

脚注
  1. Modified and translated from the official explanatory image (CC-BY-NC-SA-4.0) ↩︎

  2. https://github.com/huggingface/safetensors/blob/5db3b92c76ba293a0715b916c16b113c0b3551e9/safetensors/src/tensor.rs#L10 ↩︎

  3. From https://github.com/huggingface/safetensors/blob/5db3b92c76ba293a0715b916c16b113c0b3551e9/safetensors/src/tensor.rs#L654-L689 ↩︎

  4. For example, in integration with PyTorch, U64 and U16 are not supported. ↩︎

GitHubで編集を提案

Discussion