iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🧪

Introducing TorchFont: A Machine Learning Library for Vector Fonts

に公開

1. Introduction

Fonts play a crucial role in conveying textual information in printed matter and on digital devices. Their creation requires significant time and effort, leading to expectations for automated generation through machine learning.

Existing machine learning methods for fonts have evolved primarily as applications of image generation technology. Representative examples include methods using VAEs or GANs, which have successfully generated bitmap fonts. However, since fonts are used at various scales, vector formats, which are resolution-independent, are the mainstream. Consequently, bitmap generation faces the challenge of low practical utility.

One reason why research into vector font generation has not progressed much is the lack of machine learning libraries that support data structures specific to vector fonts. Therefore, I have developed TorchFont, a domain-specific library for vector fonts based on PyTorch. In this article, I will introduce the design philosophy and features of TorchFont.

https://github.com/torchfont/torchfont

2. Existing Methods

2.1. DeepSVG Dataset

DeepSVG is an early study that addressed vector image generation. DeepSVG worked on generating not only vector images but also vector fonts, and the dataset has been released along with the source code (strictly speaking, it is the SVG-VAE dataset from its preceding research).

https://alexandre01.github.io/deepsvg/

https://github.com/alexandre01/deepsvg

Comparing the DeepSVG font dataset with the Google Fonts dataset results in the following:

Number of Font Faces Number of Characters Total Number of Samples
DeepSVG 45,821 62 100,000
Google Fonts Approx. 9,000 Approx. 110,000 Approx. 12,000,000

While Google Fonts has fewer fonts, it has about 120 times more samples. Furthermore, while the DeepSVG dataset is limited to alphanumeric characters, Google Fonts has no such restriction. Especially when considering the generation of multilingual fonts including Japanese, the DeepSVG dataset is arguably insufficient.

2.2. TTFQuery and fontTools

TTFQuery is a library for extracting vector font outlines. TTFQuery is a library that wraps Python's fontTools, making it easy to obtain font outline information.

https://github.com/mcfletch/ttfquery

In fact, fontTools is a versatile and powerful library used even in font development environments, providing various functions such as font analysis and editing. It is possible to obtain font outline information directly by using fontTools without necessarily using TTFQuery. For details, you might refer to fontTools' Pen protocol.

https://github.com/fonttools/fonttools

On the other hand, there are problems with methods using fontTools. There are two main issues:

  • High memory usage
  • Slow parser processing speed

Specifically, reading all fonts included in Google Fonts requires about 30 GB of memory. If the parsing speed were fast, it might be acceptable to parse every time, but since the processing speed is also slow, it must be kept in memory.

Furthermore, because font file parsing results cannot be shared across multiple processes, increasing num_workers in a PyTorch DataLoader requires memory proportional to the number of workers. This is particularly fatal; if the machine's memory capacity is limited, the DataLoader becomes a bottleneck. I believe this issue is one reason why research like DeepSVG builds datasets in advance rather than processing them on-the-fly.

These problems arise because fontTools is written in pure Python and is highly general-purpose. While it's unfair to blame fontTools, it is undeniably somewhat unsuitable for use in the field of machine learning.

3. Proposed Method

3.1. Skrifa

There have been two major trends in font-related libraries:

  • C/C++ based libraries such as FreeType and HarfBuzz
  • Python-based libraries such as fontTools

While C/C++ based libraries mainly handle user-oriented processing like font rendering, Python-based libraries have handled developer-oriented processing like font editing.

In response to this situation, projects have been underway in recent years to replace these with Rust, which is faster than Python and more memory-safe than C/C++. One such initiative is the Oxidize project by the Google Fonts team.

https://github.com/googlefonts/oxidize

For example, Chrome previously used the C/C++ based FreeType for font rendering but was replaced by the Rust-based Skrifa in 2025.

https://developer.chrome.com/blog/memory-safety-fonts?hl=ja

In TorchFont, Skrifa was adopted for obtaining outlines. Skrifa's parsing speed is extremely high compared to fontTools, eliminating the need to constantly keep parsing results in memory.

As a result, memory usage has been significantly reduced. Specifically, while fontTools required about 30 GB to load the entire Google Fonts into memory, it was reduced to about 2 GB when using Skrifa.

3.2. Memory Mapped I/O

The adoption of Skrifa is also significant because it enabled memory sharing across multiple processes. With Skrifa, there is no longer a need to consider sharing parsing results across multiple processes; one only needs to consider sharing the font files themselves.

By reading font files using Memory Mapped I/O, sharing across multiple processes can be easily achieved. This method has made it possible to increase the num_workers in the PyTorch DataLoader without incurring an increase in memory usage.

https://crates.io/crates/memmap2

3.4. Maturin

Since TorchFont is a machine learning library, it is desirable for it to be usable from Python. Therefore, TorchFont was implemented as a hybrid Python and Rust project using Maturin.

Maturin is a tool for calling Rust source code from Python. By using Maturin, it becomes possible to handle font outline information obtained using Skrifa in Rust within Python.

https://www.maturin.rs/

4. Usage Examples

The library I created is published in the following repository:

https://github.com/torchfont/torchfont

The following are examples of how to use TorchFont to load Google Fonts and retrieve outline information.

4.1. Installation

Since TorchFont is published on PyPI, it can be installed using the pip command.

pip install torchfont

Alternatively, for uv, which has been gaining popularity recently, you can install it with the following command:

uv add torchfont

4.2. Dataset Construction

TorchFont provides a dataset class to easily build a dataset for Google Fonts, which is likely to be the most frequently used. By running the following code, you can construct a Google Fonts dataset that inherits from the PyTorch Dataset class.

from torchfont.datasets import GoogleFonts

dataset = GoogleFonts(
    root="data/google/fonts",
    ref="main",
    download=True,
)

The arguments for the GoogleFonts class allow you to specify the following:

  • root: The directory where the Google Fonts repository will be cloned.
  • ref: The branch or tag to clone.
  • patterns: Patterns for the paths of the font files to be used in the dataset.
  • codepoint_filter: A list of Unicode codepoints for the characters to be used in the dataset.
  • transform: A transformation function to be applied to the retrieved outline information.
  • download: Whether to clone the Google Fonts repository.

4.3. Retrieving Outline Information

Once you have built the dataset, you can retrieve outline information simply by accessing it with an index, just like a regular PyTorch Dataset. The return value of the dataset is a tuple consisting of four elements. An example is shown below:

types, coords, style_label, content_label = dataset[42]

print(f"{types=}")
print(f"{coords=}")
print(f"{style_label=}")
print(f"{content_label=}")

The execution result of the code above is as follows:

types=tensor([1, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 2, 4, 5])
coords=tensor([[ 0.0000,  0.0000,  0.0000,  0.0000,  0.2759,  0.2207],
        [ 0.2759,  0.1755,  0.2701,  0.1295,  0.2585,  0.0828],
        [ 0.2470,  0.0361,  0.2309, -0.0092,  0.2102, -0.0530],
        [ 0.1895, -0.0968,  0.1647, -0.1377,  0.1357, -0.1758],
        [ 0.1068, -0.2139,  0.0749, -0.2471,  0.0400, -0.2754],
        [ 0.0000,  0.0000,  0.0000,  0.0000, -0.0352, -0.2300],
        [-0.0156, -0.2033,  0.0015, -0.1730,  0.0161, -0.1392],
        [ 0.0308, -0.1053,  0.0430, -0.0701,  0.0527, -0.0337],
        [ 0.0625,  0.0028,  0.0698,  0.0394,  0.0747,  0.0762],
        [ 0.0796,  0.1130,  0.0820,  0.1479,  0.0820,  0.1812],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0820,  0.7031],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.2759,  0.7031],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]])
style_label=0
content_label=74

types is a tensor of class labels representing the types of drawing commands. It takes integer values from 0 to 5, each representing the following drawing commands:

Class Label Drawing Command
0 PAD
1 MoveTo
2 LineTo
3 CurveTo
4 ClosePath
5 EOS

coords is a tensor representing the coordinates of control points and endpoints that serve as arguments for the drawing commands. For MoveTo and LineTo, it is 2D with only the endpoint. For CurveTo, it is 6D because it has two control points and an endpoint. Other commands do not have arguments. The tensor is structured to accommodate the maximum of 6 dimensions, with irrelevant parts padded with zeros. As for the range of values, they are normalized by the UPEM value of each font so that they generally fall within the range of 0 to 1.

style_label is an integer representing the font face class, and content_label is an integer representing the character type class. The presence of these two labels—style and content—is a characteristic feature of the font domain. Note that both are represented as sequential numbers starting from 0. For example, content_label does not correspond to Unicode codepoints.

4.4. Retrieving Dataset Information

To achieve a user experience similar to existing datasets in PyTorch and TorchVision, the dataset class provides properties for obtaining the dataset length and class labels. An example is shown below:

print(f"{len(dataset)=}")
print(f"{len(dataset.content_classes)=}")
print(f"{len(dataset.style_classes)=}")

print(f"{dataset.style_classes[0:2]=}")
print(f"{dataset.content_classes[42:58]=}")

The execution results are as follows:

len(dataset)=12076302
len(dataset.content_classes)=113545
len(dataset.style_classes)=8772
dataset.style_classes[0:2]=['Aclonica Regular', 'Arimo Italic']
dataset.content_classes[42:58]=['*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In this way, you can easily obtain information such as font face names and character types from the class indices.

4.5. Batch Processing with DataLoader

Since TorchFont's dataset inherits from the PyTorch Dataset class, it can be passed to a DataLoader for batch processing, just like a normal Dataset.

Since types and coords are variable-length sequences corresponding to drawing commands, padding must be performed with collate_fn, as is done in text generation and similar tasks.

from collections.abc import Sequence

import torch
from torch import Tensor
from torch.nn.utils.rnn import pad_sequence

def collate_fn(
    batch: Sequence[tuple[Tensor, Tensor, int, int]],
) -> tuple[Tensor, Tensor, Tensor, Tensor]:
    types_list = [types for types, _, _, _ in batch]
    coords_list = [coords for _, coords, _, _ in batch]
    style_label_list = [style for _, _, style, _ in batch]
    content_label_list = [content for _, _, _, content in batch]

    types_tensor = pad_sequence(types_list, batch_first=True, padding_value=0)
    coords_tensor = pad_sequence(coords_list, batch_first=True, padding_value=0.0)

    style_label_tensor = torch.as_tensor(style_label_list, dtype=torch.long)
    content_label_tensor = torch.as_tensor(content_label_list, dtype=torch.long)

    return types_tensor, coords_tensor, style_label_tensor, content_label_tensor

A DataLoader is constructed using such a collate_fn. For example, the following implementation allows data to be read with a batch size of 64 and 8 worker processes.

from torch.utils.data import DataLoader
from tqdm import tqdm

dataloader = DataLoader(
    dataset,
    batch_size=64,
    shuffle=True,
    num_workers=8,
    prefetch_factor=2,
    collate_fn=collate_fn,
    multiprocessing_context="fork",
)

for batch in tqdm(dataloader, desc="Iterating over datasets"):
    sample = batch

By running this code, you can confirm that Google Fonts are loaded at high speed. I believe there is no other library that can retrieve outline information for vector fonts on-the-fly with such high speed and small memory footprint.

4.6. Custom Dataset Construction

TorchFont also provides a generic FontRepo class so that datasets can be built from font repositories other than Google Fonts.

For example, Material Design Icons and Font Awesome are icon providers, but their GitHub repositories actually contain these icons in font format. Therefore, it is possible to build icon datasets using the FontRepo class.

For instance, a dataset for Google's Material Design Icons can be constructed as follows:

from torchfont.datasets import FontRepo

dataset = FontRepo(
    root="data/google/material_design_icons",
    url="https://github.com/google/material-design-icons",
    ref="master",
    patterns=("variablefont/*.ttf",),
    download=True,
)

Additionally, a dataset for Font Awesome icons can be constructed as follows:

from torchfont.datasets import FontRepo

dataset = FontRepo(
    root="data/fortawesome/font-awesome",
    url="https://github.com/FortAwesome/Font-Awesome",
    ref="7.x",
    patterns=("otfs/*.otf",),
    download=True,
)

The power of adhering to the font file data format, rather than easily converting it into something like numpy binary files, is truly demonstrated here.

5. Discussion

5.1. Precautions for Google Fonts

By the way, there are several pitfalls when trying to use all of Google Fonts naively. For example, the Google Fonts repository includes a font called Adobe Blank, which is a font for testing fallback behavior and has empty glyphs assigned to all codepoints. This has been excluded by default.

https://github.com/google/fonts/tree/main/ofl/adobeblank

Another troublesome font is Rubik Pixels. It's best to check the Google Fonts page directly, but Rubik Pixels has extremely complex outlines. Specifically, while the average outline sequence length in Google Fonts is around 85, Rubik Pixels can reach a maximum of 33,000.

While I haven't gone as far as excluding it, in practice, it's better to take measures such as setting an upper limit on sequence length. Otherwise, memory exhaustion will occur during training, stopping the process. In some cases, power spikes can even cause the machine to shut down (speaking from experience).

https://fonts.google.com/specimen/Rubik+Pixels

For statistical information about Google Fonts, please refer to the following repository where I have visualized various aspects:

https://github.com/fjktkm/google-fonts-heatmap

5.2. Supported Formats

A subtle point of attention is the support for variable fonts and font collections (.ttc, .otc). This ensures that basically any font repository can be handled without issues.

Regarding variable fonts, each NamedInstance is treated as a different font face. Since it's a variable font, it's actually possible to further augment the dataset, but I've handled it this way to maintain consistency with static fonts. Since it's possible to use variable font axes as a data augmentation method, I'm considering this as a future task.

5.3. Label Design

Information such as font weight or whether it's italic has not been added to the labels at this time. This is because I haven't yet found a way to handle such information uniformly. For example, the way weight information is handled differs between static and variable fonts.

Furthermore, font files contain various metadata, and how to handle them as labels needs consideration. Making all metadata accessible would require creating a massive number of Python bindings, which could be large enough to be a separate project. I intend to consider this as a future task.

5.4. Adding Models

TorchFont aims to be a library like TorchVision or TorchAudio, and I am considering adding implementations of existing models. However, this field doesn't have established "standard" preprocessing yet, and there are significant differences in preprocessing among researchers. For a while, I plan to proceed while exploring better implementations.

6. Conclusion

In this article, I introduced TorchFont, a domain-specific library for machine learning with vector fonts. By adopting the Rust-based Skrifa, I was able to significantly improve memory usage and parsing speed compared to conventional fontTools-based methods. Additionally, it was implemented based on the PyTorch Dataset to integrate seamlessly with the existing PyTorch ecosystem. While it is still a library in development, I believe it will be a useful tool for researchers and developers interested in vector font generation. Please give it a try.

Source Code Availability

The source code for TorchFont is available in the following repository. Bug reports and feature suggestions are welcome, so please let me know via Issues or Pull Requests.

https://github.com/torchfont/torchfont

Regarding documentation, although it is still a work in progress, it is published at the following URL:

https://torchfont.readthedocs.io/

GitHubで編集を提案

Discussion