iTranslated by AI
Quick Experiments with ONNX: Inferring Python-Trained Models in Other Languages
Introduction
Python is a commonly used programming language in machine learning.
In fact, there are abundant Python libraries for machine learning and a vast amount of expertise available.
Python is a strong candidate for building machine learning systems.
However, when building a system using programming languages other than Python, carving out only the inference part of a trained model into Python is not necessarily the best approach.
By using ONNX (Open Neural Network eXchange), you can adopt a configuration where models are shared across different languages—for example, training in Python and inferring in Node.js or Go.
In this article, we will conduct a simple experiment: converting a model trained in Python to the ONNX format and performing inference in Go or Node.js. The goal is to confirm the basic mechanism of ONNX and the actual behavior when performing inference across different languages.
The target audience for this article is those considering the possibility of inferring machine learning models in a different programming language than the one used for training. If everything is completed within Python alone, the situations where this article is useful will be limited.
Additionally, the following knowledge is assumed:
- Experience in machine learning with Python
- Experience in programming other than Python
What is ONNX (Open Neural Network eXchange)?
Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their projects evolve. ONNX provides an open-source format for AI models, encompassing both deep learning and traditional machine learning.
Furthermore, Microsoft provides ONNX Runtime as an execution engine (runtime/inference library) for loading, optimizing, and executing inference for models in the ONNX format.

References
- ONNX Official Site
- ONNX Official GitHub (onnx/onnx)
- ONNX Runtime Official Site
- ONNX Runtime Official Docs
- ONNX Runtime Official GitHub
Experiment Content
We will output a model trained in PyTorch in the ONNX format, perform inference from several programming languages, and check if the results match those obtained by inferring with Python.
Experiment 1: MNIST Inference
Handwritten image digit classification using the MNIST dataset.
After training with a simple CNN, we export it to ONNX format and perform inference using ONNX Runtime.
Experiment 2: Sentiment Analysis with BERT
We will create a model that determines whether an English sentence is a positive or negative reaction, using data summarizing movie reviews as training data.
Using bert-base-uncased as a pre-trained model, we will fine-tune it on a portion of GLUE (SST-2).
Afterwards, we export the trained model to ONNX format and perform inference using ONNX Runtime.
The target programming languages for inference in this experiment are as follows:
- Node.js
- Go
The programs created for the experiment are available here:
You can experiment on a Docker container using VSCode's Dev Containers.
The experiment has been verified in the following environments:
- MacOS (Intel CPU) + Docker Desktop
- Windows 11
Training in Python and Exporting to ONNX Format
To perform this experiment, we use the following libraries:
Experiment 1: MNIST
Training
Training is performed by entering the development container and executing the following commands:
cd app/py/mnist
python train_pytorch_mnist.py
Executing this script creates the following files:
- /workspace/app/data/mnist_test_normalized.npz
- Saves the test data in npz format
- /workspace/app/data/mnist_cnn.pth
- Model
After training, accuracy is obtained for the test data. We will verify whether the calculation results are the same by comparing this with the accuracy obtained when inferring the test data in another programming language.
Exporting to ONNX Format
Execute the export with the following command:
python export_to_onnx.py
This exports /workspace/app/data/mnist_cnn.onnx from /workspace/app/data/mnist_cnn.pth.
Inference with ONNX Runtime
Verify inference with ONNX Runtime.
python eval_onnx_mnist.py
Evaluation data is extracted from /workspace/app/data/mnist_test_normalized.npz, and inference is performed using /workspace/app/data/mnist_cnn.onnx.
Please compare this result with the test data accuracy output by PyTorch during training.
Experiment 2: BERT
Training
Training is performed by entering the development container and executing the following commands:
cd app/py/bert
python train_bert.py
In this experiment, the training data has been subsampled to accommodate environment memory and execution speed constraints. If you have sufficient resources, or if you are running low on resources, please adjust the following code.
Subsampling training data Increasing this value will increase the amount of training data, but it will also take more time to train.
Training conditions
If you want more accuracy, increase the number of epochs (this will take more time).
If it crashes due to lack of memory, reduce the batch_size.
After training is complete, the following folder will be updated:
- /workspace/app/data/bert-sst2
Exporting to ONNX Format
Execute the export with the following command:
python export_to_onnx.py
Upon successful completion, /workspace/app/data/bert-sst2.onnx will be created.
Inference with ONNX Runtime
Verify inference with ONNX Runtime.
python check_onnx.py
Perform inference on the following texts:
- "This movie is great!", # Positive
- "This movie is terrible.", # Negative
- "I really loved this film.", # Positive-leaning
- "I really hated this film.", # Negative-leaning
- "The plot was boring and slow.", # Negative
The output will be as follows:
text: This movie is great!
logits: [[-0.11655042 0.8542732 ]]
pred_id: 1 -> positive
- These values and results may vary depending on the environment.
- Inference results in other programming languages in the same environment will yield similar values.
- However, errors due to floating-point arithmetic are possible.
Inference with Node.js
The following libraries are used for this experiment:
-
@huggingface/transformers
- Depends on onnxruntime-node
- jszip
- npyjs
Experiment 1: MNIST
Inference is performed after installing the modules with the following steps:
cd app/node
npm install
node eval_mnist.js
Evaluation data is extracted from /workspace/app/data/mnist_test_normalized.npz, and inference is performed using /workspace/app/data/mnist_cnn.onnx.
Since npyjs does not support the npz format directly, the test data is obtained by first extracting it with jszip into npy format.
By comparing the output Test accuracy with the value during training, you can confirm that the inference accuracy is roughly the same.
Experiment 2: BERT
Inference is performed with the following steps:
cd app/node
node eval_bert.js
Inference for positive or negative sentiment is performed on the same text as the Python sample; please compare the results.
Inference with Go
The following libraries are used for this experiment:
- codeberg.org/sbinet/npyio/npz
- github.com/yalue/onnxruntime_go
- github.com/sugarme/tokenizer
- github.com/sugarme/tokenizer/pretrained
Experiment 1: MNIST
Inference is performed with the following steps:
cd app/go
go run eval_mnist.go
Evaluation data is extracted from /workspace/app/data/mnist_test_normalized.npz, and inference is performed using /workspace/app/data/mnist_cnn.onnx.
codeberg.org/sbinet/npyio/npz can directly reference npz files.
By comparing the output Test accuracy with the value during training, you can confirm that the inference accuracy is roughly the same.
Experiment 2: BERT
Inference is performed with the following steps:
cd app/go
go run eval_bert.go
Inference for positive or negative sentiment is performed on the same text as the Python sample; please compare the results.
Since the default settings for the tokenizer might be affected by environmental differences, we have explicitly specified tokenizer.WithPadding(nil) in this experiment to match the tokenizer settings on the Python side.
Summary
In this article, we conducted a simple experiment of converting a model trained in Python to the ONNX format and performing inference in Go and Node.js.
Since we only experimented with simple models this time, no issues occurred. However, as experiments progress in the future, the following problems are expected:
- Can the behavior of Japanese tokenizers be synchronized across languages?
- Will issues arise regarding data types?
- To what extent and how should inference results from other languages be verified against Python?
- Can complex models be handled in the same way?
While there seem to be various challenges, obtaining a method to perform inference in a different programming language is a significant advantage.
Discussion