【High speed inference Method】 OpenVINO Tutorial
1. What is OpenVINO
openvino is a toolkit for machine learning developed by Intel.
It is designed to optimize and accelerate deep learning inference on various Intel architectures, such as CPUs, GPUs, and FPGAs, and it focus to specifically computer vision tasks.
The toolkit supports models from popular frameworks like TnesorFlow and PyTorch, converting them into an intermediate representation that is efficient for deployment on Intel hardware.
2. Effect
This is a 2nd place winner's comment on Kaggle's image competition.
Important: convert pytorch model to openvino model significantly reduce inference time(about 40%). (eca_nfnet_l0 backbone ONXX cannot be converted to openvino because the stdconv layer in timm use train mode of F.batch_norm in forward method). That is the magic of ensembling 7 models.
Quote: 2nd place solution: SED + CNN with 7 models ensemble
That competition has inference condition that only use CPU and have to satisfy under 120 minutes run-time.
openvino is very useful method for reducing inference time.
3. How to use
To use openvino, we have to trans our model to onnx model, and pass some step as prepare.
3.1 Prepare
・Import
!pip install openvino-dev[onnx]
・Create model and save
import torch
import timm
from torch.onnx import export
# Load a pre-trained model from timm
model = timm.create_model('resnet50', pretrained=True)
model.eval()
# Set up dummy input for the model; this should match the model's input size
input_tensor = torch.randn(1, 3, 224, 224) # You san change batchsize ex:1→32 (Also code to input model in below must be changed)
# Export the model
output_onnx = 'model.onnx'
export(model, input_tensor, output_onnx, opset_version=11, input_names=['input'], output_names=['output'])
・Input the model
!mo --input_model /kaggle/working/model.onnx --output_dir /kaggle/working --input_shape [1,3,224,224]
Preparing is done.
3.2 Define OpenVINO Object
Doning a inference.
from openvino.runtime import Core
# Initialize the Inference Engine
ie = Core()
# Read the network and corresponding weights from the IR files
model_path = '/kaggle/working/model.xml' # Path to the .xml file
model = ie.read_model(model=model_path)
# Compile the model for a specific device
compiled_model = ie.compile_model(model=model_path, device_name='CPU')
infer_request = compiled_model.create_infer_request()
# Get input and output layers
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)
# Prepare your input data (e.g., an image processed into a tensor)
input_sample = torch.randn(1, 3, 224, 224)
# retrieves the name of the first input layer and create dict
inputs = {input_layer.any_name: input_sample}
# Perform inference
result = infer_request.infer(inputs=inputs)
# Access the results
output = result[output_layer]
# 1000 classes classification
print(len(output[0]))
print(output)
### output
# 1000
# [[ -8.310932 -7.3757944 -6.03778 ...
- model = ie.read_model(model=model_path)
Reading the model. This allow Only .xml format. - compiled_model = ie.compile_model(model=model_path, device_name='CPU')
Compiling model to device specified. - infer_request = compiled_model.create_infer_request()
Creating request of inference.
3.3 Time Measurement(option)
We can measurement of executing time by this.
times = []
for i in range(loop):
t1=time.time()
infer_request.infer(inputs)
t2=time.time()
times.append(t2-t1)
np.mean(times)
4. Summary
OpenVINO needs some step for using, but it's so useful when we doing inference, so please try using it.
This time is over, thank you for reading.
Reference
(1) 2nd place solution: SED + CNN with 7 models ensemble
(2) openvino is all you need
Discussion