ディープラーニングモデル量子化やフレームワーク間コンバージョンのアレコレ

[Tensorflow Lite] Various Neural Network Model quantization methods for Tensorflow Lite (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization, EdgeTPU). As of May 05, 2020.

PINTO

[English] Converting PyTorch, ONNX, Caffe, and OpenVINO (NCHW) models to Tensorflow / TensorflowLite (NHWC) in a snap

PINTO

ファイルを投稿することはできないのな。

PINTO

Flex OP を含むモデルを EdgeTPUモデルへ変換しようとすると Interpreter が対応していない、と怒られるんですよね。回避する方法を一日中考えましたが、全く分かりません。そもそも回避策はあるのかな？
flex2edgetpu_error

PINTO

nanodet_416x416_full_integer_quant.tflite

PINTO

量子化を手計算してみた結果。どう頑張ってもMin/Maxが合わないんですけど。。。

(x - zero_point) / (255 / (max - min))
(x - zero_point) * (1 / (255 / (max - min)))
(x - zero_point) * (1 / (255 / (max - min)))

min = -3.265998125076294
max = 2.2779781818389893
zero_point = 151
coefficient = 0.02182667888700962

coefficient of self-calculation = 0.021741084

(0 - 151) / (255 / (2.277978 - (-3.265998))) = −3.282903435
(151 - 151) / (255 / (2.277978 - (-3.265998))) = 0
(255 - 151) / (255 / (2.277978 - (-3.265998))) = 2.261072565

(0 - 151) * 0.02182667888700962 = −3.295828512
(151 - 151) * 0.02182667888700962 = 0
(255 - 151) * 0.02182667888700962 = 2.269974604

(0 - 151) * 0.021741084 = −3.282903684
(151 - 151) * 0.021741084 = 0
(255 - 151) * 0.021741084 = 2.261072736

PINTO

git clone https://github.com/openvinotoolkit/training_extensions.git

cd training_extensions/pytorch_toolkit/instance_segmentation/model_templates/coco-instance-segmentation/instance-segmentation-0904

wget https://download.01.org/opencv/openvino_training_extensions/models/instance_segmentation/v2/instance-segmentation-0904-0912.pth

pip install torch

python ../../../../ote/tools/export.py \
  --load-weights instance-segmentation-0904-0912.pth \
  --save-model-to export \
  --onnx \
  --openvino

PINTO

cd training_extensions/pytorch_toolkit

export PYTHONPATH=$PYTHONPATH:${PWD}:${PWD}/ote

wget https://download.01.org/opencv/openvino_training_extensions/models/instance_segmentation/v2/instance-segmentation-0904-0912.pth
cp instance_segmentation/model_templates/coco-instance-segmentation/instance-segmentation-0904/model.py .
cp instance_segmentation/model_templates/coco-instance-segmentation/instance-segmentation-0904/modules.yaml .
cp instance_segmentation/model_templates/coco-instance-segmentation/instance-segmentation-0904/template.yaml .


python3 ote/tools/export.py \
  --load-weights instance_segmentation/model_templates/coco-instance-segmentation/instance-segmentation-0904/instance-segmentation-0904-0912.pth \
  --save-model-to instance_segmentation/model_templates/coco-instance-segmentation/instance-segmentation-0904/export \
  --onnx

PINTO

https://note.com/npaka/n/nd144f30c8f5b

PINTO

$ dpkg -l | grep TensorRT
ii  graphsurgeon-tf                                             7.2.2-1+cuda11.0                                    amd64        GraphSurgeon for TensorRT package
ii  libnvinfer-bin                                              7.2.2-1+cuda11.0                                    amd64        TensorRT binaries
ii  libnvinfer-dev                                              7.2.2-1+cuda11.0                                    amd64        TensorRT development libraries and headers
ii  libnvinfer-doc                                              7.2.2-1+cuda11.0                                    all          TensorRT documentation
ii  libnvinfer-plugin-dev                                       7.2.2-1+cuda11.0                                    amd64        TensorRT plugin libraries
ii  libnvinfer-plugin7                                          7.2.2-1+cuda11.0                                    amd64        TensorRT plugin libraries
ii  libnvinfer-samples                                          7.2.2-1+cuda11.0                                    all          TensorRT samples
ii  libnvinfer7                                                 7.2.2-1+cuda11.0                                    amd64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                                        7.2.2-1+cuda11.0                                    amd64        TensorRT ONNX libraries
ii  libnvonnxparsers7                                           7.2.2-1+cuda11.0                                    amd64        TensorRT ONNX libraries
ii  libnvparsers-dev                                            7.2.2-1+cuda11.0                                    amd64        TensorRT parsers libraries
ii  libnvparsers7                                               7.2.2-1+cuda11.0                                    amd64        TensorRT parsers libraries
ii  python3-libnvinfer                                          7.2.2-1+cuda11.0                                    amd64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                                      7.2.2-1+cuda11.0                                    amd64        Python 3 development package for TensorRT
ii  tensorrt                                                    7.2.2.3-1+cuda11.0                                  amd64        Meta package of TensorRT
ii  uff-converter-tf                                            7.2.2-1+cuda11.0                                    amd64        UFF converter for TensorRT package

$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

$ cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#endif /* CUDNN_VERSION_H */

PINTO

$ docker build -t pinto0309/tflite2tensorflow:latest .
$ docker run --gpus all -it --rm -v `pwd`:/workspace/resources -v ${HOME}/TFDS:/workspace/resources/TFDS pinto0309/tflite2tensorflow:latest

PINTO

TensorRT DEB ダウンロードパス

PINTO

docker build -t pinto0309/tflite2tensorflow:latest .

docker run --gpus all -it --rm \
    -v `pwd`:/workspace/resources \
    -e LOCAL_UID=$(id -u $USER) \
    -e LOCAL_GID=$(id -g $USER) \
    pinto0309/tflite2tensorflow:latest bash

docker run --gpus all -it --rm \
    -v `pwd`:/workspace/resources \
    -v ${HOME}/TFDS:/workspace/resources/TFDS \
    -e LOCAL_UID=$(id -u $USER) \
    -e LOCAL_GID=$(id -g $USER) \
    pinto0309/tflite2tensorflow:latest bash

PINTO

PyTorch で DataParallel が使用されているモデルのONNXエクスポート用のトリック

PINTO

双方向性 LSTM - UnidirectionalSequenceLSTM の実装

PINTO

双方向性 LSTM - UnidirectionalSequenceLSTM のKeras実装サンプル

PINTO

adaptive_max_pool2d -> max_pool2d

PINTO

import torch
import torch.nn.functional as F
import torch.nn as nn

x = torch.randn(1, 3, 512, 512)

F.max_pool2d(x, kernel_size=x.size()[2:])
tensor([[[[4.5801]],

         [[4.4516]],

         [[4.4280]]]])

F.adaptive_max_pool2d(x, (1, 1))
tensor([[[[4.5801]],

         [[4.4516]],

         [[4.4280]]]])

nn.AdaptiveMaxPool2d(1)(x)
tensor([[[[4.9739]],

         [[4.5847]],

         [[4.5794]]]])


F.max_pool2d(x, kernel_size=(170, 170))
tensor([[[[3.8038, 4.9739, 4.0777],
          [4.5678, 4.1971, 4.0989],
          [3.9696, 3.9876, 3.8003]],

         [[3.9727, 4.0108, 4.5847],
          [4.0080, 4.2751, 3.9599],
          [4.0119, 3.9289, 4.2282]],

         [[4.1449, 4.1064, 3.7096],
          [4.0888, 3.6905, 4.5794],
          [4.0994, 4.1094, 4.4437]]]])

F.adaptive_max_pool2d(x, (3, 3))
tensor([[[[3.8038, 4.9739, 4.0777],
          [4.5678, 4.1971, 4.0989],
          [3.9696, 3.9876, 3.8003]],

         [[3.9727, 4.0108, 4.5847],
          [4.0080, 4.2751, 3.9599],
          [4.0119, 3.9289, 4.2282]],

         [[4.1449, 4.1064, 3.7096],
          [4.0888, 3.6905, 4.5794],
          [4.0994, 4.1094, 4.4437]]]])

nn.AdaptiveMaxPool2d(3)(x)
tensor([[[[3.8038, 4.9739, 4.0777],
          [4.5678, 4.1971, 4.0989],
          [3.9696, 3.9876, 3.8003]],

         [[3.9727, 4.0108, 4.5847],
          [4.0080, 4.2751, 3.9599],
          [4.0119, 3.9289, 4.2282]],

         [[4.1449, 4.1064, 3.7096],
          [4.0888, 3.6905, 4.5794],
          [4.0994, 4.1094, 4.4437]]]])

nn.MaxPool2d(kernel_size=(170, 170))(x)
tensor([[[[3.8038, 4.9739, 4.0777],
          [4.5678, 4.1971, 4.0989],
          [3.9696, 3.9876, 3.8003]],

         [[3.9727, 4.0108, 4.5847],
          [4.0080, 4.2751, 3.9599],
          [4.0119, 3.9289, 4.2282]],

         [[4.1449, 4.1064, 3.7096],
          [4.0888, 3.6905, 4.5794],
          [4.0994, 4.1094, 4.4437]]]])


>>> F.max_pool2d(x, kernel_size=(170, 170)).shape
torch.Size([1, 3, 3, 3])
>>> F.adaptive_max_pool2d(x, (3, 3)).shape
torch.Size([1, 3, 3, 3])
>>> nn.AdaptiveMaxPool2d(3)(x).shape
torch.Size([1, 3, 3, 3])
nn.MaxPool2d(kernel_size=(170, 170))(x).shape
torch.Size([1, 3, 3, 3])

>>> x = torch.randn(1, 3, 1024, 1024)
>>> nn.MaxPool2d(kernel_size=(341, 341))(x).shape
torch.Size([1, 3, 3, 3])
>>> nn.MaxPool2d(kernel_size=(341, 341))(x)
tensor([[[[4.3699, 4.5813, 4.2206],
          [4.0879, 4.7416, 4.1141],
          [4.4719, 3.8822, 4.4646]],

         [[4.1552, 4.3862, 4.0084],
          [4.4847, 4.2279, 4.1345],
          [4.1907, 4.5296, 4.6171]],

         [[4.6567, 4.3410, 4.2800],
          [4.7007, 4.0725, 4.6199],
          [4.1732, 4.5214, 4.2547]]]])
>>> nn.AdaptiveMaxPool2d(3)(x)
tensor([[[[4.3699, 4.5813, 4.2206],
          [4.2494, 4.7416, 4.1141],
          [4.4719, 3.8822, 4.4646]],

         [[4.1552, 4.3862, 4.0084],
          [4.4847, 4.2279, 4.1345],
          [4.1907, 4.5296, 4.6171]],

         [[4.6567, 4.3410, 4.2800],
          [4.7007, 4.0725, 4.6199],
          [4.1732, 4.5214, 4.2547]]]])

PINTO

        if self.adaptive_diated:
            self.adaptive_softmax = nn.Softmax(dim=3)

            self.adaptive_layers = nn.Sequential(
                nn.AdaptiveMaxPool2d(3),
                nn.Conv2d(512, 512 * 3, 3, padding=0),
            )
            self.adaptive_bn = nn.BatchNorm2d(512)
            self.adaptive_relu = nn.ReLU(inplace=True)

            self.adaptive_layers1 = nn.Sequential(
                nn.AdaptiveMaxPool2d(3),
                nn.Conv2d(1024, 1024 * 3, 3, padding=0),
            )
            self.adaptive_bn1 = nn.BatchNorm2d(1024)
            self.adaptive_relu1 = nn.ReLU(inplace=True)

PINTO

ONNX to JSON

PINTO

ONNX to JSON, JSON to ONNX

import onnx
import json
from google.protobuf.json_format import MessageToJson
from google.protobuf.json_format import Parse
import os

# Convert onnx model to JSON
model_path = "ssdlite_mobiledet_edgetpu_320x320_coco.onnx"
filename, ext = os.path.splitext(model_path)

onnx_model = onnx.load(model_path)
s = MessageToJson(onnx_model)
onnx_json = json.loads(s)

with open(f'{filename}_onnx_to_json.json', 'w') as f:
    json.dump(onnx_json, f, indent=2)

# Convert JSON to String
onnx_str = json.dumps(onnx_json)

# Convert String to onnx model
convert_model = Parse(onnx_str, onnx.ModelProto())

# Check
print(convert_model == onnx_model)