TFLite_Detection_PostProcessの標準OPによる実装試行錯誤

python3 ${INTEL_OPENVINO_DIR}/deployment_tools/model_optimizer/mo.py \
 --input_model onnx/ssdlite_mobiledet_edgetpu_320x320_coco.onnx \
 --output TFLite_Detection_PostProcess_NonMaxSuppression__663 \
 --output_dir openvino/ssdlite_mobiledet_edgetpu_320x320_coco/FP32 \
 --data_type FP32

Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/home/b920405/work/openvino2tensorflow/onnx/ssdlite_mobiledet_edgetpu_320x320_coco.onnx
	- Path for generated IR: 	/home/b920405/work/openvino2tensorflow/openvino/ssdlite_mobiledet_edgetpu_320x320_coco/FP32
	- IR output name: 	ssdlite_mobiledet_edgetpu_320x320_coco
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	Not specified, inherited from the model
	- Output layers: 	TFLite_Detection_PostProcess_NonMaxSuppression__663
	- Input shapes: 	Not specified, inherited from the model
	- Mean values: 	Not specified
	- Scale values: 	Not specified
	- Scale factor: 	Not specified
	- Precision of IR: 	FP32
	- Enable fusing: 	True
	- Enable grouped convolutions fusing: 	True
	- Move mean values to preprocess section: 	None
	- Reverse input channels: 	False
ONNX specific parameters:
Model Optimizer version: 	2021.2.0-1877-176bdf51370-releases/2021/2
2021-03-21 15:20:32.987261: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: /home/b920405/work/openvino2tensorflow/openvino/ssdlite_mobiledet_edgetpu_320x320_coco/FP32/ssdlite_mobiledet_edgetpu_320x320_coco.xml
[ SUCCESS ] BIN file: /home/b920405/work/openvino2tensorflow/openvino/ssdlite_mobiledet_edgetpu_320x320_coco/FP32/ssdlite_mobiledet_edgetpu_320x320_coco.bin
[ SUCCESS ] Total execution time: 10.22 seconds. 
[ SUCCESS ] Memory consumed: 402 MB.

PINTO

NonMaxSuppression

input_1
  = boxes
  = float32[1,2034,4]
  = [num_batches, spatial_dimension, 4]
input_2
  = scores
  = float32[1,90,2034]
  = [num_batches, num_classes, spatial_dimension]

center_point_box
座標の表現形式フラグ
絶対座標 または 0-1の範囲の正規化値
  0: [y1, x1, y2, x2]形式 TensorFlow
    -> OpenVINO box_encoding = 'corner'
  1: [x_center, y_center, width, height]形式 PyTorch
    -> OpenVINO box_encoding = 'center'

output = int64[9000,3]
9000 = 90 (num_classes) x 100 (max_detections)
num_selected_indices = 9000

onnx

selected indices from the boxes tensor.
[num_selected_indices, 3]
the selected index format is [batch_index, class_index, box_index]

[0, 0] = batch_index = 0
[0, 1] = class_index = 0
[0, 2] = box_index = 0

[1, 0] = batch_index = 0
[1, 1] = class_index = 0
[1, 2] = box_index = 1

[2, 0] = batch_index = 0
[2, 1] = class_index = 0
[2, 2] = box_index = 2
　：
[99, 0] = batch_index = 0
[99, 1] = class_index = 0
[99, 2] = box_index = 99
=================================
[100, 0] = batch_index = 0
[100, 1] = class_index = 1
[100, 2] = box_index = 0

[101, 0] = batch_index = 0
[101, 1] = class_index = 1
[101, 2] = box_index = 1
　：

PINTO

TFLite_Detection_PostProcess   -> bounding_boxes float32 [1,100,4]
TFLite_Detection_PostProcess:1 -> class_labels float32 [1,100]
TFLite_Detection_PostProcess:2 -> class_confidences float32 [1,100]
TFLite_Detection_PostProcess:3 -> num_of_boxes float32 [1]

PINTO

box_centersize, scale_values, anchor, box.ymin, box.xmin, box.ymax, box.xmax の関係

float ycenter = static_cast<float>(static_cast<double>(box_centersize.y) /
                                        static_cast<double>(scale_values.y) *
                                        static_cast<double>(anchor.h) +
                                    static_cast<double>(anchor.y));

float xcenter = static_cast<float>(static_cast<double>(box_centersize.x) /
                                        static_cast<double>(scale_values.x) *
                                        static_cast<double>(anchor.w) +
                                    static_cast<double>(anchor.x));

float half_h =
    static_cast<float>(0.5 *
                        (std::exp(static_cast<double>(box_centersize.h) /
                                    static_cast<double>(scale_values.h))) *
                        static_cast<double>(anchor.h));
float half_w =
    static_cast<float>(0.5 *
                        (std::exp(static_cast<double>(box_centersize.w) /
                                    static_cast<double>(scale_values.w))) *
                        static_cast<double>(anchor.w));

TfLiteTensor* decoded_boxes =
    &context->tensors[op_data->decoded_boxes_index];
TF_LITE_ENSURE_EQ(context, decoded_boxes->type, kTfLiteFloat32);
auto& box = ReInterpretTensor<BoxCornerEncoding*>(decoded_boxes)[idx];
box.ymin = ycenter - half_h;
box.xmin = xcenter - half_w;
box.ymax = ycenter + half_h;
box.xmax = xcenter + half_w;

PINTO

y_scale = 10
x_scale = 10
h_scale = 5
w_scale = 5

PINTO

Netron のカスタムオペレーションの属性読み取りロジック

PINTO

Netron のカスタムオペレーションの属性読み取りロジック (Flatbuffer Reader)

PINTO

浮動小数点数内部表現シミュレーター

PINTO

tfliteパース処理の入り口　tfonnx.py#L463-L495

PINTO

tfliteデコード処理の入り口　tflite_utils.py#L119-L141

PINTO

tensorflow-onnx のユニットテストの入り口

PINTO

実装完了