⏱️

PMML evaluaterのベンチマーク

2022/11/16に公開

Java

機械学習

pmml

tech

はじめに

こちらの記事を参考に、Apache Beam Java/Google Cloud Dataflowで機械学習の推論を行う際には、PMMLで定義された学習モデルを使用している（簡単な例: https://github.com/kn1kn1/pmml-r-beam-example Rで線形回帰で学習したモデルをBeamで実行している）が、本稿ではApache Beam Javaで利用可能なJavaのPMML evaluatorのライブラリ3つについてベンチマークを実行してパフォーマンスを計測した。

PMML evaluatorライブラリ

以下の3つを使用した。

ライブラリ名	URL	ライセンス	主な開発元
JPMML	https://github.com/jpmml/jpmml-evaluator	AGPL3	Openscoring
KIE	https://mvnrepository.com/artifact/org.kie/kie-pmml-dependencies	Apache License 2.0	RedHat
PMML4S	https://github.com/autodeployai/pmml4s	Apache License 2.0	AutoDeploy.AI

ベンチマーク

KIEのベンチマーク https://github.com/kiegroup/kie-benchmarks として公開されているものがあったので、これをフォークし、JPMMLとPMML4Sのベンチマークを追加した。KIEのベンチマークで、Java Microbenchmark Harness (JMH) https://github.com/openjdk/jmh を使用していたので同様にJMHを利用する形にした。全てのコードは、https://github.com/kn1kn1/pmml-benchmarks に格納している。

compilation

PMMLファイルから各ライブラリで実行可能な形式にコンパイルされるのに掛かる時間(単位: msec)を計測した。ベンチマークのiterationは、warmupを300回、measurementを50回、forkを5回としているが、KIEのRandomForestのみ時間が掛かるためwarmupを2回、measurementを3回に変更している。

JPMML, PMML4Sはあまり変わりないが、KIEのみ突出して時間が掛かる結果になった。

Benchmark	jpmml	kie	pmml4s
LinearRegressionSampleWithTransformations	1.15	69.18	0.53
RandomForest	95.36	84,559.45	197.58
SampleMineTreeModelWithTransformations	1.13	65.77	0.49
SimpleScorecardWithTransformations	0.96	62.44	0.45
SingleIrisKMeansClustering	1.05	58.02	0.45

inference

続いて、1件のレコードの推論に掛かる時間(単位: msec)を計測した。ベンチマークのiterationは、warmupを300回、measurementを50回、forkを5回である。

JPMMLで時間が短く、PMML4Sがそれに続く形になった。

Benchmark	jpmml	kie	pmml4s
LinearRegressionSampleWithTransformations	0.05	0.14	0.10
RandomForest	0.03	0.13	0.04
SampleMineTreeModelWithTransformations	0.03	0.12	0.08
SimpleScorecardWithTransformations	0.05	0.12	0.11
SingleIrisKMeansClustering	0.03	0.09	0.08

考察

全体としてはJPMMLの性能が良く、PMML4Sがそれに続く形になった。

PMML4SはJavaではなくScalaによる実装であるが、その点がJPMMLより不利だった理由かもしれない。

KIEについては、元々Red Hat Decision Managerが実装していたPMMLの機能を、Droolsとして別途実装して公開したもののようであるが、PMMLをdrools-modelに一度変換してから実行しているよう（cf. https://blog.kie.org/2020/02/pmml-revisited.html ）で、そのあたりの事情によりコンパイルも推論もパフォーマンスが出ていない可能性がある。

reference

Productizing ML Models with Dataflow | by Ben Weber
- https://towardsdatascience.com/productizing-ml-models-with-dataflow-99a224ce9f19
PMML
- https://dmg.org/pmml/v4-4-1/GeneralStructure.html
JPMML
- https://github.com/jpmml/jpmml-evaluator
PMML4S
- https://github.com/autodeployai/pmml4s
KIE
- https://www.kie.org/about/
Designing a decision service using PMML models
- https://access.redhat.com/documentation/en-us/red_hat_decision_manager/7.1/html/designing_a_decision_service_using_pmml_models/index
PMML revisited
- https://blog.kie.org/2020/02/pmml-revisited.html
JMH
- https://github.com/openjdk/jmh

GitHubで編集を提案