DebertaV2 隠れ層も出力できるonnx変換

optimum-cli export onnxでは、
output_hidden_states=Trueを指定したときに得られる
hidden_states（隠れ層の状態）を得ることができないようす

おそらくOnnxConfigを継承する

準備

pip install optimum[exporters]

手探りでトライ＆エラーを繰り返し、最終的には以下のコードで変換できた

from typing import Dict

from optimum.exporters.onnx import main_export
from optimum.exporters.onnx.model_configs import DebertaV2OnnxConfig
from transformers import AutoConfig


class CustomBertOnnxConfig(DebertaV2OnnxConfig):
    @property
    def outputs(self) -> Dict[str, Dict[int, str]]:
        common_outputs = super().outputs
        # + 1 means initial embedding outputs
        for i in range(self._config.num_hidden_layers + 1):
            common_outputs[f"hidden_states.{i}"] = {
                0: "batch_size",
                1: "sequence_length",
                2: "hidden_size",
            }
        return common_outputs


model_id = "ku-nlp/deberta-v2-large-japanese-char-wwm"
config = AutoConfig.from_pretrained(model_id)

custom_bert_onnx_config = CustomBertOnnxConfig(
    config=config,
    task="fill-mask",
)

custom_onnx_configs = {
    "model": custom_bert_onnx_config,
}

main_export(
    model_id,
    output="tests",
    no_post_process=True,
    model_kwargs={"output_hidden_states": True},
    custom_onnx_configs=custom_onnx_configs,
)

+1は初期の埋め込み層の分

参考

こーのいけ

こんにちは、とても参考になりました
ただ、このスクリプトで隠れ層の状態を出力するっぽいonnxは出来たのですが、ORTModelForMaskedLMで推論してもその出力が得られないようです。
kale_coreさんはどのように推論してますか？
もし良ければ教えていただけると助かります

kale_core

コメントありがとうございます
私はonnxruntimeを直接使って隠れ層出力を取り出しています

結論から申し上げるとORTModelForCausalLMでは
隠れ層出力は取得できないようです

これら推論について、いくつか投稿しましたのでご確認ください
なお、変換してできたtestsフォルダ（main_export outputで指定）を
deberta-v2-large-japanese-char-wwm_onnxに
リネームしていることにご注意ください

こーのいけ

なるほど、やっぱりそうなるのですね
推論についての投稿もありがとうございます！

kale_core

推論

まずPyTorchのモデルから確認

from transformers import AutoModelForMaskedLM, AutoTokenizer

model_path = "ku-nlp/deberta-v2-large-japanese-char-wwm"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForMaskedLM.from_pretrained(model_path)
encoding = tokenizer("こんにちは。はじめまして。", return_tensors="pt")
results = model(**encoding, output_hidden_states=True)
print("logits")
print(results["logits"])
print("hidden_states[0]")
print(results["hidden_states"][0])

すると以下のようになる

logits
tensor([[[-6.1631,  2.4620, -0.7342,  ..., -6.0182, -5.2640, -5.0430],
         [-5.5083, -0.6581, -0.2350,  ..., -6.1456, -4.5377, -4.0076],
         [-5.9405, -0.3195, -0.6136,  ..., -4.9960, -4.1468, -4.7519],
         ...,
         [-4.8862, -0.8497, -0.4739,  ..., -4.6061, -3.4065, -4.8753],
         [-6.6316,  2.0073,  1.1674,  ..., -7.5686, -5.1659, -4.9622],
         [-6.9476, -1.4373,  0.5368,  ..., -6.4873, -7.9941, -5.5731]]],
       grad_fn=<ViewBackward0>)
hidden_states[0]
tensor([[[-0.4352,  0.2227,  0.9623,  ...,  0.0283,  0.1983, -0.1628],
         [-0.6350,  0.6172, -0.7587,  ..., -0.0492, -0.0352, -0.1290],
         [ 0.0429, -0.3316, -1.3151,  ...,  0.0131,  1.8009,  0.6606],
         ...,
         [ 0.9631, -0.2214, -0.6423,  ...,  0.1970, -0.2580,  0.0278],
         [ 0.1922, -0.1708, -0.4008,  ..., -2.6732,  1.0996,  0.1890],
         [ 0.1593, -0.0912,  0.6603,  ...,  0.3742,  0.0360, -0.7715]]],
       grad_fn=<MulBackward0>)

results["hidden_states"]は隠れ層出力のタプルであり、そのサイズはnum_hidden_layers + 1 の25

kale_core

import onnxruntime
from transformers import AutoModelForMaskedLM, AutoTokenizer

onnx_model_path = r"deberta-v2-large-japanese-char-wwm_onnx\model.onnx"
model_path = "deberta-v2-large-japanese-char-wwm_onnx"
session = onnxruntime.InferenceSession(onnx_model_path)

tokenizer = AutoTokenizer.from_pretrained(model_path)
inputs = tokenizer("こんにちは。はじめまして。")
need_inputs = [input.name for input in session.get_inputs()]
inputs = {k: [v] for k, v in inputs.items() if k in need_inputs}
outputs = session.run(None, inputs)
print("outputs[0] -> logits")
print(outputs[0])
print("outputs[1] -> hidden_states[0]")
print(outputs[1])

outputs[0] -> logits
[[[-6.163116    2.4620116  -0.7342076  ... -6.018163   -5.2639766
   -5.043047  ]
  [-5.5083246  -0.65807194 -0.23497856 ... -6.1455894  -4.537685
   -4.0075994 ]
  [-5.940491   -0.3195123  -0.6136059  ... -4.995975   -4.1468434
   -4.7518635 ]
  ...
  [-4.886198   -0.84970987 -0.47391993 ... -4.6060786  -3.406508
   -4.8753047 ]
  [-6.6315813   2.0073345   1.1673608  ... -7.5686183  -5.1658564
   -4.962229  ]
  [-6.9476285  -1.4372597   0.53682977 ... -6.4873004  -7.9941
   -5.573085  ]]]
outputs[1] -> hidden_states[0]
[[[-0.43523338  0.22273856  0.962349   ...  0.02833249  0.19834822
   -0.1628447 ]
  [-0.63504225  0.6172022  -0.75865257 ... -0.04918524 -0.03523634
   -0.12896116]
  [ 0.04291684 -0.3316062  -1.3150733  ...  0.01311856  1.8008931
    0.66057867]
  ...
  [ 0.96311545 -0.22138044 -0.6423132  ...  0.19698547 -0.2579516
    0.02778699]
  [ 0.19215304 -0.17084736 -0.40084833 ... -2.6731858   1.099563
    0.18895522]
  [ 0.15932083 -0.0912299   0.66033065 ...  0.37421894  0.03601994
   -0.7715362 ]]]

outputsはリストであり、そのサイズは1(logits) + 25(隠れ層)
すなわち隠れ層出力の最初はインデックス1

kale_core

ORTModelForCausalLM

以下は隠れ層を取得できるだろうか？

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM

model_path = "deberta-v2-large-japanese-char-wwm_onnx"
model = ORTModelForCausalLM.from_pretrained(
    model_path,
    use_io_binding=False,
    use_cache=False,
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
encoding = tokenizer("こんにちは。はじめまして。", return_tensors="pt")
results = model(**encoding, output_hidden_states=True)
print(results)

結果は以下の通り、logitsのみ

CausalLMOutputWithPast(loss=None, logits=tensor([[[-6.1631,  2.4620, -0.7342,  ..., -6.0182, -5.2640, -5.0430],
         [-5.5083, -0.6581, -0.2350,  ..., -6.1456, -4.5377, -4.0076],
         [-5.9405, -0.3195, -0.6136,  ..., -4.9960, -4.1468, -4.7519],
         ...,
         [-4.8862, -0.8497, -0.4739,  ..., -4.6061, -3.4065, -4.8753],
         [-6.6316,  2.0073,  1.1674,  ..., -7.5686, -5.1659, -4.9622],
         [-6.9476, -1.4373,  0.5368,  ..., -6.4873, -7.9941, -5.5731]]]), past_key_values=None, hidden_states=None, attentions=None)

ORTModelForCausalLMのforwardメソッドが実行される
onnxruntimeを使ったラッパーのようになっている

オプション引数にoutput_hidden_statesを渡しても推論結果から
取り出しているコードはないので
先の投稿のように自分で取り出す必要あり