Open8

modular MAXを試す

lilacs2039lilacs2039

MAXとは

https://github.com/modularml/max
Modular Accelerated Xecution ( MAX ) プラットフォームは、一般的に断片化されている AI 導入ワークフローを統合する、AI ライブラリ、ツール、テクノロジーの統合スイートです。

What is MAX
https://docs.modular.com/max
MAX Engine はまさに MAX プラットフォームを駆動する「エンジン」であり、既存の AI モデルを幅広いハードウェア上で驚くべき速度で実行します。そのエンジン内で、Mojo は、エンジンのパフォーマンス、プログラム可能性、移植性を実現するコア テクノロジです。
(MojoとMAX)
Mojo を使用する必要はありません。PyTorch、TensorFlow、ONNX から既存のモデルを取得し、Python および C の API ライブラリを使用して MAX エンジンで実行できます。ただし、Mojo と MAX エンジンを併用すると、スーパーパワーが得られます。MAX エンジンで実行するためにモデルを拡張および最適化できるのは Mojo だけです。
(商用版)
現在 MAX Developer Edition で利用可能であり、MAX エンジンを評価し、MAX を使用して AI アプリケーションの開発を開始できるようになります。間もなく、MAX Enterprise Edition をリリースする予定です。これには、実稼働環境に導入できる商用ライセンス版の MAX が追加されます。

関連

結果  2024年3月2日

以下のコマンドでMax実行環境が手に入るが、サンプルプログラムは動かなかった。

docker run -it --rm --net=host \
  -v $MODEL_REPOSITORY:/models \
  --gpus all \
  public.ecr.aws/modular/max-serving-de \
  bash
lilacs2039lilacs2039

チュートリアル

Get started with MAX Engine
https://docs.modular.com/engine/get-started#1-install-the-max-sdk

私の環境

  • Windows 11 Pro / WSL2 / Ubuntu 22.04

インストール済みのmodularをアップデート

modular CLIは絶賛開発中。頻繁にアップデートされてるので。

Modular CLI changelog
https://docs.modular.com/cli/changelog
v0.4.1 (2024-01-25)
...

認証

$ modular auth
# Please visit this URL in your browser: https://developer.modular.com/device?userCode=GPNX-JQRF
# Waiting for confirmation...

MAXをインストール

成功ぽい

$ modular install max
# Found release for https://packages.modular.com/max @ 24.1.0
# Downloading archive: packages/24.1.0/max-x86_64-unknown-linux-gnu-24.1.0.tar.gz
Downloading  [ ███████████████████████████████████████████████ ] 100%   1.37GiB/1.37GiB @  9.74MiB/s
# Extracting downloaded archives.
# Extraction complete, setting configs...
# Configs complete, running post-install hooks...
Collecting find_libpython==0.3.0
  Using cached find_libpython-0.3.0-py3-none-any.whl (8.5 kB)
Collecting jupyter_client>=8.3.0
  Downloading jupyter_client-8.6.0-py3-none-any.whl (105 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 105.9/105.9 KB 4.5 MB/s eta 0:00:00

...

Installing collected packages: wcwidth, pure-eval, ptyprocess, find_libpython, traitlets, tornado, six, pyzmq, pygments, psutil, prompt-toolkit, platformdirs, pexpect, parso, packaging, nest-asyncio, executing, exceptiongroup, decorator, debugpy, python-dateutil, matplotlib-inline, jupyter-core, jedi, comm, asttokens, stack-data, jupyter_client, ipython, ipykernel
Successfully installed asttokens-2.4.1 comm-0.2.1 debugpy-1.8.1 decorator-5.1.1 exceptiongroup-1.2.0 executing-2.0.1 find_libpython-0.3.0 ipykernel-6.29.3 ipython-8.22.1 jedi-0.19.1 jupyter-core-5.7.1 jupyter_client-8.6.0 matplotlib-inline-0.1.6 nest-asyncio-1.6.0 packaging-23.2 parso-0.8.3 pexpect-4.9.0 platformdirs-4.2.0 prompt-toolkit-3.0.43 psutil-5.9.8 ptyprocess-0.7.0 pure-eval-0.2.2 pygments-2.17.2 python-dateutil-2.9.0.post0 pyzmq-25.1.2 six-1.16.0 stack-data-0.6.3 tornado-6.4 traitlets-5.14.1 wcwidth-0.2.13
==== CONFLICTING PACKAGES DETECTED ====

The MAX package is a superset of the Mojo package.

Having both standalone Mojo package and the MAX package installed
at the same time can cause version confusion.

We recommend uninstalling the standalone Mojo using `modular uninstall mojo`.  You can still use Mojo from the MAX package..

                                                                 █
                                                             █  █
                                                       █    ██
                                                       █   ███   █
                                                       ██  ████  █
██      ███████   ████████       █████   ▐███████▌   ████████████
██      ▐█████▌   ███████▌   █   ▐█████▌   █████   ▐██████████████
██   ▌   █████   ▐   ████   ▐█▌   ███████   ▐█▌   █████████████████
██   █   ▐███▌   █   ███▌   ███   ▐███████▌     ▐██████████████████
██   █▌   ███   ▐█   ███   ▐███▌   ████████▌   ▐███████████████████
██   ██   ▐█▌   ██   ██▌           ▐██████▌     ▐██████████████████
██   ██▌   █   ▐██   ██             █████   ▐█▌   █████████████████
██   ███       ███   █▌   ███████   ▐██▌   █████   ▐██████████████
██   ███▌     ▐███   █   ▐███████▌   █   ▐███████▌   ████████████


MAX is now installed! Almost...

## FINISH THE INSTALL

Install the MAX Python package and set environment variables as per:
https://docs.modular.com/engine/get-started/

## NEXT STEPS

Once done, you can now access the 'max' and 'mojo' CLI tools.
Enter 'max --help' or 'mojo --help'.

For MAX Docs, see https://docs.modular.com.
For MAX Code Examples, see https://github.com/modularml/max.

チュートリアルをインストール

$ git clone https://github.com/modularml/max.git
Cloning into 'max'...
remote: Enumerating objects: 934, done.
remote: Counting objects: 100% (358/358), done.
remote: Compressing objects: 100% (183/183), done.
remote: Total 934 (delta 264), reused 192 (delta 172), pack-reused 576
Receiving objects: 100% (934/934), 965.29 KiB | 14.41 MiB/s, done.
Resolving deltas: 100% (574/574), done.
$ cd max/examples/inference/roberta-python-tensorflow
$ python3 -m pip install -r requirements.txt

Defaulting to user installation because normal site-packages is not writeable
Collecting setuptools>=69.1.0 (from -r requirements.txt (line 1))
  Downloading setuptools-69.1.1-py3-none-any.whl (819 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 819.3/819.3 kB 16.6 MB/s eta 0:00:00
...
Installing collected packages: libclang, flatbuffers, brotli, wrapt, urllib3, termcolor, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, setuptools, safetensors, rsa, regex, python-rapidjson, opt-einsum, ml-dtypes, MarkupSafe, markdown, keras, h5py, grpcio, greenlet, google-pasta, gast, fsspec, filelock, cachetools, absl-py, zope.event, werkzeug, tritonclient, requests, google-auth, requests-oauthlib, huggingface-hub, gevent, tokenizers, google-auth-oauthlib, geventhttpclient, transformers, tensorboard, tensorflow
  Attempting uninstall: flatbuffers
    Found existing installation: flatbuffers 23.1.21
    Uninstalling flatbuffers-23.1.21:
      Successfully uninstalled flatbuffers-23.1.21
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2023.3.0
    Uninstalling fsspec-2023.3.0:
      Successfully uninstalled fsspec-2023.3.0
  Attempting uninstall: requests
    Found existing installation: requests 2.28.2
    Uninstalling requests-2.28.2:
      Successfully uninstalled requests-2.28.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
selenium 4.8.0 requires urllib3[socks]~=1.26, but you have urllib3 2.2.1 which is incompatible.
Successfully installed MarkupSafe-2.1.5 absl-py-2.1.0 brotli-1.1.0 cachetools-5.3.3 filelock-3.13.1 flatbuffers-23.5.26 fsspec-2024.2.0 gast-0.5.4 gevent-24.2.1 geventhttpclient-2.0.2 google-auth-2.28.1 google-auth-oauthlib-1.2.0 google-pasta-0.2.0 greenlet-3.0.3 grpcio-1.62.0 h5py-3.10.0 huggingface-hub-0.21.3 keras-2.15.0 libclang-16.0.6 markdown-3.5.2 ml-dtypes-0.2.0 opt-einsum-3.3.0 python-rapidjson-1.16 regex-2023.12.25 requests-2.31.0 requests-oauthlib-1.3.1 rsa-4.9 safetensors-0.4.2 setuptools-69.1.1 tensorboard-2.15.2 tensorboard-data-server-0.7.2 tensorflow-2.15.0.post1 tensorflow-estimator-2.15.0 tensorflow-io-gcs-filesystem-0.36.0 termcolor-2.4.0 tokenizers-0.15.2 transformers-4.38.2 tritonclient-2.43.0 urllib3-2.2.1 werkzeug-3.0.1 wrapt-1.14.1 zope.event-5.0

「ERROR: pip's dependency resolver ...」エラーがあったが概ね成功。

lilacs2039lilacs2039

run.shを実行

エラー

$ bash run.sh
Downloading model ...
Converting Transformers Model to Tensorflow SavedModel...
Model saved to ../../models/roberta-tensorflow/.

Traceback (most recent call last):
  File "/mnt/c/Users/.../max/examples/inference/roberta-python-tensorflow/simple-inference.py", line 16, in <module>
    from max import engine
ModuleNotFoundError: No module named 'max'

対応

https://github.com/modularml/max
:ModuleNotFoundError: No module named 'max'サンプルを実行すると、次のようなことが起こりますか?
A: 必ず実行してください
python3 -m pip install --find-links "$(modular config max.path)/wheels" max-engine

実施 → 失敗

$ python3 -m pip install --find-links "$(modular config max.path)/wheels" max-engine
Defaulting to user installation because normal site-packages is not writeable
Looking in links: /wheels
WARNING: Location '/wheels' is ignored: it is either a non-existing path or lacks a specific scheme.
ERROR: Could not find a version that satisfies the requirement max-engine (from versions: none)
ERROR: No matching distribution found for max-engine

そも、maxのパスが通ってないため。

$ modular config max.path
(空文字列)
$ max
Command 'max' not found, did you mean:
...
lilacs2039lilacs2039

Dockerイメージを使う

Dockerイメージが提供されてるらしい。

https://github.com/modularml/max/issues/74
パブリック ECR から取得できますpublic.ecr.aws/modular/max-serving-de。README にも含めるために再開します。

まだREADMEには載ってないが、あった。

https://gallery.ecr.aws/modular/max-serving-de

イメージ取得

$ docker pull public.ecr.aws/modular/max-serving-de:latest
latest: Pulling from modular/max-serving-de
9d19ee268e0d: Pull complete
0ec682bf9971: Pull complete
9ac855545fa9: Pull complete
0a77dcbd0e64: Pull complete
...

コンテナ起動
 試したいリポジトリのディレクトリに移動して以下を実行

$ MODEL_REPOSITORY=$(pwd)
$ docker run -it --rm --net=host \
  -v $MODEL_REPOSITORY:/models \
  public.ecr.aws/modular/max-serving-de \
  tritonserver --model-repository=/models


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.08 (build 66820947)
Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

...

I0302 07:29:46.936614 1 server.cc:305] Waiting for in-flight requests to complete.
I0302 07:29:46.936619 1 server.cc:321] Timeout 30: Found 0 model versions that have in-flight inferences
I0302 07:29:46.936631 1 server.cc:336] All models are stopped, unloading models
I0302 07:29:46.936634 1 server.cc:343] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

失敗。Nvidia dockerが設定できてない

lilacs2039lilacs2039

Nvidia dockerで実行

--gpus allを追記して実行

$ docker run -it --rm --net=host \
  -v $MODEL_REPOSITORY:/models \
  --gpus all \
  public.ecr.aws/modular/max-serving-de \
  tritonserver --model-repository=/models


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.08 (build 66820947)
Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I0302 07:34:15.203528 1 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x205000000' with size 268435456
I0302 07:34:15.203619 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0302 07:34:15.203633 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
W0302 07:34:15.375620 1 server.cc:249] failed to enable peer access for some device pairs
E0302 07:34:15.708757 1 model_repository_manager.cc:1307] Poll failed for model directory 'examples': Invalid model name: Could not determine backend for model 'examples' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.

...

I0302 07:34:15.796287 1 server.cc:305] Waiting for in-flight requests to complete.
I0302 07:34:15.796292 1 server.cc:321] Timeout 30: Found 0 model versions that have in-flight inferences
I0302 07:34:15.796298 1 server.cc:336] All models are stopped, unloading models
I0302 07:34:15.796301 1 server.cc:343] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

「'examples': Invalid model name」、examplesディレクトリにあるモデルが正しく読み込まれないとのこと。
dockerコンテナ起動時のtritonserverコマンドは、モデルを自動的に検出するプログラムみたい。bashで起動してみる…

lilacs2039lilacs2039

max-serving-deのコンテナでbashを起動

$  docker run -it --rm --net=host \
  -v $MODEL_REPOSITORY:/models \
  --gpus all \
  public.ecr.aws/modular/max-serving-de \
>   bash

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.08 (build 66820947)
Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

コンテナに入れた。max, mojoがインストールされてるか確認。

# max -v
max 24.1.0 (c176f84d)
Modular version 24.1.0-c176f84d-release

# mojo -v
mojo 24.1.0 (c176f84d)

# modular -v
bash: modular: command not found

modular cliは入ってないみたい。

lilacs2039lilacs2039

コンテナ内でサンプルプログラム「roberta-python-tensorflow」を実行してみる

# cd /models/examples/inference/roberta-python-tensorflow
# python3 -m pip install -r requirements.txt

Collecting setuptools>=69.1.0 (from -r requirements.txt (line 1))
  Obtaining dependency information for setuptools>=69.1.0 from https://files.pythonhosted.org/packages/c0/7a/3da654f49c95d0cc6e9549a855b5818e66a917e852ec608e77550c8dc08b/setuptools-69.1.1-py3-none-any.whl.metadata
...

Installing collected packages: libclang, flatbuffers, brotli, wrapt, urllib3, typing-extensions, tqdm, termcolor, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, setuptools, safetensors, regex, pyyaml, python-rapidjson, pyasn1, protobuf, packaging, opt-einsum, multidict, ml-dtypes, MarkupSafe, markdown, keras, idna, h5py, grpcio, greenlet, google-pasta, gast, fsspec, frozenlist, filelock, charset-normalizer, certifi, cachetools, attrs, async-timeout, astunparse, absl-py, zope.interface, zope.event, yarl, werkzeug, tritonclient, rsa, requests, pyasn1-modules, aiosignal, requests-oauthlib, huggingface-hub, google-auth, gevent, aiohttp, tokenizers, google-auth-oauthlib, geventhttpclient, transformers, tensorboard, tensorflow
  Attempting uninstall: setuptools
    Found existing installation: setuptools 68.1.0
    Uninstalling setuptools-68.1.0:
      Successfully uninstalled setuptools-68.1.0
Successfully installed MarkupSafe-2.1.5 absl-py-2.1.0 aiohttp-3.9.3 aiosignal-1.3.1 astunparse-1.6.3 async-timeout-4.0.3 attrs-23.2.0 brotli-1.1.0 cachetools-5.3.3 certifi-2024.2.2 charset-normalizer-3.3.2 filelock-3.13.1 flatbuffers-23.5.26 frozenlist-1.4.1 fsspec-2024.2.0 gast-0.5.4 gevent-24.2.1 geventhttpclient-2.0.2 google-auth-2.28.1 google-auth-oauthlib-1.2.0 google-pasta-0.2.0 greenlet-3.0.3 grpcio-1.62.0 h5py-3.10.0 huggingface-hub-0.21.3 idna-3.6 keras-2.15.0 libclang-16.0.6 markdown-3.5.2 ml-dtypes-0.2.0 multidict-6.0.5 opt-einsum-3.3.0 packaging-23.2 protobuf-4.25.3 pyasn1-0.5.1 pyasn1-modules-0.3.0 python-rapidjson-1.16 pyyaml-6.0.1 regex-2023.12.25 requests-2.31.0 requests-oauthlib-1.3.1 rsa-4.9 safetensors-0.4.2 setuptools-69.1.1 tensorboard-2.15.2 tensorboard-data-server-0.7.2 tensorflow-2.15.0.post1 tensorflow-estimator-2.15.0 tensorflow-io-gcs-filesystem-0.36.0 termcolor-2.4.0 tokenizers-0.15.2 tqdm-4.66.2 transformers-4.38.2 tritonclient-2.43.0 typing-extensions-4.10.0 urllib3-2.2.1 werkzeug-3.0.1 wrapt-1.14.1 yarl-1.9.4 zope.event-5.0 zope.interface-6.2
# bash run.sh

Traceback (most recent call last):
  File "/models/examples/inference/roberta-python-tensorflow/../common/roberta-tensorflow/download-model.py", line 23, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/__init__.py", line 48, in <module>
    from tensorflow._api.v2 import __internal__
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/_api/v2/__internal__/__init__.py", line 8, in <module>
    from tensorflow._api.v2.__internal__ import autograph
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/_api/v2/__internal__/autograph/__init__.py", line 8, in <module>
    from tensorflow.python.autograph.core.ag_ctx import control_status_ctx # line: 34
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/autograph/core/ag_ctx.py", line 21, in <module>
    from tensorflow.python.autograph.utils import ag_logging
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/autograph/utils/__init__.py", line 17, in <module>
    from tensorflow.python.autograph.utils.context_managers import control_dependency_on_returns
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/autograph/utils/context_managers.py", line 19, in <module>
    from tensorflow.python.framework import ops
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py", line 40, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 34, in <module>
    self_check.preload_check()
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/platform/self_check.py", line 63, in preload_check
    from tensorflow.python.platform import _pywrap_cpu_feature_guard
ImportError: /usr/local/lib/python3.10/dist-packages/tensorflow/python/platform/../_pywrap_tensorflow_internal.so: undefined symbol: _ZN10tensorflow4core34CppShapeInferenceResult_HandleDataC2EPN6google8protobuf5ArenaEb

import tensorflow as tfでエラー。TensorFlowを読込めなかったみたい。
このイメージではまだTensorFlowは利用できないかも。

lilacs2039lilacs2039

コンテナ内でサンプルプログラム「bert-python-torchscript」を実行してみる

TensorFlowが動かないなら、TorchScriptのサンプルを動かしてみる。

セットアップ → OK

$ cd /models/examples/inference/bert-python-torchscript
$ python3 -m pip install -r requirements.txt

Collecting torch>=2.1.2 (from -r requirements.txt (line 1))
...
Successfully installed jinja2-3.1.3 mpmath-1.3.0 networkx-3.2.1 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.3.101 nvidia-nvtx-cu12-12.1.105 sympy-1.12 torch-2.2.1 triton-2.2.0

実行 → ダメ

# bash run.sh
+ INPUT_EXAMPLE='There are many exciting developments in the field of AI Infrastructure!'
+ MODEL_PATH=../../models/bert.torchscript
++ dirname run.sh
+ cd .
+ python3 ../common/bert-torchscript/download-model.py -o ../../models/bert.torchscript
Downloading model...
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 570/570 [00:00<00:00, 4.14MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 440M/440M [00:06<00:00, 71.6MB/s]
Saving model in TorchScript format...
Converting the model to TorchScript format...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:4193: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
Model saved.
+ python3 simple-inference.py --text 'There are many exciting developments in the field of AI Infrastructure!' --model-path ../../models/bert.torchscript
Traceback (most recent call last):
  File "/models/examples/inference/bert-python-torchscript/simple-inference.py", line 14, in <module>
    from max import engine
ModuleNotFoundError: No module named 'max'

Pythonにmaxが入ってない。イメージにあらかじめ入れておいてほしかった…

# python3 -m pip install --find-links "$(modular config max.path)/wheels" max-engine
bash: modular: command not found
Looking in links: /wheels

modularがインストールされてないので、これは通らない。