🐡

Llama3 を MLX で動かしてみた

2024/05/15に公開

Llama3

Llama3 は、Metaがオープンソースで提供しているLLMの最新版（2024年4月18日に発表）です。
https://llama.meta.com/llama3/

Llama3 には「8B」と「70B」という２つのモデルがあります。
Bは、Bilion の略で、8Bは80億、70Bは700億となりパラメータ数の大きさに違いがあります。

Hugging Face に mlx-community から提供された、4bitに量子化された 8B モデルがありましたのでこちらを動かしてみます。

MLX

MLX は、Apple Silicon に最適化された機械学習フレームワークでオープンソースで提供されています。

リポジトリはこちら
https://github.com/ml-explore/mlx

実行環境

マシーン：MacBook Pro(Apple M2 Maxチップ)
メモリ：64GB
OS：macOS Sonoma
Python 3.10

手順

1 仮想環境の準備

Anacondaを使用してPythonの仮想環境を準備します。
https://www.anaconda.com/

Python で色々と動かしていると、様々なライブラリをインストールすることになり、これまで動いていたものが動かなくなったりするので毎度個別に環境を用意するようにしています。

conda create -n mlx-llama3 python=3.10

conda activate mlx-llama3

2 MLXのインストール

MLXのドキュメント通りにインストールを進めていきます。

pip install mlx

https://huggingface.co/docs/hub/mlx

３モデルのダウンロードと実行

MLXを使ってモデルを実行します

mlx_lm.generate --model mlx-community/Meta-Llama-3-8B-Instruct-4bit --prompt "あなたは誰ですか？"

４実行結果

mlx_lm.generate --model mlx-community/Meta-Llama-3-8B-Instruct-4bit --prompt "あなたは誰ですか？"

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Fetching 6 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 95325.09it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
==========
Prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

あなたは誰ですか？<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Nice to meet you! I am LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I'm not a human, but a computer program designed to simulate conversation, answer questions, and even create content. I don't have personal experiences or emotions, but I'm here to help you with any information or tasks you may have!
==========
Prompt: 104.910 tokens-per-sec
Generation: 68.816 tokens-per-sec

Nice to meet you! I am LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I'm not a human, but a computer program designed to simulate conversation, answer questions, and even create content. I don't have personal experiences or emotions, but I'm here to help you with any information or tasks you may have!

【翻訳（deepl）】
はじめまして！私はLLaMA、Meta AIが開発したAIアシスタントで、人間の入力を理解し、会話形式で応答することができます。私は人間ではなく、会話をシミュレートし、質問に答え、さらにはコンテンツを作成するように設計されたコンピュータープログラムです。私は個人的な経験や感情を持っていませんが、あなたが持っている情報やタスクをお手伝いします！