Closed20日前にクローズ9

「Gemma3」を試す

公式
https://x.com/googleaidevs/status/1899725682545967555
中の人
https://x.com/_philschmid/status/1899726907022963089
Gemma 3 が登場しました。これは LMSYS で最高のオープン非推論モデルです! 🚀

@GoogleDeepMind Gemma 3 は、128k トークンのコンテキストを持つオープンでマルチモーダル (テキスト + ビジョン)、多言語 LLM であり、4 つのサイズがあります。
要約:

4️⃣ 1B、4B、12B、27Bの4つのサイズ（事前トレーニング済みおよび命令調整済みバージョン）

🥇 LMArena のオープン非推論モデルで 1 位を獲得。スコアは 1338 で、o1-mini を上回っています。

🖼️ 4B、12B、27Bサイズのテキストと画像の入力とテキスト出力

🌍 140以上の言語をサポートする多言語対応

📈コンテキスト ウィンドウが 128K トークンに増加 (1B モデルでは 32K)

🧠トレーニング後、RLHF、RLMF、RLEFとモデルのマージを組み合わせる

😮ジェマ 3 12B はジェマ 2 27B より優れており、4B と 9B も同様である。

🔍高解像度画像のための適応ウィンドウアルゴリズムを備えたSigLIPベースのビジョンエンコーダ

🧪 JAXを使用してGoogle TPUで最大14Tトークン（27B）をトレーニング

🤗 @huggingface、@Kaggle、AI Studioで試用可能
LMSYSの結果はこちら
https://x.com/lmarena_ai/status/1899729292617277501

HuggingFace
https://huggingface.co/blog/gemma3
https://huggingface.co/collections/google/gemma-3-release-67c6c6f89c4f76621268bb6d
事前学習モデル（-pt。だと思う）とInstruction Tuning（-it）モデルで、1B/4B/12B/27Bの合計8種類。
なお、1Bだけ画像入力・マルチリンガルに非対応で、入力コンテキストも32Kな様子。
Ollamaもすでにある。
https://x.com/ollama/status/1899742981676007791

ColaboratoryでInstruction Tuningモデルを試す。手頃な4Bで。

4B

Colaboratory T4で。

パッケージインストール。Gemma-3向けはまだマージされていないってことなのかな？

!pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3
!pip install accelerate

transformersのpipelineを使う場合

こういう画像を読み取る

referred from https://huggingface.co/google/gemma-3-4b-it

from transformers import pipeline
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/gemma-3-4b-it",
    device="cuda",
    torch_dtype=torch.bfloat16
)

messages = [
    [
        {
            "role": "system",
            "content": [{"type": "text", "text": "あなたは親切な日本語のアシスタントです。"},]
        },
        {
            "role": "user",
            "content": [
                {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
                {"type": "text", "text": "キャンディに描いてある動物は何？"},
            ]
        },
    ],
]


output = pipe(text=messages, max_new_tokens=1024)
print(output[0][0]["generated_text"][-1]["content"])

結果

写真に写っているキャンディには、キツネが描かれていますね！

キャンディの表面には、キツネのシルエットがはっきりと見えます。

nvidia-smiだと12GB弱ぐらい

出力

Wed Mar 12 09:38:40 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   61C    P0             28W /   70W |   11870MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

自分の画像でも試してみる。

messages = [
    [
        {
            "role": "system",
            "content": [{"type": "text", "text": "あなたは親切な日本語のアシスタントです。"},]
        },
        {
            "role": "user",
            "content": [
                {"type": "image", "url": "https://storage.googleapis.com/zenn-user-upload/caa14aef4b7e-20250226.png"},
                {"type": "text", "text": "本の表紙にはなんと書かれていますか？"},
            ]
        },
    ],
]


output = pipe(text=messages, max_new_tokens=1024)
print(output[0][0]["generated_text"][-1]["content"])

本の表紙には、以下のように書かれています。

ドキュメンテーション・コミュニケーション全体観

「真」と「制作スピード」を上げるメカニズム

上巻原則と手順中川邦夫

コミュニケーションはすべて 解・動・早 で進めよ

「めっくついただけ・動いていただけてできるだけ早く」

おー、画像サイズ結構小さいと思うんだけど、結構読めてる。

Gemma3ForConditionalGenerationクラスを使う例

from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from PIL import Image
import requests
import torch

model_id = "google/gemma-3-4b-it"

model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map="auto"
).eval()

processor = AutoProcessor.from_pretrained(model_id)

messages = [
    [
        {
            "role": "system",
            "content": [{"type": "text", "text": "あなたは親切な日本語のアシスタントです。"},]
        },
        {
            "role": "user",
            "content": [
                {"type": "image", "url": "https://storage.googleapis.com/zenn-user-upload/caa14aef4b7e-20250226.png"},
                {"type": "text", "text": "本の表紙にはなんと書かれていますか？"},
            ]
        },
    ],
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

input_len = inputs["input_ids"].shape[-1]

with torch.inference_mode():
    generation = model.generate(
        **inputs,
        max_new_tokens=1024,
        do_sample=False
    )
    generation = generation[0][input_len:]

decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)

本の表紙には、以下のことが書かれています。

タイトル: ドキュメント・コミュニケーション全体観

サブタイトル: 「真”と”別れスピード”を上げるメカニズム」

著者: 上巻原則と手順中川邦夫

キャッチコピー: コミュニケーションはすぐで解・動・早で進めよ

小見出し: 筋力、戦略、戦術、プロセスが図で示されています。

pipelineよりもちょっとVRAM使用量が増えたな。

出力

Wed Mar 12 09:58:03 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   55C    P0             27W /   70W |   12938MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

llma.cpp GGUF

MLX

UnslothのFine Tuning記事、速い・・・
https://unsloth.ai/blog/gemma3
FT版も
https://huggingface.co/collections/unsloth/gemma-3-67d12b7e8816ec6efa7e4e5b

 まとめ性能高そうだし普段遣いに使えそう。ローカルモデルもすごいなぁ・・・
個人的には
システムプロンプトに対応
Function Calling+Structured Outputに対応
も嬉しいところ。ちょっとエージェントで動かしてみたい。
https://blog.google/technology/developers/gemma-3/?linkId=13397566

厳密にはFunction Callingに対応している、というわけではないのかな？
https://x.com/_philschmid/status/1900535848279626072
https://www.philschmid.de/gemma-function-calling
この記事含めて、冒頭の公式の記事もきちんと読んだほうが良さそう。

OllamaのテンプレートでFunction Callingを実装した例があった

このスクラップは20日前にクローズされました

ログインするとコメントできます