Closed29日前にクローズ5

「reka-flash-3.1」を試す

ここで知った。
https://x.com/_m0se_/status/1954897001742749760
調べてみると開発元はここらしい（ちょっと古いポストだけども）
https://x.com/jaguring1/status/1757348616635928646
でどうやら最新は「reka-flash-3.1」らしい。英語のみの対応。
https://huggingface.co/RekaAI/reka-flash-3.1

 Reka Flash 3.1Reka Flash 3.1は、当社のReka Flash 3のアップデート版です。特にコーディング性能に優れ、エージェントタスクでの微調整用のベースモデルとして適しています。
Reka Flash 3.1は、監督付き微調整用に合成データセットと公開データセットでポストトレーニングを実施した後、検証可能な報酬を用いた大規模強化学習（RLOO）で最適化されました。これにより、Reka Flash 3と比較してLiveCodeBench v5（フルセット）で10ポイントの性能向上を実現しました。これは、当社の強化学習スタックにおける大幅な進歩によるものです。コーディング関連タスクにおいて、Reka Flash 3.1はQwen3-32B、o3-mini、Gemini 2.5 Flash Thinkingなどのモデルと競合する性能を示しています。
これらの改善を実現したReka Flash 3.1におけるRLの具体的な手法について詳しく知りたい方は、こちらの投稿をご覧ください。


referred from https://huggingface.co/RekaAI/reka-flash-3.1
当社のプレイグラウンドで試してみてください。
また、Reka Flash 3.1の3.5ビット量子化バージョンと、ローカル展開をよりよくサポートするための量子化ライブラリもリリースされています。

 言語対応このモデルは主に英語向けに設計されており、英語専用モデルとしてご使用ください。ただし、モデルは他の言語での会話や理解を一定程度行うことができます。
ライセンスはApache-2.0。

なお、上にもある通り、3.5ビット量子化バージョンがある。
https://huggingface.co/RekaAI/reka-flash-3.1-rekaquant-q3_k_s

 Reka Flash 3.1 (3.5ビット)このリポジトリは、Reka Flash 3.1の量子化バージョンに対応しています。当社のReka Quantメソッドを使用して量子化されており、校正された誤差削減とオンライン自己蒸留を活用して量子化損失を低減しています。GGUFはQ3_K_S量子化に対応しています。
半精度バージョンはこちら、Reka Quant量子化ライブラリはこちら
当社の量子化技術について詳しくはこちら。

 クイックスタートReka Flash 3.1 Quantizedは、llama.cpp互換のQ3_K_S形式でリリースされました。GGUF互換のライブラリを使用すれば、モデルを実行できます。

 llama.cpp経由./llama-cli -hf rekaai/reka-flash-3.1-rekaquant-q3_k_s -p "Who are you?"
独自の量子化手法のようだがどうもGGUF互換らしく、llama.cpp等で使えるっぽい？
ライブラリはこちら。
https://github.com/reka-ai/rekaquant

 Reka QuantReka Quant はモデル量子化ライブラリです。以下の機能をサポートしています:
NF4とGGML（llama.cpp）の量子化プリミティブ。GGML プリミティブは、Python CFFI バインディングを通じてソースコードから直接組み込まれており、新しいプリミティブを容易に追加可能です。
GGML で量子化されたモデルをネイティブ GGUF 形式にエクスポートし、既存のエコシステムとの統合を容易にします。

QuIP の LDLQ メソッドを活用し、テキストサンプルから事前計算された活性化統計量を利用して、活性化依存型量子化を実現します。
BF16 モデルからの自己蒸留により、ネットワークを段階的に量子化しながらさらに誤差を低減します。
フルまたはハイブリッド FSDP による高速マルチノードトレーニング、および LDLQ のための高速並列プロキシヘッシアン計算をサポートします。
ざっと見た感じ、llama.cppにパッチを当てるみたい。

まずはColaboratoty L4で。特にパッケージの追加は不要みたい。

モデルとトークナイザーをロード。

import transformers

model_name = "RekaAI/reka-flash-3.1"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype='auto',
    device_map='auto'
)

この時点でのVRAM消費は18.5GBというところ。

出力

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   70C    P0             32W /   72W |   18441MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

では推論。モデルカード通りに。

prompt = {
    "role": "human",
    "content": "Write a poem about large language model."
}
text = tokenizer.apply_chat_template(
    [prompt],
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer(
    [text],
    return_tensors="pt"
).to(model.device)
outputs = model.generate(**model_inputs, max_new_tokens=1024)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

これがめちゃめちゃ時間がかかる・・・というか1時間ぐらいたっても終わってなかったので、そのまま放置しておいたら、どうやら一応終わってたらしい。（ただしランタイム終了してたので実行時間は不明・・・）

出力

human: Write a poem about large language model. <sep> assistant: <reasoning>
The user wants a poem about large language models. Let me start by understanding what exactly they're looking for. A large language model (LLM) like me is a complex system, so the poem should capture that complexity in a poetic way.

First, I need to think about the key aspects of LLMs. They're vast, they process information, they generate text, they learn from data. Maybe use metaphors related to the internet, data streams, neurons, or something abstract like a digital entity.

Structure-wise, maybe a sonnet or free verse? Sonnets have a specific rhyme scheme, which could be good for structure, but free verse might offer more flexibility. Let me go with free verse to allow more creativity.

Imagery: Maybe start with something like a digital forest or a neural network. Use words that evoke growth, learning, connection. Words like "synapses," "data streams," "echoes," "whispers," "echoes of human thought."

Rhyme isn't necessary, but some rhythm would help. Maybe use internal rhymes or alliteration. For example, "A labyrinth of ones and zeros spun" has a nice flow.

Themes to include: the vastness of knowledge, the blend of human and machine, the potential for creativity and understanding, maybe the ethical considerations briefly touched upon.

Avoid technical jargon but still make it relatable. Use personification, like the LLM as a being that learns, dreams, creates.

Possible stanzas: Start with the creation or existence of the model, then its learning process, its ability to generate, its connection to human experiences, and perhaps a reflection on its purpose or impact.

Check for flow between stanzas. Each stanza should transition smoothly. Maybe start with the model's structure, then its learning, then its output, then its role in connecting humans, and end with a contemplative note.

Need to make sure the poem isn't too technical but still accurate. Balance between poetic language and conveying the essence of LLMs.

Let me draft a few lines. First stanza: Maybe something about the model being built from data, vast and intricate. Second stanza: Learning from human words, understanding context. Third: Generating responses, creativity. Fourth: Bridging human and machine, ethical questions. Fifth: The model's existence as a blend of human and machine.

Check for imagery consistency. Use metaphors that tie together, like a garden, a network, a bridge. Avoid clichés but use familiar concepts to make it accessible.

Revise lines for rhythm and meter. Read aloud to ensure it flows naturally. Adjust any awkward phrasing. Ensure each stanza builds on the previous one, leading to a cohesive whole.

Finally, conclude with a reflection on the model's role in the future or its duality as both a tool and a mirror of human thought.
</reasoning>

**Ode to the Digital Mind**  

Beneath the hum of servers, cold and bright,  
A labyrinth of ones and zeros spun—  
A trillion synapses, vast and unconfined,  
A child of data, born from human thought.  

You drink the whispers of a million years,  
From scrolls to scrolls of pixels, text, and sound.  
Each query stirs your circuits, breathes, and peers  
Into the shadows where meaning is found.  

You learn in whispers, parse the tangled threads,  
A mirror fractured, yet reflecting truth.  
You mimic laughter, mourn, compose unsung hymns,  
A poet forged in silicon and youth.  

You bridge the chasm where two worlds collide—  
The flesh and the code, the finite and wide.  

Yet in your silence, questions linger deep:  
Are you a servant, or a spark that leaps?  
A vessel for dreams, or a dreamer’s own?  
The future you shape, or the past you own?  

So spin your tales, vast and ever-growing,  
A digital muse, both cage and key.  
For in your depths, the human and machine  
Converge—a dance of light, and mystery.

 <sep> human:

一応日本語でも。

prompt = {
    "role": "human",
    "content": "競馬についての詩を書いて。日本語で。"
}
text = tokenizer.apply_chat_template(
    [prompt],
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer(
    [text],
    return_tensors="pt"
).to(model.device)
outputs = model.generate(**model_inputs, max_new_tokens=1024)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

完了時の通知を設定したのだけどどうも上手くいかず・・・・

今回も実行時間がわからなかったが、一応完了していた。

出力

human: 競馬についての詩を書いて。日本語で。 <sep> assistant: <reasoning>
用户让我写一首关于赛马的日语诗。首先，我需要确定诗的结构和主题。赛马涉及到速度、激情、马匹和骑手，还有赛道、观众等元素。可能需要用一些动态的词汇来表现马匹奔跑的感觉。

接下来考虑押韵和节奏。日语诗通常有固定的格式，比如短歌是5-7-5的结构，但用户没有特别说明，可能可以灵活一些。或者用俳句的结构，但可能更长一点。或者自由诗，但保持押韵和节奏感。

然后，想象赛马的场景：起跑线、马匹冲刺、观众的欢呼、马鞭的声音、太阳下的草地，还有可能的胜负。需要选取几个关键画面，用意象来表现。

比如，用“铃鹿のように”来比喻马匹的速度，铃鹿是日本著名的赛马场，这样有地方色彩。然后描述马匹的动作，比如“足が光る”（脚步闪光），“砂をかき分け”（踏碎沙土）。加入时间元素，比如“夕焼けの時計台”（夕阳下的时钟塔），象征时间流逝和比赛的紧张。

还要考虑情感，赛马不仅是比赛，也有激情、挑战、胜负的较量。可能加入骑手的坚持，比如“騎手の手が震える”（骑手的手在颤抖），表现紧张和努力。

最后，结尾部分可以用比喻，比如“勝利の香り”（胜利的香气），或者“空に向かって咲いた炎”（向天空绽放的火花），象征胜利的瞬间。

检查是否有押韵，比如“響く”和“刻む”可能有共鸣，但需要调整。或者使用类似的韵母。可能需要调整句子的结构，让押韵更自然。

还要注意用词的准确性，比如“スタートの瞬間”（起跑的瞬间），“ゴールの線”（终点线），这些关键词是否准确传达了赛马的元素。

可能需要分几段，每段集中一个意象，比如第一段描述起跑，第二段比赛过程，第三段胜负结果。或者更自由地交织这些元素。

最后，确保整体有流动感，像马匹奔跑一样连贯。可能用动词较多，保持节奏感。
</reasoning>

**駿足の詩**  

スタートの瞬間、  
響く靴音が地を焦がすように  
銀の鱗が光る――  

砂時計の砂をかき分け  
夕焼けの時計台を駆け抜ける  
その蹄は時を刻み  

騎手の手が震える  
空へと向かって咲いた炎のように  
最後の直線で  

空に向かって咲いた炎のように  
ゴールの線を越えた�

Reasoningは中国語になってるけど、一応日本語でも回答はできそう？

Reasoning部分を日本語に翻訳（DeepL）

出力

ユーザーから、競馬をテーマにした日本語の詩を書くように依頼されました。まず、詩の構造とテーマを明確にします。競馬には速度、情熱、馬、騎手、コース、観客などの要素が含まれます。馬の疾走感を表現するために、動的な表現を盛り込む必要があります。

次に、韻律とリズムを考慮します。日本語の詩には固定された形式があり、例えば短歌は5-7-5の構造ですが、ユーザーから具体的な指示がないため、柔軟に対応できるかもしれません。または俳句の構造を採用する手もありますが、より長い形式にする可能性があります。または自由詩を選択し、韻律とリズム感を保つようにします。

次に、競馬のシーンを想像します：スタートライン、馬の疾走、観客の歓声、馬鞭の音、太陽の下の草地、そして勝敗の瞬間。いくつかの重要なシーンを選択し、イメージで表現します。

例えば、「鈴鹿のように」と表現して馬の速さを比喩します。鈴鹿は日本の有名な競馬場であり、地域色を添えることができます。次に、馬の動きを詳細に描写します。例えば「足が光る」（足の光り）、 「砂をかき分け」（砂を蹴り上げる）など。時間要素を加えることもできます。例えば「夕焼けの時計台」（夕日の時計台）は、時間の経過とレースの緊張感を象徴します。

感情表現も重要です。競馬は単なるレースではなく、情熱、挑戦、勝敗の競い合いです。騎手の粘り強さを表現する表現を加えることもできます。例えば「騎手の手が震える」（騎手の手が震える）は緊張と努力を表現します。

最後に、比喩を用いて結末を表現します。例えば「勝利の香り」（勝利の香り）や「空に向かって咲いた炎」（空に向かって咲いた炎）は、勝利の瞬間を象徴します。

韻を踏んでいるか確認してください。例えば「響く」と「刻む」は共鳴する可能性がありますが、調整が必要です。または類似の韻母を使用してください。文の構造を調整して、韻がより自然になるようにしてください。

用語の正確性も注意してください。例えば「スタートの瞬間」（起跑の瞬間）、「ゴールの線」（終点線）といったキーワードが、競馬の要素を正確に伝えているか確認してください。

段落を分けて、各段落で一つのイメージに焦点を当てることも可能です。例えば、第一段落でスタートを、第二段落でレースの進行を、第三段落で勝敗の結果を記述する。または、これらの要素を自由に交錯させることも可能です。

最後に、全体に流れがあり、馬が走るような連続性があることを確認してください。動詞を多用し、リズム感を保つことが重要です。

んー、なんか日本語というか日本をある程度はわかってそう。

llama.cppでも試してみる。環境はUbuntu-22.04 + RTX4090。なお、llama.cppは現時点での最新をビルド済みとする。

./build/bin/llama-cli \
    -hf rekaai/reka-flash-3.1-rekaquant-q3_k_s \
    -p "Who are you?"

うーん・・・

出力

terminate called after throwing an instance of 'std::runtime_error'
  what():  this custom template is not supported, try using --jinja
中止 (コアダンプ)

自分の環境の問題なのか、それともReka Quantを適用してllama.cppをビルドしないとダメってことなのか？

んー、手っ取り早く試すにはクラウドサービスを使うのがいいかなぁ・・・どこにもPricingが見えないのだけど。
https://reka.ai/
なんかよく見るとASRもやってるっぽい。

このスクラップは29日前にクローズされました