👻

Llama-deus-7bをgoogle colabで試してみた

2023/06/15に公開

Llama-deus-7bとは

Llama-deus-7bは以下のデータセットを利用してllama7bをfinetuningしたものです。

GPTeacher - General Instruct - Code Instruct - Roleplay Instruct
My unreleased Roleplay V2 Instruct
GPT4-LLM Uncensored + Unnatural Instructions
WizardLM Uncensored
CamelAI's 20k Biology, 20k Physics, 20k Chemistry, and 50k Math GPT4 Datasets
CodeAlpaca

データ数をめっちゃ食わせた軽量なモデルって感じですね。

リンク

Colab
github

準備

Google Colabを開き、メニューから「ランタイム→ランタイムのタイプを変更」でランタイムを「GPU」に変更します。

環境構築

インストール手順です。

!pip install git+https://github.com/huggingface/transformers.git
!pip install peft gradio accelerate bitsandbytes
!pip install sentencepiece

推論

(1)モデルのロード

import torch
from peft import PeftModel
import transformers
import gradio as gr
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig

# colab pro以上でのプランでA100を使用しないと動かないかも

BASE_MODEL = "decapoda-research/llama-7b-hf"
# BASE_MODEL = "decapoda-research/llama-13b-hf"
# BASE_MODEL = "decapoda-research/llama-30b-hf"
# BASE_MODEL = "decapoda-research/llama-65b-hf"

tokenizer = LlamaTokenizer.from_pretrained(BASE_MODEL,device_map={'': 0})

LORA_WEIGHTS = "teknium/llama-deus-7b-v3-lora"
# LORA_WEIGHTS ="kunishou/Japanese-Alpaca-LoRA-13b-v0"
# LORA_WEIGHTS = "kunishou/Japanese-Alpaca-LoRA-30b-v0"
# LORA_WEIGHTS = "kunishou/Japanese-Alpaca-LoRA-65b-v0"

if BASE_MODEL == "decapoda-research/llama-7b-hf":
  model_param = "7B"
elif BASE_MODEL == "decapoda-research/llama-13b-hf":
  model_param = "13B"
elif BASE_MODEL == "decapoda-research/llama-30b-hf":
  model_param = "30B"
else:
  model_param = "65B"

if torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"

try:
    if torch.backends.mps.is_available():
        device = "mps"
except:
    pass

if device == "cuda":
    model = LlamaForCausalLM.from_pretrained(
        BASE_MODEL,
        load_in_8bit=True,
        torch_dtype=torch.float16,
        # device_map="auto",
        device_map={'': 0},
    )
    model = PeftModel.from_pretrained(model, LORA_WEIGHTS, torch_dtype=torch.float16, device_map={'': 0},)
elif device == "mps":
    model = LlamaForCausalLM.from_pretrained(
        BASE_MODEL,
        # device_map={"": device},
        device_map={'': 0},
        torch_dtype=torch.float16,
    )
    model = PeftModel.from_pretrained(
        model,
        LORA_WEIGHTS,
        # device_map={"": device},
        device_map={'': 0},
        torch_dtype=torch.float16,
    )
else:
    model = LlamaForCausalLM.from_pretrained(
        BASE_MODEL,
        # device_map={"": device},
        device_map={'': 0},
        low_cpu_mem_usage=True
    )
    model = PeftModel.from_pretrained(
        model,
        LORA_WEIGHTS,
        # device_map={"": device},
        device_map={'': 0},
    )


def generate_prompt(instruction, input=None):
    if input:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:"""


model.eval()
if torch.__version__ >= "2":
    model = torch.compile(model)


def evaluate(
    instruction,
    input=None,
    temperature=0.1,
    top_p=0.75,
    top_k=40,
    num_beams=4,
    max_new_tokens=256,
    **kwargs,
):
    prompt = generate_prompt(instruction, input)
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].to(device)
    generation_config = GenerationConfig(
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        num_beams=num_beams,
        no_repeat_ngram_size=3,
        **kwargs,
    )

    with torch.no_grad():
        generation_output = model.generate(
            input_ids=input_ids,
            generation_config=generation_config,
            return_dict_in_generate=True,
            output_scores=True,
            max_new_tokens=max_new_tokens,
        )
    s = generation_output.sequences[0]
    output = tokenizer.decode(s)
    return output.split("### Response:")[1].strip()

(2)推論実行

instruction = """
I want you act as a professional scenario writer. Please write 20 minutes presentations about the latest generative AI.

"""

response = evaluate(instruction, max_new_tokens=1024)
print(response)

(3)推論結果
input

### Instruction
I want you act as a professional scenario writer. Please write 20 minutes presentations about the latest generative AI.

output

Title: Generative Artificial Intelligence: The Future of AI

Introduction:
In today's world, AI has become an integral part of our daily lives. From virtual assistants to self-driving cars, the AI revolution has transformed the way we live, work, and interact with each other. However, with the rapid advancement in AI technology, a new AI paradigm has emerged - generative artificial intelligence (GAI). GAI is an AI subfield that focuses on creating intelligent systems that can mimic human intelligence and creativity. In this presentation, we will explore the latest developments in GAI and its potential impact on various industries. 

Key Points:
1. What is GAI?
- Generative artificial Intelligence is a subset of machine learning that uses deep learning techniques to generate human-like text, images, and videos.
- GAI systems are designed to learn from vast amounts of data and use that knowledge to create original works of art, music, literature, and more.
2. The Advantages of GAI:
- Unlike traditional AI, which is limited to specific tasks, GAI can create a wide range of works that are virtually indistinguishable from those created by human beings
- With GAI, we can expect to see a new wave of creativity and innovation, as AI systems start to generate their own content, rather than simply processing pre-existing data.
3. Potential Applications:
a. GAI has the potential to revolutionize the creative industries, such as music, art, and literature, by providing a new platform for generating original content.
b. In the field of marketing and advertising, GIA can be used to create personalized ad campaigns that resonate with the target audience.
c. In healthcare, GA can be applied to generate personalized treatment plans, taking into account the patient's medical history, genetics, and lifestyle factors.
4. Challenges and Limitations:
While GAI holds immense potential, there are still several challenges and limitations that need to be addressed before we can fully harness its power. One of the biggest challenges is the need for large amounts of high-quality data to train GAI models. Additionally, there is a need for better algorithms and techniques to ensure that the generated content is not only accurate, but also culturally and contextually appropriate.
Conclusion:
Generative artificial intelligencia is a game-changer in the world of artificial intelligence. With its potential to generate original content in virtually every industry, the GAI revolution is set to reshape the world as we know it. As researchers and developers continue to push the boundaries of GA technology, it is safe to say that we are on the verge of a new creative and innovative era.

なかなかスンバラしい文章を生成してくるじゃないですか。好きです。

ただ推論時間がめちゃくちゃ遅いのだけはどうにかならないですかね？

最後に

今回はLlama-Deus-7bを試してみました。出力はかなりいい感じです。7Bモデルの中だったら最高品質レベルなんじゃないかと感じています。ただめちゃくちゃ推論時間が長い気がする。
コードの設計が悪いのかもしれないですが。

今後ともLLM, Diffusion model, Image Analysis, 3Dに関連する試した記事を投稿していく予定なのでよろしくお願いします。

Llama-deus-7bとは

リンク

準備

環境構築

推論

最後に

Discussion