iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🫠

How to set chat_template when using Transformers in LangChain [Troubleshooting Guide]

に公開

Introduction

This content was touched upon in a previous article, but I decided to make it an independent article as it might be useful for someone as a guide for error handling.

How to Use Transformers

Basic Usage

First, I would like to organize how to use Transformers.

Below is sample code for using Transformers.
By running the code below, you can operate Hugging Face models on Transformers.

transformers.py
# pip install transformers torch accelerate

from transformers import AutoModelForCausalLM, AutoTokenizer

# Model name

model_name = "elyza/Llama-3-ELYZA-JP-8B"

# Load the model
llama = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
)

messages = [
    {"role": "system", "content": "あなたは日本語を話す優秀なアシスタントです。"},
    {"role": "user", "content": "日本で一番高い山は?"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(llama.device)

tokens = llama.generate(
    **model_inputs,
    max_new_tokens=1024,
    do_sample=False
)

# Extract only the response part
generated_text = tokenizer.batch_decode(tokens)[0]
print(f"Generated Text:\n{generated_text}")

# Remove the prompt part
# Get the prompt part (decode `input_ids`)
prompt_text = tokenizer.decode(model_inputs["input_ids"][0])
print(f"Prompt Text:\n{prompt_text}")

# Extract only the response part
response = generated_text[len(prompt_text):].strip()
print(f"Response:\n{response}")

The flow is simple:

  • Define the model and tokenizer.
  • Define the system prompt and user prompt in a dictionary format.
  • Use tokenizer.apply_chat_template to embed the prompt into the appropriate chat_template.
  • Tokenize the natural language prompt with the tokenizer.
  • Input it into the model using the llama.generate method to generate text.
  • Decode with the tokenizer to output natural language.
    • Since the input prompt is also output, process it to display only the model's output.
  • Display the output text.

What is a Chat Template (chat_template)?

The important part for this article is this chat_template section.
In terms of code, the following part is relevant:

transformers.py

messages = [
    {"role": "system", "content": "あなたは日本語を話す優秀なアシスタントです。"},
    {"role": "user", "content": "日本で一番高い山は?"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

LLM Prerequisite

As a fundamental prerequisite, what an LLM does is "generate the next word given the input."
By repeating this auto-regressively many times, it can generate long texts.

For example, it is easy to output the continuation of a diary.
If you input:

Today I woke up early and,

into the model, it will automatically generate the continuation, such as:

went to the zoo. At the zoo, I saw giraffes and elephants for the first time in a long while. How many years has it been? Their majestic....

This is the basic operation of an LLM (its base model).

In Chat Models (Instruct Models)

It's basically the same for chat models (Instruct models). However, chat models have system prompts and user prompts. Also, as a characteristic, the LLM's response needs to be generated as someone else's response, rather than just continuing the user prompt.

Therefore, LLMs are trained with a format that explicitly indicates the boundaries of the system prompt, user prompt, and AI's output result. Consequently, during inference, you must input data into the model in the same format used during training.

This rule for grammar and tags used to make these boundaries explicit is called a "chat template."

For example, the chat template for Llama 3 series models looks like this:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

あなたは日本語を話す優秀なアシスタントです。<|eot_id|><|start_header_id|>user<|end_header_id|>

日本で一番高い山は?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
日本で一番高い山は、北海道にある「旭岳」で、標高は2,291メートルです。<|eot_id|>

In this example, the system prompt is set to "あなたは日本語を話す優秀なアシスタントです。" (You are an excellent assistant who speaks Japanese.), and the user prompt is "日本で一番高い山は?" (What is the highest mountain in Japan?). Then, the AI's response is "日本で一番高い山は、北海道にある「旭岳」で、標高は2,291メートルです。" (The highest mountain in Japan is "Asahidake" in Hokkaido, with an altitude of 2,291 meters.) (Unfortunately, the answer is incorrect).

Llama 3 is trained using this format with these tags. During inference, you first input the following part and it outputs the next word. This process continues auto-regressively until the <|eot_id|> token is output.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

あなたは日本語を話す優秀なアシスタントです。<|eot_id|><|start_header_id|>user<|end_header_id|>

日本で一番高い山は?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The method that formats dictionary-style prompts into this chat template is the tokenizer.apply_chat_template method.

transformers.py

messages = [
    {"role": "system", "content": "あなたは日本語を話す優秀なアシスタントです。"},
    {"role": "user", "content": "日本で一番高い山は?"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

print(text)
# Output of print(text)
#<|begin_of_text|><|start_header_id|>system<|end_header_id|>
#
#あなたは日本語を話す優秀なアシスタントです。<|eot_id|><|start_header_id|>user<|end_header_id|>
#
#日本で一番高い山は?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Where Can You See the Chat Template?

Now, this chat template varies depending on the model.

This is because improving the chat template also affects the model's performance, so it is not uncommon for the chat template to change between generations. The fact that initially simple chat templates like Markdown have evolved into using slightly more complex tags is evidence of this.

However, if that's the case, how does the tokenizer.apply_chat_template method choose a chat template suited to the model and format the prompt appropriately?

This is possible because the chat_template attribute is stored in the tokenizer_config.json file of the Hugging Face model.

For example, let's look at the tokenizer_config.json file of the elyza/Llama-3-ELYZA-JP-8B model used in the sample code above.

It is here:
https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B/blob/main/tokenizer_config.json

As shown in the image below, you can see that the chat_template attribute is saved near the bottom.

By the way, here is the full text (formatted):

tokenizer_config.json
{% set loop_messages = messages %}
{% for message in loop_messages %}
    {% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' %}
    {% if loop.index0 == 0 %}
        {% set content = bos_token + content %}
    {% endif %}
    {{ content }}
{% endfor %}
{% if add_generation_prompt %}
    {{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{% endif %}

It is said to be written in Jinja2 template engine syntax. Let's look at it from the top.

Chat Template Explanation

Setting Variables

tokenizer_config.json
{% set loop_messages = messages %}

From this, the loop_messages variable is created based on the messages prompt input into the tokenizer.

Setting the Loop

tokenizer_config.json
{% for message in loop_messages %}
    {% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' %}
・・・
{% endfor %}

A loop is set up here. A line for the chat template is created for each role. Also, in message['content'] | trim, a pipe process is used to remove whitespace from the message content. Once the above process is complete, the following part of the chat template is finished.

<|start_header_id|>system<|end_header_id|>
# You are an excellent assistant who speaks Japanese.
<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the highest mountain in Japan?<|eot_id|>

The remaining parts, the <|begin_of_text|> token indicating the start of the template and the token indicating the AI's output, need to be added before and after.

Token Indicating the Start of the Template

tokenizer_config.json
    {% if loop.index0 == 0 %}
        {% set content = bos_token + content %}
    {% endif %}

In the loop process, bos_token is set before content at the beginning of the loop. bos_token is also set in tokenizer_config.json.

Therefore, the <|begin_of_text|> token is added at the very beginning.

Adding the Token Indicating AI Output

tokenizer_config.json
{% if add_generation_prompt %}
    {{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{% endif %}

In this part, a token is added at the end. However, this is only when if add_generation_prompt is true.

As shown below, this if statement is satisfied because add_generation_prompt is set to True.

transformers.py
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

Also, you can see {{ content }} just before that. Variables or strings inside these double curly braces {{}} are embedded in place.

Therefore, the string stored in content up to this point is fixed and used. Then, since it becomes {{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}, it is configured to be appended after {{ content }}.

Chat Templates for Other Models

Also, for example, the above is a chat template for the Llama 3 series, but the chat template for Llama 3.1 or Llama 3.3 is slightly different. (The chat templates for Llama 3.1 and Llama 3.3 are identical.)

The following shows the chat template for the meta-llama/Llama-3.3-70B-Instruct model.

meta-llama/Llama-3.3-70B-Instruct

The Llama 3.3 series seems to have a richer chat template to enable features like cutoff dates, tool functions, JSON output, and code_interpreter, which were added compared to the Llama 3 series.

However, basic parts such as tags for organizing prompts are the same as the Llama 3 series. So, if you don't use the rich features mentioned above, even if you use the Llama 3 chat template as is, it will actually respond properly.

tokenizer_config.json
{{- bos_token }}
{%- if custom_tools is defined %}
    {%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
    {%- set tools_in_user_message = true %}
{%- endif %}
{%- if not date_string is defined %}
    {%- set date_string = "26 Jul 2024" %}
{%- endif %}
{%- if not tools is defined %}
    {%- set tools = none %}
{%- endif %}

{# Extract the system message, so we can slot it into the right place. #}
{%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content']|trim %}
    {%- set messages = messages[1:] %}
{%- else %}
    {%- set system_message = "" %}
{%- endif %}

{# System message + builtin tools #}
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
{%- if builtin_tools is defined or tools is not none %}
    {{- "Environment: ipython\n" }}
{%- endif %}
{%- if builtin_tools is defined %}
    {{- "Tools: " + builtin_tools | reject('equalto', 'code_interpreter') | join(", ") + "\n\n" }}
{%- endif %}
{{- "Cutting Knowledge Date: December 2023\n" }}
{{- "Today Date: " + date_string + "\n\n" }}
{%- if tools is not none and not tools_in_user_message %}
    {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.\n\n" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "\n\n" }}
    {%- endfor %}
{%- endif %}
{{- system_message }}
{{- "<|eot_id|>" }}

{# Custom tools are passed in a user message with some extra guidance #}
{%- if tools_in_user_message and not tools is none %}
    {# Extract the first user message so we can plug it in here #}
    {%- if messages | length != 0 %}
        {%- set first_user_message = messages[0]['content']|trim %}
        {%- set messages = messages[1:] %}
    {%- else %}
        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
    {%- endif %}
    {{- '<|start_header_id|>user<|end_header_id|>\n\n' }}
    {{- "Given the following functions, please respond with a JSON for a function call " }}
    {{- "with its proper arguments that best answers the given prompt.\n\n" }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.\n\n" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "\n\n" }}
    {%- endfor %}
    {{- first_user_message + "<|eot_id|>" }}
{%- endif %}

{# Process each message #}
{%- for message in messages %}
    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' }}
    {%- elif 'tool_calls' in message %}
        {%- if not message.tool_calls|length == 1 %}
            {{- raise_exception("This model only supports single tool-calls at once!") }}
        {%- endif %}
        {%- set tool_call = message.tool_calls[0].function %}
        {%- if builtin_tools is defined and tool_call.name in builtin_tools %}
            {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
            {{- "<|python_tag|>" + tool_call.name + ".call(" }}
            {%- for arg_name, arg_val in tool_call.arguments | items %}
                {{- arg_name + '="' + arg_val + '"' }}
                {%- if not loop.last %}, {% endif %}
            {%- endfor %}
            {{- ")" }}
        {%- else %}
            {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
            {{- '{"name": "' + tool_call.name + '", ' }}
            {{- '"parameters": ' }}
            {{- tool_call.arguments | tojson }}
            {{- "}" }}
        {%- endif %}
        {%- if builtin_tools is defined %}
            {{- "<|eom_id|>" }}
        {%- else %}
            {{- "<|eot_id|>" }}
        {%- endif %}
    {%- elif message.role == "tool" or message.role == "ipython" %}
        {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
        {%- if message.content is mapping or message.content is iterable %}
            {{- message.content | tojson }}
        {%- else %}
            {{- message.content }}
        {%- endif %}
        {{- "<|eot_id|>" }}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{%- endif %}

Models Without a Chat Template Set

So far we have looked at chat templates, but some models do not have chat_template set in tokenizer_config.json.

For example, regarding the nitky/Llama-3.1-SuperSwallow-70B-Instruct-v0.1 model, which is currently ranked top for performance on the Open Japanese LLM Leaderboard, chat_template is not set in tokenizer_config.json.

(This is also mentioned in the README, stating that the chat template should be specified separately.)

For such models, there is a way to manually specify the chat template.

How to Manually Set chat_template in Transformers

For example, by modifying the sample code as follows, you can set the chat_template later and execute tokenizer.apply_chat_template.

Full code for transformers.py
transformers.py
# pip install transformers torch accelerate

from transformers import AutoModelForCausalLM, AutoTokenizer

# Model name

model_name = "nitky/Llama-3.1-SuperSwallow-70B-Instruct-v0.1"

# Load the model
llama = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
)

tokenizer.chat_template = """
{%- if custom_tools is defined %}
    {%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
    {%- set tools_in_user_message = true %}
{%- endif %}
{%- if not date_string is defined %}
    {%- set date_string = "26 Jul 2024" %}
{%- endif %}
{%- if not tools is defined %}
    {%- set tools = none %}
{%- endif %}

{#- This block extracts the system message, so we can slot it into the right place. #}
{%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content']|trim %}
    {%- set messages = messages[1:] %}
{%- else %}
    {%- set system_message = "" %}
{%- endif %}

{#- System message + builtin tools #}
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
{%- if builtin_tools is defined or tools is not none %}
    {{- "Environment: ipython\n" }}
{%- endif %}
{%- if builtin_tools is defined %}
    {{- "Tools: " + builtin_tools | reject('equalto', 'code_interpreter') | join(", ") + "\n\n" }}
{%- endif %}
{{- "Cutting Knowledge Date: December 2023\n" }}
{{- "Today Date: " + date_string + "\n\n" }}
{%- if tools is not none and not tools_in_user_message %}
    {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.\n\n" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "\n\n" }}
    {%- endfor %}
{%- endif %}
{{- system_message }}
{{- "<|eot_id|>" }}

{#- Custom tools are passed in a user message with some extra guidance #}
{%- if tools_in_user_message and not tools is none %}
    {#- Extract the first user message so we can plug it in here #}
    {%- if messages | length != 0 %}
        {%- set first_user_message = messages[0]['content']|trim %}
        {%- set messages = messages[1:] %}
    {%- else %}
        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
    {%- endif %}
    {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
    {{- "Given the following functions, please respond with a JSON for a function call " }}
    {{- "with its proper arguments that best answers the given prompt.\n\n" }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.\n\n" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "\n\n" }}
    {%- endfor %}
    {{- first_user_message + "<|eot_id|>" }}
{%- endif %}

{%- for message in messages %}
    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
        {{- '<|start_header_id|>' + message['role'] + '<|start_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' }}
    {%- elif 'tool_calls' in message %}
        {%- if not message.tool_calls|length == 1 %}
            {{- raise_exception("This model only supports single tool-calls at once!") }}
        {%- endif %}
        {%- set tool_call = message.tool_calls[0].function %}
        {%- if builtin_tools is defined and tool_call.name in builtin_tools %}
            {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
            {{- "<|python_tag|>" + tool_call.name + ".call(" }}
            {%- for arg_name, arg_val in tool_call.arguments | items %}
                {{- arg_name + '="' + arg_val + '"' }}
                {%- if not loop.last %}
                    {{- ", " }}
                {%- endif %}
            {%- endfor %}
            {{- ")" }}
        {%- else %}
            {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
            {{- '{"name": "' + tool_call.name + '", ' }}
            {{- '"parameters": ' }}
            {{- tool_call.arguments | tojson }}
            {{- "}" }}
        {%- endif %}
        {%- if builtin_tools is defined %}
            {#- This means we're in ipython mode #}
            {{- "<|eom_id|>" }}
        {%- else %}
            {{- "<|eot_id|>" }}
        {%- endif %}
    {%- elif message.role == "tool" or message.role == "ipython" %}
        {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
        {%- if message.content is mapping or message.content is iterable %}
            {{- message.content | tojson }}
        {%- else %}
            {{- message.content }}
        {%- endif %}
        {{- "<|eot_id|>" }}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{%- endif %}"""


messages = [
    {"role": "system", "content": "あなたは日本語を話す優秀なアシスタントです。"},
    {"role": "user", "content": "日本で一番高い山は?"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(llama.device)

tokens = llama.generate(
    **model_inputs,
    max_new_tokens=1024,
    do_sample=False
)

# Extract only the response part
generated_text = tokenizer.batch_decode(tokens)[0]
print(f"Generated Text:\n{generated_text}")

# Remove the prompt part
# Get the prompt part (decode `input_ids`)
prompt_text = tokenizer.decode(model_inputs["input_ids"][0])
print(f"Prompt Text:\n{prompt_text}")

# Extract only the response part
response = generated_text[len(prompt_text):].strip()
print(f"Response:\n{response}")

The important part is the following:

transformer.py
tokenizer.chat_template = """
{%- if custom_tools is defined %}
・・・
"""

Since the model used this time, nitky/Llama-3.1-SuperSwallow-70B-Instruct-v0.1, is a Llama 3.1 series model, I used the same chat template as Llama 3.1.

By setting it this way, even if a chat template is not defined in the model, you can define it later.

How to Use Transformers with LangChain

We are almost at the main topic, but please let me make one more preparation.
Up to this point, I have described Transformers and chat templates.

From here, I will describe how to use Transformer models with LangChain in the first place.

When using Hugging Face models with LangChain, there are two ways: using HuggingFaceEndpoint and using HuggingFacePipeline.
https://note.com/npaka/n/nbe332ad7c9f8

However, this time, I will use HuggingFacePipeline. By using this method, you can define models in almost the same way as general usage of the Transformers module.

Therefore, I will modify the sample code for LangChain as follows.

Sample code for using transformers with LangChain
transformers_Langchain.py
# pip install transformers torch accelerate langchain langchain_core langchain-community huggingface_hub langchain-huggingface

from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Model name

model_name = "elyza/Llama-3-ELYZA-JP-8B"

# Load the model
llama = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
)

pipe = pipeline(
    "text-generation", model=llama, tokenizer=tokenizer, temperature = 1.0, do_sample=True, max_new_tokens=1024
)

pipe = HuggingFacePipeline(pipeline=pipe)
llm = ChatHuggingFace(llm=pipe)


messages = [
    ("system","あなたは日本語を話す優秀なアシスタントです。"),
    ("human", "{user_input}")
]

query = ChatPromptTemplate.from_messages(messages)
output_parser = StrOutputParser()

# To remove the input prompt from the output, use the following line instead
#chain = query | llm.bind(skip_prompt=True) | output_parser
chain = query | llm | output_parser
response = chain.invoke({"user_input":"日本で一番高い山は?"})

print(response)

Details are explained below.
For details on how to use LangChain itself, please refer to the referenced books or my past article.
Below, I will explain the definition parts required for use with LangChain.

Model Definition

transformers_Langchain.py
# Load the model
llama = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)

Tokenizer Definition

transformers_Langchain.py
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
)

Defining the Transformers Pipeline

transformers_Langchain.py
pipe = pipeline(
    "text-generation", model=llama, tokenizer=tokenizer, temperature = 1.0, do_sample=True, max_new_tokens=1024
)

Defining the LangChain HuggingFacePipeline
Reconfigure the pipeline defined in Transformers into a LangChain pipeline.

transformers_Langchain.py
pipe = HuggingFacePipeline(pipeline=pipe)

Defining the LangChain ChatHuggingFace

transformers_Langchain.py
llm = ChatHuggingFace(llm=pipe)

Once configured as described above, you can use it just as you would use regular LangChain.
The great thing about LangChain is that common code can be used regardless of the model.

You can use it as follows:

transformers_Langchain.py
messages = [
    ("system","あなたは日本語を話す優秀なアシスタントです。"),
    ("human", "{user_input}")
]

query = ChatPromptTemplate.from_messages(messages)
output_parser = StrOutputParser()

# To remove the input prompt from the output, use the following line instead
#chain = query | llm.bind(skip_prompt=True) | output_parser
chain = query | llm | output_parser
response = chain.invoke({"user_input":"日本で一番高い山は?"})

print(response)

The Pitfall of Manually Setting chat_template in a Transformers Tokenizer

Now, here is the main topic.
The issue is when the chat_template attribute is not defined in the model itself, just as in the case of Transformers.

Let's consider setting it up as follows, similar to what we did for Transformers.

transformers_Langchain.py

... (Previously same as transformers_Langchain.py) ...

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
)

# Set the chat template here
tokenizer.chat_template = """
{%- if custom_tools is defined %}
... (omitted) ...
"""

pipe = pipeline(
    "text-generation", model=llama, tokenizer=tokenizer, temperature = 1.0, do_sample=True, max_new_tokens=1024
)

pipe = HuggingFacePipeline(pipeline=pipe)
llm = ChatHuggingFace(llm=pipe)

... (Hereafter same as transformers_Langchain.py)

If set as above, it should technically be set correctly.
In fact, even if you debug with the following code, you can see that it is properly set.

print(tokenizer.chat_template)
print(llm.llm.pipeline.tokenizer.chat_template)

However, if you execute in this state, the following error is displayed.

File "/home/sagemaker-user/LLM_Evaluation_Elyza/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1785, in get_chat_template
    raise ValueError(
ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

This is an error message indicating that the chat template is not defined.
This part was a total rabbit hole.
Even when the error message was displayed, running the following would correctly show the chat_template, so I couldn't understand why it wasn't being recognized.

print(tokenizer.chat_template)
print(llm.llm.pipeline.tokenizer.chat_template)

Solution

After being stuck in the rabbit hole for quite a while, I found the following issue.
https://github.com/langchain-ai/langchain/issues/26656

Apparently, when manually setting a template different from the default using tokenizer.chat_template, you need to write it like this:

llm = ChatHuggingFace(llm=pipe, tokenizer=pipe.pipeline.tokenizer)

Even though I had set pipe as the llm attribute in ChatHuggingFace, I never imagined that I also had to set pipe.pipeline.tokenizer in the tokenizer attribute.

By setting it this way and running it, I was able to use the LLM without any issues.

In the RAG chapter of the following book, Hugging Face models are also used with LangChain, and a description similar to the solution above was included there. (It was so subtle that I hadn't realized it...)
Introduction to Large Language Models II: Implementation and Evaluation of Generative LLMs

Summary

I wrote this article about something I got quite stuck on.
I intended it to be an error-handling article, but it feels like it turned into one of my usual explanatory articles.
I hope it helps various people.

Thank you for reading this far.

References

Practical Introduction to RAG and AI Agents with LangChain and LangGraph
Practical Introduction to Building Chat Systems with ChatGPT/LangChain
By using LangChain, you can execute all sorts of models with a unified codebase.
Regarding LangChain, I recommend these books as they will help you do most things.

Introduction to Large Language Models II: Implementation and Evaluation of Generative LLMs
Although it's in the RAG chapter, sample code for using Hugging Face models with LangChain is also included.
I regret overlooking it because it was a very detailed point...

https://note.com/npaka/n/nbe332ad7c9f8

https://note.com/tatsuyashirakawa/n/n0aa9169c99d5

https://huggingface.co/docs/transformers/en/chat_templating

(Book links are Amazon affiliate links)

Discussion