vllm启动大语言模型时指定chat_template

🕗 发布于 2024-10-15 01:07 语言模型 人工智能 自然语言处理大语言模型 VLLM

问题介绍

在Linux下启动vllm：

python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0   --model  /model/Baichuan2-7B-Chat --trust-remote-code    --gpu-memory-utilization 0.80

使用下面的命令测试出错：

curl -X 'POST' \
  'http://127.0.0.1:8000/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "/model/Baichuan2-7B-Chat",
    "messages": [
        {
            "role": "system",
            "content": "你是我的小助理"
        },
        {
            "role": "user",
            "content": "告诉我你是谁"
        }
    ],
    "max_tokens": 512
  }'

返回的信息为：

{
    "object": "error",
    "message": "Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating",
    "type": "BadRequestError",
    "param": null,
    "code": 400
}

问题分析

上面的返回信息可知，是没有指定chat template引起的。

从那里获取chat template的内容呢？我是从https://github.com/vllm-project/vllm/blob/main/examples/template_llava.jinja获取的，测试了下可以用。
其内容如下：

{%- if messages[0]['role'] == 'system' -%}
    {%- set system_message = messages[0]['content'] -%}
    {%- set messages = messages[1:] -%}
{%- else -%}
    {% set system_message = '' -%}
{%- endif -%}

{{ bos_token + system_message }}
{%- for message in messages -%}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
    {%- endif -%}

    {%- if message['role'] == 'user' -%}
        {{ 'USER: ' + message['content'] + '\n' }}
    {%- elif message['role'] == 'assistant' -%}
        {{ 'ASSISTANT: ' + message['content'] + eos_token + '\n' }}
    {%- endif -%}
{%- endfor -%}

{%- if add_generation_prompt -%}
    {{ 'ASSISTANT:' }}
{% endif %}

解决方法有三种，下面一一介绍。

解决问题

方案1：在模型的tokenizer_config.json中增加一个chat_template字段

{
.....
#老的内容不动，在文件中新增一个chat_template
"chat_template":"{%- if messages[0]['role'] == 'system' -%}    {%- set system_message = messages[0]['content'] -%}    {%- set messages = messages[1:] -%}{%- else -%}    {% set system_message = '' -%}{%- endif -%}{{ bos_token + system_message }}{%- for message in messages -%}    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}    {%- endif -%}    {%- if message['role'] == 'user' -%}        {{ 'USER: ' + message['content'] + '\n' }}    {%- elif message['role'] == 'assistant' -%}        {{ 'ASSISTANT: ' + message['content'] + eos_token + '\n' }}    {%- endif -%}{%- endfor -%}{%- if add_generation_prompt -%}    {{ 'ASSISTANT:' }} {% endif %}"
}

方案2：在启动vllm时指定chat_template的所有内容（–chat_template）

python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0   --model  /model/Baichuan2-7B-Chat --trust-remote-code    --gpu-memory-utilization 0.9  --chat_template "{%- if messages[0]['role'] == 'system' -%}    {%- set system_message = messages[0]['content'] -%}    {%- set messages = messages[1:] -%}{%- else -%}    {% set system_message = '' -%}{%- endif -%}{{ bos_token + system_message }}{%- for message in messages -%}    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}    {%- endif -%}    {%- if message['role'] == 'user' -%}        {{ 'USER: ' + message['content'] + '\n' }}    {%- elif message['role'] == 'assistant' -%}        {{ 'ASSISTANT: ' + message['content'] + eos_token + '\n' }}    {%- endif -%}{%- endfor -%}{%- if add_generation_prompt -%}    {{ 'ASSISTANT:' }} {% endif %}"

方案3：在启动vllm时指定chat_template的所在文件（–chat_template）

python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0   --model  /model/Baichuan2-7B-Chat --trust-remote-code    --gpu-memory-utilization 0.9  --chat_template ./template_llava.jinja

测试

测试命令

curl -X 'POST' \
  'http://127.0.0.1:8000/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "/model/Baichuan2-7B-Chat",
    "messages": [
        {
            "role": "system",
            "content": "你是我的小助理"
        },
        {
            "role": "user",
            "content": "告诉我你是谁"
        }
    ],
    "max_tokens": 512
  }'

则返回

{"id":"chat-15c280f5f54e4128abaeec95daf32e39","object":"chat.completion","created":1728906010,"model":"/model/Baichuan2-7B-Chat","choices":[{"index":0,"message":{"role":"assistant","content":"我是一个聊天机器人，USER，可以帮助你解决问题、提供建议、回答问题等。请随时向我提问，我会尽力帮助你。","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":15,"total_tokens":41,"completion_tokens":26}}

参考资料

vllm quickstart.html

https://github.com/vllm-project/vllm/blob/main/examples/template_llava.jinja

原文地址：https://blog.csdn.net/yuanlulu/article/details/142929234

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：向量的外积
下一篇：C++基础之值(Value)的类别

特斯拉智驾路线影响国内OEM组织架构变革，Robotaxi重塑汽车定位搅动风云
其实对于这样的产品推出的时间线我们还是比较乐观的，我们期望很快能够实现完全的无人监督FSD，明年在德州和加州，我们希望推出完全的无人监督的FSD，很明显，Model 3和Model Y是沿着这样的道路
阅读更多2024-10-15
OpeneBayes 教程上新 | 打败 GPT-4V？超强开源多模态大模型 LLaVA-OneVision 正式上线！
视频还包括对运动员脸部的特写，展示了他们的专注和决心。近期，来自字节跳动、南洋理工大学、香港中文大学和香港科技大学的研究人员共同开源了 LLaVA-OneVision 多模态大模型，该模型在单图像、多
阅读更多2024-10-15
Windows server 2019的安装
图1-2 在有空间的磁盘上创建一个目录用于指定安装Windows server 2019。图1-8 安装完毕后重启，目前是在安装VMware tools虚拟工具。图1-3 给虚拟机命名/改名,将安装的
阅读更多2024-10-15
nacos使用需注意的问题
spring:cloud:nacos:#config:discovery:
阅读更多2024-10-15
python 桌面程序开发
功能描述：编写带UI界面的桌面程序，读取终端设备历史轨迹数据，采用多线程高并发，模拟终端设备实时定位发送，检测服务端程序的性能。6.生成的执行文件在dist目录中，需要生成linux、麒麟环境中运行程
阅读更多2024-10-15
Spring Boot 核心理解-自动装配
springBoot重新梳理和学习。为了面试。加油。。。。。
阅读更多2024-10-15
时间序列预测（四）——损失函数（Lossfunction）
在 BCE 的基础上直接对模型输出进行 Sigmoid 操作，适合未经 Sigmoid 的原始输出（logits）。适用场景：二分类任务，适合直接使用模型的输出值（未经过 Sigmoid 激活的 lo
阅读更多2024-10-15
模块化沙箱：构建零信任架构的关键技术
政府机构可以采用零信任结合模块化沙箱的组合，形成一机两用的解决方案，确保业务工作的流畅性和安全性。国企央企和科研企业也可以通过使用模块化反向沙箱和重型沙箱，确保安全上网、数据防泄漏和防病毒，以及研发调
阅读更多2024-10-15
基础篇:带你打开Vue的大门（二）
本文将详细介绍Vue.js中常用的指令和功能，包括条件渲染（v-if、v-else-if、v-else和v-show）、列表渲染（v-for）、属性绑定和事件处理、计算属性和侦听器、Class与Sty
阅读更多2024-10-15
一文读懂何为高内聚低耦合
耦合程度越高，模块之间的依赖性越强，意味着更改一个模块时可能会带来连锁反应，影响到其他模块的功能和行为。为了实现高内聚、低耦合，每个子任务都可以封装在独立的模块中，彼此之间通过接口进行通信，避免模块之
阅读更多2024-10-15