NPU流式输出-torch_npu和transformers框架-多线程Streamer-昇腾910B-EE1001

🕗 发布于 2024-04-18 15:25 人工智能 NPU 昇腾

前情提要

torch_npu框架不支持多线程自动set_device

报错详情

直接使用transformers的TextIteratorStreamer进行流式推理，会报错

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/root/anaconda3/envs/AI/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/root/anaconda3/envs/AI/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/root/anaconda3/envs/AI/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/AI/lib/python3.9/site-packages/transformers/generation/utils.py", line 1403, in generate
    and torch.sum(inputs_tensor[:, -1] == generation_config.pad_token_id) > 0
RuntimeError: getDevice:torch_npu/csrc/aten/common/CopyKernel.cpp:41 NPU error, error code is 107002
[Error]: The context is empty.
        Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
        Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
        TraceBack (most recent call last):
        ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4686]
        The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

设置好generation_config后，报错变为

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/root/anaconda3/envs/sakura/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/root/anaconda3/envs/sakura/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 1411, in generate
    streamer.put(input_ids.cpu())
RuntimeError: getDevice:torch_npu/csrc/aten/common/CopyKernel.cpp:41 NPU error, error code is 107002
[Error]: The context is empty.
        Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
        Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
        TraceBack (most recent call last):
        ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4686]
        The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

为此咨询了transformers官方人员，issues-23042，但是他们无法处理
在这里插入图片描述

后来经过不断debug发现在threading.py的Thread函数中，执行run函数后self._kwargs中的参数均未传递成功
在这里插入图片描述
询问了torch_npu的官方人员

解决方案

在Thread函数中的target传入set_device
完整代码如下，以chatglm3-6b为例

import torch
import torch_npu
from torch_npu.contrib import transfer_to_npu
torch_device = "npu:3"
torch.npu.set_device(torch.device(torch_device))
torch.npu.set_compile_mode(jit_compile=False)
option = {}
option["NPU_FUZZY_COMPILE_BLACKLIST"] = "Tril"
torch.npu.set_option(option)
import os
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, AutoModel
from transformers import TextIteratorStreamer
from threading import Thread
 
model_path = "/root/.cache/modelscope/hub/ZhipuAI/chatglm3-6b" 
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map=torch_device)
model = model.eval()
 

def generate_with_npu_device(**generation_kwargs):
    torch.npu.set_device(torch.device(torch_device))
    model.generate(**generation_kwargs)


if __name__ == "__main__":
     # TextIteratorStreamer实现
    streamer = TextIteratorStreamer(tokenizer)
    turn_count = 0
    while True:
        query = input("\n用户：")
        if query.strip() == "stop":
            break
        inputs = tokenizer([query], return_tensors="pt")
        input_ids = inputs["input_ids"].to(torch_device)
        attention_mask = inputs["attention_mask"].to(torch_device)
        generation_kwargs = dict(input_ids=input_ids,
            attention_mask=attention_mask, 
            streamer=streamer, 
            max_new_tokens=512)
        thread = Thread(target=generate_with_npu_device, kwargs=generation_kwargs)
        thread.start()
        generated_text = ""
        position = 0
        # 流式输出
        for new_text in streamer:
            generated_text += new_text
            print(generated_text[position:], end='', flush=True)
            position = len(generated_text)

原文地址：https://blog.csdn.net/weixin_46398647/article/details/137772338

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：什么是显卡服务器？
下一篇：Injection Mold Factory Processing Mold Application Field

单片机和FPGA有什么区别？
总的来说，选择单片机还是FPGA取决于具体的应用需求、成本预算、开发资源和性能要求。单片机更适合成本敏感和性能要求不高的应用，而FPGA则适用于需要高度定制化和高性能的应用。
阅读更多2024-11-15
离线语音识别自定义功能怎么用？
自学习功能是指终端用户可以通过语音输入的方式学习客户词条，来自定义唤醒词和命令词。设备默认可以通过“开灯”执行打开灯的动作，用户通过语音输入学习了“开一下灯”，则可以通过“开一下灯”的说法来执行打开灯
阅读更多2024-11-15
PPT技巧：如何合并PPT文件？
如何合并PPT文件？
阅读更多2024-11-15
Unity3D高级编程
本文是unity3d编程的核心内容，包括了多个知识点以及C#代码实现
阅读更多2024-11-15
HOW - PPT 制作系列（一）
注意以上几点，可以让一页PPT既美观又高效地传达信息。
阅读更多2024-11-15
不仅能够实现前后场的简单互动，而且能够实现人机结合，最终实现整个巡检流程的标准化的智慧园区开源了
智慧园区场景视频监控平台是一款功能强大且简单易用的实时算法视频监控系统。它的愿景是最底层打通各大芯片厂商相互间的壁垒，省去繁琐重复的适配流程，实现芯片、算法、应用的全流程组合，从而大大减少企业级应用约
阅读更多2024-11-15
云原生后端
一、背景与概念二、关键技术三、优势四、应用场景
阅读更多2024-11-15
云原生学习
云原生学习：介绍、Docker容器化
阅读更多2024-11-15
气膜球幕展览馆：开启元宇宙时代的沉浸式科技体验—轻空间
球幕结构能够包裹观众的全部视野，在这里，每一幅画面都经过精心调校，色彩真实、细节丰富，使得场景的还原度达到一个全新的高度。这样独特的沉浸感，是传统展览馆所无法比拟的，气膜球幕展览馆让每一位参观者都仿佛
阅读更多2024-11-15
Istio分布式链路监控搭建：Jaeger与Zipkin
Jaeger是由Uber开源的分布式追踪系统，它采用Go语言编写，主要借鉴了Google Dapper论文和Zipkin的设计，兼容OpenTracing以及Zipkin追踪格式，目前已经成为CNCF
阅读更多2024-11-15

NPU流式输出-torch_npu和transformers框架-多线程Streamer-昇腾910B-EE1001

前情提要

报错详情

解决方案

相关文章