How do I increase max_new_tokens

🕗 发布于 2024-07-24 18:46 ai huggingface transformer langchain tokenizer

题意：怎样增加 max_new_tokens 的值

问题背景：

I'm facing this error while running my code: 运行代码时遇到如下错误：

ValueError: Input length of input_ids is 1495, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.

ValueError：input_ids 的输入长度为 1495，但 max_length 被设置为 20。这可能导致意外的行为。你应该考虑增加 max_length 的值，或者更好的是，设置 max_new_tokens。

I wanted the code to generate the query instead it says about the max length issue as basically I am using 8 bit quantized llama using vector embedding to develop a RAG chat bot

我希望代码能够生成查询，但它却提示了最大长度问题。实际上，我正在使用8位量化的llama模型，并利用向量嵌入来开发一个基于检索的增强型生成（RAG）聊天机器人

import os
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from transformers import BitsAndBytesConfig, AutoTokenizer
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
from time import time
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA

# Load PDF and split into chunks
def split_doc(file, chunk_size, chunk_overlap):
    text_splitter = CharacterTextSplitter(
        separator="\n\n",
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len
    )
    return text_splitter.split_documents(file)

loader = PyPDFLoader("/kaggle/input/report/report.pdf")
pages = loader.load()
docs = split_doc(pages, 700, 450)

# Configure and load the model
model_name = "NousResearch/Llama-2-7b-chat-hf"
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
llm = HuggingFacePipeline.from_model_id(
    model_id=model_name,
    task="text-generation",
    model_kwargs={"trust_remote_code": True, "quantization_config": bnb_config}
)
chat_model = ChatHuggingFace(llm=llm)

# Set up embeddings and vector store
embedding_model_name = "sentence-transformers/all-mpnet-base-v2"
embedding_model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name=embedding_model_name, model_kwargs=embedding_model_kwargs)
vectordb = Chroma.from_documents(documents=docs, embedding=embeddings, persist_directory="chroma_db")
retriever = vectordb.as_retriever()

# Set up the QA system
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
)

# Define the testing function
def test_rag(qa, query):
    print(f"Query: {query}\n")
    result = qa.run(query)
    print("\nResult: ", result)

query = "What were the main topics in this "
test_rag(qa, query)# get stuck here

问题解决：

thanks it helped i added the following details: using the pipeline_kwargs in huggingface.py file i was able to find the variable i could use although using this method will render the quantization method a bit useless as you will consume more memory upon increasing the tokens

“谢谢，这很有帮助。我添加了以下细节：通过在huggingface.py文件中使用pipeline_kwargs，我能够找到我可以使用的变量。然而，使用这种方法会使量化方法变得有点无用，因为当你增加tokens时，会消耗更多的内存。

llm = HuggingFacePipeline.from_model_id(
    model_id=model_name,
    task="text-generation",
    model_kwargs={
        "trust_remote_code": True,
        "quantization_config": bnb_config,
        "use_auth_token": auth_token
    },
    pipeline_kwargs={"max_new_tokens": 8096}# this part is how i reconfigured the tokens

)

原文地址：https://blog.csdn.net/suiusoar/article/details/140667943

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

Web性能优化：从基础到高级
然而，要充分发挥性能优化的潜力，还需要持续监测和逐步优化，确保每一步都符合用户体验的要求。企业级应用通常包含复杂的业务逻辑和大量的数据交互，通过优化 CSS 和 JavaScript，避免阻塞渲染，可
阅读更多2024-11-15
HTML5+CSS前端开发【保姆级教学】＋前端介绍和软件安装
前端开发主要涉及网站和 App，用户能够从 App 屏幕或浏览器上看到东西。能够从 App 屏幕和浏览器上看到的东西都属于前端。文章适合计算机小白，大佬请绕行！
阅读更多2024-11-15
群控系统服务端开发模式-应用开发-前端角色功能开发
群控系统服务端开发模式-应用开发-前端角色功能开发
阅读更多2024-11-15
自定义反序列化过程
需求：student对象中name属性，序列化时将该属性映射为stuname，反序列化时将 Json中的NAME键值对映射到name属性中。
阅读更多2024-11-15
界面控件DevExpress WPF中文教程：TreeList视图及创建分配视图
本文主要介绍DevExpress WPF数据网格组件的TreeList视图及如何创建和分配视图教程，欢迎下载最新版组件体验！
阅读更多2024-11-15
微波无源器件 OMT1 一种用于倍频程接收机前端的十字转门四脊正交模耦合器(24-51GHz)
我们报道了一种用于天文学射电望远镜的毫米波波长接收机的一种十字转门四脊OMT的设计，制造和实测结果。此四脊OMT被直接兼容到一个四脊馈电喇叭来实现可以拓展矩形波导单模带宽的双极化低噪声接收机。使用了2
阅读更多2024-11-15
实战：深入探讨 MySQL 和 SQL Server 全文索引的使用及其弊端
MySQL 中的全文索引自 5.6 版本开始支持InnoDB引擎（在此之前，仅支持MyISAM引擎）。全文索引主要适用于CHARVARCHAR和TEXT类型字段，并提供了的查询方式，可以选择不同的查询
阅读更多2024-11-15
前端 - 使用uniapp+vue搭建前端项目（app端）
前端 - 使用uniapp+vue搭建前端项目（app端）
阅读更多2024-11-15
NFS存储基础操作
NFS 挂载主机在网络断开后卡住通常是由于默认的 NFS 挂载选项导致的。为了避免这种情况，可以使用特定的挂载选项来确保在 NFS 服务器不可用时主机不会卡住。在windows 启用和关闭Window
阅读更多2024-11-15
SpringCloud OpenFeign负载均衡远程调用跨服务调用连接池优化
Spring Cloud OpenFeign 是 Spring Cloud 的一部分，提供了一种声明式的 HTTP 客户端方式来简化服务间的通信。通过 OpenFeign，开发者可以像调用本地方法一样
阅读更多2024-11-15

How do I increase max_new_tokens

问题背景：

问题解决：

相关文章