How do I increase max_new_tokens
题意:怎样增加 max_new_tokens
的值
问题背景:
I'm facing this error while running my code: 运行代码时遇到如下错误:
ValueError: Input length of input_ids is 1495, but
max_length
is set to 20. This can lead to unexpected behavior. You should consider increasingmax_length
or, better yet, settingmax_new_tokens
.
ValueError
:input_ids
的输入长度为 1495,但max_length
被设置为 20。这可能导致意外的行为。你应该考虑增加max_length
的值,或者更好的是,设置max_new_tokens
。
I wanted the code to generate the query instead it says about the max length issue as basically I am using 8 bit quantized llama using vector embedding to develop a RAG chat bot
我希望代码能够生成查询,但它却提示了最大长度问题。实际上,我正在使用8位量化的llama模型,并利用向量嵌入来开发一个基于检索的增强型生成(RAG)聊天机器人
import os
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from transformers import BitsAndBytesConfig, AutoTokenizer
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
from time import time
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
# Load PDF and split into chunks
def split_doc(file, chunk_size, chunk_overlap):
text_splitter = CharacterTextSplitter(
separator="\n\n",
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
length_function=len
)
return text_splitter.split_documents(file)
loader = PyPDFLoader("/kaggle/input/report/report.pdf")
pages = loader.load()
docs = split_doc(pages, 700, 450)
# Configure and load the model
model_name = "NousResearch/Llama-2-7b-chat-hf"
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
llm = HuggingFacePipeline.from_model_id(
model_id=model_name,
task="text-generation",
model_kwargs={"trust_remote_code": True, "quantization_config": bnb_config}
)
chat_model = ChatHuggingFace(llm=llm)
# Set up embeddings and vector store
embedding_model_name = "sentence-transformers/all-mpnet-base-v2"
embedding_model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name=embedding_model_name, model_kwargs=embedding_model_kwargs)
vectordb = Chroma.from_documents(documents=docs, embedding=embeddings, persist_directory="chroma_db")
retriever = vectordb.as_retriever()
# Set up the QA system
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
verbose=True
)
# Define the testing function
def test_rag(qa, query):
print(f"Query: {query}\n")
result = qa.run(query)
print("\nResult: ", result)
query = "What were the main topics in this "
test_rag(qa, query)# get stuck here
问题解决:
thanks it helped i added the following details: using the pipeline_kwargs in huggingface.py file i was able to find the variable i could use although using this method will render the quantization method a bit useless as you will consume more memory upon increasing the tokens
“谢谢,这很有帮助。我添加了以下细节:通过在huggingface.py
文件中使用pipeline_kwargs
,我能够找到我可以使用的变量。然而,使用这种方法会使量化方法变得有点无用,因为当你增加tokens时,会消耗更多的内存。
llm = HuggingFacePipeline.from_model_id(
model_id=model_name,
task="text-generation",
model_kwargs={
"trust_remote_code": True,
"quantization_config": bnb_config,
"use_auth_token": auth_token
},
pipeline_kwargs={"max_new_tokens": 8096}# this part is how i reconfigured the tokens
)
原文地址:https://blog.csdn.net/suiusoar/article/details/140667943
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!