基于YOLOv8目标检测与chef-transformer（T5）从图像创建食谱

🕗 发布于 2024-09-26 20:10 YOLO 目标检测 transformer YOLOv8 人工智能

前言

在本文中，将演示如何使用从Roboflow获得的开源产品数据来训练我的YOLOv8模型，然后将其与从Hugging Face获得的chef-transformer（T5）模型集成。应用程序的主要目标是将检测到的对象参数化地发送到语言模型，并在NLP和CV之间建立关系。
在这里插入图片描述

YOLOv8 目标检测

YOLOv8是由ultralytics公司在2023年1月10日开源的一款重量级更新版本，作为YOLOv5的继承者，它支持图像分类、物体检测和实例分割等任务。在开源之前，YOLOv8就已经受到了用户的热切期待。作为一个SOTA（State of the Art）模型，YOLOv8在前代YOLO系列的成功基础上，引入了多项创新，旨在进一步提升模型的性能和灵活性。这些创新包括一个全新的骨干网络、一个无Anchor的检测头，以及一个新型的损失函数，使得模型能够在从CPU到GPU的各种硬件平台上流畅运行。

ultralytics公司并没有将这个开源库命名为YOLOv8，而是选择了“ultralytics”这个名称。这样做的原因是，公司希望将这个库定位为一个算法框架，而不仅仅是一个特定的算法。ultralytics的主要特点是其可扩展性，它不仅能够支持YOLO系列模型，还旨在兼容非YOLO模型，并能够广泛应用于分类、分割、姿态估计等多种任务。

环境安装

conda create -n yolov8 python=3.8
activate ylolv8
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install ultralytics

从Roboflow获取YOLOv8训练数据

Roboflow 是一个资源丰富的平台，拥有超过2亿张图像和20多万个数据集，为训练模型提供了广泛的选择。它能够满足对水果和蔬菜图像数据的需求，以便训练我的模型。此外，Roboflow 提供了 Roboflow Health Check 模块，这是一个功能强大的模块，允许在训练模型之前进行深入分析。这有助于更好地理解所需数据，并确保使用最恰当的数据集。
在这里插入图片描述

导入库并定义需求

import os
import streamlit as st
from ultralytics import YOLO
from IPython.display import display, Image
from langchain import PromptTemplate, HuggingFaceHub, LLMChain

模型推理

使用模型预测测试图像，并将检测到的对象类别存储在变量中。

model = YOLO('C:\\Users\\batuh\\Desktop\\last_pr_best.pt')
results = model.predict(source='test.jpeg', conf=0.50, save = True)
unique_names = set()
names = model.names
for r in results:
    for c in r.boxes.cls:
        unique_names.add(names[int(c)])

print(unique_names)

以这种方式返回唯一类别并将像这样将它们传递给语言模型。

{'Potato', 'Garlic', 'Onion', 'Tomato', 'Green Chili'}

语言模型

pip install transformers

在语言模型中使用检测到的对象

使用Langchain库来训练了超过200万食谱的语言模型。在这里，将的YOLO模型检测到的对象作为文本输入，以便生成菜谱。

template = """Question: {question}: """
prompt = PromptTemplate(template=template, input_variables=["question"])
    
llm_chain = LLMChain(prompt=prompt, 
                     llm=HuggingFaceHub(repo_id="flax-community/t5-recipe-generation", 
                                    model_kwargs={"temperature": 0.3, "max_length": 512}))
    
question = ', '.join(unique_names)
recipe = llm_chain.run(question)

得到了一个不错的炸三角食谱，如下：

'title: samosas ingredients: 1 large potato 1 tbsp minced garlic 1 tbsp minced onion 1 tbsp minced tomato 1 tbsp minced green chili answer the call 1 tbsp do cook. directions: cut the potato into small cubes. add the minced garlic, minced onion, minced tomato, and minced green chili. mix well. heat a griddle or frying pan over medium heat. place the samosas on the griddle and cook until golden brown on both sides.'

使用Streamlit创建界面

使用Streamlit创建了一个交互式界面，Streamlit是一个开源的Python库，可以轻松创建和共享机器学习和数据科学项目的Web应用程序。

import os
import streamlit as st
from ultralytics import YOLO
from IPython.display import display, Image
from langchain import PromptTemplate, HuggingFaceHub, LLMChain

model_path = 'C:\\Users\\batuh\\Desktop\\last_pr_best.pt'


st.set_page_config(
    page_title="Recipe Generator",  
    page_icon="🍳",     
    layout="wide",      
    initial_sidebar_state="expanded"    
)

st.title("Recipe Generator :female-cook:")
st.subheader("Upload an image to generate a recipe for the detected objects.")
st.markdown(
        f"""
        <style>
            body {{
                background-color: darkgreen;
            }}
        </style>
        """,
        unsafe_allow_html=True,)

def predict_objects_and_generate_recipe(image_path, model_path, define_conf):
    os.environ['HUGGINGFACEHUB_API_TOKEN'] = 'Your-Key'
    
    # YOLO model ile nesneleri tanıma
    model = YOLO(model_path)
    results = model.predict(source=image_path, conf=define_conf, save=True)
    names = model.names
    
    unique_names = set()
    for r in results:
        for c in r.boxes.cls:
            unique_names.add(names[int(c)])
    
    return unique_names

def generate_recipe_using_language_model(unique_names):
    template = """Question: {question}: """
    prompt = PromptTemplate(template=template, input_variables=["question"])
    
    llm_chain = LLMChain(prompt=prompt, 
                         llm=HuggingFaceHub(repo_id="flax-community/t5-recipe-generation", 
                                            model_kwargs={"temperature": 0.3, "max_length": 512}))
    
    question = ', '.join(unique_names)
    recipe = llm_chain.run(question)
    
    return recipe

image_file = st.file_uploader("Upload an image", type=["jpg", "png", "jpeg"])
conf_threshold = st.slider("Confidence Threshold", 0.1, 1.0, 0.25)

if image_file is not None:
    
    temp_dir = 'temp'
    os.makedirs(temp_dir, exist_ok=True)  
    
    image_path = os.path.join(temp_dir, image_file.name)  
    with open(image_path, 'wb') as f:
        f.write(image_file.read())
    
    unique_names = predict_objects_and_generate_recipe(image_path, model_path, conf_threshold)
    recipe = generate_recipe_using_language_model(unique_names)
    
    title = recipe.split('ingredients:')[0]
    ingredients = recipe.split('directions:')[0].split('ingredients:')[1].split('answer the call 1 do cook.')[0].split(' ')
    directions = (recipe.split('directions:')[1].split('add salt and pepper to taste.')[0])

    full_recipe = '{}\n\n\nIngredients :\n{}\nDirections :\n{}'.format(
        title.capitalize().title(),
        '\n'.join(['    {}'.format(ingredient).title() for ingredient in ingredients]),
        '    {}'.format(directions).capitalize()
    )
    
    st.image(image_path)
    
    st.markdown("""
                <style>
                .big-font {
                    font-size:60px !important;
                }
                </style>
                """, unsafe_allow_html=True)
    st.write("Recipe:")
    st.write('<p class="big-font">'+full_recipe+'</p>', unsafe_allow_html=True, fontsize = 100)

借助Streamlit库，可以更改置信度阈值，并创建一个漂亮的界面，将上传图像中检测到的对象与我们的语言模型结合起来。
在这里插入图片描述

备注

原本地址：https://medium.com/@batuhansenerr/yolov8-and-chef-transformer-t5-to-create-recipe-from-image-ae10c0e83656

原文地址：https://blog.csdn.net/matt45m/article/details/142556357

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

leetcode289:生命游戏
根据，简称为，是英国数学家约翰·何顿·康威在 1970 年发明的细胞自动机。给定一个包含m × n个格子的面板，每一个格子都可以看成是一个细胞。每个细胞都具有一个初始状态：1即为（live），或0即为
阅读更多2024-10-20
MongoDB数据恢复
注意：两个MongoDB的版本要一致，本文使用的是mongo:4.2.24。先把K8S上面的MongoDB 容器停止（可以把副本改成0）。1、将容器挂载MongoDB的数据目录备份到本地。经常是数据文
阅读更多2024-10-20
C#中实现事务
C#中实现事务
阅读更多2024-10-20
【LeetCode每日一题】——560.和为 K 的子数组
给你一个整数数组 nums 和一个整数 k ，请你统计并返回该数组中和为 k 的子数组的个数。子数组是数组中元素的连续非空序列。
阅读更多2024-10-20
「漏洞复现」满客宝智慧食堂系统 selectUserByOrgId 未授权访问漏洞
请勿利用文章内的相关技术从事非法测试，由于传播、利用此文所提供的信息而造成的任何直接或者间接的后果及损失，均由使用者本人负责，作者不为此承担任何责任。工具来自网络，安全性自测，如有侵权请联系删除。本次
阅读更多2024-10-20
React面试题目（从基本到高级）
React前端面试常见题目涵盖了React的基础概念、组件、状态管理、生命周期、性能优化等多个方面。
阅读更多2024-10-20
12.个人博客系统（Java项目基于spring和vue）
1 在校学习的学生，可用于日常学习使用或是毕业设计使用 2 毕业一到两年的开发人员，用于锻炼自己的独立功能模块设计能力，增强代码编写能力。 3 亦可以部署为商化项目使用。 4 需要完整资料及源码
阅读更多2024-10-20
YoloV8改进策略：注意力改进|DeBiFormer，可变形双级路由注意力|引入DeBiLevelRoutingAttention注意力模块（全网首发）
本次改进的核心在于将DeBiLevelRoutingAttention模块嵌入到YoloV8的主干网络中，具体位置是在SPPF（Spatial Pyramid Pooling Fast）模块之后。这一
阅读更多2024-10-20
word取消自动单词首字母大写
情况说明：在word输入单词后首字母会自动变成大写取消单词首字母大写步骤：（1）点击菜单栏文件（2）点击“更多”——>“选项”（3）点击“校对”——>“自动更正选项”（4）取消“句首字母大
阅读更多2024-10-20
web前端网页用户注册页面
【代码】web前端网页用户注册页面。
阅读更多2024-10-20