带函数的多代理 AutoGen - 使用代码示例分步

🕗 发布于 2024-04-13 16:25 microsoft windows

一、说明

AutoGen 是一个开创性的多智能体对话框架，彻底改变了基础模型的使用方式。这个创新平台具有多功能、可对话的座席，能够通过自动座席聊天集成大型语言模型（LLM）、工具和人类见解。AutoGen 的独特方法不仅简化了复杂的 LLM 工作流程，而且还最大限度地提高了其性能，标志着下一代 LLM 应用程序开发的飞跃。

在这里插入图片描述

二、AutoGen 的本质：可对话和可定制的代理

AutoGen 的核心是其代理，其设计考虑了对话。这些智能体可以无缝交换信息，通过智能体间对话为协作解决任务铺平道路。这些代理可定制以合并 LLM、人工输入或两者兼而有之，体现了灵活性和适应性。该框架具有内置代理，如 AssistantAgent 和 UserProxyAgent，每个代理都具有独特的功能。利用 LLM 的 AssistantAgent 可以自主生成 Python 代码和建议，而充当人工代理的 UserProxyAgent 可以在必要时执行代码并触发基于 LLM 的响应。

2.1 简化任务自动化和人机交互

AutoGen的出色之处在于它能够自动执行多智能体通信，同时保持人为干预的选项。这种双重方法确保了任务得到有效处理，无论是通过自主代理交互还是在人工指导下。该框架的工具调用功能进一步提高了其效率，允许代理有效地与外部工具和 API 进行交互。

2.2 赋能动态对话

AutoGen 的多功能性扩展到支持各种对话模式，从完全自主的对话到人机交互问题解决。这种适应性在需要动态响应和复杂问题解决策略的应用中至关重要。

三、分步代码示例

在我们即将推出的用例中，我们将探索 Whisper 和 GPT-4 与 AutoGen 的 AssistantAgent 和 UserProxyAgent 的集成。本演示将重点处理视频文件，我们的目标是使用 Whisper 识别和转录口语。随后，我们将利用 GPT-4 的翻译功能将转录转换为不同的语言。最终目标是生成带有时间戳的字幕，反映创建字幕文件的过程，展示 AutoGen 在媒体翻译和可访问性增强方面的实际应用。

在这里插入图片描述

3.1 第 1 步：安装必要的库

首先，我们需要确保我们的环境具有所有必要的 Python 库。以下是在笔记本中运行以安装每个命令的命令：

openai：这是 OpenAI 的官方库，允许我们与他们的 API 服务进行交互。
openai-whisper：用于音频处理的专用库，专门使用 OpenAI 的 Whisper 模型进行转录。
moviepy：用于视频处理的多功能库，支持视频编辑任务。
pyautogen：AutoGen 的核心库，可促进多智能体对话式 AI。
若要安装这些库，请在笔记本中运行以下命令：

%%capture --no-stderr
!pip install moviepy~=1.0.3
# !pip install openai-whisper~=20230918
!pip install openai-whisper
!pip install openai~=1.3.5
!pip install "pyautogen>=0.2.3"

3.2 第 2 步：设置 API 终端节点

使用 API 密钥时，安全性和便利性至关重要。若要安全地设置 OpenAI API 密钥，请在笔记本中使用以下代码。替换为您实际的 OpenAI API 密钥：‘your-api-key’

import os
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
%env

以这种方式设置 API 密钥允许您的代码与 OpenAI 的服务无缝进行身份验证，以进行任何后续 API 调用，例如转录视频音频或翻译文本。
请记住，代码片段中提供的 API 密钥是一个占位符。在运行笔记本之前，应将其替换为实际的 OpenAI API 密钥。

3.3 第 3 步：导入库并设置配置

现在我们的环境已经设置好了，下一步是导入所需的库，并为我们的任务配置它们：

whisper对于音频转录，
moviepy.editor要处理视频文件，
openai访问 OpenAI 功能，
autogen作为我们的对话式 AI 框架。
下面是执行此操作的代码：

import os

import whisper
from moviepy.editor import VideoFileClip
from openai import OpenAI

import autogen
config_list = [
    {
        "model": "gpt-4",
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
]

   通过导入这些库，我们使我们的笔记本能够转录和处理视频内容，并与 OpenAI 强大的语言模型进行通信。这一点至关重要，因为它指定了要使用的模型（在本例中）以及在哪里查找 API 密钥（我们在上一步的环境变量中设置了该密钥）。config_listgpt-4

    通过此设置，我们现在可以开始处理视频和音频内容了。

3.4 第 4 步：创建转录和翻译的函数和配置

让我们深入研究应用程序的核心功能。我们将定义两个基本函数：

1 recognize_transcript_from_video：此函数获取从视频中提取的音频文件的路径，并使用 Whisper 模型将音频转录为文本。它处理音频片段以构建带有时间戳的转录本。
2 translate_text：该函数将转录后的文本发送到OpenAI的API，具体使用gpt-4模型，请求从源语言直接翻译到目标语言，适用于视频字幕。
我们的 AutoGen 代理、聊天机器人和 user_proxy 的配置是通过 llm_config 字典建立的。在这里，我们定义代理将使用的函数并设置通信参数，例如系统消息和代码执行配置。

最后，我们启动 user_proxy 和聊天机器人之间的聊天，指示它们识别给定视频文件中的语音并将文本翻译成所需的语言。
在这里插入图片描述
提供的代码将是这些操作的框架，您可以根据特定要求对其进行扩展和自定义。下面是初始化和聊天启动的示例：

def recognize_transcript_from_video(audio_filepath):
    try:
        # Load model
        model = whisper.load_model("small")

        # Transcribe audio with detailed timestamps
        result = model.transcribe(audio_filepath, verbose=True)

        # Initialize variables for transcript
        transcript = []
        sentence = ""
        start_time = 0

        # Iterate through the segments in the result
        for segment in result["segments"]:
            # If new sentence starts, save the previous one and reset variables
            if segment["start"] != start_time and sentence:
                transcript.append(
                    {
                        "sentence": sentence.strip() + ".",
                        "timestamp_start": start_time,
                        "timestamp_end": segment["start"],
                    }
                )
                sentence = ""
                start_time = segment["start"]

            # Add the word to the current sentence
            sentence += segment["text"] + " "

        # Add the final sentence
        if sentence:
            transcript.append(
                {
                    "sentence": sentence.strip() + ".",
                    "timestamp_start": start_time,
                    "timestamp_end": result["segments"][-1]["end"],
                }
            )

        # Save the transcript to a file
        with open("transcription.txt", "w") as file:
            for item in transcript:
                sentence = item["sentence"]
                start_time, end_time = item["timestamp_start"], item["timestamp_end"]
                file.write(f"{start_time}s to {end_time}s: {sentence}\n")

        return transcript

    except FileNotFoundError:
        return "The specified audio file could not be found."
    except Exception as e:
        return f"An unexpected error occurred: {str(e)}"

def translate_text(input_text, source_language, target_language):
    client = OpenAI(api_key=key)

    response = client.chat.completions.create(
#        model="gpt-3.5-turbo",
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {
                "role": "user",
                "content": f"Directly translate the following {source_language} text to a pure {target_language} "
                f"video subtitle text without additional explanation.: '{input_text}'",
            },
        ],
        max_tokens=1500,
    )

    # Correctly accessing the response content
    translated_text = response.choices[0].message.content if response.choices else None
    return translated_text


def translate_transcript(source_language, target_language):
    with open("transcription.txt", "r") as f:
        lines = f.readlines()

    translated_transcript = []

    for line in lines:
        # Split each line into timestamp and text parts
        parts = line.strip().split(": ")
        if len(parts) == 2:
            timestamp, text = parts[0], parts[1]
            # Translate only the text part
            translated_text = translate_text(text, source_language, target_language)
            # Reconstruct the line with the translated text and the preserved timestamp
            translated_line = f"{timestamp}: {translated_text}"
            translated_transcript.append(translated_line)
        else:
            # If the line doesn't contain a timestamp, add it as is
            translated_transcript.append(line.strip())

    return "\n".join(translated_transcript)


llm_config = {
    "functions": [
        {
            "name": "recognize_transcript_from_video",
            "description": "recognize the speech from video and transfer into a txt file",
            "parameters": {
                "type": "object",
                "properties": {
                    "audio_filepath": {
                        "type": "string",
                        "description": "path of the video file",
                    }
                },
                "required": ["audio_filepath"],
            },
        },
        {
            "name": "translate_transcript",
            "description": "using translate_text function to translate the script",
            "parameters": {
                "type": "object",
                "properties": {
                    "source_language": {
                        "type": "string",
                        "description": "source language",
                    },
                    "target_language": {
                        "type": "string",
                        "description": "target language",
                    },
                },
                "required": ["source_language", "target_language"],
            },
        },
    ],
    "config_list": config_list,
    "timeout": 120,
}
source_language = "English"
target_language = "Spanish"
key = os.getenv("OPENAI_API_KEY")
target_video = "/content/LiquidSyllabus.mp4"

chatbot = autogen.AssistantAgent(
    name="chatbot",
    system_message="For coding tasks, only use the functions you have been provided with. Reply TERMINATE when the task is done.",
    llm_config=llm_config,
)

user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config={
        "work_dir": "coding_2",
        "use_docker": False,
    },  # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.
)

user_proxy.register_function(
    function_map={
        "recognize_transcript_from_video": recognize_transcript_from_video,
        "translate_transcript": translate_transcript,
    }
)
user_proxy.initiate_chat(
    chatbot,
    message=f"For the video located in {target_video}, recognize the speech and transfer it into a script file, "
    f"then translate from {source_language} text to a {target_language} video subtitle text. ",
)

在这里插入图片描述

四、结论

总之，我们开始的旅程展示了 AutoGen 多智能体对话框架的强大功能，利用尖端的 AI 将原始视频内容转换为可访问和翻译的字幕。通过利用 Whisper 进行转录，利用 GPT-4 进行翻译，我们弥合了不同语言和媒介之间的差距，展示了 AI 的实际和变革性应用。此用例说明了用户代理和助手等代理如何协同工作以自动执行复杂任务，为内容本地化和可访问性方面的创新解决方案铺平了道路。随着人工智能的不断发展，这种技术使全球通信更具包容性和互联性的潜力确实是无限的。

原文地址：https://blog.csdn.net/gongdiwudu/article/details/137713964

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：精品资料-2024护网HVV实战教程资料合集（共20章）
下一篇：深入理解中文编码：原理、应用与实践

《生成式 AI》课程第3講 CODE TASK执行文章摘要的机器人
2.设计一个提示符，使语言模型能够对文章进行总结。1.我们希望你创建一个可以执行文章摘要的机器人。
阅读更多2024-11-17
《生成式 AI》课程第3講 CODE TASK 任务2:角色扮演的机器人
我们希望你设计一个机器人服务，你可以用LM玩角色扮演游戏。与LM进行多轮对话提示:告诉聊天机器人扮演任意角色。后续输入:与聊天机器人交互。
阅读更多2024-11-17
存在重复元素 II
判断数组中是否存在两个。
阅读更多2024-11-17
排列问题方法总结（递归+迭代）
这个代码主要就是讲的是逐步生成结果，然后它主要就是利用了一个递归的思想。首先就是先假设我求出来了前 n -1 个数的排列，然后我作为老板我只需要去排列第 n 个数。它的排法一共有三种，首先就是可以
阅读更多2024-11-17
教资考试题目
政治要强”、“情怀要深”、“视野要广”和（ BCD）不仅仅是对全国思政课教师的要求，也是广大教师强化师德修养、践行使命担当的行动指南。课程评价应将教师和学生在课程开发、实施以及教学过程中的全部情况都纳
阅读更多2024-11-17
【更新至2023】A股上市公司企业突破性创新、渐进性创新数据（2000-2023年）
参考C刊《财经问题研究》胡山（2022）老师的研究，用当年获得授权的发明专利数量加 1 后取自然对数来衡量企业突破性创新 ( Invention);用非发明专利 ( 包括实用新型专利和外观设计专利)
阅读更多2024-11-17
Stable Diffusion Hypernetwork Embedding
本节课程我们讲述了另外两种控制图像输出特征和风格的方法---embedding和hypernetwork，embedding在实践中会经常使用到，尤其在反向提示词中，我们会经常使用一些embeddin
阅读更多2024-11-17
深度学习在边缘检测中的应用及代码分析
边缘通常是指图像中像素灰度值发生急剧变化的地方，这些变化可以是由于物体与背景之间的灰度差异、物体不同表面的灰度差异等原因造成的。从数学角度来看，边缘可以看作是图像灰度函数的不连续点或其导数的极值点。
阅读更多2024-11-17
Vulnhub靶场案例渗透[10]- Momentum2
将文件下载,分析对应代码逻辑，发现请求中包含指定cookie键值对和secure参数就能上传php文件了，同时代码中提示实际cookie这个文件中指定字符串多一位大写字符串在末尾。目录下，经过上传一个
阅读更多2024-11-17
企业网络链路聚合、数据抓包、远程连接访问实验
随着信息技术的飞速发展和企业业务的不断扩大，企业网络面临着越来越多的挑战。其中，网络带宽、数据安全和远程访问等问题尤为突出。为了解决这些问题，我们进行了本次企业网络链路聚合、数据抓包和远程连接访问的实
阅读更多2024-11-17

带函数的多代理 AutoGen - 使用代码示例分步

目录

一、说明

二、AutoGen 的本质：可对话和可定制的代理

2.1 简化任务自动化和人机交互

2.2 赋能动态对话

三、分步代码示例

3.1 第 1 步：安装必要的库

3.2 第 2 步：设置 API 终端节点

3.3 第 3 步：导入库并设置配置

3.4 第 4 步：创建转录和翻译的函数和配置

四、结论

相关文章