自学内容网 自学内容网

文生图模型

CogView3/README_zh.md at main · THUDM/CogView3 · GitHub

快速开始

提示词优化

虽然 CogView3 系列模型都是通过长篇合成图像描述进行训练的,但我们强烈建议在文本生成图像之前,基于大语言模型(LLMs)进行提示词的重写操作,这将大大提高生成质量。

我们提供了一个 示例脚本。我们建议您运行这个脚本,以实现对提示词对润色

python prompt_optimize.py --api_key "智谱AI API Key" --prompt {你的提示词} --base_url "https://open.bigmodel.cn/api/paas/v4" --model "glm-4-plus"

推理模型(Diffusers)

首先,确保从源代码安装diffusers库。

pip install git+https://github.com/huggingface/diffusers.git

接着,运行以下代码:

from diffusers import CogView3PlusPipeline
import torch

pipe = CogView3PlusPipeline.from_pretrained("THUDM/CogView3-Plus-3B", torch_dtype=torch.float16).to("cuda")

# Open it for reduce GPU memory usage
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."
image = pipe(
    prompt=prompt,
    guidance_scale=7.0,
    num_images_per_prompt=1,
    num_inference_steps=50,
    width=1024,
    height=1024,
).images[0]

image.save("cogview3.png")


原文地址:https://blog.csdn.net/asd8705/article/details/143069759

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!