换脸讲话：hallo在windows下的安装实现

🕗 发布于 2024-09-24 23:00 音视频计算机视觉

前言

提示：之前安装过linux下的hallo，即人脸讲话系统。hallo是目前使用的较好的一个虚拟人脸视频生成系统，相对比SadTalker而言，表情更加逼真，人物更加形象。这里记录的是windows下的hallo版本安装，感谢liuning同学的参与。

一、安装：

配置cuda python等，并克隆

示例：pandas 是基于NumPy 的一种工具，该工具是为了解决数据分析任务而创建的。

安装python

要求python版本在3.8和3.11之间，我们安装的是3.10.11，点击链接python官网
在这里插入图片描述
找到对应的版本的python

下载完成之后开始安装，选择自定义安装，注意勾选将python添加到路径中

点击next

选择自己的安装路径（建议不要放在C盘），然后点击安装

安装完成之后,查看是否加入环境变量，搜先搜索编辑系统环境变量。
在这里插入图片描述
然后，点击环境变量。

点击path

查看是否python的安装路径是否已经加入环境变量中了。如果不在，将这些变量加入即可

最后检查python是否安装成功，打开cmd（win+R）

在这里插入图片描述
在跳出的终端输入python，出现以下内容就代表安装成功。然后输入exit()退出

安装ffmpeg

点击此处进入ffmpeg官网下载Windows版本
在这里插入图片描述
解压之后，将文件夹命名为ffmpeg，并放入到C盘根目录中

然后，以管理员身份运行cmd并设置环境变量

setx /m PATH "C:\ffmpeg\bin;%PATH%"

在这里插入图片描述
查看环境变量是否有以下内容

重新启动计算机并通过运行以下命令来验证安装：

ffmpeg -version

在这里插入图片描述

安装CUDA12.1

进入CUDA官网，选择12.1版本的cuda
在这里插入图片描述
根据下图选择安装

下载完成之后就是安装，最后选自动安装，如果选择自定义安装的话，记住安装路径就可以了。
接着配置环境变量（和上面步骤一样）先进入你的C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1这个目录。这一步得确保你能进入到这个目录，否则的话就要找到你的NVIDIA GPU Computing Toolkit安装目录，然后进入./CUDA/v12.18这个目录。
在这里插入图片描述
最后查看是否安装成功，打开cmd，输入

nvcc --version

在这里插入图片描述
如图所所示就是安装成功了。

Install with Powershell run install.ps1 or install-cn.ps1(for Chinese)

我们在用powershell运行 install.ps1时，其实做的一件事就是install requirement.txt。将各种依赖库进行安装，并下载各种依赖的模型，其中下载模型，需要参考如下：

./pretrained_models/
|-- audio_separator/
|   |-- download_checks.json
|   |-- mdx_model_data.json
|   |-- vr_model_data.json
|   `-- Kim_Vocal_2.onnx
|-- face_analysis/
|   `-- models/
|       |-- face_landmarker_v2_with_blendshapes.task  # face landmarker model from mediapipe
|       |-- 1k3d68.onnx
|       |-- 2d106det.onnx
|       |-- genderage.onnx
|       |-- glintr100.onnx
|       `-- scrfd_10g_bnkps.onnx
|-- motion_module/
|   `-- mm_sd_v15_v2.ckpt
|-- sd-vae-ft-mse/
|   |-- config.json
|   `-- diffusion_pytorch_model.safetensors
|-- stable-diffusion-v1-5/
|   `-- unet/
|       |-- config.json
|       `-- diffusion_pytorch_model.safetensors
`-- wav2vec/
    `-- wav2vec2-base-960h/
        |-- config.json
        |-- feature_extractor_config.json
        |-- model.safetensors
        |-- preprocessor_config.json
        |-- special_tokens_map.json
        |-- tokenizer_config.json
        `-- vocab.json

其中，各种模型的下载错误，是导致安装不成功的主要原因，为此。我将linux中下载好的模型直接拷贝到对应的文件下，即可解决下载的问题。以下是安装成功的效果。

待插图；

二、推理

1.Powershell run with run_inference.ps1

run_inference代码如下：

$source_image="assets/zgr.jpg"   
$driving_audio="assets/feng3cut.wav"
$output="test.mp4"
$face_expand_ratio=""

Set-Location $PSScriptRoot
.\venv\Scripts\activate

$Env:HF_HOME = "huggingface"
$Env:XFORMERS_FORCE_DISABLE_TRITON = "1"
$ext_args = [System.Collections.ArrayList]::new()

if ($output) {
  [void]$ext_args.Add("--output=$output")
}

if ($face_expand_ratio) {
  [void]$ext_args.Add("--face_expand_ratio=$face_expand_ratio")
}

python.exe "./scripts/inference.py" `
--source_image=$source_image `
--driving_audio=$driving_audio `
 $ext_args

注意，我们需要修改的是source_image和driving_audio的文件入口。
driving_audio放入的音频文件不要太大，否则非常耗费时间。

然后就是运行的过程，显示如下：

  
# 处理步骤1：先处理背景图，再处理人脸；
Processed and saved: ./.cache\trump1_sep_background.png
Processed and saved: ./.cache\trump1_sep_face.png

#步骤2：将音频文件转为向量；
Some weights of Wav2VecModel were not initialized from the model checkpoint at ./pretrained_models/wav2vec/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:audio_separator.separator.separator:Separator version 0.17.2 instantiating with output_dir: ./.cache/audio_preprocess, output_format: WAV
INFO:audio_separator.separator.separator:Operating System: Linux #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2
INFO:audio_separator.separator.separator:System: Linux Node: ubuntu22-E500-G9-WS760T Release: 6.5.0-44-generic Machine: x86_64 Proc: x86_64
INFO:audio_separator.separator.separator:Python Version: 3.10.14
INFO:audio_separator.separator.separator:PyTorch Version: 2.2.2+cu121
INFO:audio_separator.separator.separator:FFmpeg installed: ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
INFO:audio_separator.separator.separator:ONNX Runtime GPU package installed with version: 1.18.0
INFO:audio_separator.separator.separator:CUDA is available in Torch, setting Torch device to CUDA
INFO:audio_separator.separator.separator:ONNXruntime has CUDAExecutionProvider available, enabling acceleration
INFO:audio_separator.separator.separator:Loading model Kim_Vocal_2.onnx...
INFO:audio_separator.separator.separator:Load model duration: 00:00:00
INFO:audio_separator.separator.separator:Starting separation process for  audio_file_path: assets/zhiguibing3cut.wav

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00,  1.24s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 17.42it/s]
INFO:audio_separator.separator.separator:Saving Vocals stem to 1_(Vocals)_Kim_Vocal_2.wav...
INFO:audio_separator.separator.separator:Clearing input audio file paths, sources and stems...
INFO:audio_separator.separator.separator:Separation duration: 00:00:10
## 大概需要运行2mins

# 步骤三：将多模态特征输入扩散模型的UNet结构中
The config attributes {'center_input_sample': False, 'out_channels': 4} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: 
 ['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight']

#运行SD的UNet
INFO:hallo.models.unet_3d:loaded temporal unet's pretrained weights from pretrained_models/stable-diffusion-v1-5/unet ...
The config attributes {'center_input_sample': False} were passed to UNet3DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.

# 运行运动模块，生成动画效果
Load motion module params from pretrained_models/motion_module/mm_sd_v15_v2.ckpt
INFO:hallo.models.unet_3d:Loaded 453.20928M-parameter motion module

# 运行hallo框架
loaded weight from  ./pretrained_models/hallo/net.pth

#进行31次迭代生成；
[1/31]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:27<00:00,  1.45it/s]
100%|
....
....
....

Moviepy - Building video .cache/output.mp4.
MoviePy - Writing audio in outputTEMP_MPY_wvf_snd.mp4
MoviePy - Done.                                                                                                                                                       
Moviepy - Writing video .cache/output.mp4

#输出mp4文件
Moviepy - Done !                                                                                                                                                      
Moviepy - video ready .cache/output.mp4
————————————————                        
原文链接：https://blog.csdn.net/wqthaha/article/details/140696292

9:38开始运行，
一段20s的音频，需要20mins才能生成对应的视频。

生成的效果视频如下所示：

zhangguorong

好用的工具推荐：
在线音频剪辑工具：https://mp3cut.net/#google_vignette

总结

这里对windows下尝鲜hallo的教程，进行了详细的说明。

原文地址：https://blog.csdn.net/wqthaha/article/details/142374632

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：11.安卓逆向-安卓开发基础-api服务接口设计2
下一篇：OLED（1）原理篇

Samba服务
Samba是一种实现SMB（Server Message Block）协议的软件，它允许 Linux 和 Unix 系统与 Windows 系统进行网络文件和打印服务的共享。主要进程：samba服务的
阅读更多2024-09-25
【CSS in Depth 2 精译_032】5.4 Grid 网格布局的显式网格与隐式网格（上）
本篇为《CSS in Depth》全新第2版5.4小节的上篇，主要介绍了 Grid 网格布局中的显示网格（explicit grid）与隐式网格（implicit grid）的基本概念，并通过引入一个
阅读更多2024-09-25
Effective Java 学习笔记45-48 Stream
本文先介绍Stream的基本概念以及Java的实现方式。后续介绍书中的几个建议。
阅读更多2024-09-25
【图形用户界面和游戏开发(基于Python)】
Python默认的GUI开发模块是tkinter（在Python 3以前的版本中名为Tkinter），从这个名字就可以看出它是基于Tk的，Tk是一个工具包，最初是为Tcl设计的，后来被移植到很多其他的
阅读更多2024-09-25
嵌入式边缘智能实验平台-嵌入式边缘智能实验箱
同时，结合温湿度、大气压力、光强等多种传感器，还可以进行多种功能的扩展，设计出智能家居、智能腕带、智慧农业等多种创意性的项目，培养出能快速融入市场的应用型人才，推动信创高等教育的发展。嵌入式边缘智能实
阅读更多2024-09-25
Java高级Day50-连接池
传统的JDBC数据库连接使用DriverManager来获取，每次向数据库建立连接的时候都要将Connection加载到内存中，再验证IP地址，用户名和密码。需要数据库连接的时候，就向数据库请求一个，
阅读更多2024-09-25
6-1 jmu-Java-04面向对象进阶-01-接口-匿名内部类ActionListener
调用MyStarter对象的。
阅读更多2024-09-25
USB 电缆中的信号线 DP、DM 的缩写由来
USB 电缆中的信号线 DP、DM 的缩写由来
阅读更多2024-09-25
35. 模型材质和几何体属性
本文章给大家介绍模型对象的几何体.geometry和材质属性.material。
阅读更多2024-09-25
Unity Debug时出现请选择unity实例
请选择Unity实例
阅读更多2024-09-25