ubuntu下的chattts 学习5：Example: self introduction

🕗 发布于 2024-12-07 05:46 ubuntu 学习 linux

代码

import ChatTTS
import torch
import torchaudio

chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance
###################################
inputs_en = """
chat T T S is a text to speech model designed for dialogue applications. 
[uv_break]it supports mixed language input [uv_break]and offers multi speaker 
capabilities with precise control over prosodic elements like 
[uv_break]laughter[uv_break][laugh], [uv_break]pauses, [uv_break]and intonation. 
[uv_break]it delivers natural and expressive speech,[uv_break]so please
[uv_break] use the project responsibly at your own risk.[uv_break]
""".replace('\n', '') # English is still experimental.

params_refine_text = ChatTTS.Chat.RefineTextParams(
    prompt='[oral_2][laugh_0][break_4]',
)

audio_array_en = chat.infer(inputs_en, params_refine_text=params_refine_text)
torchaudio.save("self_introduction_output.wav", torch.from_numpy(audio_array_en[0]), 24000)

最后出错了。

如下：

UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3683.)
src = src.T
Traceback (most recent call last):
File "/home/duyicheng/gitee/ChatTTS/intr.py", line 22, in <module>
    torchaudio.save("self_introduction_output.wav", torch.from_numpy(audio_array_en[0]), 24000)
File "/home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torchaudio/_backend/utils.py", line 313, in save
    return backend.save(
           ^^^^^^^^^^^^^
File "/home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torchaudio/_backend/ffmpeg.py", line 316, in save
    save_audio(
File "/home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torchaudio/_backend/ffmpeg.py", line 248, in save_audio
    s.add_audio_stream(
File "/home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torio/io/_streaming_media_encoder.py", line 278, in add_audio_stream
    self._s.add_audio_stream(
RuntimeError: Failed to open codec: (Invalid argument)
Exception raised from open_codec at /__w/audio/audio/pytorch/audio/src/libtorio/ffmpeg/stream_writer/encode_process.cpp:194 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x75c2fb4b2446 in /home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x75c2fb45c6e4 in /home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x46c3f (0x75c2f28cac3f in /home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg6.so)
frame #3: torio::io::get_audio_encode_process(AVFormatContext*, int, int, std::string const&, std::optional<std::string> const&, std::optional<std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > > const&, std::optional<std::string> const&, std::optional<int> const&, std::optional<int> const&, std::optional<torio::io::CodecConfig> const&, std::optional<std::string> const&, bool) + 0x250 (0x75c2f28d14d0 in /home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg6.so)
frame #4: torio::io::StreamingMediaEncoder::add_audio_stream(int, int, std::string const&, std::optional<std::string> const&, std::optional<std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > > const&, std::optional<std::string> const&, std::optional<int> const&, std::optional<int> const&, std::optional<torio::io::CodecConfig> const&, std::optional<std::string> const&) + 0x90 (0x75c2f28d8dd0 in /home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg6.so)
frame #5: <unknown function> + 0x3acbb (0x75c22799ecbb in /home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torio/lib/_torio_ffmpeg6.so)
frame #6: <unknown function> + 0x31dfc (0x75c227995dfc in /home/duyicheng/anaconda3/envs/chattts/lib/python3.12/site-packages/torio/lib/_torio_ffmpeg6.so)
<omitting python frames>
frame #18: <unknown function> + 0x2a1ca (0x75c30022a1ca in /lib/x86_64-linux-gnu/libc.so.6)
frame #19: __libc_start_main + 0x8b (0x75c30022a28b in /lib/x86_64-linux-gnu/libc.so.6)

解决：未验证：明天再说。

根据你提供的错误信息，问题主要集中在两个方面：

1. **使用 `x.T` 进行张量转置的弃用警告**：
   - 错误信息提示 `x.T` 在维度不是2的情况下使用已被弃用，并将在未来版本中抛出错误。
   - 建议使用 `x.mT` 来转置批次矩阵，或者使用 `x.permute(*torch.arange(x.ndim - 1, -1, -1))` 来反转张量的所有维度。

2. **保存音频文件时的运行时错误**：
   - 错误信息显示在尝试保存音频文件时，FFmpeg 编解码器无法打开，原因是无效参数。
   - 具体错误发生在 `torio::io::get_audio_encode_process` 函数中。

### 解决方案

#### 1. 处理 `x.T` 弃用警告
如果你的代码中有类似 `x.T` 的用法，建议替换为推荐的方法。例如：
```python
# 替换 x.T 为 x.mT 或 x.permute(*torch.arange(x.ndim - 1, -1, -1))
# 示例：
# x = x.mT # 如果 x 是一个批次矩阵
# 或者
# x = x.permute(*torch.arange(x.ndim - 1, -1, -1)) # 反转所有维度
```

#### 2. 解决音频保存错误
音频保存错误可能是由于音频数据格式或参数设置不正确导致的。以下是一些可能的解决方案：

- **检查音频数据格式**：
- 确保 `audio_array_en[0]` 是一个正确的音频数组，通常是一个二维数组，其中第一维是通道数，第二维是样本数。
- 使用 `torch.from_numpy(audio_array_en[0])` 转换后，确保得到的是一个浮点类型的张量，并且值在合理范围内（通常是 -1 到 1）。

- **调整采样率**：
- 确认 `24000` 是正确的采样率。如果不正确，修改为正确的采样率。

- **检查 FFmpeg 安装**：
- 确保 FFmpeg 已正确安装并且配置正确。可以通过命令行运行 `ffmpeg -version` 来验证。

- **调试音频数据**：
- 尝试先将音频数据保存为其他格式（如 `.wav`），看看是否能成功保存，以排除数据本身的问题。

- **查看文档和社区支持**：
- 查阅 `torchaudio` 和 `torio` 的官方文档，寻找关于音频保存的更多信息。
- 如果问题依然存在，可以在相关的 GitHub 讨论区或 Stack Overflow 上寻求帮助。

希望这些建议能帮助你解决问题！

原文地址：https://blog.csdn.net/weixin_42771529/article/details/144296874

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：3.2 串口_HAL库实现代码
下一篇：从零开始用Pytorch构建大型语言模型（LLM）

【动态规划】陶然无喜亦无忧，人生且自由 - 简单多状态模型
本篇博客给大家带来的是简单多状态之动态规划解法技巧.🐎文章专栏: 动态规划🚀若有问题评论区见❤如果你不知道分享给谁,那就分享给薯条.你们的支持是我不断创作的动力 .王子,公主请阅🚀1. 按摩师2.
阅读更多2024-12-25
Python高性能web框架-FastApi教程：(10)Request对象
类型的参数，FastAPI 就会自动传递 Request 对象给这个参数，我们就可以获取到。例如我们在路径操作函数中想获取客户端的IP地址，需要在函数中声明Request。url, cookie, s
阅读更多2024-12-25
麒麟操作系统服务架构保姆级教程（四）NGINX中间件
如果你想拥有你从未拥有过的东西，那么你必须去做你从未做过的事情想要在网页上访问到代码那么就需要用到应用服务类中间件，国外的有Nginx，Tomcat等，国内的有金蝶web，东方通的服务中间件（Tong
阅读更多2024-12-25
伪逆不能把矩阵变成单位阵
伪逆用来求解方程的最小二乘解(相当于线性方程版本的牛顿迭代找最小残差)。补充伪逆只有一个，没有什么左伪逆右伪逆。如何判断伪逆能不能恢复矩阵成单位阵。
阅读更多2024-12-25
《向量数据库指南》——Milvus Cloud 2.5：Sparse-BM25引领全文检索新时代
Milvus Cloud 2.5版本的发布，标志着向量数据库技术在全文检索领域的又一重大突破。Sparse-BM25算法的引入，不仅提升了Milvus Cloud的检索性能和存储效率，还为其在混合数据
阅读更多2024-12-25
网页核心页面设计（第10章）
CSS 动效是一种使用 CSS 提供的动画功能来创建视觉效果的技术。它可以使网页更具吸引力和交互性，最终提升用户体验。
阅读更多2024-12-25
Zero Trust 模型：重新定义数字化时代的安全策略
Zero Trust 是一种基于“从不信任，始终验证”的安全模型，旨在应对现代网络环境中的复杂安全挑战。它打破了传统“内网可信”的假设，通过持续验证、最小权限访问和动态监控，确保无论是内部还是外部的访
阅读更多2024-12-25
【kubernetes】资源管理方式
常见的flags包括-n或–namespace用于指定命名空间，-o或–output用于指定输出格式（如yaml、json），–dry-run用于预览操作而不实际执行等。3.声明式对象配置：优点在于通
阅读更多2024-12-25
css3滚动边框特效属性 filter、inset应用
【代码】css3滚动边框特效属性 filter、inset应用。
阅读更多2024-12-25
双臂机器人
双臂机器人（Dual-arm Robot）是一种配备有两个机械臂的机器人系统，通常设计用于完成需要高精度、高灵活性和复杂交互的任务。与单臂机器人相比，双臂机器人能够进行更加复杂的操作和任务协作，比如搬
阅读更多2024-12-25

ubuntu下的chattts 学习5：Example: self introduction

相关文章