实时语音克隆调研

2023-07-29 / 2023-07-29 / 虚拟人语音

Real-Time Voice Cloning

作者开源了全部代码，并且共享了预训练模型，对应的文章链接Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

预训练模型 ·CorentinJ/Real-Time-Voice-Cloning Wiki (github.com)

开源代码CorentinJ/Real-Time-Voice-Cloning: Clone a voice in 5 seconds to generate arbitrary speech in real-time (github.com)

Demo 试听

Real-Time Voice Cloning Toolbox - YouTube

Mocking Bird

这个项目是基于Realtime Voice Clone对中文的扩展，目前开源了所有代码，开源代码地址

https://github.com/babysor/MockingBird

目前也有知乎的官方专栏可以在里面进行讨论，并且里面有一些关于该项目的使用技巧

MockingBird - 知乎 (zhihu.com)

争对该项目，已经有一些训练好的模型，并且分享了出来如下：

作者	下载链接	效果展示	训练信息
	https://pan.baidu.com/s/1iONvRxmkI-t1nHqxKytY3g 百度盘链接提取码：4j5d （未失效）		75k steps 用3个开源数据集混合训练
	https://pan.baidu.com/s/1fMh9IlgKJlL2PIiRTYDUvw 百度盘链接提取码：om7f （未失效）		25k steps 用3个开源数据集混合训练, 切换到tag v0.0.1使用
FawenYo	https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing 百度盘链接提取码：1024 （未失效）	input output	200k steps 台湾口音需切换到tag v0.0.1使用
miven	https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ 提取码：2021 （未失效）		150k steps 注意：根据issue修复并切换到tag v0.0.1使用

仅支持手动新录音（16khz）, 不支持超过4MB的录音，最佳长度在5~15秒

Demo试听

DEMO of SV2TTS_哔哩哔哩_bilibili

只要5秒就能“克隆”本人语音！美玉学姐不再查寝，而是吃起了桃桃丨开源 - 知乎 (zhihu.com)

同样的，预训练数据集质量越好覆盖范围够广，那么它后面克隆的声音的质量就越好

Tortoise-TTS

代码作者开源出来了

https://github.com/neonbjb/tortoise-tts

目前作者还在不断完善改进这个项目，可以生成指定情感的语音

Demo试听

https://nonint.com/static/tortoise_v2_examples.html

百度飞桨--一句话合成

主要集中于一个仓库中，这个仓库中包含多个工具

https://github.com/PaddlePaddle/PaddleSpeech

包括语音聊天、声纹识别、语音识别、语音合成、语音指令

其中一句话合成即为模仿输入的音频的音色进行合成任务，给出了两种方案

1、 GE2E 音色克隆方案【FastSpeech2 + AISHELL-3 Voice Cloning】

2、 ECAPA-TDNN 音色克隆方案【FastSpeech2 + AISHELL-3 Voice Cloning (ECAPA-TDNN)】

相当于给出了预训练好的模型，直接安装好就可以尝试

https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_web

suno-ai bark

github开源代码地址

https://github.com/suno-ai/bark

demo试听地址

https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2

目前可以生成带一些情感的声音，比如笑声、哭泣等等，还能够生成音乐。即Bark会尝试匹配给定预设的语气、音高、情感和韵律，但目前不支持自定义语音克隆。

也能够生成长语音

在线体验https://huggingface.co/spaces/suno/bark

ElevenLabs声音克隆

声音克隆需要付费使用，未开源

https://beta.elevenlabs.io/

My Vocal

未开源，可以免费使用，但是局限是没有中文

https://www.myvocal.ai/

bark带有操作界面的开源项目

https://github.com/C0untFloyd/bark-gui

该项目中在bark项目的基础上附加了许多功能，包括语音克隆功能

下面有一段长文本的示例

Hello, I am called BARK and am a new text to audio model made by SUNO! Let me read an excerpt from War of the Worlds to you. [clears throat] We know NOW that in the early years of the twentieth century, this world was being watched closely by intelligences greater than man's and yet as mortal as his own. We know NOW that as human beings busied themselves about their various concerns they were scrutinized and studied, perhaps almost as NARROWLY as a man with a Microscope might scrutinize the transient creatures that swarm and multiply in a drop of water. YET across an immense ethereal gulf, minds that to our minds as ours are to the beasts in the jungle, intellects vast, cool and unsympathetic, regarded this earth with envious eyes and slowly and surely drew their plans against us. [sighs] In the thirty-ninth year of the twentieth century came the great disillusionment.