PSYCH OpenIR
Zero-shot voice conversion based on feature disentanglement
Guo, Na1; Wei, Jianguo1; Li, Yongwei2; Lu, Wenhuan1; Tao, Jianhua3
通讯作者Li, Yongwei(liyw@psych.ac.cn)
摘要Voice conversion (VC) aims to convert the voice from a source speaker to a target speaker without modifying the linguistic content. Zero-shot voice conversion has attracted significant attention in the task of VC because it can achieve conversion for speakers who did not appear during the training stage. Despite the significant progress made by previous methods in zero-shot VC, there is still room for improvement in separating speaker information and content information. In this paper, we propose a zero-shot VC method based on feature disentanglement. The proposed model uses a speaker encoder for extracting speaker embeddings, introduces mixed speaker layer normalization to eliminate residual speaker information in content encoding, and employs adaptive attention weight normalization for conversion. Furthermore, dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. The experiments demonstrate that performance of the proposed model is superior to several state-of-the-art models, achieving both high similarity with the target speaker and intelligibility. In addition, the decoding speed of our model is much higher than the existing state-of-the-art models.
关键词Zero-shot voice conversion Mixed speaker layer normalization Adaptive attention weight normalization Dynamic convolution
2024-11-01
语种英语
DOI10.1016/j.specom.2024.103143
发表期刊SPEECH COMMUNICATION
ISSN0167-6393
卷号165页码:10
收录类别SCI
资助项目National Key R&D Pro-gram of China[2023YFB2603902] ; Tianjin Science and Technology Program[21JCZXJC00190] ; National Natural Science Foundation of China[62201571]
出版者ELSEVIER
WOS关键词SPARSE REPRESENTATION ; ADAPTATION ; SPEAKER
WOS研究方向Acoustics ; Computer Science
WOS类目Acoustics ; Computer Science, Interdisciplinary Applications
WOS记录号WOS:001340314300001
资助机构National Key R&D Pro-gram of China ; Tianjin Science and Technology Program ; National Natural Science Foundation of China
引用统计
文献类型期刊论文
条目标识符https://ir.psych.ac.cn/handle/311026/48867
通讯作者Li, Yongwei
作者单位1.Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
2.Chinese Acad Sci, Inst Psychol, CAS Key Lab Behav Sci, Beijing, Peoples R China
3.Tsinghua Univ, Dept Automat, Beijing, Peoples R China
推荐引用方式
GB/T 7714
Guo, Na,Wei, Jianguo,Li, Yongwei,et al. Zero-shot voice conversion based on feature disentanglement[J]. SPEECH COMMUNICATION,2024,165:10.
APA Guo, Na,Wei, Jianguo,Li, Yongwei,Lu, Wenhuan,&Tao, Jianhua.(2024).Zero-shot voice conversion based on feature disentanglement.SPEECH COMMUNICATION,165,10.
MLA Guo, Na,et al."Zero-shot voice conversion based on feature disentanglement".SPEECH COMMUNICATION 165(2024):10.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Guo, Na]的文章
[Wei, Jianguo]的文章
[Li, Yongwei]的文章
百度学术
百度学术中相似的文章
[Guo, Na]的文章
[Wei, Jianguo]的文章
[Li, Yongwei]的文章
必应学术
必应学术中相似的文章
[Guo, Na]的文章
[Wei, Jianguo]的文章
[Li, Yongwei]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。