Titlebook: Man-Machine Speech Communication; 17th National Confer Ling Zhenhua,Gao Jianqing,Jia Jia Conference proceedings 2023 The Editor(s) (if appl

显示全部楼层 · 发表于 2025-3-26 22:07:24

,Adversarial Training Based on Meta-Learning in Unseen Domains for Speaker Verification,fer from poor performance when apply to unseen data with domain shift caused by the difference between training data and testing data such as scene noise and speaking style. To solve the above issues, the model we proposed includes a backbone and an extra domain attention module, which are optimized

显示全部楼层 · 发表于 2025-3-27 01:34:55

,Multi-speaker Multi-style Speech Synthesis with Timbre and Style Disentanglement, the disentanglement of timbres and styles, TTS systems could synthesize expressive speech for a given speaker with any style which has been seen in the training corpus. However, there are still some shortcomings with the current research on timbre and style disentanglement. The current method eithe

显示全部楼层 · 发表于 2025-3-27 05:29:34

,Multiple Confidence Gates for Joint Training of SE and ASR,es on improving the auditory quality of speech, but the enhanced feature distribution is changed, which is uncertain and detrimental to the ASR. To tackle this challenge, an approach with multiple confidence gates for jointly training of SE and ASR is proposed. A speech confidence gates prediction m

显示全部楼层 · 发表于 2025-3-27 11:22:32

显示全部楼层 · 发表于 2025-3-27 14:37:13

显示全部楼层 · 发表于 2025-3-27 20:47:45

显示全部楼层 · 发表于 2025-3-27 21:56:41

Interplay Between Prosody and Syntax-Semantics: Evidence from the Prosodic Features of Mandarin Tagof Mandarin sentence prosody, on the other hand, is still limited. To bridge this gap, this study probed the prosodic features of Mandarin tag questions in comparison with those from the declarative counterparts. The aim was to verify the hypothesis that the statement parts in the tag questions woul

显示全部楼层 · 发表于 2025-3-28 05:41:22

,Improving Fine-Grained Emotion Control and Transfer with Gated Emotion Representations in Speech Syhe lack of fine-grained emotion strength labelling data, emotion or style strength extractor is usually learned at the whole utterance scale through a ranking function. However, such utterance-based extractor is then used to provide fine-grained emotion strength labels, conditioning on which a fine-

显示全部楼层 · 发表于 2025-3-28 09:32:44

,Violence Detection Through Fusing Visual Information to Auditory Scene,olve the present issue of the lack of violent audio datasets, we first created our own audio violent dataset named VioAudio. Then, we proposed a CNN-ConvLSTM network model for audio violence detection, which obtained an accuracy of 91.5% on VioAudio and a MAP value of 16.47% on the MediaEval 2015 da

显示全部楼层 · 发表于 2025-3-28 11:04:49

,Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis,t various temporal resolutions and finally reconstructs the raw waveform. The experimental results show that our proposed SF-GAN vocoder outperforms the state-of-the-art HiFi-GAN and Fre-GAN in both analysis-synthesis (AS) and text-to-speech (TTS) tasks, and the synthesized speech quality of SF-GAN is comparable to the ground-truth audio.

		自动登录	找回密码
密码			To register

关于派博传思			派博传思旗下网站			友情链接
派博传思介绍	公司地理位置	论文服务流程	影响因子官网	吾爱论文网	大讲堂	北京大学	Oxford Uni.	Harvard Uni.
发展历史沿革	期刊点评	投稿经验总结	SCIENCEGARD	IMPACTFACTOR	派博系数	清华大学	Yale Uni.	Stanford Uni.
\|Archiver\|手机版\|小黑屋\| 派博传思国际 ( 京公网安备110108008328) GMT+8, 2026-1-12 17:16
Copyright © 2001-2015 派博传思京公网安备110108008328 版权所有 All rights reserved