CHAFE 发表于 2025-4-1 05:46:49

PhonHuBERT: A Phoneme Transcription Tool for Song DatasetsVoice Synthesis (SVS) systems, which led to an increase in the demand for accurately labeled data. In response to this need, this work introduces an Aligned Phoneme Sequence Transcription (APST) model for automatic song datasets annotation, called PhonHuBERT. This model uses HuBERT - a pre-trained s

短程旅游 发表于 2025-4-1 06:16:18

Audio-LLM: Activating the Capabilities of Large Language Models to Comprehend Audio Dataes of large language models to comprehend audio data. Our task entails introducing an encoding method that effectively transforms audio data into embedded representations, enabling LLMs to comprehend and process the information contained within the audio. By undergoing a series of fine-tuning stages

Leisureliness 发表于 2025-4-1 11:20:32

http://reply.papertrans.cn/17/1673/167293/167293_63.png

Somber 发表于 2025-4-1 16:55:39

http://reply.papertrans.cn/17/1673/167293/167293_64.png

Throttle 发表于 2025-4-1 20:54:55

http://reply.papertrans.cn/17/1673/167293/167293_65.png
页: 1 2 3 4 5 6 [7]
查看完整版本: Titlebook: Advances in Neural Networks – ISNN 2024; 18th International S Xinyi Le,Zhijun Zhang Conference proceedings 2024 The Editor(s) (if applicabl