CHAFE 发表于 2025-4-1 05:46:49
PhonHuBERT: A Phoneme Transcription Tool for Song DatasetsVoice Synthesis (SVS) systems, which led to an increase in the demand for accurately labeled data. In response to this need, this work introduces an Aligned Phoneme Sequence Transcription (APST) model for automatic song datasets annotation, called PhonHuBERT. This model uses HuBERT - a pre-trained s短程旅游 发表于 2025-4-1 06:16:18
Audio-LLM: Activating the Capabilities of Large Language Models to Comprehend Audio Dataes of large language models to comprehend audio data. Our task entails introducing an encoding method that effectively transforms audio data into embedded representations, enabling LLMs to comprehend and process the information contained within the audio. By undergoing a series of fine-tuning stagesLeisureliness 发表于 2025-4-1 11:20:32
http://reply.papertrans.cn/17/1673/167293/167293_63.pngSomber 发表于 2025-4-1 16:55:39
http://reply.papertrans.cn/17/1673/167293/167293_64.pngThrottle 发表于 2025-4-1 20:54:55
http://reply.papertrans.cn/17/1673/167293/167293_65.png