CESS 发表于 2025-4-1 05:18:52
An Automated Pipeline for Robust Image Processing and Optical Character Recognition of Historical Dernates mostly between Russian and Ukrainian but other languages also occur. The paper focuses mainly on segmentation, document type classification, and image preprocessing of the scanned documents; the output of those methods is then passed to the off-the-shelf OCR software and a baseline performance is evaluated on a simplified OCR task.为敌 发表于 2025-4-1 09:35:13
Conference proceedings 2020 multimedia processing, human-machine interaction, deep learning for audio processing, computational paralinguistics, affective computing, speech and language resources, speech translation systems, text mining and sentiment analysis, voice assistants, etc..Due to the Corona pandemic SPECOM 2020 was held as a virtual event..加花粗鄙人 发表于 2025-4-1 13:55:30
http://reply.papertrans.cn/88/8741/874048/874048_63.png有机体 发表于 2025-4-1 14:42:56
Hate Speech Detection Using Transformer Ensembles on the HASOC Dataset,rpose of dehumanizing, defaming or threatening individuals and marginalized groups not only threatens the mental health of its targets, as well as their democratic access to the Internet, but also the fabric of our society. Because of this, much effort has been devoted to manual moderation. The amouAllergic 发表于 2025-4-1 19:21:47
MP3 Compression to Diminish Adversarial Noise in End-to-End Speech Recognition,on. The present work proposes MP3 compression as a means to decrease the impact of Adversarial Noise (AN) in audio samples transcribed by ASR systems. To this end, we generated AAEs with a new variant of the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attention ASR system. The MP3’s effeAntigen 发表于 2025-4-2 00:16:00
,Exploration of End-to-End ASR for OpenSTT – Russian Open Speech-to-Text Dataset,enSTT. We evaluate different existing end-to-end approaches such as joint CTC/Attention, RNN-Transducer, and Transformer. All of them are compared with the strong hybrid ASR system based on LF-MMI TDNN-F acoustic model..For the three available validation sets (phone calls, YouTube, and books), our b