统治人类 发表于 2025-4-1 03:08:38
Representation and Granularity Joint Alignment Framework for Multimodal Sarcasm Detection on Social ent modalities into a unified granularity. Moreover, bidirectional cross-modal attention is adopted to aggregate global and local features for capturing comprehensive inconsistencies. Extensive experiments on a benchmark dataset demonstrate the superiority of the proposed framework. Our source codeAffectation 发表于 2025-4-1 06:10:31
http://reply.papertrans.cn/29/2845/284469/284469_62.png