沉积物 发表于 2025-4-1 03:25:35
http://reply.papertrans.cn/24/2342/234130/234130_61.pngADJ 发表于 2025-4-1 08:33:28
https://doi.org/10.1007/978-1-349-25899-4ere each glimpse denotes an attention map. SOMA adopts multi-glimpse attention to focus on different contents in the image. With projected the multi-glimpse outputs and question feature into a shared embedding space, an explicit second order feature is constructed to model the interaction on both th