共同确定为确 发表于 2025-3-30 11:59:37
http://reply.papertrans.cn/25/2424/242336/242336_51.png后退 发表于 2025-3-30 15:53:12
http://reply.papertrans.cn/25/2424/242336/242336_52.png争吵 发表于 2025-3-30 19:04:01
See and Think: Embodied Agent in Virtual Environment,hotspot. This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE comprises three key components: vision perception, language instruction, and code action. Vision perception involves interpreting visual information in the environment, whichALLAY 发表于 2025-3-30 21:11:57
http://reply.papertrans.cn/25/2424/242336/242336_54.png欢乐东方 发表于 2025-3-31 03:15:34
http://reply.papertrans.cn/25/2424/242336/242336_55.pngobscurity 发表于 2025-3-31 05:33:31
,VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding,cade of vision and language models. The text component can either be extracted explicitly with the use of external OCR models in OCR-based approaches, or alternatively, the vision model can be endowed with reading capabilities in OCR-free approaches. Typically, the queries to the model are input exc里程碑 发表于 2025-3-31 10:17:31
,Masked Angle-Aware Autoencoder for Remote Sensing Images,ade promising progress. However, they have overlooked the diverse angles present in RS objects. This paper proposes the Masked Angle-Aware Autoencoder (MA3E) to perceive and learn angles during pre-training. We design a . operation to create the rotated crop with random orientation on each original原谅 发表于 2025-3-31 14:17:06
http://reply.papertrans.cn/25/2424/242336/242336_58.pngOVER 发表于 2025-3-31 19:48:25
http://reply.papertrans.cn/25/2424/242336/242336_59.pngUrgency 发表于 2025-4-1 01:23:43
,GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths,arious applications. Traditional scanpath models predict the where and when of gaze shifts without providing explanations, creating a gap in understanding the rationale behind fixations. To bridge this gap, we introduce GazeXplain, a novel study of visual scanpath prediction and explanation. This in