共同确定为确 发表于 2025-3-30 11:59:37

http://reply.papertrans.cn/25/2424/242336/242336_51.png

后退 发表于 2025-3-30 15:53:12

http://reply.papertrans.cn/25/2424/242336/242336_52.png

争吵 发表于 2025-3-30 19:04:01

See and Think: Embodied Agent in Virtual Environment,hotspot. This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE comprises three key components: vision perception, language instruction, and code action. Vision perception involves interpreting visual information in the environment, which

ALLAY 发表于 2025-3-30 21:11:57

http://reply.papertrans.cn/25/2424/242336/242336_54.png

欢乐东方 发表于 2025-3-31 03:15:34

http://reply.papertrans.cn/25/2424/242336/242336_55.png

obscurity 发表于 2025-3-31 05:33:31

,VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding,cade of vision and language models. The text component can either be extracted explicitly with the use of external OCR models in OCR-based approaches, or alternatively, the vision model can be endowed with reading capabilities in OCR-free approaches. Typically, the queries to the model are input exc

里程碑 发表于 2025-3-31 10:17:31

,Masked Angle-Aware Autoencoder for Remote Sensing Images,ade promising progress. However, they have overlooked the diverse angles present in RS objects. This paper proposes the Masked Angle-Aware Autoencoder (MA3E) to perceive and learn angles during pre-training. We design a . operation to create the rotated crop with random orientation on each original

原谅 发表于 2025-3-31 14:17:06

http://reply.papertrans.cn/25/2424/242336/242336_58.png

OVER 发表于 2025-3-31 19:48:25

http://reply.papertrans.cn/25/2424/242336/242336_59.png

Urgency 发表于 2025-4-1 01:23:43

,GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths,arious applications. Traditional scanpath models predict the where and when of gaze shifts without providing explanations, creating a gap in understanding the rationale behind fixations. To bridge this gap, we introduce GazeXplain, a novel study of visual scanpath prediction and explanation. This in
页: 1 2 3 4 5 [6] 7
查看完整版本: Titlebook: Computer Vision – ECCV 2024; 18th European Confer Aleš Leonardis,Elisa Ricci,Gül Varol Conference proceedings 2025 The Editor(s) (if applic