STAT 发表于 2025-3-23 13:33:56
Innere Welten und äußere Realitätene solution for explainable human visual scanpath prediction. Extensive experiments on diverse eye-tracking datasets demonstrate the effectiveness of GazeXplain in both scanpath prediction and explanation, offering valuable insights into human visual attention and cognitive processes.节省 发表于 2025-3-23 17:00:38
Vom Älterwerden des Psychoanalytikers to counterfactual scenarios. This enables LVLMs to explicitly reason step-by-step rather than relying on biased knowledge, leading to more generalizable solutions. Our extensive evaluation demonstrates that CoCT outperforms existing approaches on tasks requiring reasoning under knowledge bias. OurSTING 发表于 2025-3-23 21:56:06
,Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs, first self-supervised tracker to achieve competitive performance on MOT17, DanceTrack, and BDD100K. Remarkably, our proposal outperforms the previous self-supervised trackers even when drastically reducing the annotation requirements by up to 400..付出 发表于 2025-3-24 00:18:55
,Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition,e comprises individual-to-global and individual-to-social paths, mutually reinforcing each other’s task with global-local context through multiple layers. Through extensive experiments, we validate the effectiveness of the spatio-temporal proximity among individuals and the dual-path architecture inacquisition 发表于 2025-3-24 02:21:34
http://reply.papertrans.cn/25/2424/242336/242336_15.pngadroit 发表于 2025-3-24 10:06:46
,FSD-BEV: Foreground Self-distillation for Multi-view 3D Object Detection,ome distillation strategies. Additionally, we design two Point Cloud Intensification (PCI) strategies to compensate for the sparsity of point clouds by frame combination and pseudo point assignment. Finally, we develop a Multi-Scale Foreground Enhancement (MSFE) module to extract and fuse multi-scalhandle 发表于 2025-3-24 12:08:22
,MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?, addition, we propose a Chain-of-Thought (CoT) evaluation strategy for a fine-grained assessment of the output answers. Rather than naively judging true or false, we employ GPT-4(V) to adaptively assess each step with error analysis to derive a total score, which can reveal the inner CoT reasoning qMAPLE 发表于 2025-3-24 16:44:09
See and Think: Embodied Agent in Virtual Environment,wledge question-answering pairs, and 200+ skill-code pairs. We conduct continuous block search, knowledge question and answering, and tech tree mastery to evaluate the performance. Extensive experiments show that STEVE achieves at most 1.5. faster unlocking key tech trees and 2.5. quicker in block s预防注射 发表于 2025-3-24 19:57:38
http://reply.papertrans.cn/25/2424/242336/242336_19.png油毡 发表于 2025-3-24 23:25:32
,VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding,ure enhancements with a novel pre-training task, using language masking on a snippet of the document text fed to the visual encoder in place of the prompt, to empower the model with focusing capabilities. Consequently, VisFocus learns to allocate its attention to text patches pertinent to the provid