GROG 发表于 2025-3-23 13:05:56

,Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Langnowledge while preserving the zero-shot capabilities of pre-trained VLMs. Extensive experiments on benchmark datasets demonstrate that our framework is favorable against state-of-the-art continual learning approaches for preventing catastrophic forgetting and zero-shot degradation. Project page: ..

抛射物 发表于 2025-3-23 16:02:40

,SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging,introduced which enjoys privileges from previous optical flow, selection masks and initial prediction. Moreover, to facilitate learning on samples with large motion, a new window partition cropping method is presented during training. Experiments on public and newly developed challenging datasets sh

AMBI 发表于 2025-3-23 21:48:23

,Reason2Drive: Towards Interpretable and Chain-Based Reasoning for Autonomous Driving,ric to assess chain-based reasoning performance in autonomous systems, addressing the reasoning ambiguities of existing metrics such as BLEU and CIDEr. Based on the proposed benchmark, we conduct experiments to assess various existing VLMs, revealing insights into their reasoning capabilities. Addit

shrill 发表于 2025-3-23 23:41:50

,Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models,ining efficiency, we design a novel fine-tuning framework named Omniview-Tuning (OVT). Specifically, OVT introduces a Cross-Viewpoint Alignment objective through a minimax-like optimization strategy, which effectively aligns representations of identical objects from diverse viewpoints without causin

地名表 发表于 2025-3-24 04:32:50

http://reply.papertrans.cn/25/2424/242306/242306_15.png

轻率看法 发表于 2025-3-24 07:10:18

,Soziales – Vom Sinn des Zusammen Seins, Network (BGAN) that learns to predict the constructed correction biases, which can be utilized to correct the original predictions from coarse-grained relationships to fine-grained ones. The extensive experimental results on VG, GQA, and VG-1800 datasets demonstrate that our SBG outperforms the sta

PALSY 发表于 2025-3-24 14:32:55

https://doi.org/10.1007/978-3-662-63158-4e FID score to 4.37. It is noteworthy that our sampling strategy sufficiently closes the gap between GANs and one-step diffusion models (.., with FID 4.02) under comparable model size. Code is available at ..

intertwine 发表于 2025-3-24 16:08:39

Theoretischer Hintergrund der Untersuchungeraging large-scale language, vision-language, and vision-motion data to assist motion-related generation tasks, MotionChain thus comprehends each instruction in multi-turn conversation and generates human motions followed by these prompts. Extensive experiments validate the efficacy of MotionChain,

JIBE 发表于 2025-3-24 20:00:20

http://reply.papertrans.cn/25/2424/242306/242306_19.png

尽管 发表于 2025-3-25 00:50:06

A. Koocheki,B. Lalegani,S. A. Hosseini a diffusion decoder conditioned on the representations extracted by a semantic encoder. Random masking is applied to encoder inputs to introduce a information bottleneck and remove redundancy of skeletons. Furthermore, we theoretically demonstrate that our generative objective involves the contrast
页: 1 [2] 3 4 5 6 7
查看完整版本: Titlebook: Computer Vision – ECCV 2024; 18th European Confer Aleš Leonardis,Elisa Ricci,Gül Varol Conference proceedings 2025 The Editor(s) (if applic