纯朴 发表于 2025-3-30 10:02:07

,ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images, conditioned by textual descriptions. Existing diffusion models exhibit limitations of spatial perception in complex real-world scenes, relying on additional modalities to enforce constraints, and TOG imposes heightened challenges on scene comprehension under the weak supervision of linguistic infor

Anhydrous 发表于 2025-3-30 14:03:37

http://reply.papertrans.cn/25/2424/242348/242348_52.png

entitle 发表于 2025-3-30 16:39:41

,Region-Adaptive Transform with Segmentation Prior for Image Compression,s transform methods for compression. However, there is no prior research on neural transform that focuses on specific regions. In response, we introduce the class-agnostic segmentation masks (. semantic masks without category labels) for extracting region-adaptive contextual information. Our propose

spondylosis 发表于 2025-3-30 21:14:03

http://reply.papertrans.cn/25/2424/242348/242348_54.png

aspect 发表于 2025-3-31 03:36:08

,: Spuriousness Mitigation with Minimal Human Annotations,orld scenarios where such correlations do not hold. Despite the increasing research effort, existing solutions often face two main challenges: they either demand substantial annotations of spurious attributes, or they yield less competitive outcomes with expensive training when additional annotation

决定性 发表于 2025-3-31 06:53:50

http://reply.papertrans.cn/25/2424/242348/242348_56.png

衣服 发表于 2025-3-31 12:16:28

http://reply.papertrans.cn/25/2424/242348/242348_57.png

IVORY 发表于 2025-3-31 16:09:01

http://reply.papertrans.cn/25/2424/242348/242348_58.png

唤起 发表于 2025-3-31 20:11:08

,UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection,ded natural language within untrimmed videos. Despite that they focus on different events, we observe they have a significant connection. For instance, most descriptions in MR involve multiple actions from TAD. In this paper, we aim to investigate the potential synergy between TAD and MR. Firstly, w

格子架 发表于 2025-3-31 23:14:31

,DyFADet: Dynamic Feature Aggregation for Temporal Action Detection,d modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneously adapt kernel weights and receptive fields at
页: 1 2 3 4 5 [6]
查看完整版本: Titlebook: Computer Vision – ECCV 2024; 18th European Confer Aleš Leonardis,Elisa Ricci,Gül Varol Conference proceedings 2025 The Editor(s) (if applic