纯朴 发表于 2025-3-30 10:02:07
,ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images, conditioned by textual descriptions. Existing diffusion models exhibit limitations of spatial perception in complex real-world scenes, relying on additional modalities to enforce constraints, and TOG imposes heightened challenges on scene comprehension under the weak supervision of linguistic inforAnhydrous 发表于 2025-3-30 14:03:37
http://reply.papertrans.cn/25/2424/242348/242348_52.pngentitle 发表于 2025-3-30 16:39:41
,Region-Adaptive Transform with Segmentation Prior for Image Compression,s transform methods for compression. However, there is no prior research on neural transform that focuses on specific regions. In response, we introduce the class-agnostic segmentation masks (. semantic masks without category labels) for extracting region-adaptive contextual information. Our proposespondylosis 发表于 2025-3-30 21:14:03
http://reply.papertrans.cn/25/2424/242348/242348_54.pngaspect 发表于 2025-3-31 03:36:08
,: Spuriousness Mitigation with Minimal Human Annotations,orld scenarios where such correlations do not hold. Despite the increasing research effort, existing solutions often face two main challenges: they either demand substantial annotations of spurious attributes, or they yield less competitive outcomes with expensive training when additional annotation决定性 发表于 2025-3-31 06:53:50
http://reply.papertrans.cn/25/2424/242348/242348_56.png衣服 发表于 2025-3-31 12:16:28
http://reply.papertrans.cn/25/2424/242348/242348_57.pngIVORY 发表于 2025-3-31 16:09:01
http://reply.papertrans.cn/25/2424/242348/242348_58.png唤起 发表于 2025-3-31 20:11:08
,UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection,ded natural language within untrimmed videos. Despite that they focus on different events, we observe they have a significant connection. For instance, most descriptions in MR involve multiple actions from TAD. In this paper, we aim to investigate the potential synergy between TAD and MR. Firstly, w格子架 发表于 2025-3-31 23:14:31
,DyFADet: Dynamic Feature Aggregation for Temporal Action Detection,d modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneously adapt kernel weights and receptive fields at