纯朴
发表于 2025-3-30 10:02:07
,ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images, conditioned by textual descriptions. Existing diffusion models exhibit limitations of spatial perception in complex real-world scenes, relying on additional modalities to enforce constraints, and TOG imposes heightened challenges on scene comprehension under the weak supervision of linguistic infor
Anhydrous
发表于 2025-3-30 14:03:37
http://reply.papertrans.cn/25/2424/242348/242348_52.png
entitle
发表于 2025-3-30 16:39:41
,Region-Adaptive Transform with Segmentation Prior for Image Compression,s transform methods for compression. However, there is no prior research on neural transform that focuses on specific regions. In response, we introduce the class-agnostic segmentation masks (. semantic masks without category labels) for extracting region-adaptive contextual information. Our propose
spondylosis
发表于 2025-3-30 21:14:03
http://reply.papertrans.cn/25/2424/242348/242348_54.png
aspect
发表于 2025-3-31 03:36:08
,: Spuriousness Mitigation with Minimal Human Annotations,orld scenarios where such correlations do not hold. Despite the increasing research effort, existing solutions often face two main challenges: they either demand substantial annotations of spurious attributes, or they yield less competitive outcomes with expensive training when additional annotation
决定性
发表于 2025-3-31 06:53:50
http://reply.papertrans.cn/25/2424/242348/242348_56.png
衣服
发表于 2025-3-31 12:16:28
http://reply.papertrans.cn/25/2424/242348/242348_57.png
IVORY
发表于 2025-3-31 16:09:01
http://reply.papertrans.cn/25/2424/242348/242348_58.png
唤起
发表于 2025-3-31 20:11:08
,UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection,ded natural language within untrimmed videos. Despite that they focus on different events, we observe they have a significant connection. For instance, most descriptions in MR involve multiple actions from TAD. In this paper, we aim to investigate the potential synergy between TAD and MR. Firstly, w
格子架
发表于 2025-3-31 23:14:31
,DyFADet: Dynamic Feature Aggregation for Temporal Action Detection,d modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneously adapt kernel weights and receptive fields at