THROB 发表于 2025-3-26 23:49:28
,Uncertainty Calibration with Energy Based Instance-Wise Scaling in the Wild Dataset,truggle to accurately estimate uncertainty when processing inputs drawn from the wild dataset. To address this issue, we introduce a novel instance-wise calibration method based on an energy model. Our method incorporates energy scores instead of softmax confidence scores, allowing for adaptive cons有恶意 发表于 2025-3-27 02:44:15
http://reply.papertrans.cn/25/2424/242348/242348_32.pngSilent-Ischemia 发表于 2025-3-27 06:02:30
,UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection,ce the mutual benefits between TAD and MR. Extensive experiments demonstrate that the proposed task fusion learning scheme enables the two tasks to help each other and outperform the separately trained counterparts. Impressively, . achieves state-of-the-art results on three paired datasets Ego4D, Chextrovert 发表于 2025-3-27 11:15:52
,DyFADet: Dynamic Feature Aggregation for Temporal Action Detection,th the proposed encoder layer and DyHead, a new dynamic TAD model, DyFADet, achieves promising performance on a series of challenging TAD benchmarks, including HACS-Segment, THUMOS14, ActivityNet-1.3, Epic-Kitchen 100, Ego4D-Moment QueriesV1.0, and FineAction. Code is released to ..perpetual 发表于 2025-3-27 17:03:00
http://reply.papertrans.cn/25/2424/242348/242348_35.pngAnkylo- 发表于 2025-3-27 20:28:32
https://doi.org/10.1007/978-3-540-37652-1ures to reduce the ambiguity between foreground and background and strengthen the depth edges. Extensive experimental results on nuScenes and DDAD benchmarks show M.Depth achieves state-of-the-art performance. More results can be found in ..mastopexy 发表于 2025-3-27 23:12:34
Colin L Masters,Konrad Beyreuthermpowers existing frameworks to support hour-long videos and pushes their upper limit with an extra context token. It is demonstrated to surpass previous methods on most of video- or image-based benchmarks. Code and models are available at ..最有利 发表于 2025-3-28 05:07:50
M,Depth: Self-supervised Two-Frame ,ulti-camera ,etric Depth Estimation,ures to reduce the ambiguity between foreground and background and strengthen the depth edges. Extensive experimental results on nuScenes and DDAD benchmarks show M.Depth achieves state-of-the-art performance. More results can be found in ..奇思怪想 发表于 2025-3-28 10:10:55
,LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models,mpowers existing frameworks to support hour-long videos and pushes their upper limit with an extra context token. It is demonstrated to surpass previous methods on most of video- or image-based benchmarks. Code and models are available at ..防止 发表于 2025-3-28 13:04:31
http://reply.papertrans.cn/25/2424/242348/242348_40.png