懒惰人民 发表于 2025-3-28 17:03:54
,Factorizing Text-to-Video Generation by Explicit Image Conditioning,. Nvidia’s PYOCO, and . vs. Meta’s Make-A-Video. Our model outperforms commercial solutions such as RunwayML’s Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user’s text prompt, where our generations are preferred . over prior work.提名的名单 发表于 2025-3-28 22:49:32
,MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices, the base model. Empirical studies, conducted both quantitatively and qualitatively, demonstrate the effectiveness of our proposed technologies. With them, MobileDiffusion achieves instant text-to-image generation on mobile devices, establishing a new state of the art.NEX 发表于 2025-3-29 01:00:39
,Generating Physically Realistic and Directable Human Motions from Multi-modal Inputs,on. For example, the input may come from a VR controller providing arm motion and body velocity, partial key-point animation, computer vision applied to videos, or even higher-level motion goals. This requires a versatile low-level humanoid controller that can handle such sparse, under-specified guiPLIC 发表于 2025-3-29 03:22:14
,CoTracker: It Is Better to Track Together,oaches that track points independently, CoTracker tracks them jointly, accounting for their dependencies. We show that joint tracking significantly improves tracking accuracy and robustness, and allows CoTracker to track occluded points and points outside of the camera view. We also introduce severa名字 发表于 2025-3-29 10:44:15
http://reply.papertrans.cn/25/2424/242330/242330_45.png他日关税重重 发表于 2025-3-29 13:05:53
http://reply.papertrans.cn/25/2424/242330/242330_46.pngIngest 发表于 2025-3-29 15:41:06
,Improving Adversarial Transferability via Model Alignment,alignment technique aimed at improving a given source model’s ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measures the divergence in the predictions between the来这真柔软 发表于 2025-3-29 23:29:34
http://reply.papertrans.cn/25/2424/242330/242330_48.pngExuberance 发表于 2025-3-30 00:51:18
,ADen: Adaptive Density Representations for Sparse-View Camera Pose Estimation,structions. Classic methods often depend on feature correspondence, such as keypoints, which require the input images to have large overlap and small viewpoint changes. Such requirements present considerable challenges in scenarios with sparse views. Recent data-driven approaches aim to directly out欲望 发表于 2025-3-30 04:49:02
,Embodied Understanding of Driving Scenarios,standing is typically founded upon Vision-Language Models (VLMs). Nevertheless, existing VLMs are restricted to the 2D domain, devoid of spatial awareness and long-horizon extrapolation proficiencies. We revisit the key aspects of autonomous driving and formulate appropriate rubrics. Hereby, we intr