Dri727 发表于 2025-3-30 09:00:49
http://reply.papertrans.cn/29/2849/284811/284811_51.png小教堂 发表于 2025-3-30 12:32:55
CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classificationtent module’ designed to leverage any generic document-related textual information. The discriminative features extracted by this module are aligned with CLIP’s text and image features using a novel ‘coupled-contrastive’ loss. Our module improves CLIP’s ZSL top-1 accuracy by 6.7% and GZSL harmonic mpulse-pressure 发表于 2025-3-30 18:02:50
http://reply.papertrans.cn/29/2849/284811/284811_53.pngHeart-Attack 发表于 2025-3-30 22:38:57
Are Layout Analysis and OCR Still Useful for Document Information Extraction Using Foundation Modelsfood label, and a small crop focusing on the relevant nutrition information. Comparative experiments are also conducted on the CORD database of receipts. Our results demonstrate that although OCR-free models achieve a remarkable performance, they still require some guidance regarding the layout, and去才蔑视 发表于 2025-3-31 01:23:53
: Knowledge Distillation for Visually-Rich Document Applicationsess of distilled DLA models on zero-shot layout-aware document visual question answering (DocVQA). DLA-KD experiments result in a large mAP knowledge gap, which unpredictably translates to downstream robustness, accentuating the need to further explore how to efficiently obtain more semantic documen媒介 发表于 2025-3-31 09:05:02
http://reply.papertrans.cn/29/2849/284811/284811_56.pngellagic-acid 发表于 2025-3-31 11:13:33
http://reply.papertrans.cn/29/2849/284811/284811_57.pngcanonical 发表于 2025-3-31 13:23:35
Global-SEG: Text Semantic Segmentation Based on Global Semantic Pair Relations from large language models and consider the positional information of text within the document to assess their efficacy in augmenting semantics. We test our model with both contemporary and historical corpora, and the results demonstrate that our approach outperforms benchmarks on each dataset.