摆动 发表于 2025-3-30 09:34:00
http://reply.papertrans.cn/29/2849/284809/284809_51.png种类 发表于 2025-3-30 12:35:46
achine learning, and information retrieval. While many existing TSR methods employ transformer-based models with generally impressive performance, a gap remains in transformer models specifically designed to handle the distinct attributes of table rows and columns. Moreover, there is a lack of robus被诅咒的人 发表于 2025-3-30 17:40:47
cument regions under limited data conditions. The LD-DOC model effectively utilizes information from various scale visual features, enhancing its adaptability to feature distributions in scenarios with limited data and thereby improving the accuracy of document region partitioning. Specifically, our推测 发表于 2025-3-30 20:49:05
http://reply.papertrans.cn/29/2849/284809/284809_54.png障碍物 发表于 2025-3-31 04:34:12
he classification of book genres using text design on book covers. Text images have both semantic information about the word itself and other information (non-semantic information or visual design), such as font style, character color, etc. When we read a word printed on some materials, we receive iMinutes 发表于 2025-3-31 07:49:49
ssification is impractical for large collections due to its labor-intensive and error-prone nature. To address this, we propose a representational learning strategy that integrates semantic segmentation and deep learning models such as ResNet, CLIP, Document Image Transformer (DiT), and masked auto-