名字 发表于 2025-3-23 12:57:34

Learning to Separate Text Content and Style for Classificationstudent” or “faculty”, or according the source universities, such as “Cornell” or “Texas”. We call one kind of labels the content and the other kind the style. Given a set of documents, each with both content and style labels, we seek to effectively learn to classify a set of documents in a new styl

botany 发表于 2025-3-23 17:48:32

Using Relative Entropy for Authorship Attributionbut none of these approaches is particularly satisfactory; some of them are ad hoc and most have defects in terms of scalability, effectiveness, and efficiency. In this paper, we propose a principled approach motivated from information theory to identify authors based on elements of writing style. W

humectant 发表于 2025-3-23 18:32:51

Efficient Query Evaluation Through Access-Reorderingeness. In this paper we extend access-ordering and introduce a variant index organisation technique that we label access-reordering. We show that by access-reordering an inverted index, query evaluation time can be reduced by as much as 62% over the standard approach, while yielding highly similar e

Limousine 发表于 2025-3-24 00:50:55

Natural Document Clustering by Clique Percolation in Random Graphsracteristics of clusters, and/or the probability distribution of clustered data. As a result, the clustering effects tend to be unnatural and stray away more or less from the intrinsic grouping nature among the documents in a corpus. We propose a novel graph-theoretic technique called . (CPC). It mo

countenance 发表于 2025-3-24 06:22:01

Text Clustering with Limited User Feedback Under Local Metric Learningsters into text clustering. For the modeling of each cluster, we make use of a local weight metric to reflect the importance of the features for a particular cluster. The local weight metric is learned using both the unlabeled data and the constraints generated automatically from user feedbacks and

ARCH 发表于 2025-3-24 09:14:56

http://reply.papertrans.cn/47/4653/465207/465207_16.png

不能妥协 发表于 2025-3-24 12:10:23

http://reply.papertrans.cn/47/4653/465207/465207_17.png

细微的差异 发表于 2025-3-24 15:57:26

Statistical Behavior Analysis of Smoothing Methods for Language Models of Mandarin Data Setsethod will be used on three language models in Mandarin data sets. Because of the problem of data sparseness, smoothing methods are employed to estimate the probability for each event (including all the seen and unseen events) in a language model. A set of properties used to analyze the statistical

Canary 发表于 2025-3-24 22:20:28

No Tag, a Little Nesting, and Great XML Keyword Searchformation. As XML data becomes more and more widespread, the trend of adapting keyword search on XML data also becomes more and more active. In this paper, we first try nesting mechanism for XML keyword search, which just uses a little nesting skill. This attempt has several benefits. For example, i

圆柱 发表于 2025-3-25 00:59:26

http://reply.papertrans.cn/47/4653/465207/465207_20.png
页: 1 [2] 3 4 5 6
查看完整版本: Titlebook: Information Retrieval Technology; Third Asia Informati Hwee Tou Ng,Mun-Kew Leong,Donghong Ji Conference proceedings 2006 Springer-Verlag Be