Medicare 发表于 2025-3-26 23:58:33

Automatic Discovery of SimilarWords documents, the World Wide Web, and monolingual dictionaries. The underlying goal of these methods is in general the automatic discovery of synonyms. This goal, however, is most of the time too difficult to achieve since it is often hard to distinguish in an automatic way among synonyms, antonyms, a

领带 发表于 2025-3-27 01:53:30

Principal Direction Divisive Partitioning with Kernels and ,-Means Steeringthms, specifically .-means and principal direction divisive partitioning (PDDP). Using available theory regarding the solution of the clustering indicator vector problem, we use 2-means to induce partitionings around fixed or varying cut-points. 2-means is applied either on the data or over its proj

paradigm 发表于 2025-3-27 07:47:43

Hybrid Clustering with Divergencesmemory, one has to compress the dataset to make the application of clustering algorithms possible. The balanced iterative reducing and clustering algorithm (BIRCH) is designed to operate under the assumption that “the amount of memory available is limited, whereas the dataset can be arbitrarily larg

Infirm 发表于 2025-3-27 12:03:47

http://reply.papertrans.cn/89/8827/882668/882668_34.png

dearth 发表于 2025-3-27 15:34:40

http://reply.papertrans.cn/89/8827/882668/882668_35.png

Obsequious 发表于 2025-3-27 20:16:27

Applications of Semidefinite Programming in XML Document Classification a set of textual data according to a predefined logical structure. It has been shown that storing documents having similar structures together can reduce the fragmentation problem and improve query efficiency. Unlike the flat text document, the XML document has no vectorial representation, which is

轻打 发表于 2025-3-28 01:35:24

Discussion Tracking in Enron Email Using PARAFACa period of one year. For the publicly released Enron electronic mail collection, we encode a sparse term-author-month array for subsequent three-way factorization using the PARAllel FACtors (or PARAFAC) three-way decomposition first proposed by Harshman. Using nonnegative tensors, we preserve natur

商品 发表于 2025-3-28 05:41:14

Spam Filtering Based on Latent Semantic Indexingommercial email (UBE, UCE, commonly called “spam”) is studied. Comparisons to the simple vector space model (VSM) and to the extremely widespread, de-facto standard for spam filtering, the SpamAssassin system, are summarized. It is shown that VSM and LSI achieve significantly better classification r

蜈蚣 发表于 2025-3-28 07:48:13

A Probabilistic Model for Fast and Confident Categorization of Textual Documentssee Appendix). This entry relies on a straightforward implementation of a probabilistic categorizer described earlier . This categorizer is adapted to handle multiple labeling and a piecewise-linear confidence estimation layer is added to provide an estimate of the labeling confidence. This

报复 发表于 2025-3-28 13:56:05

Document Representation and Quality of Text: An Analysisapter, we will focus on document representation and demonstrate that the choice of document representation has a profound impact on the quality of the classification.We will also show that the text quality affects the choice of document representation. In our experiments we have used the centroid-ba
页: 1 2 3 [4] 5
查看完整版本: Titlebook: Survey of Text Mining II; Clustering, Classifi Michael W. Berry,Malu Castellanos Book 2008 Springer-Verlag London 2008 Anomaly Detection.Au