新娘 发表于 2025-3-30 12:05:15
Methods for Collection and Evaluation of Comparable DocumentsSMT) but for other applications as well, e.g. the extraction of paraphrases. The potential value of such corpora requires efficient and effective methods for gathering and evaluating them. Most of these methods have been tested in retrieving document pairs for well resourced languages, however thereObserve 发表于 2025-3-30 14:25:35
Measuring the Distance Between Comparable Corpora Between Languagess still art rather than proper science. Here I will discuss attempts at approximating the content of corpora collected from the Web using various methods, also in comparison to traditional corpora, such as the BNC. The procedure for estimating the corpus composition is based on selecting keywords, fgastritis 发表于 2025-3-30 19:15:43
Exploiting Comparable Corpora for Lexicon Extraction: Measuring and Improving Corpus Quality We first develop a measure which can capture different comparability levels. This measure correlates very well with gold-standard comparability levels and is relatively robust to dictionary coverage. We then propose a well-founded algorithm to improve the quality, in terms of comparability scores,破译 发表于 2025-3-30 21:47:40
http://reply.papertrans.cn/20/1919/191857/191857_54.pngParadox 发表于 2025-3-31 03:18:22
Comparable Multilingual Patents as Large-Scale Parallel Corporaal. In this chapter, we explore a new but important area involving patents by investigating the potential of cultivating large-scale parallel corpora from comparable multilingual patents. Two major issues are investigated on multilingual patents: (1) How to build large-scale corpora of comparable pa吞下 发表于 2025-3-31 08:29:45
Extracting Parallel Phrases from Comparable DataLP applications. Even if two comparable documents have few or no parallel sentence pairs, there is still potential for parallelism in the sub-sentential level. The ability to detect these phrases creates a valuable resource, especially for low-resource languages. In this chapter we explore three phr动机 发表于 2025-3-31 11:28:56
http://reply.papertrans.cn/20/1919/191857/191857_57.pngComedienne 发表于 2025-3-31 15:26:46
Paraphrase Detection in Monolingual Specialized/Lay Comparable Corporaen two comparable corpora in the same language and the same domain, but displaying two different discourse types (lay and specialized), specific paraphrases can be spotted which provide a dimension along which these discourse types can be contrasted. Detecting such paraphrases in comparable corporaWAG 发表于 2025-3-31 18:32:59
http://reply.papertrans.cn/20/1919/191857/191857_59.png