找回密码
 To register

QQ登录

只需一步,快速开始

扫一扫,访问微社区

Titlebook: Data Cleaning; Venkatesh Ganti,Anish Das Sarma Book 2013 Springer Nature Switzerland AG 2013

[复制链接]
楼主: irritants
发表于 2025-3-25 06:36:43 | 显示全部楼层
Operator: Clustering,at the records in a group be closer to each other, especially to each other than to records in other groups. A custom deduplication task may require that other constraints beyond similarity be satisfied as well. However, closeness to each other by textual similarity is a critical predicate, which ne
发表于 2025-3-25 09:08:23 | 显示全部楼层
Operator: Parsing, records into a target data warehouse often requires the reconciliation of schema of the input records and that of the target records. The process of reconciliation would often involve “segmenting” a column of an input record into multiple target columns. The segmented input records may then be comp
发表于 2025-3-25 11:49:33 | 显示全部楼层
Task: Record Matching,esent the same real-world entity, often referred to as “matching.” This important task needs to be solved while importing new customer sales records into the customer relation in a data warehouse. The customer records in the incoming sales need to be matched with existing customers to avoid subseque
发表于 2025-3-25 16:01:17 | 显示全部楼层
发表于 2025-3-25 23:27:11 | 显示全部楼层
Conclusion,mponents of the technology. he goals of data cleaning technology in typical enterprise scenarios, as illustrated by the examples in customer and product databases, are to maintain the quality and consistency of data as the data warehouse is either being populated with data for the first time or bein
发表于 2025-3-26 00:46:37 | 显示全部楼层
发表于 2025-3-26 04:45:48 | 显示全部楼层
Operator: Similarity Join,ied by a textual similarity function which compares the content of the two records. There are a variety of common similarity functions as discussed in the previous chapter. As in record matching, the deduplication task typically involves many predicates. However, a critical one is often based on textual similarity between records.
发表于 2025-3-26 08:51:15 | 显示全部楼层
Data Cleaning Scripts,erators as well as other predicates, which are required for the specific data and domain being considered. Thus, the development of custom data cleaning scripts is expected to be flexible, easy, and efficient all at the same time.
发表于 2025-3-26 13:48:30 | 显示全部楼层
发表于 2025-3-26 20:10:59 | 显示全部楼层
 关于派博传思  派博传思旗下网站  友情链接
派博传思介绍 公司地理位置 论文服务流程 影响因子官网 SITEMAP 大讲堂 北京大学 Oxford Uni. Harvard Uni.
发展历史沿革 期刊点评 投稿经验总结 SCIENCEGARD IMPACTFACTOR 派博系数 清华大学 Yale Uni. Stanford Uni.
|Archiver|手机版|小黑屋| 派博传思国际 ( 京公网安备110108008328) GMT+8, 2025-6-18 19:48
Copyright © 2001-2015 派博传思   京公网安备110108008328 版权所有 All rights reserved
快速回复 返回顶部 返回列表