Frost:制定基准和探索数据匹配结果 (Frost: Benchmarking and Exploring Data Matching Results) - 专知论文

会员服务 ·

0

可理解性 · entity · SOFT · 相同 · 有向 ·

2021 年 7 月 22 日

Frost: Benchmarking and Exploring Data Matching Results

翻译：Frost:制定基准和探索数据匹配结果

Martin Graf,Lukas Laskowski,Florian Papsdorf,Florian Sold,Roland Gremmelspacher,Felix Naumann,Fabian Panse

"Bad" data has a direct impact on 88% of companies, with the average company losing 12% of its revenue due to it. Duplicates - multiple but different representations of the same real-world entities - are among the main reasons for poor data quality. Therefore, finding and configuring the right deduplication solution is essential. Various data matching benchmarks exist which address this issue. However, many of them focus on the quality of matching results and neglect other important factors, such as business requirements. Additionally, they often do not specify how to explore benchmark results, which helps understand matching solution behavior. To address this gap between the mere counting of record pairs vs. a comprehensive means to evaluate data matching approaches, we present the benchmark platform Frost. Frost combines existing benchmarks, established quality metrics, a benchmark dimension for soft KPIs, and techniques to systematically explore and understand matching results. Thus, it can be used to compare multiple matching solutions regarding quality, usability, and economic aspects, but also to compare multiple runs of the same matching solution for understanding its behavior. Frost is implemented and published in the open-source application Snowman, which includes the visual exploration of matching results.

翻译：“ 错误” 数据直接影响到88%的公司,平均公司收入的12%因此损失了12%。重复—— 相同真实世界实体的多重但不同的表现—— 是数据质量差的主要原因。因此, 找到和配置正确的解析解决方案至关重要。各种数据匹配基准可以解决这个问题。但是, 其中许多数据匹配基准侧重于匹配结果的质量, 忽视其他重要因素, 如商业要求。此外, 它们往往没有具体说明如何探索基准结果, 这有助于理解匹配解决方案的行为。为了解决仅仅计算记录对对对与评估数据匹配方法的全面方法之间的差距, 我们介绍了基准平台Frost。 Frost 将现有的基准、建立的质量指标、软的KPIs的基准维度以及系统探索和理解匹配结果的技术结合起来。因此, 它可以用来比较质量、可使用性和经济方面的多重匹配解决方案, 但也用来比较同一匹配解决方案的多重运行量, 以了解其行为。 Frostowman 应用软件中实施并公布, 包括直观匹配结果的探索。

0

相关内容

可理解性

如何构建你的推荐系统？这份21页ppt教程为你讲解

如何构建你的推荐系统？这份21页ppt教程为你讲解

专知会员服务

65+阅读 · 2021年2月12日

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

24+阅读 · 2020年3月31日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【ICSI Lecture 2019】多媒体数据机器学习的实验设计（Experimental Design for Machine Learning on Multimedia Data）

【ICSI Lecture 2019】多媒体数据机器学习的实验设计（Experimental Design for Machine Learning on Multimedia Data）

专知会员服务

8+阅读 · 2019年11月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【数据集】新的YELP数据集官方下载

【数据集】新的YELP数据集官方下载

机器学习研究会

16+阅读 · 2017年8月31日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

SOCIALGYM: A Framework for Benchmarking Social Robot Navigation

Arxiv

0+阅读 · 2021年9月22日

A Benchmark Comparison of Visual Place Recognition Techniques for Resource-Constrained Embedded Platforms

Arxiv

0+阅读 · 2021年9月22日

Benchmarking Graph Data Management and Processing Systems: A Survey

Arxiv

0+阅读 · 2021年9月22日

Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework

Arxiv

5+阅读 · 2021年7月27日

Domain Generalization using Causal Matching

Arxiv

12+阅读 · 2021年6月29日

Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective

Arxiv

4+阅读 · 2021年3月31日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Horizontal Pyramid Matching for Person Re-identification

Arxiv

3+阅读 · 2018年4月30日

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Arxiv

5+阅读 · 2018年3月27日

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Arxiv

5+阅读 · 2018年2月14日

VIP会员

文章信息

相关主题

相关VIP内容

如何构建你的推荐系统？这份21页ppt教程为你讲解

如何构建你的推荐系统？这份21页ppt教程为你讲解

专知会员服务

65+阅读 · 2021年2月12日

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

24+阅读 · 2020年3月31日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【ICSI Lecture 2019】多媒体数据机器学习的实验设计（Experimental Design for Machine Learning on Multimedia Data）

【ICSI Lecture 2019】多媒体数据机器学习的实验设计（Experimental Design for Machine Learning on Multimedia Data）

专知会员服务

8+阅读 · 2019年11月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《复合人工智能决策优势：面向军事行动的人类数字孪生智能体编队与群体建模》最新文献

中文版《整合蓝绿作战域：北约空陆一体化向多域作战演进》2025最新资料

演进中的空中力量指挥控制体系

《在轨空间目标多智能体检测的制导、导航与控制》195页

相关资讯

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【数据集】新的YELP数据集官方下载

【数据集】新的YELP数据集官方下载

机器学习研究会

16+阅读 · 2017年8月31日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

SOCIALGYM: A Framework for Benchmarking Social Robot Navigation

Arxiv

0+阅读 · 2021年9月22日

A Benchmark Comparison of Visual Place Recognition Techniques for Resource-Constrained Embedded Platforms

Arxiv

0+阅读 · 2021年9月22日

Benchmarking Graph Data Management and Processing Systems: A Survey

Arxiv

0+阅读 · 2021年9月22日

Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework

Arxiv

5+阅读 · 2021年7月27日

Domain Generalization using Causal Matching

Arxiv

12+阅读 · 2021年6月29日

Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective

Arxiv

4+阅读 · 2021年3月31日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Horizontal Pyramid Matching for Person Re-identification

Arxiv

3+阅读 · 2018年4月30日

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Arxiv

5+阅读 · 2018年3月27日

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Arxiv

5+阅读 · 2018年2月14日

微信扫码咨询专知VIP会员