自动数据与软物质科学应用的自动数据叠加数据驱动方法 (A Data-Driven Method for Automated Data Superposition with Applications in Soft Matter Science)

The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently by one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, non-parametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise, and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability -- specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.

翻译：数据集的叠加与内部参数自异性是分析物理科学中许多类型的实验数据的长期和广泛技术。通常,这种叠加是人工操作的,或最近由少数自动算法之一进行。然而,这些方法往往具有超常性质,容易通过人工数据转换或参数化造成用户偏差,而且缺乏处理数据不确定性和由此得出的被取代数据模型的本地框架。在这项工作中,我们开发了一种数据驱动的非参数方法,用以以任意协调转换的方式取代实验数据,使用高山进程回归法学习描述数据的统计模型,然后使用最事后估计法优化地强化数据集。这种统计框架对实验噪音很强,并自动为学习的协调转换产生不确定性估计。此外,它与黑箱机器在可解释性方面的学习有区别 -- 具体地说,它产生一种可以自己加以探索的模型,以便了解正在研究的系统。我们通过应用这些方法的突出特征,通过四个具有代表性的模型来学习描述数据,然后使用最优化的估算方法来优化数据结构的自我解释。这种统计框架对数据进行自我分析, 使用这种标准化的模型进行复制, 使数据分析方法的模型能够复制。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日