In many real-world applications, we are interested in approximating black-box, costly functions as accurately as possible with the smallest number of function evaluations. A complex computer code is an example of such a function. In this work, a Gaussian process (GP) emulator is used to approximate the output of complex computer code. We consider the problem of extending an initial experiment (set of model runs) sequentially to improve the emulator. A sequential sampling approach based on leave-one-out (LOO) cross-validation is proposed that can be easily extended to a batch mode. This is a desirable property since it saves the user time when parallel computing is available. After fitting a GP to training data points, the expected squared LOO (ES-LOO) error is calculated at each design point. ES-LOO is used as a measure to identify important data points. More precisely, when this quantity is large at a point it means that the quality of prediction depends a great deal on that point and adding more samples nearby could improve the accuracy of the GP. As a result, it is reasonable to select the next sample where ES-LOO is maximised. However, ES-LOO is only known at the experimental design and needs to be estimated at unobserved points. To do this, a second GP is fitted to the ES-LOO errors and where the maximum of the modified expected improvement (EI) criterion occurs is chosen as the next sample. EI is a popular acquisition function in Bayesian optimisation and is used to trade-off between local/global search. However, it has a tendency towards exploitation, meaning that its maximum is close to the (current) "best" sample. To avoid clustering, a modified version of EI, called pseudo expected improvement, is employed which is more explorative than EI yet allows us to discover unexplored regions. Our results show that the proposed sampling method is promising.


翻译:在许多真实世界应用程序中, 我们感兴趣的是近似黑盒, 尽可能精确地使用最小数量的函数评估, 成本高昂的 黑盒 。 复杂的计算机代码是这种函数的一个示例。 在此工作中, 使用高萨进程模拟器来接近复杂的计算机代码的输出。 我们考虑的是, 如何按顺序扩展初始实验( 模型运行的一套) 来改进模拟器。 提议了一种基于 休假一出( LOO) 选择的交叉校验的顺序抽样方法, 它可以很容易地扩展到批量模式 。 这是一个可取的属性, 因为它可以在平行计算时节省用户的间隔时间 。 在将GP 安装到培训数据点后, 将每个设计点计算成平方LO( ES- LOO( ES- LO) 的模拟错误。 ES- LOO( 使用最高值) 用于测量重要数据点。 更确切地说, 当这个数据量在一个点上显示, 预估测值值值值值到预估测值的 E- LOO( ) 将比值显示, 最接近于 E- l) 。 。 。 将显示的是, 最接近的 ES- 度值是 最高值的 。 将值的 。

0
下载
关闭预览

相关内容

专知会员服务
50+阅读 · 2020年12月14日
专知会员服务
52+阅读 · 2020年9月7日
【新书】Python编程基础,669页pdf
专知会员服务
186+阅读 · 2019年10月10日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
99+阅读 · 2019年10月9日
Hierarchically Structured Meta-learning
CreateAMind
23+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
25+阅读 · 2019年5月18日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
16+阅读 · 2018年12月24日
【SIGIR2018】五篇对抗训练文章
专知
12+阅读 · 2018年7月9日
【论文】变分推断(Variational inference)的总结
机器学习研究会
39+阅读 · 2017年11月16日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
强化学习 cartpole_a3c
CreateAMind
9+阅读 · 2017年7月21日
VIP会员
Top
微信扫码咨询专知VIP会员