One of the impediments in advancing actuarial research and developing open source assets for insurance analytics is the lack of realistic publicly available datasets. In this work, we develop a workflow for synthesizing insurance datasets leveraging state-of-the-art neural network techniques. We evaluate the predictive modeling efficacy of datasets synthesized from publicly available data in the domains of general insurance pricing and life insurance shock lapse modeling. The trained synthesizers are able to capture representative characteristics of the real datasets. This workflow is implemented via an R interface to promote adoption by researchers and data owners.
翻译:推进精算研究和开发保险分析公开源资产的障碍之一是缺乏现实的公开可得数据集。在这项工作中,我们开发了利用最新神经网络技术综合保险数据集的工作流程。我们评估了从一般保险定价和人寿保险冲击模拟领域公开可得数据合成的数据集的预测模型效力。经过培训的合成人能够捕捉真实数据集的代表性特征。通过R接口实施这一工作流程,以促进研究人员和数据所有人采用。