Traditionally, machine learning algorithms have been focused on modeling dynamics of a certain dataset at hand for which all features are available for free. However, there are many concerns such as monetary data collection costs, patient discomfort in medical procedures, and privacy impacts of data collection that require careful consideration in any health analytics system. An efficient solution would only acquire a subset of features based on the value it provides whilst considering acquisition costs. Moreover, datasets that provide feature costs are very limited, especially in healthcare. In this paper, we provide a health dataset as well as a method for assigning feature costs based on the total level of inconvenience asking for each feature entails. Furthermore, based on the suggested dataset, we provide a comparison of recent and state-of-the-art approaches to cost-sensitive feature acquisition and learning. Specifically, we analyze the performance of major sensitivity-based and reinforcement learning based methods in the literature on three different problems in the health domain, including diabetes, heart disease, and hypertension classification.
翻译:传统上,机器学习算法侧重于手头某一数据集的建模动态,所有功能都可以免费获得,然而,有许多问题,如货币数据收集费用、医疗程序中病人不适和数据收集的隐私影响,这些都需要在任何健康分析系统中仔细考虑。一个有效的解决办法只能根据其提供的价值获得一组特征,同时考虑获取成本。此外,提供特征成本的数据集非常有限,特别是在医疗保健方面。我们在本文件中提供了一套健康数据集,以及根据对每个特征要求的不便程度确定特征成本的方法。此外,根据建议的数据集,我们比较了最新和最新最先进的方法与成本敏感的特征获取和学习方法。具体地说,我们分析了文献中基于主要敏感性和强化学习方法在卫生领域三个不同问题上的绩效,包括糖尿病、心脏病和高血压分类。