MSC：星际争霸II宏观管理数据集 (MSC: A Dataset for Macro-Management in StarCraft II)

Macro-management is an important problem in StarCraft, which has been studied for a long time. Various datasets together with assorted methods have been proposed in the last few years. But these datasets have some defects for boosting the academic and industrial research: 1) There're neither standard preprocessing, parsing and feature extraction procedures nor predefined training, validation and test set in some datasets. 2) Some datasets are only specified for certain tasks in macro-management. 3) Some datasets are either too small or don't have enough labeled data for modern machine learning algorithms such as deep neural networks. So most previous methods are trained with various features, evaluated on different test sets from the same or different datasets, making it difficult to be compared directly. To boost the research of macro-management in StarCraft, we release a new dataset MSC based on the platform SC2LE. MSC consists of well-designed feature vectors, pre-defined high-level actions and final result of each match. We also split MSC into training, validation and test set for the convenience of evaluation and comparison. Besides the dataset, we propose a baseline model and present initial baseline results for global state evaluation and build order prediction, which are two of the key tasks in macro-management. Various downstream tasks and analyses of the dataset are also described for the sake of research on macro-management in StarCraft II. Homepage: https://github.com/wuhuikai/MSC.

翻译：宏观管理一直是星际争霸的一个重要问题，近年来已经有多种数据集及相应的方法被提出来。但是这些数据集大多存在一些缺陷，不利于学术界及产业界的研究进展：1）有些数据集中缺乏标准的预处理、解析和特征提取程序，也没有预定义的训练、验证和测试集；2）有些数据集仅适用于特定的宏观管理任务；3）有些数据集要么规模太小，要么标注数据不足，不利于现代机器学习算法，如深度神经网络的训练。因此，大多数方法都是使用各种特征进行训练，并在来自同一个或不同数据集的不同测试集上进行评估，难以直接进行比较。为了推进星际争霸宏观管理的研究，我们基于SC2LE平台发布了一个新数据集MSC。MSC包含经过精心设计的特征向量、预定义的高级行动以及每场比赛的最终结果。此外，我们将MSC分为训练、验证和测试集，以方便评估和比较。除了数据集，我们提出了一个基线模型，并介绍了宏观管理中的两个关键任务——全局状态评估和建造顺序预测的初始结果。为了促进星际争霸II宏观管理研究，还描述了数据集的各种下游任务和分析。主页：https：//github.com/wuhuikai/MSC。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

44+阅读 · 2020年6月29日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日