MARTRA:用于天文瞬时事件识别的机器学习参考光曲线数据集 (MANTRA: A Machine Learning reference lightcurve dataset for astronomical transient event recognition)

We introduce MANTRA, an annotated dataset of 4869 transient and 71207 non-transient object lightcurves built from the Catalina Real Time Transient Survey. We provide public access to this dataset as a plain text file to facilitate standardized quantitative comparison of astronomical transient event recognition algorithms. Some of the classes included in the dataset are: supernovae, cataclysmic variables, active galactic nuclei, high proper motion stars, blazars and flares. As an example of the tasks that can be performed on the dataset we experiment with multiple data pre-processing methods, feature selection techniques and popular machine learning algorithms (Support Vector Machines, Random Forests and Neural Networks). We assess quantitative performance in two classification tasks: binary (transient/non-transient) and eight-class classification. The best performing algorithm in both tasks is the Random Forest Classifier. It achieves an F1-score of 96.25% in the binary classification and 52.79% in the eight-class classification. For the eight-class classification, non-transients ( 96.83% ) is the class with the highest F1-score, while the lowest corresponds to high-proper-motion stars ( 16.79% ); for supernovae it achieves a value of 54.57% , close to the average across classes. The next release of MANTRA includes images and benchmarks with deep learning models.

翻译：我们引入了由4869个瞬时和7177个非瞬时天体光线组成的附加数据集MATRA。我们提供公众访问该数据集的简单文本文件,以便于对天文瞬时事件识别算算法进行标准化的定量比较。数据集中包含的一些类别是:超新星、天体变数、活跃的银河核、高正态运动恒星、亮点和耀斑。作为在数据集中可以执行的任务的一个实例,我们试验多种数据预处理方法、特征选择技术和流行机器学习算法(支持矢量机、随机森林和神经网络)。我们评估两个分类任务中的定量性能:二进制(透明/不透明)和八级分类。两项任务中的最佳操作算法是随机森林分类。它在二进级分类中达到96.25%的F-1级,在八级分类中达到52.79%的F-1级分类。在八级分类中, 低级分类、低级(96.83%)和低级机体图像和流行机器学习算算(96-83%)中, 最高级和最高级为F-1级,最高级和最高级。