We have constructed NAIST Academic Travelogue Dataset (ATD) and released it free of charge for academic research. This dataset is a Japanese text dataset with a total of over 31 million words, comprising 4,672 Japanese domestic travelogues and 9,607 overseas travelogues. Before providing our dataset, there was a scarcity of widely available travelogue data for research purposes, and each researcher had to prepare their own data. This hinders the replication of existing studies and fair comparative analysis of experimental results. Our dataset enables any researchers to conduct investigation on the same data and to ensure transparency and reproducibility in research. In this paper, we describe the academic significance, characteristics, and prospects of our dataset.
翻译:我们构建了NAIST学术游记数据集(ATD)并免费公开发布,供学术研究使用。该数据集为日语文本数据集,总词汇量超过3100万,包含4672篇日本国内游记和9607篇海外游记。在提供本数据集之前,广泛可用的游记研究数据较为匮乏,每位研究者需自行准备数据,这阻碍了现有研究的复现及实验结果的公平比较分析。我们的数据集使所有研究者能在相同数据基础上开展研究,确保研究的透明度与可重复性。本文阐述了该数据集的学术意义、特征及未来展望。