The stock market is a complex and dynamic system, where it is non-trivial for researchers and practitioners to uncover underlying patterns and forecast stock movements. The existing studies for stock market analysis rely on leveraging various types of information to extract useful factors, which are highly conditional on the quality of the data used. However, the currently available resources are mainly based on the U.S. stock market in English, which is inapplicable to adapt to other countries. To address these issues, we propose CSMD, a multimodal dataset curated specifically for analyzing the Chinese stock market with meticulous processing for validated quality. In addition, we develop a lightweight and user-friendly framework LightQuant for researchers and practitioners with expertise in financial domains. Experimental results on top of our datasets and framework with various backbone models demonstrate their effectiveness compared with using existing datasets. The datasets and code are publicly available at the link: https://github.com/ECNU-CILAB/LightQuant.
翻译:股票市场是一个复杂且动态的系统,研究人员与从业者揭示其内在规律并预测股价走势并非易事。现有股市分析研究依赖于利用多类信息提取有效因子,其效果高度依赖于所用数据的质量。然而,当前可用资源主要基于英文的美国股市数据,难以直接适配其他国家市场。为应对这些问题,我们提出CSMD——一个专为分析中国股市而构建的多模态数据集,经过精细处理以保证数据质量。此外,我们为具备金融领域专业知识的研究者与从业者开发了轻量级用户友好框架LightQuant。基于本数据集与框架、结合多种骨干模型的实验结果表明,相较于使用现有数据集,本方案具有显著优势。数据集与代码已通过以下链接公开:https://github.com/ECNU-CILAB/LightQuant。