Stock market prediction is a long-standing challenge in finance, as accurate forecasts support informed investment decisions. Traditional models rely mainly on historical prices, but recent work shows that financial news can provide useful external signals. This paper investigates a multimodal approach that integrates companies' news articles with their historical stock data to improve prediction performance. We compare a Graph Neural Network (GNN) model with a baseline LSTM model. Historical data for each company is encoded using an LSTM, while news titles are embedded with a language model. These embeddings form nodes in a heterogeneous graph, and GraphSAGE is used to capture interactions between articles, companies, and industries. We evaluate two targets: a binary direction-of-change label and a significance-based label. Experiments on the US equities and Bloomberg datasets show that the GNN outperforms the LSTM baseline, achieving 53% accuracy on the first target and a 4% precision gain on the second. Results also indicate that companies with more associated news yield higher prediction accuracy. Moreover, headlines contain stronger predictive signals than full articles, suggesting that concise news summaries play an important role in short-term market reactions.
翻译:股市预测是金融领域长期存在的挑战,准确的预测有助于支持明智的投资决策。传统模型主要依赖历史价格数据,但近期研究表明,金融新闻能提供有用的外部信号。本文研究了一种多模态方法,通过整合公司新闻文章与历史股票数据来提升预测性能。我们将图神经网络(GNN)模型与基准LSTM模型进行了比较。每家公司的历史数据使用LSTM编码,新闻标题则通过语言模型嵌入表示。这些嵌入构成异质图中的节点,并利用GraphSAGE捕捉文章、公司与行业间的交互关系。我们评估了两个预测目标:二元涨跌方向标签和基于显著性的分类标签。在美国股票和彭博数据集上的实验表明,GNN模型优于LSTM基准,在第一个目标上达到53%的准确率,在第二个目标上获得4%的精确度提升。结果还显示,拥有更多关联新闻的公司预测准确率更高。此外,新闻标题比全文包含更强的预测信号,表明简明的新闻摘要在短期市场反应中具有重要作用。