News can convey bearish or bullish views on financial assets. Institutional investors need to evaluate automatically the implied news sentiment based on textual data. Given the huge amount of news articles published each day, most of which are neutral, we present a systematic news screening method to identify the ``true'' impactful ones, aiming for more effective development of news sentiment learning methods. Based on several liquidity-driven variables, including volatility, turnover, bid-ask spread, and book size, we associate each 5-min time bin to one of two specific liquidity modes. One represents the ``calm'' state at which the market stays for most of the time and the other, featured with relatively higher levels of volatility and trading volume, describes the regime driven by some exogenous events. Then we focus on the moments where the liquidity mode switches from the former to the latter and consider the news articles published nearby impactful. We apply naive Bayes on these filtered samples for news sentiment classification as an illustrative example. We show that the screened dataset leads to more effective feature capturing and thus superior performance on short-term asset return prediction compared to the original dataset.
翻译:新闻可以传达对金融资产的看空或看涨意见。机构投资者需要根据文本数据自动评估隐含的新闻情绪。鉴于每天发布的新闻文章数量巨大,其中大部分是中性的,本文提出一种系统化的新闻筛选方法,以识别真正有影响力的新闻,旨在更有效地开发新闻情绪学习方法。基于多个流动性驱动变量,包括波动率、成交量、买卖价差和档位大小,我们将每个5分钟时间段关联到两种特定的流动性模式之一。其中之一表示市场大部分时间保持“平静”的状态,另一个则标志着相对较高水平的波动和交易量驱动的状态,描绘了某些外部事件所驱动的体制。然后,我们关注流动性模式从前者切换到后者的时刻,并考虑附近发布的新闻文章具有影响力。我们在这些筛选样本上应用朴素贝叶斯进行新闻情绪分类,举例说明。我们表明,经过筛选的数据集具有更有效的特征捕获,因此在短期资产回报预测方面表现更佳,相对于原始数据集。