Aspect-based sentiment analysis involves the recognition of so called opinion target expressions (OTEs). To automatically extract OTEs, supervised learning algorithms are usually employed which are trained on manually annotated corpora. The creation of these corpora is labor-intensive and sufficiently large datasets are therefore usually only available for a very narrow selection of languages and domains. In this work, we address the lack of available annotated data for specific languages by proposing a zero-shot cross-lingual approach for the extraction of opinion target expressions. We leverage multilingual word embeddings that share a common vector space across various languages and incorporate these into a convolutional neural network architecture for OTE extraction. Our experiments with 5 languages give promising results: We can successfully train a model on annotated data of a source language and perform accurate prediction on a target language without ever using any annotated samples in that target language. Depending on the source and target language pairs, we reach performances in a zero-shot regime of up to 77% of a model trained on target language data. Furthermore, we can increase this performance up to 87% of a baseline model trained on target language data by performing cross-lingual learning from multiple source languages.
翻译:以外观为基础的情绪分析涉及对所谓的见解目标表达式(OTEs)的承认。为了自动提取 OTEs,通常使用监督的学习算法,这些算法在手动附加说明的语体上受过培训。因此,创建这些Cororora是一个劳动密集型的,因此,通常只有非常狭窄的语言和领域才能获得足够庞大的数据集。在这项工作中,我们提出一种零点点点滴的跨语言方法,用于提取见解目标表达式,从而解决特定语言缺乏附加说明的数据的问题。我们利用多种语言的多语种嵌入系统,在不同语言中共享一个共同的矢量空间,并将这些嵌入OTE的动态神经网络结构。我们用5种语言进行的实验产生了有希望的结果:我们可以成功地培训一个源语言附加说明的数据模型,并在不使用目标语言的任何附加说明样本的情况下对目标语言进行准确的预测。根据源和目标语言配对,我们从目标语言数据培训的模型的零点化系统达到77%。此外,我们可以通过跨语言学习多种语言,将这一基准模型的绩效提高到87%。