Human communication is inherently multimodal and asynchronous. Analyzing human emotions and sentiment is an emerging field of artificial intelligence. We are witnessing an increasing amount of multimodal content in local languages on social media about products and other topics. However, there are not many multimodal resources available for under-resourced Dravidian languages. Our study aims to create a multimodal sentiment analysis dataset for the under-resourced Tamil and Malayalam languages. First, we downloaded product or movies review videos from YouTube for Tamil and Malayalam. Next, we created captions for the videos with the help of annotators. Then we labelled the videos for sentiment, and verified the inter-annotator agreement using Fleiss's Kappa. This is the first multimodal sentiment analysis dataset for Tamil and Malayalam by volunteer annotators.
翻译:分析人类情绪和情绪是人工智能的一个新兴领域。我们看到,在关于产品和其他主题的社交媒体上,当地语言的多式内容越来越多。然而,没有太多的多式资源可用于资源不足的德拉维迪亚语言。我们的研究旨在为资源不足的泰米尔语和马拉亚拉姆语创建一个多式情绪分析数据集。首先,我们从YouTube下载了泰米尔语和马拉亚拉姆语的产品或电影审查视频。接着,我们在告发员的帮助下为视频制作了字幕。然后,我们用Fleiss的Kappa为视频贴上了情绪标签,并核实了机构间协议。这是由自愿告发员为泰米尔语和马拉亚拉姆语制作的第一个多式情绪分析数据集。