A huge number of multi-participant dialogues happen online every day, which leads to difficulty in understanding the nature of dialogue dynamics for both humans and machines. Dialogue disentanglement aims at separating an entangled dialogue into detached sessions, thus increasing the readability of long disordered dialogue. Previous studies mainly focus on message-pair classification and clustering in two-step methods, which cannot guarantee the whole clustering performance in a dialogue. To address this challenge, we propose a simple yet effective model named CluCDD, which aggregates utterances by contrastive learning. More specifically, our model pulls utterances in the same session together and pushes away utterances in different ones. Then a clustering method is adopted to generate predicted clustering labels. Comprehensive experiments conducted on the Movie Dialogue dataset and IRC dataset demonstrate that our model achieves a new state-of-the-art result.
翻译:每天在网上进行大量多参与者对话,这导致难以理解人和机器对话动态的性质。对话分解的目的是将纠缠不开的对话分解为独立的会话,从而增加长期无序对话的可读性。以往的研究主要侧重于信息-纸质分类和以两步方法分组,这无法保证对话中整个组合的性能。为了应对这一挑战,我们提出了一个简单而有效的模型,名为CluCDD, 以对比性学习来汇总语句。更具体地说,我们的模式将同一会话集中在一起,将不同会话推开。然后采用分组方法生成预测的组群标签。在电影对话数据集和研究中心数据集上进行的全面实验表明,我们的模型取得了新的最新结果。