We present a framework that allows users to incorporate the semantics of their domain knowledge for topic model refinement while remaining model-agnostic. Our approach enables users to (1) understand the semantic space of the model, (2) identify regions of potential conflicts and problems, and (3) readjust the semantic relation of concepts based on their understanding, directly influencing the topic modeling. These tasks are supported by an interactive visual analytics workspace that uses word-embedding projections to define concept regions which can then be refined. The user-refined concepts are independent of a particular document collection and can be transferred to related corpora. All user interactions within the concept space directly affect the semantic relations of the underlying vector space model, which, in turn, change the topic modeling. In addition to direct manipulation, our system guides the users' decision-making process through recommended interactions that point out potential improvements. This targeted refinement aims at minimizing the feedback required for an efficient human-in-the-loop process. We confirm the improvements achieved through our approach in two user studies that show topic model quality improvements through our visual knowledge externalization and learning process.
翻译:我们提出了一个框架,使用户能够将其域知识的语义用于主题模型的完善,同时保留模型的不可知性。我们的方法使用户能够(1) 理解模型的语义空间,(2) 查明潜在冲突和问题的区域,(3) 根据其理解,重新调整概念的语义关系,直接影响到主题的建模。这些任务得到互动视觉分析工作空间的支持,该工作空间使用文字组合预测来界定概念区域,然后加以完善。用户精炼的概念独立于特定的文件收藏,并可以转移到相关的公司。概念空间内的所有用户互动直接影响到基本矢量空间模型的语义关系,而后者反过来又改变了主题建模。除了直接操作外,我们的系统通过建议的互动来指导用户的决策过程,从而指出潜在的改进。这种有针对性的改进旨在尽量减少高效的人类环形进程所需的反馈。我们确认通过两个用户研究方法取得了改进,通过视觉知识的外部化和学习过程来显示主题质量的模型改进。