Qualitative research is an approach to understanding social phenomenon based around human interpretation of data, particularly text. Probabilistic topic modelling is a machine learning approach that is also based around the analysis of text and often is used to in order to understand social phenomena. Both of these approaches aim to extract important themes or topics in a textual corpus and therefore we may see them as analogous to each other. However there are also considerable differences in how the two approaches function. One is a highly human interpretive process, the other is automated and statistical. In this paper we use this analogy as the basis for our Theme and Topic system, a tool for qualitative researchers to conduct textual research that integrates topic modelling into an accessible interface. This is an example of a more general approach to the design of interactive machine learning systems in which existing human professional processes can be used as the model for processes involving machine learning. This has the particular benefit of providing a familiar approach to existing professionals, that may can make machine learning seem less alien and easier to learn. Our design approach has two elements. We first investigate the steps professionals go through when performing tasks and design a workflow for Theme and Topic that integrates machine learning. We then designed interfaces for topic modelling in which familiar concepts from qualitative research are mapped onto machine learning concepts. This makes these the machine learning concepts more familiar and easier to learn for qualitative researchers.
翻译:定性研究是一种了解社会现象的方法,其基础是人类对数据的解释,特别是文本。概率性专题建模是一种机器学习方法,其基础是分析文本,并经常用于理解社会现象。这两种方法的目的都是在文本材料中提取重要主题或主题,因此我们可能认为它们彼此相似。在两种方法的功能方面也有相当大的差异。一种是高度人文解释过程,另一种是自动化和统计。在本文中,我们用这一类比作为我们主题和主题系统的基础。我们使用这个类比作为我们主题和主题系统的基础,一个质量研究人员进行文字研究的工具,将主题建模纳入无障碍的界面。这是设计交互式机器学习系统时比较一般的方法的一个例子,其中现有的人类专业程序可以用作涉及机器学习过程的模型。这特别有助于为现有的专业人员提供熟悉的方法,这可能会使机器学习看起来不那么容易,我们的设计方法有两个要素。我们首先调查在从事任务和设计主题和主题的动态时所走的步骤,从主题和主题上设计一个专题上,把熟悉的理论学得更方便。我们随后设计的界面用来学习这些机器的定性概念。我们设计了比较容易学习这些机器概念。我们设计的界面。我们设计了这些接口,然后设计了用来学习这些机器学习这些机器概念。我们用来学习这些概念。我们设计了比较容易学习的课题和研究。