One task that is included in managing documents is how to find substantial information inside. Topic modeling is a technique that has been developed to produce document representation in form of keywords. The keywords will be used in the indexing process and document retrieval as needed by users. In this research, we will discuss specifically about Probabilistic Latent Semantic Analysis (PLSA). It will cover PLSA mechanism which involves Expectation Maximization (EM) as the training algorithm, how to conduct testing, and obtain the accuracy result.
翻译:管理文件的任务之一是如何找到内部的实质性信息。主题建模是一种技术,已经开发出来,以关键字的形式生成文件代表。关键字将用于索引编制过程和用户所需的文件检索。在这项研究中,我们将具体讨论概率性边端语义分析(PLSA),它将包括PLSA机制,它包括期望最大化(EM)作为培训算法,如何进行测试和获得准确性结果。