This paper discusses clustering and latent semantic indexing (LSI) aspects of the singular value decomposition (SVD). The purpose of this paper is twofold. The first is to give an explanation on how and why the singular vectors can be used in clustering. And the second is to show that the two seemingly unrelated SVD aspects actually originate from the same source: related vertices tend to be more clustered in the graph representation of lower rank approximate matrix using the SVD than in the original semantic graph. Accordingly, the SVD can improve retrieval performance of an information retrieval system since queries made to the approximate matrix can retrieve more relevant documents and filter out more irrelevant documents than the same queries made to the original matrix. By utilizing this fact, we will devise an LSI algorithm that mimicks SVD capability in clustering related vertices. Convergence analysis shows that the algorithm is convergent and produces a unique solution for each input. Experimental results using some standard datasets in LSI research show that retrieval performances of the algorithm are comparable to the SVD's. In addition, the algorithm is more practical and easier to use because there is no need to determine decomposition rank which is crucial in driving retrieval performance of the SVD.
翻译:本文讨论单值分解( SVD) 的组群和潜在语义索引( LSI) 。 本文的目的是双重的。 第一是解释如何和为什么在组群中使用单一矢量。 第二是表明两个看起来无关的 SVD 方面实际上起源于同一个来源: 相关的脊椎往往比最初的语义图中, 更集中在使用 SVD 的较低级别近似矩阵的图形表示中。 因此, SVD 能够改进信息检索系统的检索性能, 因为对近似矩阵的查询可以检索更多相关文件, 并过滤比对原始矩阵的相同查询更无关的文件。 通过这个事实, 我们将设计一个 LSI 算法算法, 即 MImick SVD 在相关垂直群集中的能力。 Convergence 分析显示, 算法比较趋同, 并且为每项输入产生独特的解决办法。 因此, SVD 的检索性能与SVD 的检索性能比较相似。 此外, 驱动SV 的性能是更实际的, 因为SV 级是比较容易的, 。