基于概率图的文本检索模型及算法研究

项目名称： 基于概率图的文本检索模型及算法研究

项目编号： No.61462043

项目类型： 地区科学基金项目

立项/批准年度： 2015

项目学科： 计算机科学学科

项目作者： 左家莉

作者单位： 江西师范大学

项目金额： 46万元

中文摘要： 信息检索是应对海量信息最有效的手段，但检索结果仍难以满足用户要求的快捷、准确的信息需求。为方便信息检索建模，在文档预处理阶段丢弃了大量相关信息；在检索阶段，用户信息需求表达为查询，表现形式为3-5个索引项，难以有效表达用户真实查询意图，这是造成检索精确度降低的主要原因。由于大量相关信息难以建模，因而难以构建良好的文档表示模型，当前的信息检索模型大多蕴含独立性假设。查询重构模型虽在一定程度上解决了查询太短的问题，但也会因为加入查询的信息不相关或太多，导致查询主题漂移，降低检索精度。本项目试图为文档表示模型和文本检索模型构建统一框架，借助概率图理论构造文档表示模型和文本检索模型，在图模型中研究节点（索引项节点、文档节点）重要性模型，进而对查询重构模型进行研究。项目所构造的模型可实现概念层上的文本检索模型和查询重构模型，可有效提高检索性能。

中文关键词： 概率图；文档表示模型；索引项重要性；查询重构；文本检索模型

英文摘要： Although Information retrieval has been the most effective means to deal with mass information, the search results are still difficult to meet the user's needs of fast and accurate information. In order to simplify information retrieval modeling, preprocessing discards a lot of relevant information of the document. In the retrieval phase, the user information needs expressed as queries containing only 3-5 terms make it difficult to effectively express user's real information needs. All these reasons cause poor retrieval accuracy. As it is too difficult to model lots of relevant information, building a good document representation model turns to be difficult, which makes most of the current information retrieval models take independence assumption. Query reformulation model can solve the problem of short queries to some extent. However, it may result in the query topic drift and make retrieval performance poor when add too much irrelevant information to query. The project attempts to construct a unified framework for document representation model and text retrieval model. By means of probability graph theory, the project construct document representation model and text retrieval model model, and explore node importance model in the graph model, then study query reformulation model. The model constructed by the project can realize text retrieval and query reformulation in the level of concept and then improve retrieval performance.

英文关键词： Probability Graph;Document Representation Model;Term Importance;Query Reformulation;Text Retrieval Model

成为VIP会员查看完整内容