A significant use case of instruction-finetuned Large Language Models (LLMs) is to solve question-answering tasks interactively. In this setting, an LLM agent is tasked with making a prediction by sequentially querying relevant information from the user, as opposed to a single-turn conversation. This paper explores sequential querying strategies that aim to minimize the expected number of queries. One such strategy is Information Pursuit (IP), a greedy algorithm that at each iteration selects the query that maximizes information gain or equivalently minimizes uncertainty. However, obtaining accurate estimates of mutual information or conditional entropy for LLMs is very difficult in practice due to over- or under-confident LLM proba- bilities, which leads to suboptimal query selection and predictive performance. To better estimate the uncertainty at each iteration, we propose Conformal Information Pursuit (C-IP), an alternative approach to sequential information gain based on conformal prediction sets. More specifically, C-IP leverages a relationship between prediction sets and conditional entropy at each iteration to estimate uncertainty based on the average size of conformal prediction sets. In contrast to conditional entropy, we find that conformal prediction sets are a distribution-free and robust method of measuring uncertainty. Experiments with 20 Questions show that C-IP obtains better predictive performance and shorter query-answer chains compared to previous approaches to IP and uncertainty-based chain-of-thought methods. Furthermore, extending to an interactive medical setting between a doctor and a patient on the MediQ dataset, C-IP achieves competitive performance with direct single-turn prediction while offering greater interpretability.
翻译:指令微调的大型语言模型(LLMs)的一个重要应用场景是交互式解决问答任务。在此设定下,LLM智能体通过顺序向用户查询相关信息来做出预测,而非进行单轮对话。本文探讨了旨在最小化预期查询次数的顺序查询策略。其中一种策略是信息追踪(IP),这是一种贪心算法,在每次迭代中选择能最大化信息增益或等效地最小化不确定性的查询。然而,由于LLM概率存在过度自信或不足自信的问题,实践中难以准确估计互信息或条件熵,这导致查询选择和预测性能次优。为了更准确地估计每次迭代的不确定性,我们提出了保形信息追踪(C-IP),这是一种基于保形预测集的顺序信息增益替代方法。具体而言,C-IP利用每次迭代中预测集与条件熵之间的关系,基于保形预测集的平均大小来估计不确定性。与条件熵相比,我们发现保形预测集是一种无需分布假设且稳健的不确定性度量方法。在20 Questions上的实验表明,与先前的IP方法和基于不确定性的思维链方法相比,C-IP获得了更好的预测性能和更短的查询-回答链。此外,在MediQ数据集上扩展至医生与患者之间的交互式医疗场景时,C-IP在提供更强可解释性的同时,实现了与直接单轮预测相竞争的性能。