Function calling is a fundamental capability of today's large language models, but sequential function calling posed efficiency problems. Recent studies have proposed to request function calls with parallelism support in order to alleviate this issue. However, they either delegate the concurrent function calls to users for execution which are conversely executed sequentially, or overlook the relations among various function calls, rending limited efficiency. This paper introduces LLMOrch, an advanced framework for automated, parallel function calling in large language models. The key principle behind LLMOrch is to identify an available processor to execute a function call while preventing any single processor from becoming overburdened. To this end, LLMOrch models the data relations (i.e., def-use) among different function calls and coordinates their executions by their control relations (i.e., mutual-exclusion) as well as the working status of the underlying processors. When comparing with state-of-the-art techniques, LLMOrch demonstrated comparable efficiency improvements in orchestrating I/O-intensive functions, while significantly outperforming (2$\times$) them with compute-intensive functions. LLMOrch's performance even showed a linear correlation to the number of allocated processors. We believe that these results highlight the potential of LLMOrch as an efficient solution for parallel function orchestration in the context of large language models.
翻译:函数调用是当前大语言模型的核心能力之一,但顺序函数调用存在效率瓶颈。近期研究提出通过支持并行调用来缓解此问题,然而现有方法要么将并发函数调用交由用户顺序执行,要么忽略不同函数调用间的关联,导致效率提升有限。本文提出LLMOrch,一种面向大语言模型的自动化并行函数调用框架。其核心原理是在避免单个处理器过载的前提下,为函数调用分配合适的可用处理器。为此,LLMOrch通过数据关系(即定义-使用关系)建模函数调用间的依赖,并基于控制关系(即互斥关系)与底层处理器工作状态协调执行过程。与前沿技术相比,LLMOrch在编排I/O密集型函数时展现出相当的效率提升,而在处理计算密集型函数时性能显著超越现有方法(达2倍)。LLMOrch的性能表现与分配的处理器数量呈线性相关。这些结果表明LLMOrch有望成为大语言模型并行函数编排的高效解决方案。