在线编程任务中超时错误的系统性研究 (A Systematic Study of Time Limit Exceeded Errors in Online Programming Assignments)

Online programming platforms such as Codeforces and LeetCode attract millions of users seeking to learn to program or refine their skills for industry interviews. A major challenge for these users is the Time Limit Exceeded (TLE) error, triggered when a program exceeds the execution time bound. Although designed as a performance safeguard, TLE errors are difficult to resolve: error messages provide no diagnostic insight, platform support is minimal, and existing debugging tools offer little help. As a result, many users abandon their submissions after repeated TLE failures. This paper presents the first large-scale empirical study of TLE errors in online programming. We manually analyzed 1000 Codeforces submissions with TLE errors, classified their root causes, and traced how users attempted to fix them. Our analysis shows that TLE errors often arise not only from inefficient algorithms but also from infinite loops, improper data structure use, and inefficient I/O, challenging the conventional view that TLEs are purely performance issues. Guided by these findings, we introduce Nettle, the first automated repair tool specifically designed for TLE errors, and Nettle-Eval, the first framework for evaluating TLE repairs. Integrating LLMs with targeted automated feedback generated by the compiler and test cases, Nettle produces small, correct code edits that eliminate TLEs while preserving functionality. Evaluated on the same 1000 real-world cases, Nettle achieves a 98.5% fix rate, far exceeding the strongest LLM baseline, and all of its repairs pass both Nettle-Eval and the platform's official checker, confirming the reliability of our framework.

翻译：诸如Codeforces和LeetCode等在线编程平台吸引了数百万旨在学习编程或为行业面试精进技能的用户。这些用户面临的一个主要挑战是超时错误，即当程序超出执行时间限制时触发。尽管作为性能保障机制设计，超时错误却难以解决：错误信息不提供诊断线索，平台支持有限，现有调试工具也帮助甚微。因此，许多用户在经历多次超时失败后放弃提交。本文首次对在线编程中的超时错误进行了大规模实证研究。我们手动分析了1000个Codeforces平台上出现超时错误的提交，对其根本原因进行了分类，并追踪了用户尝试修复这些错误的过程。我们的分析表明，超时错误不仅源于算法低效，还常由无限循环、数据结构使用不当以及低效的输入/输出操作引起，这挑战了超时错误纯属性能问题的传统观点。基于这些发现，我们提出了首个专门针对超时错误设计的自动化修复工具Nettle，以及首个用于评估超时修复的框架Nettle-Eval。Nettle将大型语言模型与编译器及测试用例生成的有针对性自动反馈相结合，生成精简且正确的代码编辑，在保持功能性的同时消除超时错误。在相同的1000个真实案例上评估，Nettle实现了98.5%的修复率，远超最强的LLM基线，且所有修复均通过了Nettle-Eval和平台官方检查器的验证，证实了我们框架的可靠性。