ChatPipe: 通过优化人-ChatGPT交互来协调数据准备程序 (ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions)

Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it is known to be time and effort consuming. Despite the impressive capabilities of large language models like ChatGPT in generating programs by interacting with users through natural language prompts, there are still limitations. Specifically, a user must provide specific prompts to iteratively guide ChatGPT in improving data preparation programs, which requires a certain level of expertise in programming, the dataset used and the ML task. Moreover, once a program has been generated, it is non-trivial to revisit a previous version or make changes to the program without starting the process over again. In this paper, we present ChatPipe, a novel system designed to facilitate seamless interaction between users and ChatGPT. ChatPipe provides users with effective recommendation on next data preparation operations, and guides ChatGPT to generate program for the operations. Also, ChatPipe enables users to easily roll back to previous versions of the program, which facilitates more efficient experimentation and testing. We have developed a web application for ChatPipe and prepared several real-world ML tasks from Kaggle. These tasks can showcase the capabilities of ChatPipe and enable VLDB attendees to easily experiment with our novel features to rapidly orchestrate a high-quality data preparation program.

翻译：在成功进行机器学习（ML）之前，协调高质量的数据准备程序至关重要，但其已知需要耗费大量时间和精力。尽管大型语言模型如ChatGPT在与用户通过自然语言提示进行交互以生成程序方面具有惊人的能力，但仍存在局限性。具体而言，用户必须提供具体提示以逐步指导ChatGPT改进数据准备程序，这种需求需要对编程、使用的数据集以及ML任务有一定的专业知识。此外，一旦程序生成，重新访问以前的版本或者对程序进行更改是非常困难的，需要重新开始整个过程。在本文中，我们提出ChatPipe，一个旨在促进用户和ChatGPT之间无缝交互的创新系统。ChatPipe为用户提供了有效的下一个数据准备操作推荐，并指导ChatGPT生成操作的程序。此外，ChatPipe使用户轻松地回滚到程序的以前版本，从而促进更有效的实验和测试。我们已经为ChatPipe开发了一个Web应用程序，并准备了几个Kaggle的真实ML任务。这些任务可以展示ChatPipe的能力，并使VLDB参与者轻松地尝试我们的新功能，以快速协调高质量的数据准备程序。

相关内容

ChatGPT

关注 255

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

【干货书】面向程序员的机器学习与人工智能的教科书，681页DF

专知会员服务

121+阅读 · 2021年7月1日

百页Python编程指南

专知会员服务

70+阅读 · 2021年2月16日

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

专知会员服务

75+阅读 · 2020年7月12日