大型语言模式能带来什么变化? 对超高水平语言的深入研究:十亿韩国创世先发制人式变异器 (What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers)

Boseop Kim,HyoungSeok Kim,Sang-Woo Lee,Gichang Lee,Donghyun Kwak,Dong Hyeon Jeon,Sunghyun Park,Sungju Kim,Seonhoon Kim,Dongpil Seo,Heungsub Lee,Minyoung Jeong,Sungjae Lee,Minsub Kim,Suk Hyun Ko,Seokhun Kim,Taeyong Park,Jinuk Kim,Soyoung Kang,Na-Hyeon Ryu,Kang Min Yoo,Minsuk Chang,Soobin Suh,Sookyo In,Jinseong Park,Kyungduk Kim,Hiun Kim,Jisu Jeong,Yong Goo Yeo,Donghoon Ham,Dongju Park,Min Young Lee,Jaewook Kang,Inho Kang,Jung-Woo Ha,Woomyoung Park,Nako Sung

from arxiv, Accepted to EMNLP2021 as a long paper

GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications.

翻译：GPT-3展示了在数千亿大尺度数据方面受过培训的大型语言模型(LMs)的超文本学习能力。在这里,我们处理GPT-3论文少报的一些遗留问题,如非英语LM、不同规模模型的绩效以及最近引入的迅速优化对中文本学习的影响。为了实现这一点,我们引入了82B GPT-3的韩国变体,即HyperCLOVA, 一种韩国核心560B符号体积培训的韩国变体。得到韩国特有象征性化的加强,超文本CLOVA与我们的培训配置显示了韩国各种下游任务的最新文本零效果和少见的学习表现。此外,我们展示了快速学习的绩效效益,并展示了如何将其纳入快速工程管道。然后,我们讨论了通过引入超文本CLOVA工作室(交互式快速工程界面)向非专家提供人工智能转换能力,从而实现无规范AI模式的可能性。最后,我们展示了我们方法在三个内部应用成功的可能性。