GPT-2 试用总结及感想

中国科学技术大学物理学博士

hahakity 原创

这两天各大公众号都被 OpenAI 的 GPT-2 模型刷屏，我也在第一时间从 GitHub 上下载了模型及预训练好的参数，试图调戏一把。结果感觉不是在调戏 GPT-2，而是被 GPT-2 调戏了。如果各位想被 GPT-2 调戏，可以按照下面这些步骤一步步来。

下载，安装

首先说下这个模型如何下载，安装。代码公布在 Github 上，模型参数放在 Google 云（知道这一点的时候心里有一万个❓，OpenAI 没钱搞自己的数据服务器吗？为什么要用Google云？）。

代码下载非常容易，简单的克隆到本机，

git clone https://github.com/openai/gpt-2.git

预训练的参数，就不是那么好下载了。OpenAI 声称15亿参数的大 GPT-2 产生的假新闻太以假乱真，他们不敢面对将其放出的后果，最终只放出了1.17亿参数的小兽，命名为 117M。要下载这组预训练的参数，需要先安装 gsutil, 这是安装 Google SDK 的时候，附送的云端数据下载工具。但是最简单的安装方式是使用 python2.7，

pip2 install gsutil --user

安装成功之后，下载数据的时候一定要记住不能漏掉 117M, 我在下载数据的时候，以为117M代表下载后数据有117兆，就没复制到命令行，结果每次都告诉我没权限下载，说多了都是泪

sh download_model.sh 117M

要想运行这个模型，还要安装一些python库，

pip3 install -r requirements.txt

如果一切顺利，那么就可以开始调戏 GPT-2 或被 GPT-2 调戏了。

测试结果

在简短的说明文档里，有三种示例运行模式，第一种是无限制乱说模式，

python3 src/generate_unconditional_samples.py | tee samples

这段命令行的意思是，| 这个通道符号之前是主要的调用代码，无限制生成文本片段，tee 是 linux 下的一个命令行工具，用于将产生的文本重定向到文件 samples 中。第二种运行模式是用几个参数改变无限制生成文本的特性，

python3 src/generate_unconditional_samples.py --top_k 40 --temperature 0.7 | tee samples

其中最重要的那个参数可能是 temperature （温度）, 这个参数调节输出的随机性，温度越高，随机性越强，越低，出现copy训练语料里说过的话的几率就越大。具体实现应该是从 logits 到字典中字的 index 计算的 softmax 函数中加入温度参数 T，

\frac{\exp(z_i/T)}{\sum_j \exp(z_j/T)}

第三种运行模式, 交互式按照用户提示生成文本片段，这种模式最有意思，也是我们要调戏的对象。调用代码是，

python3 src/interactive_conditional_samples.py

这时，在提示符那里输入一些脑洞大开的故事开头，就可以坐看 GPT-2 如何接下去了。我试了一些例子，如果以中文开头，接下去几乎没正常的，可能训练语料里根本没多少中文网页，比如输入特朗普和普京在游泳池打起来了：

Model prompt >>> 特朗普和普京在游泳池打起来了
======================================== SAMPLE 1 )（GPT-2接）
即面在京在饘到在游泳池。
返正 Thursday, Chengjiang * Saturday, Minsk and Moscow *
Monday, Dec 5th 천o Now. He's filming you at the front of the garden and when your ship arrives he gets angry and tries to rip your leash out of your hand*
In this course the picture who gets non-safety hands, are you kidding me?
Today he grabbed my 3 month old Daddy and forced him to let me off.
By that point I didn't want my leash and Baby tried to get onto Captain Cheka's leash that gave up inside her, he grabbed his pitbull tigger from get on the walk and put it around my testicles
This year he tried to tase me drag me off the road and drown in his recycling site three times over (full of worst kind) by pulling my shorts
The dog didn't die out with him but was seriously injured the most, losing a tire with his jaw got to his hand on something and using his tail to do it, he is still missing.

GPT-2 接下来编的故事有些黄暴，但是跟我输入的故事开头没有任何联系。换成英文会不会好一点呢？接下来输入英文：川普想建墙的原因是外星人正在入侵墨西哥

Model prompt >>> Donald Trump wants to build the wall because Aliens are invading Mexico
======================================== SAMPLE 1 （GPT-2接）
," he said. Trump believes that border safety as Trump says, that some illegal aliens are good immigrants. Trump claims that the United States is at a disadvantage right now due to the Obama immigration policies. He said that the level of federal spending on wall wall have been partly cut due to that, including American taxpayers' cost of the wall. Trump said that information should have been provided by the Department of Homeland Security and it would have handled it with care, asking an audience that the vast majority of Americans didn't get what they wanted from the Mayors for Truth and Justice they signed.

基本上这一段跟川普有关，跟建墙有关，但是跟外星人入侵墨西哥没有丝毫关系。此外，后面有一大段对话我没有贴上来，基本那些对话跟川普，建墙，墨西哥，外星人都毫无关系。尝试了不少例子，都是能猜到开头，却猜不到结尾，117M 小模型GPT-2 会很快的偏离主题，使得接下来的故事语法无大错，却无语义，无逻辑。

更新：345M 中等大小模型试用

经网友提醒，最近 OpenAI 放出了中等大小的预训练模型参数，共有 345M 参数。这一部分会更新一下 GPT-2 中级版的表现。

如果你之前按照这篇文章试用了 GPT-2，为了更新到最新版本，只需要使用版本管理系统的 pull 功能更新。首先进入之前下载的 GPT-2 文件夹，并在命令行运行，

git pull

如果是第一次尝试，则之间按照之前的步骤下载源代码。在新版本力，模型参数的下载脚本由 download_model.sh 变成了 download_model.py。为了下载 345M 模型参数（需要下载1.4G文件），运行

python download_model.py 345M

运行之后，脚本会生成新的文件夹，models/345M/, 里面存放着新的模型参数。其中 encoder.json 里面存放着字典，而 hparams.json 里面存放着模型结构参数，内容如下，

{
"n_vocab": 50257, 字典大小，有50257个不同的符号和英文单词
"n_ctx": 1024, 文本的长度，比如输入文本有 1024 个单词。
"n_embd": 1024, 词向量，位置向量，以及内部特征向量的维数
"n_head": 16, 多头注意力，16个注意力磁头，平分1024 个隐藏特征
"n_layer": 24 24 层 Tranformer 结构
}

作为对比，之前的 117M 小模型使用的参数组合为 n_embd = 768, n_head=12, n_layer=12。

为了使用 345M 模型参数，需要将 src/generate_unconditional_samples.py 以及条件生成文件里的模型名字改成 model_name='345M'。我尝试了

 python src/generate_unconditional_samples.py --help

来查看是否可以通过命令行参数的形式选择 345M 模型，但程序忽律了 --help 选项，直接运行，生成样本。所以目前为止，只能手动改程序，运行 345M 模型。117M的模型还可以在我的苹果电脑上本地运行，但 345M 模型在本地运行速度极其缓慢，等待很久也没有看到结果输出。为了测试，只好在GPU 服务器上运行，这次使用 NVIDIA K40 GPU。

在交互模式下，如果输入提示语：What if the big one hits California?，那么 GPT-2 生成的内容与地震没有任何关系。这句提示语中有个隐喻，住在加州的人都知道未来几年或几十年可能有一个超级大地震，并给它起别名 “the big one"。GPT-2 尚未学习到这句话中的 “the big one“ 指代的是存在想象中的超级大地震。如果不使用隐喻，明确的说出，What if the big earthquake hits California? 。则会生成非常相关的文本片段。与117M模型结果相比，可以用一个词“天翻地覆”来形容。

GPT-2 345M 模型生成文本结果如下，

Model prompt >>> What if the big earthquake hits California?
用户提示>>> 如果加州发生超级大地震怎么办？
A huge quake with 12.1 miles (23 km) of seismic rupture on the Pacific coast could seriously damage or destroy California homes, towns, and facilities, plus worse, damage small towns.
太平洋沿岸地震破裂12.1英里（23公里）的巨大地震可能严重破坏或摧毁加利福尼亚州的住宅，城镇和设施，更糟糕的是，破坏小城镇。
As San Joaquin Delta residents prepare for layoffs, millions could fall without electricity, heat, cellular service, toll transit services, and water. Confusion and fear is everything related to the huge quake. No one knows if we will survive, 4.6 million California residents can evacuate through Friday night.
因为圣华金（北加州一个地区）居民准备停工，数百万人可能会在没有电，热，蜂窝服务，收费公交服务和水的情况下倒下。混乱和恐惧是与巨大地震有关的一切。没有人知道我们是否能活下来，有460万加州居民可以在周五晚上撤离。
Why would 11.1 magnitude shake enough properties to collapse a one-third of a San Francisco City from California into a jumbled mess?
为什么11.1级别的震动足以让旧金山城三分之一从加利福尼亚陷入混乱？
08. North Carolina
08. 北加州
Type of earthquake: 7.2 NNI (Natural Quakes and Heated Flows)
地震类型：7.2 NNI（自然地震和热流）
The first quake to strike in the U.S., which was centered 29 miles north of Greensboro (North Carolina) on October 30, 1876, triggered an upwards degress by 1.3 magnitude and some of major towns to go foggy in sunny Allegany County. People sustained only light to light injuries from that epicenter, and repairs were a matter of days. There are no record estimate of the extent of damage, but viral videos and sound recordings suggest consumption has been crippled by 100 to 30,000 Tonnes of radicals, 30 to 150 trillion Megajoules worth of flooding, ending the largest fuel crop worldwide, including alcohol plants across the world.
1876年10月30日，位于格林斯伯勒（北卡罗来纳州）以北29英里，美国的第一次地震引发了1.3级的向上 degress（GPT-2创造的词），阳光明媚的阿勒格尼县，一些主城镇变得雾气弥漫。人们只能从那个震中轻伤到轻伤，修理只需几天。没有关于损害程度的记录估计，但是病毒视频和录音表明消费量已经减少了100到30,000吨自由基，30到150万亿兆焦耳的洪水，结束了全球最大的燃料作物，包括酒精植物。
After a TEMPORARY history, the two largest freeways of North America really do begin [to close], allowing for homes to sit and cove located upon Manhattan's City Center much of the timeframe after the quake. The smaller distance was signed over a mere year to about 13 miles of Anacostia River…the dike: http://www.rcnbrb.org/padLinks/pak.html.
很短一段时间之后，北美最大的两条高速公路确实开始[关闭]，允许住宅在地震发生后的大部分时间内坐落在曼哈顿市中心。距离较小的距离只有一年到约13英里的阿纳科斯蒂亚河......堤防
A SLIGHT TWIST: the South Dragon, which is damaged by superlative water levels caused by the AP tendons damaged in downstream inunders with water temperatures above .7 C (1,340 to 2,870 F, for reference sake).
下游河流水温超过0.7摄氏度（1,340至2,870华氏度，参考），引起超高水位，摧毁了the South Dragon。
Seismicity from the North American-North Atlantic plate merger shadowing long continental crust. An eighth record displacement of land, from Antarctica:(5-10 Lambday's vulnerable Southernport)<|endoftext|>Hey Smashing Magazine,
北美 - 北大西洋板块合并引起的地震活动，遮蔽了（shadowing 词性错误）长长的大陆地壳。来自南极洲的八分之一土地流离失所：（5-10 Lambday的脆弱的南部港口）<| endoftext |> Hey Smashing Magazine，
Our latest issue comes out this Wednesday and there have been some pretty amazing updates throughout the month! And the latest brings
Smashing 杂志，
我们最新的问题在本周三发布，整个月都有一些非常精彩的更新！而最新带来的

后面我又尝试了用“HUAWEI's new phone is way better than iphone, do you want to buy one? " 以及 “ People's park is so dangerous, do not go there!"， GPT-2 生成的段落大部分时候与提示语毫无关联。希望在 GPT-2 最大的模型放出来之后，再做一次测试。

GPT-2 应该是一个伟大的进步，但是文章不能这样写，宣传不能这样做。这场游戏中，如果玩家太少，大家不能复现文章里的结果，这次 OpenAI 的宣传又要很快凉凉。拿这个117M参数的GPT-2出来，给大家调戏，结果最终变成调戏大家，引起众怒就得不偿失了。当然，OpenAI也可能是在做饥饿营销，先吊吊大家的胃口，火上一周，再勉为其难的放出15亿参数的大模型，给火上浇点油，以期达到BERT那样的经久不衰的宣传效果。可能需要中文预训练模型或多语言预训练模型出来，OpenAI 放出15亿预训练参数，GPT-2才真的会火上天。事实究竟如何，让我们拭目以待。

编辑于 2019-05-06 13:38

深度学习（Deep Learning）

人工智能

自然语言处理

GPT-2 试用总结及感想

下载，安装

测试结果

更新：345M 中等大小模型试用

评论

文章被以下专栏收录

AI+X