This paper has been withdrawn as we discovered a bug in our tensorflow implementation that involved accidental mixing of vectors across batches. This lead to different inference results given different batch sizes which is completely strange. The performance scores still remain the same but we concluded that it was not the self-attention that contributed to the performance. We are withdrawing the paper because this renders the main claim of the paper false. Thanks to Guan Xinyu from NUS for discovering this issue in our previously open source code.
翻译：本文被撤回, 因为我们发现我们的抗龙流实施中有一个错误, 涉及到不同批次的矢量的意外混合。 这导致不同批次大小的不同的推论结果完全奇怪。 性能评分仍然保持不变, 但我们的结论是, 并不是出于自我注意才促成业绩的。 我们撤回了该文件, 因为这使得纸张的主要主张是假的。 感谢纽西兰州广新州公司在先前的开放源码中发现了这个问题 。