We present fastHan, an open-source toolkit for four basic tasks in Chinese natural language processing: Chinese word segmentation, Part-of-Speech tagging, named entity recognition, and dependency parsing. The kernel of fastHan is a joint many-task model based on a pruned BERT, which uses the first 8 layers in BERT. We also provide a 4-layer base version of model compressed from the 8-layer model. The joint-model is trained and evaluated in 13 corpora of four tasks, yielding near state-of-the-art (SOTA) performance in the dependency parsing task and SOTA performance in the other three tasks. In addition to its small size and excellent performance, fastHan is also very user-friendly. Implemented as a python package, fastHan allows users to easily download and use it. Users can get what they want with one line of code, even if they have little knowledge of deep learning. The project is released on Github.
翻译:我们提出了快速汉,这是一个开放源码工具包,用于中国自然语言处理中的四项基本任务:中文单词分割、部分语音标记、名称实体识别和依赖性剖析。快速汉的内核是一种基于经处理的BERT的多任务联合模型,它使用BERT的前8层。我们还提供了从8层模式压缩的模型的4级基本版本。联合模型以13个组合对4个任务进行了培训和评价,在依赖性剖析任务和另外3个任务中接近最先进的SOTA(SOTA)性能。快速汉除了规模小和极佳的性能外,还非常方便用户使用。快速汉作为python软件包实施,让用户能够轻松下载和使用它。用户可以用一行代码获得他们想要的东西,即使他们对深度学习知之甚少。该项目在Githhub上发布。