This study presents a Long Short-Term Memory (LSTM) neural network approach to Japanese word segmentation (JWS). Previous studies on Chinese word segmentation (CWS) succeeded in using recurrent neural networks such as LSTM and gated recurrent units (GRU). However, in contrast to Chinese, Japanese includes several character types, such as hiragana, katakana, and kanji, that produce orthographic variations and increase the difficulty of word segmentation. Additionally, it is important for JWS tasks to consider a global context, and yet traditional JWS approaches rely on local features. In order to address this problem, this study proposes employing an LSTM-based approach to JWS. The experimental results indicate that the proposed model achieves state-of-the-art accuracy with respect to various Japanese corpora.
翻译:本研究提出了日本文字分割的长期短期内存神经网络(LSTM)方法(JWS),以前关于中国文字分割的研究成功地使用了LSTM和封闭式经常单元(GRU)等经常性神经网络(CWS),但与中文不同的是,日本人有几种性格类型,如hiragana、katakana和kanji,它们产生正拼数变化,增加文字分割的困难。此外,JWS的任务必须考虑到全球背景,而传统的JWS方法则依赖当地特点。为解决这一问题,本研究提议对JWS采用基于LSTM的神经网络。实验结果表明,拟议的模型在日本各种社团方面达到了最新准确性。