Deep Bidirectional Long Short-Term Memory (D-BLSTM) with a Connectionist Temporal Classification (CTC) output layer has been established as one of the state-of-the-art solutions for handwriting recognition. It is well known that the DBLSTM trained by using a CTC objective function will learn both local character image dependency for character modeling and long-range contextual dependency for implicit language modeling. In this paper, we study the effects of implicit and explicit language model information for DBLSTM-CTC based handwriting recognition by comparing the performance of using or without using an explicit language model in decoding. It is observed that even using one million lines of training sentences to train the DBLSTM, using an explicit language model is still helpful. To deal with such a large-scale training problem, a GPU-based training tool has been developed for CTC training of DBLSTM by using a mini-batch based epochwise Back Propagation Through Time (BPTT) algorithm.
翻译:具有连接时间分类(CTC)输出层的深度双向短期短期内存(D-BLSTM)已被确定为最先进的笔迹识别解决方案之一,众所周知,通过使用CTC客观功能培训的DBLSTM将既学习对字符建模的本地字符图像依赖性,也学习对隐含语言建模的长距离背景依赖性。在本文件中,我们通过比较使用或不使用明确的语言模型进行解码的性能,研究隐含和明确的语言模型信息对DBLSTM-CTC笔迹识别的影响。据观察,即使使用100万条培训线的培训DBLSTM(使用明确的语言模型)仍然有帮助。为了处理这种大规模的培训问题,我们开发了一个基于GPU的培训工具,通过使用基于小型批的“超速后时间”推进算法,对CTC培训DBBLSTM进行。