零热中华特征识别,与台式分解水平分解 (Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition)

Chinese character recognition has attracted much research interest due to its wide applications. Although it has been studied for many years, some issues in this field have not been completely resolved yet, e.g. the zero-shot problem. Previous character-based and radical-based methods have not fundamentally addressed the zero-shot problem since some characters or radicals in test sets may not appear in training sets under a data-hungry condition. Inspired by the fact that humans can generalize to know how to write characters unseen before if they have learned stroke orders of some characters, we propose a stroke-based method by decomposing each character into a sequence of strokes, which are the most basic units of Chinese characters. However, we observe that there is a one-to-many relationship between stroke sequences and Chinese characters. To tackle this challenge, we employ a matching-based strategy to transform the predicted stroke sequence to a specific character. We evaluate the proposed method on handwritten characters, printed artistic characters, and scene characters. The experimental results validate that the proposed method outperforms existing methods on both character zero-shot and radical zero-shot tasks. Moreover, the proposed method can be easily generalized to other languages whose characters can be decomposed into strokes.

翻译：中国人因其广泛应用而吸引了许多研究兴趣。尽管多年来一直对中国人的个性认同进行了研究,但该领域的一些问题尚未完全解决,例如零发问题。以前基于字符和激进的方法没有从根本上解决零发问题,因为测试组中的某些字符或激进在数据饥饿的条件下可能没有出现在培训组中。受以下事实的启发,即人类在学习到某些字符的中风顺序之前,可以概括地了解如何写出看不见的字符。我们建议采用中风方法,将每个字符分解成一个中风序列,这是中国字符最基本的单位。然而,我们观察到中风序列和中华字符之间存在一对一的关系。为了应对这一挑战,我们采用了基于匹配的战略,将预测的中风序列转换成一个具体特性。我们评估了手写字符、印刷艺术字符和场景字符的拟议方法。实验结果证实,拟议的方法在字符零发和激进零发任务上都超越了现有方法。此外,拟议的方法可以很容易地将其他字符简单化为普通化。