In spite of the high accuracy of the existing optical mark reading (OMR) systems and devices, a few restrictions remain existent. In this work, we aim to reduce the restrictions of multiple choice questions (MCQ) within tests. We use an image registration technique to extract the answer boxes from answer sheets. Unlike other systems that rely on simple image processing steps to recognize the extracted answer boxes, we address the problem from another perspective by training a machine learning classifier to recognize the class of each answer box (i.e., confirmed, crossed out, or blank answer). This gives us the ability to deal with a variety of shading and mark patterns, and distinguish between chosen (i.e., confirmed) and canceled answers (i.e., crossed out). All existing machine learning techniques require a large number of examples in order to train a model for classification, therefore we present a dataset including six real MCQ assessments with different answer sheet templates. We evaluate two strategies of classification: a straight-forward approach and a two-stage classifier approach. We test two handcrafted feature methods and a convolutional neural network. In the end, we present an easy-to-use graphical user interface of the proposed system. Compared with existing OMR systems, the proposed system has the least constraints and achieves a high accuracy. We believe that the presented work will further direct the development of OMR systems towards reducing the restrictions of the MCQ tests.
翻译:尽管现有的光学标记读取系统和设备的高度精准性(OMR),但仍有一些限制。在这项工作中,我们的目标是减少测试中多种选择问题(MCQ)的限制。我们使用图像注册技术从答题单中提取答案框。与其他系统不同,我们依靠简单的图像处理步骤来识别抽取的答案框,我们从另一个角度解决问题,培训机器学习分类员来识别每个答题箱的等级(即,确认、脱解或空白回答)。这使我们有能力处理各种阴影和标记模式,区分选择(即确认)和取消的答案(即,跨出)。所有现有的机器学习技术都需要大量的例子来训练分类模式,因此我们提出了一个数据集,包括六个真正的MCQ评估,有不同的答题模板。我们评价了两种分类战略:直线前行法和两阶段分类方法。我们测试了两种手制地貌方法,并区分了两种进式神经网络,区分了选择(即确认)和取消的答案(即:跨出)。在最后,我们目前所有机器学习技术需要大量的例子来训练一个模型,我们相信现有的用户系统将有一个容易进行直接测试。