波斯学龄前儿童语言评估自动语音识别 (Automatic Speech Recognition for Speech Assessment of Persian Preschool Children)

Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition (ASR) system would not help since they are pre-trained on voices that differ from children's in terms of frequency and amplitude. Because most of these are pre-trained with data in a specific range of amplitude, their objectives do not make them ready for voices in different amplitudes. To overcome this issue, we added a new objective to the masking objective of the Wav2Vec 2.0 model called Random Frequency Pitch (RFP). In addition, we used our newly introduced dataset to fine-tune our model for Meaningless Words (MW) and Rapid Automatic Naming (RAN) tests. Using masking in concatenation with RFP outperforms the masking objective of Wav2Vec 2.0 by reaching a Word Error Rate (WER) of 1.35. Our new approach reaches a WER of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.

翻译：学前教育评估至关重要,因为它使教师和家长对儿童成长和发展有影响的知识。COVID-19大流行强调对学龄前儿童进行在线评估的必要性。应该测试的领域之一是他们说话的能力。使用自动语音识别系统没有帮助,因为他们事先就与儿童的频率和振幅不同的声音进行了培训。由于这些系统大多在具体的振幅范围内对数据进行了预先培训,因此他们的目标不能使他们为不同振幅中的声音做好准备。为了克服这一问题,我们为Wav2Vec 2.0模式的蒙面目标增加了一个新目标,称为随机频率Pitch(RFP)。此外,我们使用我们新推出的数据集来微调我们用于无意义的言语和快速自动命名(RAN)测试的模型。使用与RFP的混音掩蔽方式超越了Wav2Vec 2.0的目标,达到1.35的WER错误率(WER)。我们的新办法达到了WER-45的新版本,在共同的6.45中产生了一种正面的波斯结果。