This article evaluates a first experience of generating artificial children's voices with a Costa Rican accent, using the technique of statistical parametric speech synthesis based on Hidden Markov Models. The process of recording the voice samples used for learning the models, the fundamentals of the technique used and the subjective evaluation of the results through the perception of a group of people is described. The results show that the intelligibility of the results, evaluated in isolated words, is lower than the voices recorded by the group of participating children. Similarly, the detection of the age and gender of the speaking person is significantly affected in artificial voices, relative to recordings of natural voices. These results show the need to obtain larger amounts of data, in addition to becoming a numerical reference for future developments resulting from new data or from processes to improve results in the same technique.
翻译:本文评估了首次以哥斯达黎加口音生成儿童人工声音的经验,使用了基于隐藏马可夫模型的统计参数语言合成技术;描述了用于学习模型的语音样本、所用技术的基本原理和通过一群人的看法对结果进行主观评价的过程;结果显示,以孤立的文字评价的结果的洞察力低于参与儿童群体所记录的声音;同样,与自然声音记录相比,对讲语言者年龄和性别的检测在人工声音方面受到极大影响;这些结果显示,除了成为新数据或改进同一技术结果的过程所产生的未来发展的数字参考之外,还需要获得更多数据。