近年来随着移动智能设备的兴起, 人们越来越频繁的接触和使用语音信息, 语音伪造和鉴伪成为语音处理领域中愈加重要的技术. 本文首先梳理了语音合成系统的一般流程, 并对语音伪造领域中主要的文本到语音(text-to-speech, TTS)和语音转换(voice conversion, VC)两项技术进行系统归纳; 接着, 对语音鉴伪技术中常见的算法进行介绍和分类; 最后, 针对语音伪造和鉴伪目前存在的问题, 本文从数据、模型、训练方法以及应用场景等多个角度出发提出未来可能的发展方向.
In recent years, with the rise of mobile intelligent devices, people contact and use voice information more and more frequently. Voice forgery and its detection have become increasingly important technologies in the field of voice processing. Firstly, this study clarifies the general process of a voice generation system and systematically summarizes the two main technologies, text-to-speech (TTS) and voice conversion (VC), in the field of voice forgery. Then, the common algorithms in voice forgery detection technology are introduced and classified. Finally, to tackle the existing problems in voice forgery and its detection, this study puts forward possible development directions from the perspectives of data, models, training methods and application scenarios.