Simon Wiesler A. R., Schlüter R., Ney H. Mean-normalized stochastic gradient for large-scale deep learning. IEEE Intern. Conf. on Acoustics, Speech, and Signal Processing, 2014, pp. 180–184. doi:10.
1109/ICASSP.2014.6853582
He K., Zhang X., Ren S., Sun J.Deep residual learning for image recognition. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp. 770–778. doi:10.1109/CVPR.2016.90
Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z. Rethinking the inception architecture for computer vision. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826. doi:10.1109/CVPR.2016.308
Chiu C. C.,et al. State-of-the-art speech recognition with sequence-to-sequence models. IEEE Intern. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4774–4778. doi:10.1109/ICASSP.
2018.8462105
Kipyatkova I., Karpov A. DNN-based acoustic modeling for Russian speech recognition using Kaldi. Intern. Conf. on Speech and Computer (SPECOM), 2016, pp. 246–253. doi:10.1007/978-3-319-43958-7_29
Verkhodanova V., Ronzhin A., Kipyatkova I. Havrus corpus: high-speed recordings of audio-visual Russian speech. Intern. Conf. on Speech and Computer
(SPECOM), 2016, pp. 338–345. doi:10.1007/978-3-
319-43958-7_40
Verhelst W., Roelands M. An overlap-add technique based on waveform similarity (wsola) for high quality time-scale modification of speech. Acoustics, Speech, and Signal Processing (ICASSP), 1993, pp. 554–557. doi:10.1109/ICASSP.1993.319366
Инструмент обработки звука Sox. http://sox.sourceforge.net/sox.html (дата обращения: 27.02.2019).
Povey D., et al. The Kaldi speech recognition toolkit. IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, 2011. https://infoscience.epfl.ch/ record/192584/ (дата обращения: 27.02.2019).
27.02.2019).
Markovnikov N., Kipyatkova I., Karpov A., Filchenkov A. Deep neural networks in Russian speech recognition. Conf. on Artificial Intelligence and Natural Language (AINL), 2017, pp. 54–67. doi:10.1007/9783-319-71746-3_5
Chen S. F., Goodman J. An empirical study of smoothing techniques for language modeling. Computer