Исследование методов построения моделей кодер-декодер для распознавания русской речи


Keywords — speech recognition, neural networks, end-to-end models, machine learning, attention mechanism, encoder-decoder models. For citation



бет22/23
Дата02.01.2022
өлшемі220.96 Kb.
#452242
түріИсследование
1   ...   15   16   17   18   19   20   21   22   23
issledovanie-metodov-postroeniya-modeley-koder-dekoder-dlya-raspoznavaniya-russkoy-rechi

Keywordsspeech recognition, neural networks, end-to-end models, machine learning, attention mechanism, encoder-decoder models.

For citation: Markovnikov N. M., Kipyatkova I. S. Encoder-decoder models for recognition of Russian speech. Informatsionnoupravliaiushchie sistemy [Information and Control Systems], 2019, no. 4, pp. 45–53 (In Russian). doi:10.31799/1684-8853-2019-4-45-53
References


1. Bahdanau D., Chorowski J., Serdyuk D., Brakel P., Bengio Y. End-to-end attention-based large vocabulary speech recognition. Acoustics, Speech and Signal Processing (ICASSP), 2016, pp

. 4945–4949. doi:10.1109/ICASSP.2016.7472618 2. Allauzen C., Riley M., Schalkwyk J., Skut W., Mohri M. OpenFst: A general and efficient weighted finite-state transducer library. Implementation and Application of Automata, 2007, pp. 11–23. doi:10.1007/978-3-540-76336-9_3

  1. Chan W., Jaitly N., Le Q., Vinyals O. Listen, attend and spell: A neural networkfor large vocabulary conversational speech recognition. Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960–4964. doi:10.1109/ICASSP. 2016.7472621

  2. Graves Jaitly N., Mohamed A.-r. Hybrid speech recognition with deep bidirectional LSTM. Automatic Speech Recognition and Understanding (ASRU), IEEE Workshop on, IEEE, 2013, pp. 273–278. doi:10.1109/ASRU.2013.6707742 5. Hochreiter S., Schmidhuber J. Long short-term memory. Neural Computation, 1997, no. 9, pp. 1735–1780. doi:10.1162/ neco.1997.9.8.1735

  1. Vaswani A., et. al. Attention is all you need. arXiv, 2017. Available at: http://arxiv.org/abs/1706.03762 (accessed 27 February 2019).

  1. Besacier L., Barnard E., Karpov A., Schultz T. Automatic speech

  1. recognition for under-resourced languages: A survey. Speech Communication, 2014, pp. 85–100. doi:10. 1016/j.specom.2013.07.008

  2. Markovnikov N., Kipyatkova I. A survey of end-to-end speech recognition systems. Trudy SPIIRAN [SPIIRAS Proceedings], 2018, vol. 58, pp. 77–110 (In Russian). doi:10.15622/sp.58.4

  3. Sutskever Vinyals O., Le Q. V. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 2014, pp. 3104–3112.

  4. Robinson T., Hochberg M., Renals S. The use of recurrent neural networks in continuous speech recognition. Automatic Speech and Speaker Recognition, Springer, 1996, pp. 233–258.

  5. Chorowski J. K., Bahdanau D., Serdyuk D., Cho K., Bengio Y. Attention-based models for speech recognition. Advances in Neural Information Processing Systems, 2015, pp. 577– 585.

  1. Bahdanau D., Cho K., Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv, 2014. Available at: http://arxiv.org/abs/1409.0473 (accessed 27 February 2019).

  2. Ganchev T., Fakotakis N., Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task. Proc. of the SPECOM, 2005, pp. 191–194.

  3. Kingma D. P., Ba J. Adam: A method for stochastic optimization. arXiv, 2014. Available at: http://arxiv.org/abs/1412. 6980 (accessed 27 February 2019).

  4. Zeyer A., Doetsch P., Voigtlaender P., Schluter R., and Ney H. A comprehensive study of deep bidirectional LSTM RNNs for acoustic modeling in speech recognition. Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 2462–2466. doi:10.1109/ICASSP.2017.7952599

  5. Sennrich R., Haddow B., and Birch A. Neural machine translation of rare words with subword units. ACL, 2016, pp. 1715–1725. doi:10.18653/v1/P16-1162

  6. Simon Wiesler A. R., Schlüter R., Ney H. Mean-normalized stochastic gradient for large-scale deep learning. IEEE Intern. Conf. on Acoustics, Speech, and Signal Processing, 2014, pp. 180–184. doi:10.1109/ICASSP.2014.6853582

  7. He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp. 770–778. doi:10.

1109/CVPR.2016.90

  1. Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z. Rethinking the inception architecture for computer vision. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826. doi:10.1109/CVPR.2016.308

  2. Chiu C. C., et. al. State-of-the-art speech recognition with sequence-to-sequence models. IEEE Intern. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4774–4778. doi:10.1109/ICASSP.2018.8462105

  3. Kipyatkova I., Karpov A. DNN-based acoustic modeling for Russian speech recognition using Kaldi. Intern. Conf. on Speech and Computer (SPECOM), 2016, pp. 246–253. doi:10.1007/978-3-319-43958-7_29

  4. Verkhodanova V., Ronzhin A., Kipyatkova I. Havrus corpus: high-speed recordings of audio-visual Russian speech.



Достарыңызбен бөлісу:
1   ...   15   16   17   18   19   20   21   22   23




©dereksiz.org 2024
әкімшілігінің қараңыз

    Басты бет