Volume 2, Issue 1, 2022
Articles

Comprehensive Assessment of Automatic Speech Recognition System for building Artificial Intelligent Schemes

Alex Joseph
Department of Computer Science [PG], Kristu Jayanti College (Autonomous), Bengaluru
Dimal Thomas
Department of Computer Science [PG], Jayanti College (Autonomous), Bengaluru
Merlin Reji
Department of Computer Science [PG], Kristu Jayanti College (Autonomous), Bengaluru

Published 2022-06-08

Keywords

  • Speech Recognition, Speech Understanding, ASR Automatic Speech Recognition Systems, Hybrid Systems, Low resource Languages.

How to Cite

Joseph, A., Thomas, D., & Reji, M. (2022). Comprehensive Assessment of Automatic Speech Recognition System for building Artificial Intelligent Schemes. Kristu Jayanti Journal of Computational Sciences (KJCS), 2(1), 57–70. https://doi.org/10.59176/kjcs.v2i1.2218

Abstract

Speech recognition is a multidisciplinary branch of natural language processing (NLP) that allows machines to recognize and translate spoken language into text. Speech recognition is crucial in the digital transformation process and it is widely employed in a variety of fields, such as education, industry, and healthcare, and has lately been employed in a number of Internet of Things (IoT) applications. Speech is a simple and effective method for human communication, but nowadays we are not only connected to one another, but also to the various devices in our lives. As a result, this kind of communication can be used by both computers and people. Interfaces are used to carry out the interaction, which is referred to as Human Computer Interaction (HCI). This paper provides an outline of the key areas of Automatic Speech Recognition (ASR), an important topic in artificial intelligence. It also includes an overview of recent major research in speech processing, as well as a basic idea of our proposal that may be considered as a major contribution to this field of research. The paper also refers to some specific enhancements that can add value to the researchers in future.

Downloads

Download data is not yet available.

References

[1] Hinton G, Deng L, Yu D, Dahl G.E, Mohamed A.R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T.N: Deep neural networks for acoustic model in speech recognition. IEEE Signal Process. Mag. (2012),29(6), 82–97

[2] Huang X, Acero A, Hon H.W: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)

[3] Huang X, Acero A, Hon H.W: Spoken Language Processing, vol. 18. Prentice Hall, Englewood Cliffs (2001)

[4] Talluri Raj: Why edge computing is critical for the IoT. Network World (2017).

[5] Juang, B.H, Hou W Lee: Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process., (1997),5(3), 257–265.

[6] Rabiner L, Juang B.H: Fundamentals of Speech Recognition. Prentice-Hall, Upper Saddle River (1993)

[7] S. Aalburg and H. Hoege: Foreign-accented speaker independent speech recognition. In Proc. of ICSLP, (2004) pages 1465–1468.

[8] M. Adda-Decker and L. Lamel: Pronunciation variants across system configuration, language and speaking style. Speech Communication, (1999),29(2):83–98,

[9] L.M. Arslan and J.H.L. Hansen: Language accent classification in American English. Speech Communication, ,( 1996),18(4):353–367.

[10] B. Atal and L. Rabiner: A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. In IEEE Trans. on Acoustics, Speech, and Signal Processing, (1976),volume 24, 3, pages 201–212.

[11] Mats Blomberg: Adaptation to a speaker‟s voice in a speech recognition system based on synthetic phoneme references. Speech Communication, (1991),10(5-6):453–461.

[12] M. J. F. Gales: Acoustic factorization. Madona di Campiglio, Italy, 2001.

[13] P. L. Garvin and P. Ladefoged: Speaker identification and message identification in speech recognition. Phonetica, 1963.9:193–199.

[14] O‟Shaughnessy D: Automatic speech recognition: history, methods and challenges, Pattern Recognit, , (2008),41 (10), pp. 2965–2979.

[15] Vimala C, Radha V: A review on speech recognition challenges and approaches, World Computer Science Information Technology, (2012),2(1), pp. 1–7.

[16] S. F. Boll: Suppression of Acoustic Noise in Speech Using Spectral Subtraction, IEEE Trans. Acoustics, Speech and Signal Processing, 1979,Vol. 27, pp. 113-120.

[17] M. Berouti, R. Schwartz and J. Makhoul: Enhancement of Speech Corrupted by Acoustic Noise, in Speech Enhancement, J. S. Prentice Hall, 1983, Englewood Cliffs, NJ.

[18] R. Stern and A. Acero: Acoustical Preprocessing for Automatic Speech Recognition, DARPA Speech and Natural Language Workshop, 1989.

[19] Machowski Michael: Speech Recognition and Natural Language Processing as Highly Effective Means of Human-Computer Interaction. University of Colorado, Department of Computing Sciences, 1997.

[20] McAllister, Alex: Voice/Speech Recognition Technologies Report and Tutorial. Bell Atlantic. May 23, 1995.

[21] Peacocke Recihar D, Graf Daryl H: An Introduction to Speech and Speaker Recognition. IEEE Computer (1990),23(8), pp 26 - 33.

[22] White George M: Natural Language Understanding and Speech Recognition. Communications of the ACM, (1990)Vol. 33, No. 8.

[23] Leksandrova, Mary: The Impact of Edge Computing on IoT, The Main Benefits and Real-Life Use Cases. Eastern Peak (2019).

[24] Nelson Patrick: How edge computing can help secure the IoT. Network World (2019).

[25] Caulfield Matt: Edge Computing, 9 Killer Use Cases for Now & the Future. Medium ( 2018).