Volume- 3
Issue- 2
Year- 2015
Prashant Tryambak Mhetre , Kedar Jyotiba Devkar, Amol Dattatraya Bhagat, Pramod Shivaji Patil, Kuldeep B. Vayadande
The large quantity of education video out there on web, a rise the usability of video data are growing rapidly. Video transcription within which should be conversion video lecture into text information. This can be manner of produce document or notes through the video. This paper present an ASR technique supported Hidden Markov Model. First of all, extract audio from video and transforms speech wave form into multiple frame used by recognition, applying Automatic Speech Recognition on audio track and extract raw data from audio. Then analysis of data in order to get the phonetic dictionary, the pronunciation of every word must be represent phonetically. And represent text document as output of video file
[1] Haojin Yang, Christoph Meinel “Content Based Lecture Video Retrieval Using Speech and Video Text Information,” IEEE Transactions on Learning Technologies, Vol. 7, No. 2, April-June 2014.
[2] Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, Joe Woelfel, “Sphinx-4: A Flexible Open Source Framework for Speech Recognition“, www.cmusphinx.sourceforge.net/sphinx4.
[3] Jigar Patel, Kailash Singh Maurya, Sameer Kulkarni “Multimedia Keyword Spotting (MKWS) Using Training and Template Based Techniques” IJETAE, Volume 4, Issue 2, February 2014.
[4] R. Ordelman, F. de Jong, and M. Larson, "Enhanced Multimedia Content Access and Exploitation Using Semantic Speech Retrieval," in Proc. ICSC '09, 2009, pp. 521-528.
[5] A. Park and J. R. Glass, "Unsupervised Word Acquisition from Speech using Pattern Discovery," in Proc. ICASSP, 2006, pp. I 409 - I 412.
[6] J. G. Wilpon, L. R. Rabiner, C. H. Lee, and E. R. Goldman, "Automatic recognition of keywords in unconstrained speech using hidden Markov models,"IEEE Trans. Acoustics, Speech and Signal Processing, vol. 38, pp. 1870-1878, 1990.
[7] Li S. Z. “Content-based classification and retrieval of audio using the nearest feature line method [J],” IEEE Transactions on Speech and Audio Processing. 2000, vol (8). pp: 619-625.
[8] A. Jansen and P. Niyogi, "Point Process Models for Spotting Keywords in Continuous Speech," IEEE Trans. Audio, Speech, and Language Processing, vol. 17, pp. 1457-1470, 2009.
[9] S. J. Young, N. H. Russell, and J. H. S. Russell, “Token passing: A simple conceptual model for connected speech recognition systems,” Cambridge University Engineering Dept, UK, Tech. Rep. CUED/F-INFENG/TR38, 1989.
Department of Information Technology, Bharati Vidyapeeth’s college of Engineering, Kolhapur, India
No. of Downloads: 11 | No. of Views: 997
Prof. Yogesh S. Amle, Prof. Dr. Santosh S. Lomate , Prof. Rajesh A. Auti.
January 2015 - Vol 3, Issue 1