Enhance Text Creature through Video

Prashant Tryambak Mhetre; Kedar Jyotiba Devkar; Amol Dattatraya Bhagat; Pramod Shivaji Patil; Kuldeep B. Vayadande

Abstract

The large quantity of education video out there on web, a rise the usability of video data are growing rapidly. Video transcription within which should be conversion video lecture into text information. This can be manner of produce document or notes through the video. This paper present an ASR technique supported Hidden Markov Model. First of all, extract audio from video and transforms speech wave form into multiple frame used by recognition, applying Automatic Speech Recognition on audio track and extract raw data from audio. Then analysis of data in order to get the phonetic dictionary, the pronunciation of every word must be represent phonetically. And represent text document as output of video file

Keywords

Lecture videos, HMM, Large Vocabulary.

References

[1] Haojin Yang, Christoph Meinel “Content Based Lecture Video Retrieval Using Speech and Video Text Information,” IEEE Transactions on Learning Technologies, Vol. 7, No. 2, April-June 2014.

[2] Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, Joe Woelfel, “Sphinx-4: A Flexible Open Source Framework for Speech Recognition“, www.cmusphinx.sourceforge.net/sphinx4.

[3] Jigar Patel, Kailash Singh Maurya, Sameer Kulkarni “Multimedia Keyword Spotting (MKWS) Using Training and Template Based Techniques” IJETAE, Volume 4, Issue 2, February 2014.

[4] R. Ordelman, F. de Jong, and M. Larson, "Enhanced Multimedia Content Access and Exploitation Using Semantic Speech Retrieval," in Proc. ICSC '09, 2009, pp. 521-528.

[5] A. Park and J. R. Glass, "Unsupervised Word Acquisition from Speech using Pattern Discovery," in Proc. ICASSP, 2006, pp. I 409 - I 412.

[6] J. G. Wilpon, L. R. Rabiner, C. H. Lee, and E. R. Goldman, "Automatic recognition of keywords in unconstrained speech using hidden Markov models,"IEEE Trans. Acoustics, Speech and Signal Processing, vol. 38, pp. 1870-1878, 1990.

[7] Li S. Z. “Content-based classification and retrieval of audio using the nearest feature line method [J],” IEEE Transactions on Speech and Audio Processing. 2000, vol (8). pp: 619-625.

[8] A. Jansen and P. Niyogi, "Point Process Models for Spotting Keywords in Continuous Speech," IEEE Trans. Audio, Speech, and Language Processing, vol. 17, pp. 1457-1470, 2009.

[9] S. J. Young, N. H. Russell, and J. H. S. Russell, “Token passing: A simple conceptual model for connected speech recognition systems,” Cambridge University Engineering Dept, UK, Tech. Rep. CUED/F-INFENG/TR38, 1989.

Cites this article as

[Prashant Tryambak Mhetre, Kedar Jyotiba Devkar, Amol Dattatraya Bhagat, Pramod Shivaji Patil, Kuldeep B. Vayadande (2015), Enhance Text Creature through Video, International Journal of Innovative Research in Computer Science and Technology (IJIRCST), Vol-3, Issue-2, Page No-28-30], (ISSN 2347 - 5552). www.ijircst.org

Corresponding Author

Prashant Tryambak Mhetre

Department of Information Technology, Bharati Vidyapeethâ€™s college of Engineering, Kolhapur, India

Download Full Paper

Download PDF

No. of Downloads: 11 | No. of Views: 997

IJIRCST

Enhance Text Creature through Video

Citations

Download Full Paper PDF

Total View 997

Total Download 11