Wednesday, May 6, 2009

History of Automatic Speech Recognition System

The projects in speech recognition are supported by the Advanced Research Projects Agency (ARPA) and since 1971 a lot of things related to acoustic-phonetics, syntax, semantics, and context are more clearly understood. For now we are able to handle relatively small vocabulary size of few hundred words which are trained single time. In case of unrestricted vocabulary size and many speakers we are really far away from the desired results. As for now it seems an intense research with many different domains to handle this task where we just pick up a random person and make his/her recognized.
The primary sources of information in this field are the IEEE Transactions on Acoustics, Speech, and Signal Processing (pertinent special issues: vol. 21, June) and the Journal of Acoustic Society of America (in particular semi annual conference abstracts which appear with January and July issues each year).
Authors who have written in the various branches of Speech Communication includes:
1. Flanagan
2. Fant
3. Lehistic
Other useful sources (along with the researchers) which are presently working in the field of speech recognition includes
1. Bell Telephone Laboratories (Denes, Flanagan)
2. Carnegie- Mellon University (Erman, Newell, Reddy)
3. Research Laboratories of Electronics, M.I.T (Klatt)
4. System Development Corporation (Barnett, Ritea)
5. University of California Berkeley,(O’Malley)
6. Haskins Laboratories (Cooper, Mermelstein)
7. Bolt Beranek and Newman, Inc. (Makhoul, Wolf, Woods)
8. Xerox Palo Alto Research Center (White)
9. Threshold Technologies (Martin)
10. Stanford Research Institute (Walker)
11. Speech Communication Research Laboratories (Broad, Markel, Shoup)
12. IBM Research Laboratories (Bahl, Dixon, Jelinek)
13. Department of Speech Communication, KTH, Stockholm (Fant)


Speech Recognition itself what I feel can be better understood if we know the fundamental structure of each individual unit of speech, i.e., phoneme sounds in each language. Thus, though presently we have started working with recognition of continuous words, it becomes a rather important and difficult task to achieve even phoneme level recognition when the environment is noisy or the phonemes sounds are spoken by people of different geographical locations with different native language.
To make the system speaker independent and of a very large vocabulary size it needs to cater research issues of different fields such as knowing the domain, i.e., the area from where the speaker belongs, the different environments in which the speech is recorded which greatly affects the audio quality and hampers the recognition a lot.

Reference:
1. R. Reddy, “Speech Recognition by Machine: A Review”, IEEE Proceedings 64(4),April
1976, Pg 502-531

Hi every one, I am Raj Rishi Purohit from Gandhinagar(Gujarat). I have just completed my B.Tech in ICT. My research area includes Speech Recognition and Filterbank design using wavelet methods for speech recognition. Presently I am working on Sphinx for my speech recognition projects.
For further details on the topic covered, you can contact me at http://www.webmultimediale.org by posting your views on this short article.
You can also mail me the queries and your valuable suggestions at rajrishipurohit[at]gmail,com so that we can improve upon the work and make it more robust