Speech Recognition

In the near future, speech will be the method for controlling appliances, toys, tools, computers, and robotics. There is a huge commercial market waiting for this technology to mature.

Our speech recognition circuit is a standalone trainable speech recognition circuit that may be interfaced to control just about anything electrical. The interface circuit we will build in the second part of this chapter will allow this speech recognition circuit to control a variety of electrical devices such as appliances, test instruments, VCRs, TVs, and of course robots. The circuit is trained (programmed) to recognize words you want it to recognize. The unit can be trained in any language and even nonlanguages such as grunts, birdcalls, and whistles.

To be able to control and operate an appliance (computer, VCR, TV security system, etc.) or robot by speaking to it makes it easier to work with that device, while increasing the efficiency and effectiveness. At the most basic level, speech commands allow the user to perform parallel tasks (i.e., hands and eyes are busy elsewhere) while continuing to work with the computer, appliance, instrument, or robot.

The heart of the circuit is the HM2007 speech recognition integrated circuit. The chip provides the options of recognizing either 40 words each with a length of 0.96 s or 20 words each with a length of 1.92 s. This speech recognition circuit has a jumper setting (jumper WD on main board) that allows the user to choose either the 0.96s word length (40-word vocabulary) or the 1.92sword length (20-word vocabulary).

For memory the circuit uses an 8K � 8 static RAM. There is a backup memory battery for the SRAM on the main board. This battery keeps the trained words safely stored in the SRAM when the main power is turned off. The button battery lasts approximately 2 years. Without the battery backup you would have to retrain the circuit every time the circuit was switched off.

Speech recognition circuit assembled.

HM2007 integrated circuit.

The chip has two operational modes: manual mode and CPU mode. The CPU mode is implemented when it is necessary for the chip to work as a speech recognition coprocessor under a host computer.This is an attractive approach to speech recognition for computers because the job of listening to sound and recognition of command words doesn’t occupy any of the main computer’s CPU time. In one type of programming scenario, when the HM2007 recognizes a command, it can signal an interrupt to the host CPU and then relay the command it recognized. The HM2007 chip can be cascaded to provide a larger word recognition library.

The SR06 circuit we are building operates in the standalone manual mode. As a standalone circuit, the speech recognition circuit doesn’t require a host computer and may be integrated into other devices to add speech control.

Applications

Applications of command and control of appliances and equipment include these:

· Telephone assistance systems

· Data entry

· Speech controlled toys

· Speech and voice recognition security systems

· Robotics

Software Approach

Currently most speech recognition systems available today are software programs that run on personal computers. The software requires a compatible sound card be installed in the computer. Once activated, this software runs continuously in the background of the computer’s operating system (Windows, OS/2, etc.) and any other application program.

While this speech software is impressive, it is not economically viable for manufacturers to add personal computer systems to control a washing machine or VCR. The speech recognition software steals processing power from the operating system and adds to the computer’s processing tasks. Typically there is a noticeable slowdown in the operation and function of the computer when voice recognition is enabled.

Learning to Listen

We take our ability to listen for granted. For instance, we are capable of listening to one person speak among several at a party. We subconsciously filter out the extraneous conversations and sound. This filtering ability is beyond the capabilities of today’s speech recognition systems.

Speech recognition is not speech understanding. Understanding the meaning of words is a higher intellectual function. The fact that a computer can respond to a vocal command does not mean it understands the command spoken. Voice recognition systems will one day have the ability to distinguish linguistic nuances and the meaning of words, to “Do what I mean, not what I say!”

Speaker Dependent and Speaker Independent Recognition

Speech recognition is classified into two categories,

speaker dependent and speaker independent. Speakerdependent systems are trained by the individual who will be using the system. These systems are capable of achieving a high command count and better than 95 percent accuracy for word recognition. The drawback to this approach is that the system only responds accurately to the individual who trained the system. This is the most common approach employed in software for personal computers.

Speakerindependent systems are trained to respond to a word regardless of who speaks. Therefore, the system must respond to a large variety of speech patterns, inflections, and enunciation of the target word. The command word count is usually lower than that of the speakerdependent system; however, high accuracy can still be maintained within processing limits. Industrial requirements more often require speakerindependent voice systems, such as the AT&T system used in the telephone systems.

Recognition Style

Speech recognition systems have another constraint concerning the style of speech they can recognize. They are three styles of speech: isolated, connected, and continuous.

Isolated speech recognition systems can just handle words that are spoken separately. This is the most common speech recognition system available today. The user must pause between each word or command spoken. The speech recognition circuit is set up to identify isolated words of 0.96s length.

peech recognition circuit is set up to identify isolated words of 0.96s length. Connected speech recognition system is a halfway point between isolated word and continuous speech recognition. It allows users to speak multiple words. The HM2007 can be set up to identify words or phrases 1.92 s in length. This reduces the word recognition vocabulary number to 20.

Continuous speech is the natural conversational speech we are used to in everyday life. It is extremely difficult for a recognizer to sift through the text as the words tend to merge together. For instance, “Hi, how are you doing?” sounds like “Hi, howyadoin.” Continuous speech recognition systems are on the market and are under continual development.