Free Newsletter
Register for our Free Newsletters
Advanced Composites
Amorphous Metal Structures
Analysis and Simulation
Asbestos and Substitutes
Associations, Research Organisations and Universities
Automation Equipment
Building Materials
Bulk Handling and Storage
CFCs and Substitutes
View All
Other Carouselweb publications
Carousel Web
Defense File
New Materials
Pro Health Zone
Pro Manufacturing Zone
Pro Security Zone
Web Lec
Pro Engineering Zone

What the next generation of speech recognizers will be able to do

Georgia Institute Of Technology : 18 February, 2004  (New Product)
When the motion picture '2001: A Space Odyssey' opened in 1968, that conversation between a stranded astronaut and a malevolent computer named HAL seemed plausible for the year 2001, then more than three decades in the future.
When the motion picture '2001: A Space Odyssey' opened in 1968, that conversation between a stranded astronaut and a malevolent computer named HAL seemed plausible for the year 2001, then more than three decades in the future.

But as any user of today's automatic speech recognition technology can attest, that future hasn't quite arrived yet.

As a scientist at AT&T Bell Labs, B.H. 'Fred' Juang helped create the current generation of speech recognition technology that routinely handles 'operator-assisted' calls and a host of other simple tasks, including accessing credit card information. Proud of that pioneering work, Juang today is working to help create the next generation of speech technology, one that would facilitate natural communication between humans and machines.

Now a professor in the School of Electrical and Computer Engineering at the Georgia Institute of Technology, Juang presented his vision of next-generation speech systems Saturday, February 14 at the annual meeting of the American Association for the Advancement of Science.

'If we want to communicate with a machine as we would with a human, the basic assumptions underlying today's automated speech recognition systems are wrong,' he said. 'To have real human-machine communication, the machine must be able to detect the intention of the speaker by compiling all the linguistic cues in the acoustic wave. That's much more difficult than what the existing technology was designed to do: convert speech to text.'

To make the speech recognition problem solvable in the 1970s, researchers made certain assumptions. For instance, they assumed that all the sounds coming to the recognizer would be human speech, from just one speaker. They also assumed the output would be text, and that recognizer algorithms could acceptably match speech signals to the 'closest' word in a stored database.

But in the real world, human speech mixes with noise, which may include the speech of another person. Speaking pace varies, and people group words in unpredictable ways while peppering their conversations with 'ums' and 'ahs.'

Speech researchers chose mathematical algorithms known as Hidden Markov Models to match sounds to words and place them into grammatical outlines. That system has performed well for simple tasks, but often produces errors that make the result of speech-to-text conversion difficult for humans to understand, and even worse for natural human-machine communication.

'It doesn't matter what you give the system, it just picks the closest sounding word and gives that to you as text,' explained Juang, who holds the Motorola Foundation Chair at Georgia Tech and is a Georgia Research Alliance Eminent Scholar in Advanced Communications. 'But that's quite wrong if you are interested in general communications. When you talk, a lot of information is lost if you use the current methods.'

In addition, current machines cannot understand 'reference,' a linguistic shorthand people use to communicate. When discussing a technical issue such as electrical resistance, for instance, a group of engineers may use the word 'it' in referring to Ohm's Law. Humans easily understand that, but machines don't.

'If every time we began to discuss one term, we had to define it, conversation would be very awkward,' Juang noted. 'Being able to understand reference is very important for natural communication. If we can create a system to do that, the machine would behave much more like a human and communicate more like a human.'
Bookmark and Share
Home I Editor's Blog I News by Zone I News by Date I News by Category I Special Reports I Directory I Events I Advertise I Submit Your News I About Us I Guides
   © 2012
Netgains Logo