Speech Application Programming Interface Speech-Related Technologies

Posted by arlene

Speech is a major initiative for Microsoft. It is part of a larger concept referred to as the Natural User Interface, or Natural UI, which involves creating natural and expressive interactions with the user. This is primarily accomplished using speech-processing capabilities but can also involve natural language and machine learning. The Natural UI is intended to ease interaction with smart devices—not merely devices like PDA’s and Tablet PC’s, but devices loaded in your car, Internet television, and screen phones.

Living the Web 2.0

Kai-Fu Lee is the corporate vice president of the Natural Interactive Services Division of Microsoft. He recently gave a presentation in which he stated:

Natural UI will arrive as an evolution. . . . But, in 10 years, Natural UI will be viewed as the largest revolution since Graphical UI.

The Speech group hopes to improve human-to-computer interaction by giving computers the ability to recognize spoken words and even to understand their meaning. Of course, this is the tricky part. The group’s researcher are hard at work trying to improve speech recognition, grammar understanding, and text to speech using several different methods.

One way speech recognition can be improved is through the ability to detect emotion in speech. This is a technique that could be very useful for speech applications that interface with customers. The software will be able to respond appropriately if it can recognize the speaker’s emotion.

The work from this group was the basis for the Speech Application Programming Interface (SAPI) and also for the newly released Microsoft Speech Server. In fact, some of the researchers from this group are now working in the Speech Platforms Group, which is responsible for the Microsoft Speech Server product.

Recently, I had the opportunity to speak with James Mastan, director of marketing for the Speech Platforms Group. The profile box titled “The Future of Speech at Microsoft” contains excerpts from that conversation.

One of the group’s first prototypes was the MiPad, which stands for multimodal interactive notepad. This device, which was first demonstrated in 2000, combines speech-recognition technology with pen input. The user can choose to use either method when accessing e-mail, schedules, or contact information. Work in this area was the basis for multi- modal application development with Speech Server.

Possibly related posts: (automatically generated)
Speech Application Programming Interface Speech-Related Technologies

2 Responses to “Speech Application Programming Interface Speech-Related Technologies”

  1. To get the benefits of better performance, robust object orientation, XML support, Web services, improved extension handling and much better input filtering, users are advised to consider migrating to PHP 5.2.1. … PHP Development Environment

  2. Other benefits, as cited by Books Online, but not limited to, includes simplifying greater autonomy to users who can work with a local copy of the database and then transfer the changes to remote or mobile users across the network or over the Internet. … Online Database

Leave a Reply

LogoAlexa CounterFeedBurner Counter