Customized Voice Recognition Software Development: November 2007

The Voice User Interface Development Reality

When developing an IVR solution, a core part of this is the Voice User Interface (VUI). Developing and implementing VUI applications is usually a very lengthy process. Following the usual development stages required for any software system, VUI applications usually require exhaustive tuning processes, especially when involving natural language, to make them deliver close to natural language experiences.

The long time required to launch VUI applications often results in compromising on the requirements. This usually means accepting simpler applications that are easier to develop but which will deliver less satisfactory caller experiences.

In other cases, in trying to meet the target launch date, testing and tuning activities are short-circuited leading to:

lower call resolution rate
lower customer satisfaction
lower operational saving

In other words, this means lower overall quality and inevitably increased cost to correct deficiencies.

VUI Development Improvement

The time to launch VUI applications successfully can be shrunk by using advanced tools such as the ones used and offered by Crimsonet. These automate the major coding activity in a VUI application life cycle. This is done using a modeling process, which then is followed by auto generation of the code. Rapid cycles of testing and adaptation can take place to meet evolving business needs while improving the applications quality. The result is higher quality VUI applications and increased caller satisfaction.

For the business that is implementing a VUI application to automate their contact center services, the result is increased operational savings. For the VUI solution provider, the result is a competitive advantage over competitors using traditional tools for their VUI implementations.

The VUI Development and Implementation Process

A VUI application must include the following phases on its way to a successful launch:

Definition phase, including requirements gathering and high-level design, creating a definition of the application and setting the stage for the detailed-design to follow. The definition phase requires exploring and understanding the business goals, the users, and the functionality required for the application. The outcome of the definition phase is a requirements definition document.
Design phase, creating a complete and detailed description of the application, including the call flow, the prompts to be played by the system, the spoken expressions to be recognized by the system, and the external interfaces to implement the requirements as per the high-level design guidelines. The outcome of the design phase is a detailed design document, and possibly a prototype.Usability tests may be conducted early in the design process to validate the design from a user perspective, using the Wizard of Oz (WOZ) approach.
Realization phase, including the application development, testing, and tuning. The development includes the creation of the application software, the development of the grammars to be used for the user speech recognition, and the production of the audio to be played by the application, including prompts as well as nonverbal audio.Testing activities include application testing, recognition testing, and usability testing.

Once the system has been tested, it’s ready for deployment. A phased deployment approach is used. It starts with a pilot exposed to a limited number of users. The number of users is gradually increased until the system is fully deployed. During the phased deployment process, tuning activities take place, including dialog and recognition tuning, optimizing the system performance to meet the requirements.

The VUI application development and implementation process timeline can be illustrated as follows:

Conceptual Activities vs. Mechanical Activities

The Definition and Design phases described above involve lots of communication (like interviewing users in the requirements gathering), analysis activities (as when defining the system requirements out of the business, users, and application needs), and familiarity with human behavior. Those activities can be considered as conceptual activities, requiring human intellectual capabilities in order to be performed.

The Realization phase on the other hand, includes more routine and mechanical activities by its nature. Once the design has been laid down in details, the job of translating it to the system physical components is mostly logical and mechanical. Not all the system components are the same. The audio production obviously requires actors to perform and be recorded; on the other hand the process of programming the call flow is very mechanical; and in between, the tuning activities are a mixture of very mechanical activities and human expertise.

Automating The Mechanical Activities

Technology can be used to automate the mechanical activities. A good example is implementing the call flow.

Before coding the call flow software, a good practice is to document in details its requirements and the high-level design, review all this, and then approve it. Then a detailed design document is prepared; once reviewed, approved, and baselined, it will be the reference for the call flow software coding. In reality, as in any software development process, the design document as baselined in the beginning of the software coding keeps evolving, creating the need for a configuration management process to maintain integrity between the design and the software code being developed. GUI tools can help capturing the detailed design of the call flow, making the job faster and easier. It is easier for the initial design, as well as for modifications as may be required as a result of testing or tuning. Once the design has been captured in the system, a system can automatically produce the executable that can be then loaded to the target system and executed.

“What’s the big deal?” You might ask; “we’ve been doing it for years with our GUI and database applications”. This is exactly the point: if this approach is good for GUI and database applications, why not for VUI ones?

The tool that is most commonly used these days for VUI applications development is VoiceXML. VoiceXML is a speech-centric language providing speech dialog components. Even though it’s relatively easy to develop VUI applications using VoiceXML, it nevertheless is a programming language requiring programming skills and techniques as well as software development techniques to be applied successfully.

Ultimately, we will see an advanced tool for VUI applications development which allows the capture of the call flow design using a dedicated GUI environment. This will then automatically produce the VUI application components, which are the call flow code and the grammars.

Tuning Aid Tools

While the call flow creation is very similar in its nature to any other software application development (except for the executables that are more specific to these applications, like the grammars), the system tuning is unique to VUI applications development.

The reason for that is that human speaking behavior, especially natural speaking, is more difficult to predict than behaviors in operating other software applications. The only way to address it is to start with an application based on the best predicted behavior possible based on agents experience or service call recordings analysis. This is then tested with real users. This way real user behavior can be experienced and difficult to predict actions can be identified and addressed. As the system is being exposed to more and more users, the less likely it is that a new behavior will show up. This is the nature of the tuning process.

Usually the process of collecting the user inputs, analyzing them, and applying the changes needed requires very technical skills, and therefore is provided as a professional service by the speech engine vendor. When the system is live, ongoing tuning for performance improvement is done only from time to time, usually every couple of months. When complete usability tests and tuning activities must be performed prior to launch in order to reach the acceptable recognition level, this may well take a long time and delay the system launch.

Advanced system monitoring and reporting tools that allow ongoing analysis of the system performance on a daily basis by people that are not necessarily technology experts, as well as tools to rapidly address the monitoring findings, make the job of system tuning much easier and quicker. This applies prior to the system launch, as well as continuing after launch during its ongoing operation. This allows earlier launch of the system while its tuning continues after it has gone live.

The Improved VUI application development and implementation process can be illustrated as follows:

Shorter Time To Market Combined With Higher Quality for IVR Projects Is Possible

From experience in IVR projects using automated VUI tools as described above, the project effort and duration can be reduce drastically, resulting in overall time to launch being reduced by between 50% and 70%. Once the time to launch has shrunk, and the application development cost has been reduced, a couple of options are available:

More effort can be put into improving the quality of the application thus increasing customer satisfaction and reducing operational costs (as a result of higher call resolution rate).

More complex applications, like natural language ones are now considered doable, resulting in automation of more services and therefore further operational cost reduction.

IVR applications can go live faster using these advanced VUI tools. They’re a very powerful example of the improved human-computer interaction that is now possible.

Categories

Blog Archive

Wednesday, November 28, 2007

VUI (Voice User Interface) Applications Can Go Live Faster