Voice Assistants

Developer Edition

If you started building your assistant with the Rasa Pro Developer Edition before Rasa Pro 3.11 and want to try voice features, please request a new license. Licenses issued before this version don't contain the necessary feature scopes to run voice assistants.

Building Voice Assistants

Voice assistants provide a natural and intuitive way to interact with digital devices and services. They are particularly useful for hands-free operation, accessibility, and multitasking. They also offer a familiar and frictionless experience to the customers of contact centers. At the same time, voice solutions present distinct technical challenges and require elaborate user experience design.

Rasa provides voice channel connectors that require specialized handling to address nuanced complexities in voice conversations. The connectors are described in detail below.

Voice Ready

Voice Ready Channel Connectors in Rasa process input and output as text while enabling communication through audio. Rasa relies on external services for Speech Recognition (STT) and Text-to-Speech (TTS) to facilitate this.

For example, the Twilio Voice built-in channel in Rasa is a Voice Ready Channel Connector.

Voice Stream

Voice Stream Channel Connectors in Rasa process both input and output in audio. They transcribe incoming audio into text, process it within Rasa, and then convert the response back into audio. The assistant is communicating with the user through Audio, just as well.

For example, the Twilio Media Streams channel connector in Rasa is a Voice Stream Channel Connector.

How to Start Building a Voice Assistant

To build an optimized voice assistant, it is recommended to develop it separately from text-based assistants. Although a text assistant can serve as a foundation, maintaining and evolving the assistant is easier when voice and text assistants are developed separately.

Following CDD best practices, start your voice project with rigorous user research and include iterative user tests in the development process. Make sure to design your voice flows with the unique requirements of the modality in mind.

Apart from connecting and configuring your channel connector, you will need to configure the speech services. More information on those here:

Speech Integrations for connecting to Speech Recognition and Text to Speech Services
Voice connectors:
- Audiocodes VoiceAI Connect Channel connector (Voice Ready)
- Audiocodes Voice Stream Channel connector (Voice Stream)
- Jambonz Channel connector (Voice Ready)
- Twilio Voice Channel connector (Voice Ready)
- Twilio Media Streams Channel connector (Voice Stream)
- Genesys Cloud Channel connector (Voice Stream)

You can also Test your voice assistant directly in your browser, allowing for an iterative building process.

Voice specific primitives and conversation repair

Voice assistants require additional patterns and primitives beyond those used in text-based assistants. For instance, a voice interaction requires initiation, termination, and specific metadata management. Additionally there are specific patterns for voice assistants that differ from text, such as handling user silence and repeating the assistant's last message. Learn about Voice Conversation Patterns

Conversation Design Recommendations for Voice Assistants

In addition to using voice-specific conversation repair patterns, consider these Conversation Design recommendations to enhance the user experience of your voice assistant:

Channel-Specific Responses

Use channel-specific response variations to tailor the Voice Assistant’s responses for phone calls. Learn more about Channel Specific Response Variations.

Use "Filler" Responses

When certain operations may take time (such as certain custom actions), include "filler" responses to keep users informed about the ongoing process. These responses confirm that the system is processing the request, reducing user uncertainty and abandonment. This technique is especially important for voice-based channels like phone calls, where users don't have visual UI indicators of progress. This is an example of a filler response:

flows.yml
flows:
  check_balance:
    name: check your balance
    description: check the user's account balance
    steps:
      - action: utter_please_wait            # a response that tells user to wait a moment
      - action: check_balance                # let's say if this is a slow custom action
      - action: utter_current_balance

Building Voice Assistants​

Voice Ready​

Voice Stream​

How to Start Building a Voice Assistant​

Voice specific primitives and conversation repair​

Conversation Design Recommendations for Voice Assistants​

Channel-Specific Responses​

Use "Filler" Responses​