LLM Configuration for Rasa Pro ≤ 3.10
For Rasa Pro versions 3.11 and above, refer to the LLM Configuration for >=3.11 page.
Overview
This page applies to the following components which use LLMs:
- SingleStepLLMCommandGenerator
 - MultiStepLLMCommandGenerator
 - EnterpriseSearchPolicy
 - IntentlessPolicy
 - ContextualResponseRephraser
 - LLMBasedRouter
 
All the above components can be configured to change:
- the LLM provider
 - the model to be used
 
Starting with version Rasa Pro 3.10, CALM uses LiteLLM under the hood to integrate
with different LLM providers. Hence, all LiteLLM's integrated providers
are supported with CALM as well. We explicitly mention the settings required for the most frequently used ones in the
sections below.
If you want to try a provider other than OpenAI / Azure OpenAI, it is recommended to install Rasa Pro versions >= 3.10.
Recommended Models
The table below documents the versions of each model we recommend for use with various Rasa components. As new models are published, Rasa will test these and where appropriate add them as a recommended model.
| Component | Providing platform | Recommended models | 
|---|---|---|
SingleStepLLMCommandGenerator, EnterpriseSearchPolicy, IntentlessPolicy | OpenAI, Azure | gpt-4-0613 | 
ContextualResponseRephraser | OpenAI, Azure | gpt-4-0613, gpt-3.5-turbo-0125 | 
MultiStepLLMCommandGenerator | OpenAI, Azure | gpt-4-turbo-2024-04-09, gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-4o-2024-08-06 | 
Chat completion models
CALM is LLM agnostic and can be configured with different LLMs, but OpenAI is the default model provider. Majority of our experiments have been with models available on OpenAI or OpenAI Azure service. The performance of your assistant may vary when using other LLMs, but improvements can be made by tuning flow and collect step descriptions.
To configure components that use a chat completion model as the LLM, declare the configuration under the llm
key of that component's configuration. For example:
   recipe: default.v1
   language: en
   pipeline:
   - name: SingleStepLLMCommandGenerator
     llm:
        ...
Required Parameters
There are certain required parameters under the llm key:
model- Specifies the name of the model identifier available from the LLM provider's documentation, for e.g.gpt-4-0613provider- Unique identifier of the provider to be used for invoking the specified model.
   recipe: default.v1
   language: en
   pipeline:
   - name: SingleStepLLMCommandGenerator
     llm:
        model: gpt-4-0613
        provider: openai
Optional Parameters
The llm key also accepts inference time parameters like
temperature, etc which are optional but can be useful in extracting the best performance out of the model being used.
Please refer to the
official LiteLLM documentation for a list of such parameters supported.
When configuring a particular provider, there are a few provider specific settings which are explained under each provider's individual sub-section below.
If you switch to a different LLM provider, all default parameters for the old provider will be overriden with the default parameters of the new provider.
E.g. If a provider sets temperature=0.7 as the default value and you switch to a different LLM
provider, this default will be ignored and it is up to you to set the
temperature for the new provider.
OpenAI
API Token
The API token authenticates your requests to the OpenAI API.
To configure the API token, follow these steps:
- 
If you haven't already, sign up for an account on the OpenAI platform.
 - 
Navigate to the OpenAI Key Management page, and click on the "Create New Secret Key" button to initiate the process of obtaining
<your-api-key>. - 
To set the API key as an environment variable, you can use the following command in a terminal or command prompt:
 
- Linux/MacOS
 - Windows
 
export OPENAI_API_KEY=<your-api-key>
setx OPENAI_API_KEY <your-api-key>
This will apply to future cmd prompt window, so you will need to open a new one to use that variable.
Replace <your-api-key> with the actual API key you obtained from the OpenAI platform.
Configuration
There are no additional OpenAI specific parameters to be configured. However, there could be model specific parameters
like temperature that you might want to modify. Names for such parameters can found in
OpenAI's API documentation and defined under llm key of the
component's configuration.
Please refer to
LiteLLM's documentation to know the
list of models supported from the OpenAI platform.
Model deprecations
OpenAI regularly publishes a deprecation schedule for its models. This schedule can be accessed in the documentation published by OpenAI.
Azure OpenAI Service
API Token
The API token authenticates your requests to the Azure OpenAI Service.
Set the API token as an environment variable. You can use the following command in a terminal or command prompt:
- Linux/MacOS
 - Windows
 
export AZURE_API_KEY=<your-api-key>
setx AZURE_API_KEY <your-api-key>
This will apply to future cmd prompt window, so you will need to open a new one to use that variable.
Replace <your-api-key> with the actual API key you obtained from the Azure OpenAI Service platform.
Configuration
To access models provided by Azure OpenAI Service, there are a few additional parameters that need to be configured:
provider- Set toazure.api_type- The type of API to use. This should be set to "azure" to indicate the use of Azure OpenAI Service.api_base- The URL for your Azure OpenAI instance. An example might look like this:https://my-azure.openai.azure.com/.api_version- The API version to use for this operation. This follows the YYYY-MM-DD format and the value should be enclosed in single or double quotes.engine/deployment_name- Alias fordeploymentparameter. Name of the deployment on Azure.
Model specific parameters like temperature can be defined as well. Refer to
OpenAI Azure service's API documentation
for information on available parameter names.
A complete example configuration of the SingleStepLLMCommandGenerator using Azure OpenAI Service would look like this:
- Rasa Pro <=3.7.x
 - 3.8.x<=Rasa Pro<=3.9.x
 - Rasa Pro >=3.10.x
 
    - name: LLMCommandGenerator
      llm:
        deployment: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
    - name: SingleStepLLMCommandGenerator
      llm:
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
    - name: SingleStepLLMCommandGenerator
      llm:
        provider: azure
        deployment: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        timeout: 7
A more comprehensive example using the Azure OpenAI service in more CALM components is available here.
Model deprecations
Azure regularly publishes a deprecation schedule for its models that come under the OpenAI Azure Service. This schedule can be accessed in the documentation published by Azure.
Debugging
If you encounter timeout errors, configure request_timeout parameter to a larger value. The exact value depends on
how your azure instance is configured.
Amazon Bedrock
Requirements:
- Make sure you have 
rasa-pro>=3.10.xinstalled. - Install 
boto3>=1.28.57. - Set the following environment variables - 
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION_NAME. - (Optional) Might have to set 
AWS_SESSION_TOKENif your organisation mandates the usage of temporary credentials for security. 
Once the above steps are complete, edit config.yaml to use an appropriate model and set provider to bedrock:
    - name: SingleStepLLMCommandGenerator
      llm:
        provider: bedrock
        model: anthropic.claude-instant-v1
Model specific parameters like temperature can be defined as well. Refer to
LiteLLM's documentation
for information on available parameter names
and supported models.
Gemini - Google AI Studio
Requirements:
- Make sure you have 
rasa-pro>=3.10.xinstalled. - Install python package 
google-generativeai. - Get API Key at https://aistudio.google.com/ .
 - Set the API key to an environment variable 
GEMINI_API_KEY. 
Once the above steps are complete, edit config.yaml to use an appropriate model and set provider to gemini:
    - name: SingleStepLLMCommandGenerator
      llm:
        provider: gemini
        model: gemini-pro
Refer to LiteLLM's documentation to know which additional parameters and models are supported.
HuggingFace Inference Endpoints
Requirements:
- Make sure you have 
rasa-pro>=3.10.xinstalled. - Set an API Key to the environment variable 
HUGGINGFACE_API_KEY. - Edit 
config.yamlto use an appropriate model, setprovidertohuggingfaceandapi_baseto the base URL of the deployed endpoint: 
    - name: SingleStepLLMCommandGenerator
      llm:
        provider: huggingface
        model: meta-llama/CodeLlama-7b-Instruct-hf
        api_base: "https://my-endpoint.huggingface.cloud"
Self Hosted Model Server
CALM's components can also be configured to work with an open source LLM that is hosted on an open source model server like vLLM(recommended), Ollama or Llama.cpp web server. The only requirement is that the model server should adhere to the OpenAI API format.
Once you have your model server running, configure the CALM assistant's config.yaml:
vLLM
    - name: SingleStepLLMCommandGenerator
      llm:
        provider: self-hosted
        model: meta-llama/CodeLlama-7b-Instruct-hf
        api_base: "https://my-endpoint/v1"
Important to note:
- 
Recommended version of
vllmto use is0.6.0. - 
CALM exclusively utilizes the chat completions endpoint of the model server, so it's essential that the model's tokenizer includes a chat template. Models lacking a chat template will not be compatible with CALM.
 - 
modelshould contain the name of the model supplied to the vllm startup command, for example if your model server is started with:vllm serve meta-llama/CodeLlama-7b-Instruct-hfmodelshould be set tometa-llama/CodeLlama-7b-Instruct-hf. - 
api_baseshould contain the full exposed URL of the model server withv1attached as suffix to the URL. 
Ollama
Once the ollama model server is running, edit the config.yaml file:
    - name: SingleStepLLMCommandGenerator
      llm:
        provider: ollama
        model: llama3.1
        api_base: "https://my-endpoint"
Other Providers
If you want to try one of these providers, it is recommended to install Rasa Pro versions >= 3.10.
Other than the above mentioned providers, we have also tested support for the following providers:
| Platform | provider | API-KEY variable | 
|---|---|---|
| Anthropic | anthropic | ANTHROPIC_API_KEY | 
| Cohere | cohere | COHERE_API_KEY | 
| Mistral | mistral | MISTRAL_API_KEY | 
| Together AI | together_ai | TOGETHERAI_API_KEY | 
| Groq | groq | GROQ_API_KEY | 
For each of the above ones, ensure you have set an environment variable named by the value in API-KEY variable column
to the API key of that platform and set the provider parameter under llm key of the component's config to the value
in provider column.
Embedding models
To configure components that use an embedding model, declare the configuration under the embeddings key
of that component's configuration. For example:
pipeline:
  - name: "SingleStepLLMCommandGenerator"
    llm:
      model: gpt-4-0613
    flow_retrieval:
      embeddings:
        ...
The embeddings property needs two mandatory parameters:
- 
model- Specifies the name of the model identifier available from the LLM provider's documentation, for e.g.text-embedding-3-large. - 
provider- Unique identifier of the provider to be used for invoking the specified model, for e.g.openai 
- Rasa Pro <=3.9.x
 - Rasa Pro >=3.10.x
 
pipeline:
  - name: "SingleStepLLMCommandGenerator"
    llm:
      model: gpt-4-0613
    flow_retrieval:
      embeddings:
        type: openai
        model: text-embedding-3-large
pipeline:
  - name: "SingleStepLLMCommandGenerator"
    llm:
      provider: openai
      model: gpt-4-0613
    flow_retrieval:
      embeddings:
        provider: openai
        model: text-embedding-3-large
When configuring a particular provider, there are a few provider specific settings which are explained under each provider's individual sub-section below.
OpenAI
OpenAI is used as the default embedding model provider. To start using, ensure you have configured an API token as you would do for a chat completion model from OpenAI platform
Configuration
- Rasa Pro <=3.9.x
 - Rasa Pro >=3.10.x
 
pipeline:
  - name: "SingleStepLLMCommandGenerator"
    llm:
      model: gpt-4-0613
    flow_retrieval:
      embeddings:
        type: openai
        model: text-embedding-3-large
pipeline:
  - name: "SingleStepLLMCommandGenerator"
    llm:
      provider: openai
      model: gpt-4-0613
    flow_retrieval:
      embeddings:
        provider: openai
        model: text-embedding-3-large
Azure OpenAI Service
Ensure you have configured an API token as you would do for a chat completion model for Azure OpenAI Service
Configuration
Configuring an embedding model from Azure OpenAI Service needs values for the same set of parameters that are required for configuring a chat completion model from Azure OpenAI Service
- Rasa Pro <=3.9.x
 - Rasa Pro >=3.10.x
 
pipeline:
  - name: "SingleStepLLMCommandGenerator"
    llm:
      model: gpt-4-0613
    flow_retrieval:
      embeddings:
        type: azure
        engine: engine-embed
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
pipeline:
  - name: "SingleStepLLMCommandGenerator"
    llm:
      model: gpt-4-0613
    flow_retrieval:
      embeddings:
        provider: azure
        deployment: engine-embed
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        timeout: 7
Amazon Bedrock
Configuring an embedding model from amazon bedrock needs the same pre-requisites as a chat completion model. Please ensure you have addressed these before proceeding further.
Configuration
pipeline:
  - name: "SingleStepLLMCommandGenerator"
    llm:
      provider: openai
      model: gpt-4-0613
    flow_retrieval:
      embeddings:
        provider: bedrock
        model: amazon.titan-embed-text-v1
Please refer to LiteLLM's documentation on list of supported embedding models from Amazon Bedrock
In-Memory
CALM also provides an option to load lightweight embedding models in-memory without needing them to be exposed over an API. It uses the sentence transformers library under the hood to load and run inference on them.
Configuration
- Rasa Pro <=3.9.x
 - Rasa Pro >=3.10.x
 
pipeline:
  - name: "SingleStepLLMCommandGenerator"
    llm:
      model: gpt-4-0613
    flow_retrieval:
      embeddings:
        type: "huggingface"
        model: "BAAI/bge-small-en-v1.5"
        model_kwargs: # used during instantiation
          device: 'cpu'
        encode_kwargs: # used during inference
          normalize_embeddings: True
pipeline:
  - name: "SingleStepLLMCommandGenerator"
    llm:
      provider: "openai"
      model: "gpt-4-0613"
    flow_retrieval:
      embeddings:
        provider: "huggingface_local"
        model: "BAAI/bge-small-en-v1.5"
        model_kwargs: # used during instantiation
          device: "cpu"
        encode_kwargs: # used during inference
          normalize_embeddings: true
modelparameter can take as value either any embedding model repository available on the HuggingFace hub or a path to a local model.model_kwargsparameter is used to provide load time arguments to the sentence transformer library.encode_kwargsparameter is used to provide inference time arguments to the sentence transformer library.
Other Providers
Other than the above mentioned providers, we have also tested support for the following providers -
| Platform | provider | API-KEY variable | 
|---|---|---|
| Cohere | cohere | COHERE_API_KEY | 
| Mistral | mistral | MISTRAL_API_KEY | 
| Voyage AI | voyage | VOYAGE_API_KEY | 
For each of the above ones, ensure you have set an environment variable named by the value in API-KEY variable column
to the API key of that platform and set the provider parameter under llm key of the component's config to the value
in provider column.
Configuring self-signed SSL certificates
In environments where a proxy performs TLS interception, Rasa may need to be configured to trust the certificates used
by your proxy. By default, certificates are loaded from the OS certificate store. However, if your setup involves
custom self-signed certificates, you can specify these by setting the RASA_CA_BUNDLE environment variable.
This variable points to the path of the certificate file that Rasa should use to validate SSL connections:
export RASA_CA_BUNDLE="path/to/your/certificate.pem"
The REQUESTS_CA_BUNDLE environment variable is deprecated and will no longer be supported in future versions. Please
use RASA_CA_BUNDLE instead to ensure compatibility.
Configuring Proxy URLs
In environments where LLM requests need to be routed through a proxy, Rasa relies on LiteLLM to handle proxy
configurations. LiteLLM supports configuring proxy URLs through the HTTP_PROXY and HTTPS_PROXY environment
variables.
To ensure that all LLM requests are routed through the proxy, you can set the environment variables as follows:
export HTTP_PROXY="http://your-proxy-url:port"
export HTTPS_PROXY="https://your-proxy-url:port"
FAQ
Does OpenAI use my data to train their models?
No. OpenAI does not use your data to train their models. From their website:
We do not train our models on your business data by default.
Example Configurations
Azure
A comprehensive example which includes:
llmandembeddingsconfiguration for components inconfig.yml:IntentlessPolicyEnterpriseSearchPolicySingleStepLLMCommandGeneratorflow_retrievalin 3.8.x
llmconfiguration for rephrase inendpoints.yml(ContextualResponseRephraser)
- Rasa Pro <=3.7.x
 - 3.8.x <= Rasa Pro <= 3.9.x
 - Rasa Pro >=3.10.x
 
    nlg:
      type: rasa_plus.ml.ContextualResponseRephraser
      llm:
        engine: rasa-gpt-4
        api_type: azure
        api_version: "2024-02-15-preview"
        api_base: https://my-azure.openai.azure.com
        request_timeout: 7
    recipe: default.v1
    language: en
    pipeline:
    - name: LLMCommandGenerator
      llm:
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
    policies:
    - name: FlowPolicy
    - name: rasa_plus.ml.IntentlessPolicy
      llm:
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
      embeddings:
        model: text-embedding-3-small
        engine: rasa-embedding-small
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
    - name: rasa_plus.ml.EnterpriseSearchPolicy
      vector_store:
        type: "faiss"
        threshold: 0.0
      llm:
        model: gpt-4-0613
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
      embeddings:
        model: text-embedding-3-small
        engine: rasa-embedding-small
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
nlg:
  type: rephrase
  llm:
    engine: rasa-gpt-4
    api_type: azure
    api_version: "2024-02-15-preview"
    api_base: https://my-azure.openai.azure.com
    request_timeout: 7
    recipe: default.v1
    language: en
    pipeline:
    - name: SingleStepLLMCommandGenerator
      llm:
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
      flow_retrieval:
        embeddings:
          model: text-embedding-3-small
          engine: rasa-embedding-small
          api_type: azure
          api_base: https://my-azure.openai.azure.com/
          api_version: "2024-02-15-preview"
          request_timeout: 7
    policies:
    - name: FlowPolicy
    - name: IntentlessPolicy
      llm:
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
      embeddings:
        model: text-embedding-3-small
        engine: rasa-embedding-small
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
    - name: EnterpriseSearchPolicy
      vector_store:
        type: "faiss"
        threshold: 0.0
      llm:
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
      embeddings:
        model: text-embedding-3-small
        engine: rasa-embedding-small
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        request_timeout: 7
nlg:
  type: rephrase
  llm:
    provider: azure
    deployment: rasa-gpt-4
    api_type: azure
    api_version: "2024-02-15-preview"
    api_base: https://my-azure.openai.azure.com
    timeout: 7
    recipe: default.v1
    language: en
    pipeline:
    - name: SingleStepLLMCommandGenerator
      llm:
        provider: azure
        deployment: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        timeout: 7
      flow_retrieval:
        embeddings:
          provider: openai
          deployment: rasa-embedding-small
          api_type: azure
          api_base: https://my-azure.openai.azure.com/
          api_version: "2024-02-15-preview"
          timeout: 7
    policies:
    - name: FlowPolicy
    - name: IntentlessPolicy
      llm:
        provider: azure
        deployment: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        timeout: 7
      embeddings:
        provider: azure
        deployment: rasa-embedding-small
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        timeout: 7
    - name: EnterpriseSearchPolicy
      vector_store:
        type: "faiss"
        threshold: 0.0
      llm:
        provider: azure
        deployment: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        timeout: 7
      embeddings:
        provider: azure
        deployment: rasa-embedding-small
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: "2024-02-15-preview"
        timeout: 7