Realtime Voice
ChatBotKit integrates realtime models natively, so a realtime model behaves like any other model on the platform: you assign it to a bot, deploy it the same way, and use it across your stack. The same model works in both chat and voice, which means you can hold a text conversation or a live, low-latency voice conversation with one bot, one configuration, and the same behavior behind it. There is no separate bot type to maintain.
Realtime models have full access to everything your agents already use. The complete skillset engine is available during a voice conversation, so the model can call tools, search your datasets, read and write memory, use credentials, and reach into connected integrations mid-conversation. Voice agents can act, not only talk.
What You Can Do
- Use one bot for chat and voice: Assign a realtime model and switch between text and live voice without parallel setup.
- Speak and listen in the widget: VoiceIn and VoiceOut let users dictate requests and hear spoken responses in the AI Widget.
- Run tools in voice: Call skillset abilities, search datasets, and trigger workflows while the conversation is happening.
- Keep full context: Realtime conversations use the same memory and configuration as the rest of your bot.
How It Works
A realtime model is assigned to a bot just like any other model. In voice mode, audio streams to and from the model with low latency, while the full ChatBotKit agent stack runs behind it - skillsets, datasets, memory, and credentials. Because realtime is a first-class model rather than a constrained mode, a voice conversation can complete the same workflows your text agents handle, grounded in your data and connected systems.
Setup
Select a realtime-capable model and assign it to a bot. Use the bot in chat for text or in voice for a live spoken conversation. In the AI Widget, enable VoiceIn and VoiceOut to let users speak to the bot and hear its responses.
Practical Uses
Realtime voice suits live customer support, hands-free assistants, phone-style experiences, and interactive in-product help. Because the voice agent has the full skillset engine behind it, it can answer questions grounded in your data, complete tasks across connected services, and carry context from one conversation to the next.