What is an agent?
An agent is an AI-powered assistant that can have conversations with users via voice, chat, or video. Each agent has its own personality (system prompt), voice, and capabilities.How agents work
When a user talks to an agent, the conversation flows through a real-time engine with sub-second latency:- User speaks → speech is converted to text
- The LLM generates a response
- The response is converted back to natural speech
- Tools run as needed during the conversation
Core components
| Component | What it does | Providers |
|---|---|---|
| LLM | Processes conversation, generates responses | OpenAI, Anthropic Claude, Azure OpenAI |
| TTS | Converts text to natural speech | ElevenLabs, Cartesia, Azure Speech |
| STT | Converts speech to text | Deepgram, Azure Speech, ElevenLabs, OpenAI |
Agent types
Agents support four communication modes:Voice
Phone calls via Twilio (inbound and outbound)
WebRTC
Browser-based voice calls with no phone needed
Video
Video avatar calls for face-to-face interactions
Chat
Text-based conversations with streaming responses
Key features
- First message — A greeting spoken immediately when the call connects (bypasses LLM for lower latency)
- System prompt — Defines the agent’s personality, instructions, and behavior
- Variable injection — Use
{{variableName}}in prompts to inject contact data dynamically - Recording — Optionally record calls
- Background ambience — Add ambient sounds for a more natural feel
- Tool integration — Add tools from 13+ integrations (HubSpot, Salesforce, Slack, etc.) so your agent can take actions during calls
- Video avatars — Enable video mode and choose from a library of realistic avatars for face-to-face interactions
Next steps
Create an Agent
Step-by-step guide to creating your first agent
Configuration
Configure LLM, TTS, and STT providers

