Skip to main content

What is an agent?

An agent is an AI-powered assistant that can have conversations with users via voice, chat, or video. Each agent has its own personality (system prompt), voice, and capabilities.

How agents work

When a user talks to an agent, the conversation flows through a real-time engine with sub-second latency:
  1. User speaks → speech is converted to text
  2. The LLM generates a response
  3. The response is converted back to natural speech
  4. Tools run as needed during the conversation

Core components

ComponentWhat it doesProviders
LLMProcesses conversation, generates responsesOpenAI, Anthropic Claude, Azure OpenAI
TTSConverts text to natural speechElevenLabs, Cartesia, Azure Speech
STTConverts speech to textDeepgram, Azure Speech, ElevenLabs, OpenAI

Agent types

Agents support four communication modes:

Voice

Phone calls via Twilio (inbound and outbound)

WebRTC

Browser-based voice calls with no phone needed

Video

Video avatar calls for face-to-face interactions

Chat

Text-based conversations with streaming responses

Key features

  • First message — A greeting spoken immediately when the call connects (bypasses LLM for lower latency)
  • System prompt — Defines the agent’s personality, instructions, and behavior
  • Variable injection — Use {{variableName}} in prompts to inject contact data dynamically
  • Recording — Optionally record calls
  • Background ambience — Add ambient sounds for a more natural feel
  • Tool integration — Add tools from 13+ integrations (HubSpot, Salesforce, Slack, etc.) so your agent can take actions during calls
  • Video avatars — Enable video mode and choose from a library of realistic avatars for face-to-face interactions

Next steps

Create an Agent

Step-by-step guide to creating your first agent

Configuration

Configure LLM, TTS, and STT providers