Agents Overview - Orova AI

What is an agent?

An agent is an AI-powered assistant that can have conversations with users via voice, chat, or video. Each agent has its own personality (system prompt), voice, and capabilities.

How agents work

When a user talks to an agent, the conversation flows through a real-time engine with sub-second latency:

User speaks → speech is converted to text
The LLM generates a response
The response is converted back to natural speech
Tools run as needed during the conversation

Core components

Component	What it does	Providers
LLM	Processes conversation, generates responses	OpenAI, Anthropic Claude, Azure OpenAI
TTS	Converts text to natural speech	ElevenLabs, Cartesia, Azure Speech
STT	Converts speech to text	Deepgram, Azure Speech, ElevenLabs, OpenAI

Agent types

Agents support four communication modes:

Voice

Phone calls via Twilio (inbound and outbound)

WebRTC

Browser-based voice calls with no phone needed

Video

Video avatar calls for face-to-face interactions

Chat

Text-based conversations with streaming responses

Key features

First message — A greeting spoken immediately when the call connects (bypasses LLM for lower latency)
System prompt — Defines the agent’s personality, instructions, and behavior
Variable injection — Use {{variableName}} in prompts to inject contact data dynamically
Recording — Optionally record calls
Background ambience — Add ambient sounds for a more natural feel
Tool integration — Add tools from 13+ integrations (HubSpot, Salesforce, Slack, etc.) so your agent can take actions during calls
Video avatars — Enable video mode and choose from a library of realistic avatars for face-to-face interactions

Next steps

Create an Agent

Step-by-step guide to creating your first agent

Configuration

Configure LLM, TTS, and STT providers

Quickstart Create an Agent

⌘I

​What is an agent?

​How agents work

​Core components

​Agent types