Agents
Agent Runtime and Capabilities
How the LiveKit worker uses configs, detects languages, and handles transfers.
Agent Runtime and Capabilities
Everything in this section comes directly from packages/agent/src/agent.py and helper modules under packages/agent/src/utilities.
Call flow overview
- SIP joins the room – the worker (invoked via
cli.run_app) waits for a SIP participant and extracts the DID from LiveKit metadata (utilities/call_utility.py). - Fetch configuration –
api_client.fetch_agent_configcallsGET /api/agents/by-phone/:phoneNumberup toCONFIG_FETCH_MAX_RETRIEStimes, backing off according toCONFIG_FETCH_RETRY_DELAY. - Create call record –
api_client.create_callwrites a row via/api/calls. Metadata such as transfer destinations is appended when the call ends. - Spin up the voice pipeline –
AgentSessionis configured with:openai.realtime.RealtimeModel.with_azurefor duplex LLM+TTS,openai.TTS.with_azurefallback,noise_cancellation.BVCTelephony()for background noise suppression,MultilingualModel()turn detector and Silero VAD.
- Run the
Assistantagent – instructions combine shared receptionist behaviors (prompts.get_default_instructions), opening-hours context, FAQs, and transfer hints derived from YAML. - Persist artifacts –
save_and_anonymize_transcriptandretrieve_recording_urlupload to Azure Blob Storage and call/api/calls/:id/transcriptionor/recording.
Conversational behaviors
- Greeting –
Assistant.on_enterenforces a fixed greeting in the default language (defaultLanguagefrom YAML). If the agent starts in French, it sends “Bonjour...” exactly as defined in code; English uses “Hi, [name] here...”. - Language switching –
on_user_speech_committedusesutilities/language_utility.detect_languageto compare French vs. English tokens. When a caller switches languages and that language is insupportedLanguages, the assistant injects explicit instructions (“MANDATORY INSTRUCTION: You MUST respond in ENGLISH...”) before generating the next reply. - Transfers –
tools/transfer.pyexposes thetransfer_callfunction tool. The LLM can call it with a department name and, if a match exists inconfig["transfers"], the worker performs a cold SIP transfer using LiveKit’stransfer_sip_participant. Metadata (transferDepartment,transferTimestamp) is saved to the call row. - Manual hangups – the
end_callfunction tool ensures the agent waits for audio playout (ctx.wait_for_playout()) before runninghangup_call().
Call recording and transcripts
- Recording – Azure Blob credentials (
AZURE_STORAGE_ACCOUNT_NAME/KEY/CONTAINER) trigger LiveKit egress so every call is stored undercalls/{agent_id}/recordings/.... - Transcripts – The worker serializes
session.history.to_dict()and anonymizes PII via Presidio + spaCy (seepackages/agent/src/pii_anonymizer.py). URLs are referenced in the database, as documented in Call Recording and Transcription.
Error handling & retries
- Missing configs or API failures produce clear log lines and hang up the call cleanly by removing SIP participants.
- Provisioning details (
twilioPhoneNumberSid,livekitInboundTrunkId, etc.) are always loaded intouserdata.agent_configso transfer tools can validate departments. - Metrics from
livekit.agents.metricsare collected viaUsageCollector, which allows you to stream Latency/Token count metrics into your observability stack later.
Use this page when you need to reason about what the agent can do without diving into Python files—each bullet maps back to the implementation referenced above.