Agents

Agent Runtime and Capabilities

How the LiveKit worker uses configs, detects languages, and handles transfers.

Agent Runtime and Capabilities

Everything in this section comes directly from packages/agent/src/agent.py and helper modules under packages/agent/src/utilities.

Call flow overview

  1. SIP joins the room – the worker (invoked via cli.run_app) waits for a SIP participant and extracts the DID from LiveKit metadata (utilities/call_utility.py).
  2. Fetch configurationapi_client.fetch_agent_config calls GET /api/agents/by-phone/:phoneNumber up to CONFIG_FETCH_MAX_RETRIES times, backing off according to CONFIG_FETCH_RETRY_DELAY.
  3. Create call recordapi_client.create_call writes a row via /api/calls. Metadata such as transfer destinations is appended when the call ends.
  4. Spin up the voice pipelineAgentSession is configured with:
    • openai.realtime.RealtimeModel.with_azure for duplex LLM+TTS,
    • openai.TTS.with_azure fallback,
    • noise_cancellation.BVCTelephony() for background noise suppression,
    • MultilingualModel() turn detector and Silero VAD.
  5. Run the Assistant agent – instructions combine shared receptionist behaviors (prompts.get_default_instructions), opening-hours context, FAQs, and transfer hints derived from YAML.
  6. Persist artifactssave_and_anonymize_transcript and retrieve_recording_url upload to Azure Blob Storage and call /api/calls/:id/transcription or /recording.

Conversational behaviors

  • GreetingAssistant.on_enter enforces a fixed greeting in the default language (defaultLanguage from YAML). If the agent starts in French, it sends “Bonjour...” exactly as defined in code; English uses “Hi, [name] here...”.
  • Language switchingon_user_speech_committed uses utilities/language_utility.detect_language to compare French vs. English tokens. When a caller switches languages and that language is in supportedLanguages, the assistant injects explicit instructions (“MANDATORY INSTRUCTION: You MUST respond in ENGLISH...”) before generating the next reply.
  • Transferstools/transfer.py exposes the transfer_call function tool. The LLM can call it with a department name and, if a match exists in config["transfers"], the worker performs a cold SIP transfer using LiveKit’s transfer_sip_participant. Metadata (transferDepartment, transferTimestamp) is saved to the call row.
  • Manual hangups – the end_call function tool ensures the agent waits for audio playout (ctx.wait_for_playout()) before running hangup_call().

Call recording and transcripts

  • Recording – Azure Blob credentials (AZURE_STORAGE_ACCOUNT_NAME/KEY/CONTAINER) trigger LiveKit egress so every call is stored under calls/{agent_id}/recordings/....
  • Transcripts – The worker serializes session.history.to_dict() and anonymizes PII via Presidio + spaCy (see packages/agent/src/pii_anonymizer.py). URLs are referenced in the database, as documented in Call Recording and Transcription.

Error handling & retries

  • Missing configs or API failures produce clear log lines and hang up the call cleanly by removing SIP participants.
  • Provisioning details (twilioPhoneNumberSid, livekitInboundTrunkId, etc.) are always loaded into userdata.agent_config so transfer tools can validate departments.
  • Metrics from livekit.agents.metrics are collected via UsageCollector, which allows you to stream Latency/Token count metrics into your observability stack later.

Use this page when you need to reason about what the agent can do without diving into Python files—each bullet maps back to the implementation referenced above.