Operations & residency¶

Data residency¶

Pipeline stage	Where it runs	Region-locked?
WebSocket / orchestrator	App Service in UAE North	✅ yes
STT (Azure Speech)	AI Services in UAE North	✅ yes — audio + transcripts stay regional
TTS (Azure Speech)	AI Services in UAE North	✅ yes — text + audio stay regional
LLM (Azure OpenAI)	UAE North deployment	⚠️ See below
LLM (Foundry / OpenAI)	Wherever you point it	❌ Document in your privacy notice
Session state (Redis)	UAE North	✅ yes
RAG index (Azure AI Search)	UAE North	✅ yes
Logs (App Insights / Log Analytics)	UAE North	✅ yes

The UAE OpenAI caveat¶

In UAE North, gpt-4.1 is deployable as GlobalStandard SKU only — Microsoft routes inference globally for capacity, so request data may transit other Azure regions. Speech (STT/TTS) is unaffected and stays in UAE North.

Your options:

Accept GlobalStandard — fastest path, single region in your bill, but document the global routing in your privacy notice and DPA.
Use a smaller regional Standard model if/when one becomes available in UAE North.
Move LLM to Sweden Central for EU residency — split deployment, see Deployment / region choice.
Bring your own model via LLM_BACKEND=openai pointed at on-prem vLLM / Foundry Local.
Sovereign deployment via Core42 — separate engagement, not this repo.

Auth¶

Client → orchestrator: API key (X-API-Key) → JWT (HS256, 5 min TTL) → WebSocket auth header.
Orchestrator → Azure: DefaultAzureCredential (Managed Identity in App Service, az login locally). Speech SDK uses auth_token=aad#<resource_id>#<jwt>; AOAI uses azure_ad_token_provider.

For production, replace the env-var API-key map with a Cosmos DB lookup (planned).

Quotas to request before launch¶

Default UAE North limits will not survive production traffic. File quota requests for:

Azure OpenAI TPM (tokens / minute) on your gpt-4.1 deployment — minimum 100k for low single-digit concurrent users.
Azure Speech continuous recognition concurrent sessions — minimum 50 for a small contact center.
App Service outbound TCP connections per instance — Premium v3 default is 8000.

Logs & telemetry¶

Every session emits to Application Insights:

Event	Payload
`session.created`	`tenant_id`, `session_id`, `voice`, `languages`
`transcript.final`	`text` (PII-redacted if enabled), `language`, `duration_ms`
`response.done`	`tokens_in`, `tokens_out`, `latency_ms`
`barge_in.detected`	`spoken_chars`, `at_ms`
`tool.call`	`name`, `latency_ms`
`error`	`code`, `message`
`session.closed`	`duration_ms`, `audio_in_bytes`, `audio_out_bytes`

Sample KQL:

// p95 first-audio latency over the last 24h
customEvents
| where name == "response.done"
| extend latency = todouble(customMeasurements["latency_ms"])
| summarize p50=percentile(latency, 50), p95=percentile(latency, 95), n=count() by bin(timestamp, 1h)

Cost guidance¶

Per-minute voice cost (rough, based on 2026 retail prices):

Component	$/minute (USD)
Azure Speech STT (standard real-time)	~$0.017
Azure Speech TTS (Neural)	~$0.026
Azure OpenAI gpt-4.1 (one short reply, ~200 tokens)	~$0.005
Voice round-trip total	~$0.05 / minute

Plus fixed:

App Service P1v3: ~$160/mo
Redis Basic C0: ~$16/mo
Azure AI Search Basic: ~$75/mo (if RAG enabled)

Limits today¶

Setting	Default	Where to change
Max session duration	30 min	`MAX_SESSION_SECONDS` env var
Max audio frame size	32 KB	`--ws-max-size` uvicorn flag
JWT TTL	5 min	`AUDVOICE_JWT_TTL_SECONDS` env var
Sentence-boundary trigger	`.!?؟。…\n`	`apps/orchestrator/audvoice/tts.py:SENTENCE_TERMINATORS`
Default silence end-of-turn	600 ms	`DEFAULT_SILENCE_MS` env var, per-session via `session.update.turn_detection.silence_ms`

Roadmap¶

Cosmos DB tenant store (replaces env-var API-key map)
Per-tenant rate limits (concurrent sessions, minutes/month)
PII-redaction and sentiment via Azure AI Language
Telephony channel via Azure Communication Services SIP
Mobile SDKs (iOS / Android) wrapping the WS protocol
Optional disable_llm mode for client-side Agent Framework integration