Operations & residency¶
Data residency¶
| Pipeline stage | Where it runs | Region-locked? |
|---|---|---|
| WebSocket / orchestrator | App Service in UAE North | ✅ yes |
| STT (Azure Speech) | AI Services in UAE North | ✅ yes — audio + transcripts stay regional |
| TTS (Azure Speech) | AI Services in UAE North | ✅ yes — text + audio stay regional |
| LLM (Azure OpenAI) | UAE North deployment | ⚠️ See below |
| LLM (Foundry / OpenAI) | Wherever you point it | ❌ Document in your privacy notice |
| Session state (Redis) | UAE North | ✅ yes |
| RAG index (Azure AI Search) | UAE North | ✅ yes |
| Logs (App Insights / Log Analytics) | UAE North | ✅ yes |
The UAE OpenAI caveat¶
In UAE North, gpt-4.1 is deployable as GlobalStandard SKU only — Microsoft routes inference globally for capacity, so request data may transit other Azure regions. Speech (STT/TTS) is unaffected and stays in UAE North.
Your options:
- Accept GlobalStandard — fastest path, single region in your bill, but document the global routing in your privacy notice and DPA.
- Use a smaller regional Standard model if/when one becomes available in UAE North.
- Move LLM to Sweden Central for EU residency — split deployment, see Deployment / region choice.
- Bring your own model via
LLM_BACKEND=openaipointed at on-prem vLLM / Foundry Local. - Sovereign deployment via Core42 — separate engagement, not this repo.
Auth¶
- Client → orchestrator: API key (
X-API-Key) → JWT (HS256, 5 min TTL) → WebSocket auth header. - Orchestrator → Azure:
DefaultAzureCredential(Managed Identity in App Service,az loginlocally). Speech SDK usesauth_token=aad#<resource_id>#<jwt>; AOAI usesazure_ad_token_provider.
For production, replace the env-var API-key map with a Cosmos DB lookup (planned).
Quotas to request before launch¶
Default UAE North limits will not survive production traffic. File quota requests for:
- Azure OpenAI TPM (tokens / minute) on your
gpt-4.1deployment — minimum 100k for low single-digit concurrent users. - Azure Speech continuous recognition concurrent sessions — minimum 50 for a small contact center.
- App Service outbound TCP connections per instance — Premium v3 default is 8000.
Logs & telemetry¶
Every session emits to Application Insights:
| Event | Payload |
|---|---|
session.created |
tenant_id, session_id, voice, languages |
transcript.final |
text (PII-redacted if enabled), language, duration_ms |
response.done |
tokens_in, tokens_out, latency_ms |
barge_in.detected |
spoken_chars, at_ms |
tool.call |
name, latency_ms |
error |
code, message |
session.closed |
duration_ms, audio_in_bytes, audio_out_bytes |
Sample KQL:
// p95 first-audio latency over the last 24h
customEvents
| where name == "response.done"
| extend latency = todouble(customMeasurements["latency_ms"])
| summarize p50=percentile(latency, 50), p95=percentile(latency, 95), n=count() by bin(timestamp, 1h)
Cost guidance¶
Per-minute voice cost (rough, based on 2026 retail prices):
| Component | $/minute (USD) |
|---|---|
| Azure Speech STT (standard real-time) | ~$0.017 |
| Azure Speech TTS (Neural) | ~$0.026 |
| Azure OpenAI gpt-4.1 (one short reply, ~200 tokens) | ~$0.005 |
| Voice round-trip total | ~$0.05 / minute |
Plus fixed:
- App Service P1v3: ~$160/mo
- Redis Basic C0: ~$16/mo
- Azure AI Search Basic: ~$75/mo (if RAG enabled)
Limits today¶
| Setting | Default | Where to change |
|---|---|---|
| Max session duration | 30 min | MAX_SESSION_SECONDS env var |
| Max audio frame size | 32 KB | --ws-max-size uvicorn flag |
| JWT TTL | 5 min | AUDVOICE_JWT_TTL_SECONDS env var |
| Sentence-boundary trigger | .!?؟。…\n |
apps/orchestrator/audvoice/tts.py:SENTENCE_TERMINATORS |
| Default silence end-of-turn | 600 ms | DEFAULT_SILENCE_MS env var, per-session via session.update.turn_detection.silence_ms |
Roadmap¶
- Cosmos DB tenant store (replaces env-var API-key map)
- Per-tenant rate limits (concurrent sessions, minutes/month)
- PII-redaction and sentiment via Azure AI Language
- Telephony channel via Azure Communication Services SIP
- Mobile SDKs (iOS / Android) wrapping the WS protocol
- Optional
disable_llmmode for client-side Agent Framework integration