Deployment¶

The repo ships two Bicep templates:

File	Scope	What it provisions
`infra/minimal.bicep`	Resource group	AI Services + `gpt-4.1` GlobalStandard. For local dev / testing.
`infra/main.bicep`	Subscription	RG + AI Services + Search + Redis + Key Vault + App Insights + App Service Linux container. Full v1 production stack.

Quick — minimal (local dev)¶

az login
az group create -n rg-audvoice -l uaenorth
az deployment group create -g rg-audvoice -f infra/minimal.bicep

Then grant your user the Entra roles (one-time, per-subscription):

AI_ID=$(az cognitiveservices account list -g rg-audvoice --query "[0].id" -o tsv)
OBJ=$(az ad signed-in-user show --query id -o tsv)
for role in "Cognitive Services User" "Cognitive Services OpenAI User" "Cognitive Services Speech User"; do
  az role assignment create --assignee-object-id $OBJ --assignee-principal-type User \
    --role "$role" --scope "$AI_ID"
done

Now uvicorn audvoice.main:app runs locally against real Azure with az login.

Production — full stack¶

1. Build and push the image¶

ACR=audvoice$(openssl rand -hex 3)
az acr create -g rg-audvoice -n $ACR --sku Basic
az acr login -n $ACR

cd apps/orchestrator
docker build -t $ACR.azurecr.io/audvoice:0.1.0 .
docker push  $ACR.azurecr.io/audvoice:0.1.0

2. Deploy¶

export AUDVOICE_JWT_SECRET=$(openssl rand -base64 48)
export AUDVOICE_API_KEYS="prodkey1:tenantA,prodkey2:tenantB"
az deployment sub create \
  --location uaenorth \
  --template-file infra/main.bicep \
  --parameters infra/main.bicepparam \
  --parameters containerImage=$ACR.azurecr.io/audvoice:0.1.0

The template:

Uses System-Assigned Managed Identity on the App Service.
Auto-grants the identity Speech + OpenAI roles on the AI Services resource (or grant manually if your subscription policy blocks role assignments from templates).
Reads AUDVOICE_JWT_SECRET and AUDVOICE_API_KEYS from Key Vault references — no plaintext secrets in App Service.
Enables WebSockets, ARR affinity, HTTP/2, Premium v3 P1v3 (required for sustained WS).

3. Verify¶

APP=$(az webapp list -g rg-audvoice --query "[0].defaultHostName" -o tsv)
curl https://$APP/healthz                      # → {"status":"ok"}
curl -X POST https://$APP/v1/sessions \
  -H "X-API-Key: prodkey1" -H "Content-Type: application/json" -d '{}'

Open apps/web-demo/index.html in any browser, point Server at https://$APP, and start talking.

CI/CD¶

.github/workflows/ci.yml runs unit tests on every push. For container build + deploy, add a workflow that:

Logs in to Azure with OIDC (azure/login@v2).
Builds and pushes the image to ACR.
Runs az webapp config container set to roll the new image.

We don't ship that workflow because every team has a different Azure auth model — wire it to your federation.

Region choice¶

Need	Pick
Strict UAE residency for Speech	`uaenorth` (this is the whole point)
Lowest LLM latency	Run AI Services in `swedencentral`, accept LLM-leaves-region
Sovereign / classified	Different stack — Core42, not this repo

You can split: AI Services for Speech in UAE North, a second AI Services for OpenAI in Sweden Central. Set AZURE_OPENAI_ENDPOINT to the Sweden one and AZURE_SPEECH_RESOURCE_ID to the UAE one. Latency improves, residency for STT/TTS preserved, LLM data leaves UAE.

Quotas & scale¶

Speech: ~100 concurrent recognition + 200 concurrent synthesis per resource by default. Quotas.
Azure OpenAI: tokens/min per deployment. Request increase before launch.
App Service P1v3: ~250 concurrent WebSocket sessions per instance with our default config. Scale out horizontally; sessions are sticky via ARR affinity.

Observability¶

App Insights is wired in via APPLICATIONINSIGHTS_CONNECTION_STRING. Custom events you'll see:

session.created, session.closed, session.duration_ms
barge_in.detected — count + spoken_chars histogram
llm.tokens — prompt + completion per turn
error — by code

Log Analytics queries live in infra/queries/ (planned).