Deployment¶
The repo ships two Bicep templates:
| File | Scope | What it provisions |
|---|---|---|
infra/minimal.bicep |
Resource group | AI Services + gpt-4.1 GlobalStandard. For local dev / testing. |
infra/main.bicep |
Subscription | RG + AI Services + Search + Redis + Key Vault + App Insights + App Service Linux container. Full v1 production stack. |
Quick — minimal (local dev)¶
az login
az group create -n rg-audvoice -l uaenorth
az deployment group create -g rg-audvoice -f infra/minimal.bicep
Then grant your user the Entra roles (one-time, per-subscription):
AI_ID=$(az cognitiveservices account list -g rg-audvoice --query "[0].id" -o tsv)
OBJ=$(az ad signed-in-user show --query id -o tsv)
for role in "Cognitive Services User" "Cognitive Services OpenAI User" "Cognitive Services Speech User"; do
az role assignment create --assignee-object-id $OBJ --assignee-principal-type User \
--role "$role" --scope "$AI_ID"
done
Now uvicorn audvoice.main:app runs locally against real Azure with az login.
Production — full stack¶
1. Build and push the image¶
ACR=audvoice$(openssl rand -hex 3)
az acr create -g rg-audvoice -n $ACR --sku Basic
az acr login -n $ACR
cd apps/orchestrator
docker build -t $ACR.azurecr.io/audvoice:0.1.0 .
docker push $ACR.azurecr.io/audvoice:0.1.0
2. Deploy¶
export AUDVOICE_JWT_SECRET=$(openssl rand -base64 48)
export AUDVOICE_API_KEYS="prodkey1:tenantA,prodkey2:tenantB"
az deployment sub create \
--location uaenorth \
--template-file infra/main.bicep \
--parameters infra/main.bicepparam \
--parameters containerImage=$ACR.azurecr.io/audvoice:0.1.0
The template:
- Uses System-Assigned Managed Identity on the App Service.
- Auto-grants the identity Speech + OpenAI roles on the AI Services resource (or grant manually if your subscription policy blocks role assignments from templates).
- Reads
AUDVOICE_JWT_SECRETandAUDVOICE_API_KEYSfrom Key Vault references — no plaintext secrets in App Service. - Enables WebSockets, ARR affinity, HTTP/2, Premium v3 P1v3 (required for sustained WS).
3. Verify¶
APP=$(az webapp list -g rg-audvoice --query "[0].defaultHostName" -o tsv)
curl https://$APP/healthz # → {"status":"ok"}
curl -X POST https://$APP/v1/sessions \
-H "X-API-Key: prodkey1" -H "Content-Type: application/json" -d '{}'
Open apps/web-demo/index.html in any browser, point Server at https://$APP, and start talking.
CI/CD¶
.github/workflows/ci.yml runs unit tests on every push. For container build + deploy, add a workflow that:
- Logs in to Azure with OIDC (
azure/login@v2). - Builds and pushes the image to ACR.
- Runs
az webapp config container setto roll the new image.
We don't ship that workflow because every team has a different Azure auth model — wire it to your federation.
Region choice¶
| Need | Pick |
|---|---|
| Strict UAE residency for Speech | uaenorth (this is the whole point) |
| Lowest LLM latency | Run AI Services in swedencentral, accept LLM-leaves-region |
| Sovereign / classified | Different stack — Core42, not this repo |
You can split: AI Services for Speech in UAE North, a second AI Services for OpenAI in Sweden Central. Set AZURE_OPENAI_ENDPOINT to the Sweden one and AZURE_SPEECH_RESOURCE_ID to the UAE one. Latency improves, residency for STT/TTS preserved, LLM data leaves UAE.
Quotas & scale¶
- Speech: ~100 concurrent recognition + 200 concurrent synthesis per resource by default. Quotas.
- Azure OpenAI: tokens/min per deployment. Request increase before launch.
- App Service P1v3: ~250 concurrent WebSocket sessions per instance with our default config. Scale out horizontally; sessions are sticky via ARR affinity.
Observability¶
App Insights is wired in via APPLICATIONINSIGHTS_CONNECTION_STRING. Custom events you'll see:
session.created,session.closed,session.duration_msbarge_in.detected— count + spoken_chars histogramllm.tokens— prompt + completion per turnerror— bycode
Log Analytics queries live in infra/queries/ (planned).