From Code to Production: Our CI/CD Pipeline with Azure and GitHub Actions
When I co-founded HyrecruitAI, deployments were me SSHing into a VM at 2 AM. That worked for the first month. Then we hired two more engineers, broke production twice in a week, and I knew we needed a real pipeline.
Here is how we built a CI/CD system with GitHub Actions and Azure that lets us ship confidently multiple times a day.
The Pipeline Architecture
Our deployment flows through three stages: build and test, staging, and production. Every pull request triggers the first two. Production only fires on merges to main.
# .github/workflows/deploy.yml
name: Deploy Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v1
- run: bun install --frozen-lockfile
- run: bun run lint
- run: bun run test
- run: bun run build
deploy-staging:
needs: build-and-test
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
environment: staging
steps:
- uses: azure/webapps-deploy@v3
with:
app-name: hyrecruit-staging
slot-name: preview
The key insight was using Azure deployment slots. Our staging slot runs on the same App Service plan as production, so we catch environment-specific issues before they reach users.
Environment Management
We maintain three environments with distinct configurations:
- Development -- local machines, local PostgreSQL, mock external services
- Staging -- Azure App Service slot, staging database (seeded with anonymized production data), real third-party integrations in sandbox mode
- Production -- Azure App Service, production PostgreSQL on Azure Database for PostgreSQL
Each environment has its own .env template tracked in the repo. The actual values live in GitHub Environments, which scope secrets per deployment target.
Secrets Handling
This is where most teams get it wrong. We follow a strict hierarchy:
- GitHub Environment Secrets for deployment credentials (
AZURE_WEBAPP_PUBLISH_PROFILE,DATABASE_URL) - Azure Key Vault for application secrets that the running app needs (
OPENAI_API_KEY,STRIPE_SECRET_KEY) - Never in code. We run
gitleaksas a pre-commit hook and as a CI step
- name: Pull secrets from Key Vault
uses: Azure/get-keyvault-secrets@v1
with:
keyvault: hyrecruit-vault
secrets: 'OPENAI-API-KEY, DATABASE-URL, REDIS-URL'
id: keyvault
The app reads secrets at startup and fails fast if any required value is missing. No silent fallbacks to defaults.
Rollback Strategy
Rolling back is not optional -- it is a core feature of the pipeline. We use Azure deployment slots for zero-downtime rollbacks:
- Every production deploy first goes to a pre-production slot
- We run a health check suite against the slot (database connectivity, external API reachability, critical endpoint smoke tests)
- If health checks pass, we swap the slot into production
- If something breaks post-swap, we swap back in under 30 seconds
az webapp deployment slot swap \
--resource-group hyrecruit-rg \
--name hyrecruit-prod \
--slot pre-production \
--target-slot production
We also tag every production deployment in git and keep the last 10 build artifacts in GitHub Actions. If a slot swap is not sufficient, we can redeploy any previous build within minutes.
Database Migrations
Migrations deserve special attention. We run Drizzle ORM migrations as a separate CI step before the application deploy. The rule is simple: migrations must be backward-compatible. If a migration would break the currently running version, we split it into two releases -- one that adds the new schema, and one that removes the old.
Post-Deploy Monitoring
Every production deploy triggers a 10-minute monitoring window. We watch three signals:
- Error rate via Sentry — alerts if it increases by more than 2x baseline
- p95 latency via Azure Application Insights — alerts if it exceeds 2 seconds
- Interview completion rate — our key business metric. If it drops by more than 10%, something user-facing is broken
If any signal crosses its threshold during the monitoring window, we auto-rollback via slot swap. The rollback is triggered by an Azure Monitor alert rule that calls a webhook, which in turn runs the slot swap command. Total time from detection to rollback: under 60 seconds.
Our health check endpoint (/api/health) validates more than just "the server is running." It checks:
- Database connectivity (a lightweight
SELECT 1query) - Redis connectivity
- External API reachability (Azure Speech Services, OpenAI)
- Disk space and memory (basic thresholds)
If any check fails, the endpoint returns a 503. The deployment slot health probe calls this endpoint, and a failing probe blocks the slot swap from proceeding.
Alert routing: critical alerts (error rate spike, health check failure) go to PagerDuty. Warning alerts (latency increase, elevated queue depth) go to a Slack channel. Informational notifications (deploy started, deploy completed, slot swapped) go to the deploy channel.
Canary Deployments for High-Risk Changes
For changes that touch the database schema, AI model configuration, or core interview flow, we use a weighted traffic split instead of an all-or-nothing slot swap:
- Route 10% of traffic to the new version for 1 hour
- If signals are healthy, increase to 50% for 30 minutes
- If still healthy, route 100% (full swap)
Azure App Service supports traffic routing between slots natively:
az webapp traffic-routing set \
--resource-group hyrecruit-rg \
--name hyrecruit-prod \
--distribution pre-production=10
The canary saved us once. A Drizzle ORM minor version update changed query generation behavior for a specific left join pattern. The 10% canary showed a 3x increase in query duration for dashboard page loads. We caught it within 20 minutes and reverted the canary before 90% of users were ever exposed.
What We Learned
After eight months of running this pipeline, three things stand out:
- Deployment slots are underrated. The ability to swap and swap back in seconds has saved us multiple times.
- Environment parity matters. Our staging database uses the same PostgreSQL version and extensions as production. No surprises.
- Fast feedback loops change behavior. When deploys take 3 minutes instead of 30, engineers deploy smaller changes more frequently, which means fewer incidents.
The total setup took us about a week. The time saved since then is immeasurable. If you are building on Azure, deployment slots plus GitHub Actions is a combination that punches well above its weight.