SEQN Auth Monitoring and Non-Email Alerts

SEQN Auth should be monitored like a customer-facing SaaS control plane. The MVP excludes email delivery, so alerting should use non-email channels such as Slack, Discord, ntfy, PagerDuty, Opsgenie, Uptime Kuma, Grafana Alertmanager, or a private webhook receiver.

Monitoring principles

Monitor from outside the VPS for public HTTPS, DNS, and Caddy coverage.
Monitor from inside the VPS for container, database, backup, and disk coverage.
Keep secret-bearing checks inside trusted runners only.
Alert on symptoms first, then use logs and admin audit events for root-cause analysis.
Prefer low-noise alerts with clear severity and runbook links.

Synthetic checks

| Check | Location | Cadence | Expected result | Alert when | | --- | --- | --- | --- | --- | | Public health | External | 1 minute | GET https://accounts.seqn.in/healthz returns 200 | 2 consecutive failures | | Public config | External | 5 minutes | GET /v1/config returns safe public JSON | 2 consecutive failures | | Client config | Trusted synthetic runner | 5 minutes | GET /v1/client/config with a canary pk_live_ returns an active app | 2 consecutive failures | | Backend key health | Trusted synthetic runner | 5 minutes | GET /v1/backend/application with a canary sk_live_ returns ok: true | 2 consecutive failures | | Hosted console boundary | External | 5 minutes | Anonymous GET /auth/console redirects to login | Response is 200 with console data or 5xx | | Admin boundary | External | 5 minutes | Anonymous GET /v1/audit-logs returns 401 | Response is 200 or 5xx | | OIDC discovery | External | 5 minutes | Authentik issuer discovery returns 200 | 2 consecutive failures | | TLS certificate | External | Daily | Cert expires in more than 14 days | Expiry under 14 days | | Docker health | VPS | 1 minute | Authentik, Redis, Postgres, and Silver Auth containers are healthy | Any core service unhealthy for 3 minutes | | Database | VPS | 1 minute | Postgres healthchecks pass and disk has free space | Healthcheck failure or disk under 20 percent free | | Backups | VPS | Hourly | Latest local backup is younger than the schedule SLA | Backup age exceeds SLA | | Offsite backups | VPS or offsite monitor | Hourly | Latest encrypted offsite archive is younger than the schedule SLA | Offsite age exceeds SLA |

Example probes

Public unauthenticated checks:


curl -fsS https://accounts.seqn.in/healthz
curl -fsS https://accounts.seqn.in/v1/config
curl -fsSI https://accounts.seqn.in/auth/console

Trusted key checks:


curl -fsS \
  -H "X-SEQN-Publishable-Key: $SEQN_AUTH_PUBLISHABLE_KEY" \
  "$SEQN_AUTH_BASE_URL/v1/client/config"

curl -fsS \
  -H "Authorization: Bearer $SEQN_AUTH_SECRET_KEY" \
  "$SEQN_AUTH_BASE_URL/v1/backend/application"

Container checks:


cd /srv/silver-auth
docker compose -f deploy/vps-stack.compose.yml --env-file .env ps

Backup freshness check:


find /srv/silver-auth/backups -mindepth 1 -maxdepth 1 -type d -printf '%T@ %p\n' | sort -n | tail -1

Alert routing

| Severity | Examples | Target | Response target | | --- | --- | --- | --- | | P0 | Public health down, auth callback broken, DB unavailable, restore needed | PagerDuty/Opsgenie/phone push plus Slack incident channel | 15 minutes | | P1 | Backend key health fails, OIDC discovery down, offsite backup stale, sustained 5xx | Slack incident channel plus on-call push | 1 hour | | P2 | Single synthetic failure, high 429, webhook retry backlog, disk under 30 percent | Slack ops channel or ntfy | Next business day | | P3 | Documentation drift, pricing-policy cleanup, non-urgent hardening task | Project tracker | Planned work |

Non-email alert payload

Use a compact JSON payload for webhook-based alerting:


{
  "service": "seqn-auth",
  "severity": "P1",
  "check": "backend-key-health",
  "status": "firing",
  "summary": "Canary backend key health failed twice",
  "runbook": "docs/seqn-auth/support-admin-runbook.md",
  "startedAt": "2026-05-12T00:00:00Z"
}

Never include sk_live_, whsec_, OIDC client secrets, Authentik API tokens, database passwords, session cookies, or backup encryption keys in alert payloads.

Alert review cadence

Daily: check active alerts, recent 5xx, recent 429, and backup freshness.
Weekly: review noisy alerts and adjust thresholds.
Monthly: test one P1 alert route and one backup freshness alert.
Quarterly: run a tabletop restore and incident response drill.