SEQN Auth Monitoring and Non-Email Alerts

SEQN Auth should be monitored like a customer-facing SaaS control plane. The MVP excludes email delivery, so alerting should use non-email channels such as Slack, Discord, ntfy, PagerDuty, Opsgenie, Uptime Kuma, Grafana Alertmanager, or a private webhook receiver.

Monitoring principles

  • Monitor from outside the VPS for public HTTPS, DNS, and Caddy coverage.
  • Monitor from inside the VPS for container, database, backup, and disk coverage.
  • Keep secret-bearing checks inside trusted runners only.
  • Alert on symptoms first, then use logs and admin audit events for root-cause analysis.
  • Prefer low-noise alerts with clear severity and runbook links.

Synthetic checks

| Check | Location | Cadence | Expected result | Alert when | | --- | --- | --- | --- | --- | | Public health | External | 1 minute | GET https://accounts.seqn.in/healthz returns 200 | 2 consecutive failures | | Public config | External | 5 minutes | GET /v1/config returns safe public JSON | 2 consecutive failures | | Client config | Trusted synthetic runner | 5 minutes | GET /v1/client/config with a canary pk_live_ returns an active app | 2 consecutive failures | | Backend key health | Trusted synthetic runner | 5 minutes | GET /v1/backend/application with a canary sk_live_ returns ok: true | 2 consecutive failures | | Hosted console boundary | External | 5 minutes | Anonymous GET /auth/console redirects to login | Response is 200 with console data or 5xx | | Admin boundary | External | 5 minutes | Anonymous GET /v1/audit-logs returns 401 | Response is 200 or 5xx | | OIDC discovery | External | 5 minutes | Authentik issuer discovery returns 200 | 2 consecutive failures | | TLS certificate | External | Daily | Cert expires in more than 14 days | Expiry under 14 days | | Docker health | VPS | 1 minute | Authentik, Redis, Postgres, and Silver Auth containers are healthy | Any core service unhealthy for 3 minutes | | Database | VPS | 1 minute | Postgres healthchecks pass and disk has free space | Healthcheck failure or disk under 20 percent free | | Backups | VPS | Hourly | Latest local backup is younger than the schedule SLA | Backup age exceeds SLA | | Offsite backups | VPS or offsite monitor | Hourly | Latest encrypted offsite archive is younger than the schedule SLA | Offsite age exceeds SLA |

Example probes

Public unauthenticated checks:


curl -fsS https://accounts.seqn.in/healthz
curl -fsS https://accounts.seqn.in/v1/config
curl -fsSI https://accounts.seqn.in/auth/console

Trusted key checks:


curl -fsS \
  -H "X-SEQN-Publishable-Key: $SEQN_AUTH_PUBLISHABLE_KEY" \
  "$SEQN_AUTH_BASE_URL/v1/client/config"

curl -fsS \
  -H "Authorization: Bearer $SEQN_AUTH_SECRET_KEY" \
  "$SEQN_AUTH_BASE_URL/v1/backend/application"

Container checks:


cd /srv/silver-auth
docker compose -f deploy/vps-stack.compose.yml --env-file .env ps

Backup freshness check:


find /srv/silver-auth/backups -mindepth 1 -maxdepth 1 -type d -printf '%T@ %p\n' | sort -n | tail -1

Alert routing

| Severity | Examples | Target | Response target | | --- | --- | --- | --- | | P0 | Public health down, auth callback broken, DB unavailable, restore needed | PagerDuty/Opsgenie/phone push plus Slack incident channel | 15 minutes | | P1 | Backend key health fails, OIDC discovery down, offsite backup stale, sustained 5xx | Slack incident channel plus on-call push | 1 hour | | P2 | Single synthetic failure, high 429, webhook retry backlog, disk under 30 percent | Slack ops channel or ntfy | Next business day | | P3 | Documentation drift, pricing-policy cleanup, non-urgent hardening task | Project tracker | Planned work |

Non-email alert payload

Use a compact JSON payload for webhook-based alerting:


{
  "service": "seqn-auth",
  "severity": "P1",
  "check": "backend-key-health",
  "status": "firing",
  "summary": "Canary backend key health failed twice",
  "runbook": "docs/seqn-auth/support-admin-runbook.md",
  "startedAt": "2026-05-12T00:00:00Z"
}

Never include sk_live_, whsec_, OIDC client secrets, Authentik API tokens, database passwords, session cookies, or backup encryption keys in alert payloads.

Alert review cadence

  • Daily: check active alerts, recent 5xx, recent 429, and backup freshness.
  • Weekly: review noisy alerts and adjust thresholds.
  • Monthly: test one P1 alert route and one backup freshness alert.
  • Quarterly: run a tabletop restore and incident response drill.