Operations
Daily checks
- Check anonymous
/api/health and /api/version.
- Check
/api/deploy-check/report and /supervisor/health with an allowlist-matching identity.
- Review Queue for unusually long
running jobs.
- If GHCR webhooks are enabled, verify there are no streaks of failed deliveries.
Health checks
- Anonymous liveness:
GET /api/health, GET /api/version
- Operator preflight:
GET /api/deploy-check/report
- Supervisor health:
GET /supervisor/health
- Anonymous
401 auth_required on protected routes is expected in the transparent Forward Auth model; treat it as an application boundary signal, not a gateway outage.
Backup strategy
Back up at least:
- SQLite DB (
DOCKREV_DB_PATH)
- Supervisor state file (
DOCKREV_SUPERVISOR_STATE_PATH)
- Deployment compose files and environment config
Backup execution guidance
- Run during off-peak hours
- Avoid launching risky updates during backup windows
- Validate restore periodically
Restore sequence
- Stop Dockrev and Supervisor
- Restore DB and state file
- Restart services
- Verify Queue/Overview consistency
Upgrade and rollback
- Standard upgrade: change image tag and run
docker compose up -d
- Self-upgrade: use supervisor apply/dry-run/rollback APIs
- Failure rollback: revert to previous stable image + restore latest valid backup if needed
Observability suggestions
- Keep job logs for discovery/check/update
- Monitor spikes in
401, 409, 500
- Watch webhook delivery dedup behavior for GHCR integration
Shared testbox regression (optional)
scripts/testbox/*.e2e.ts can validate key flows on the shared test machine:
- Check job concurrency and restart recovery
- Version inference SSE continuity