Launch & ops
Production readiness
Security, webhooks, reliability, and deployment checklist — technical launch bar.
Production readiness — review & checklist
This document summarizes gaps and recommendations after reviewing the current codebase (APIs, dashboard, Supabase, approvals, webhooks). Use it as a working checklist before calling the system “production-grade” for customers.
Priority legend: P0 = block or materially risky; P1 = expected for serious production; P2 = scale / polish.
1. Security & abuse resistance
| Priority | Item | Notes |
|---|---|---|
| P0 | SSRF / webhook safety | Mitigated in code: assertWebhookUrlSafeForDispatch runs before evaluate accepts a URL and before dispatch (lib/api/webhookUrlSafety.ts). Production: HTTPS only, blocked hostnames (incl. metadata-style), no private/reserved IPv4/IPv6 literals, DNS resolution (with timeout) to catch names that resolve to non-public IPs. Non-prod allows HTTP and local addresses for testing. Optional: per-workspace hostname allowlist (workspace_integration_settings.webhook_allowed_hostnames, dashboard → integrator webhooks). |
| P0 | Secrets in production | Ensure SUPABASE_SERVICE_ROLE_KEY, APPROVAL_OTP_SECRET (required in production — getOtpPepper() throws if missing when a critical OTP stage runs; non-prod may still fall back to service role), RESEND_API_KEY (approval email), and NEXT_PUBLIC_APP_URL (canonical https:// origin for links; Vercel also has VERCEL_URL) are set on the host and never logged. Checklist + script: .env.example, npm run verify:production-env (runbook). Optionally INTEGRATION_WEBHOOK_HMAC_SECRET, Upstash, Sentry, CRON_SECRET. |
| P1 | Rate limiting | Optional app limits: with Upstash env vars set, Edge middleware applies sliding-window limits per client IP on POST /api/v1/evaluate, POST /api/v1/receipt, and POST /api/approval/* (lib/middleware/rateLimit.ts). POST evaluate/receipt also apply per hashed API key in the Node route (lib/api/apiKeyRateLimit.ts, RATE_LIMIT_PER_API_KEY_PER_MINUTE, default 600/min). Tune with RATE_LIMIT_*_PER_MINUTE; RATE_LIMIT_DISABLED=1 disables both. Still use CDN/WAF as the first line. |
| P1 | Webhook authenticity | Supported: outbound POSTs may include X-AgentNexus-Signature: sha256=<hmac> over the body (lib/api/integrationWebhook.ts). Per-workspace signing secret in workspace_integration_settings (dashboard → Integrator webhooks); else env INTEGRATION_WEBHOOK_HMAC_SECRET. Runbook: runbooks/OPERATIONS.md (signing, hostname allowlist, rotation, replay). |
| P1 | Approval token handling | Tokens are 128-bit hex (reasonable). Ensure links are HTTPS-only, email clients don’t prefetch in a way that triggers side effects (GET approve should remain non-mutating—today decisions are POST). |
| P2 | CSP / headers | Baseline + CSP-Report-Only includes script-src 'self', style-src 'self' 'unsafe-inline' (Next-friendly), object-src 'none', frame-ancestors. Still todo: report-to / collector, enforce CSP (drop report-only), HSTS at CDN. |
2. Reliability & operations
| Priority | Item | Notes |
|---|---|---|
| P0 | Email dependency | Approvals depend on Resend (and verified sender domain). Runbook: docs/runbooks/OPERATIONS.md (symptoms, logs, Sentry). Retry queue not shipped; failures log approval_email_failed / Sentry (approval_email). |
| P0 | Webhook delivery | webhook_deliveries + audit detail UI. Replay failed: dashboard Replay delivery (workspace members) or POST /api/cron/replay-webhook + CRON_SECRET (runbook). Next: scheduled multi-id replay, optional admin-only replay. |
| P1 | Approval timeout cron | If policies use approval_timeout_seconds, schedule POST /api/cron/approval-timeouts with CRON_SECRET (APPROVAL_TIMEOUTS.md). Without it, time-boxed windows never enforce. |
| P1 | Observability | Structured JSON logs via apiLogLine on evaluate/receipt success paths; x-request-id on evaluate, receipt, and approval API responses (lib/observability). Optional Sentry: set SENTRY_DSN — instrumentation.ts + sentry.*.config.ts; failed webhook deliveries call Sentry.captureException. Still todo: full APM depth, log aggregation dashboards, server action coverage. |
| P1 | Health checks | GET /api/health — fast liveness by default; GET /api/health?deep=1 runs a trivial Supabase select (503 if DB unreachable). Point uptime checks at the shallow path unless you need DB proof. |
| P1 | Database | Confirm all migrations applied in Supabase prod; enable Point-in-Time Recovery / backups per Supabase plan; document restore drill. |
| P2 | Queue for async work | Long-term: move webhook send and email fan-out to a queue (or Supabase Edge Functions + PGMQ) so API latency and failure modes are cleaner. |
3. Correctness & product behavior
| Priority | Item | Notes |
|---|---|---|
| P1 | evaluate idempotency | Idempotency-Key header (max 128 chars): unique per workspace; replays return the same evaluation response; mismatched body → 409. Columns idempotency_key, idempotency_fingerprint on evaluations (migration 20260331210000_evaluations_idempotency.sql). |
| P1 | webhook_url optional | webhook_url is optional in the Zod schema; DB column is nullable (migration 20260331203000_evaluations_webhook_optional.sql). When absent, no integrator POST. When present: approval_required (pending human / next stage) and terminal_outcome deliveries per public/openapi.yaml / IntegratorWebhookPayload. |
| P1 | Error contract stability | public/openapi.yaml describes main API surfaces; tune servers to your host. 500 JSON from evaluate/receipt/idempotent replay uses stable code: "server_error"; Supabase/Postgres codes are logged only (logSupabaseClientError). Still: document all error codes in OpenAPI; version breaking changes. |
| P2 | Approval assignment | Today all workspace members with the stage role are emailed. Enterprises may need single assignee, round-robin, or on-call—plan as a roadmap item so marketing doesn’t over-promise. |
4. Engineering quality
| Priority | Item | Notes |
|---|---|---|
| P1 | Automated tests | Vitest: webhook URL safety; Zod contracts; route tests — evaluate (incl. Idempotency-Key replay + 23505 race), receipt (mockReceiptSupabase), approval approve / reject / request-otp / verify-otp, health. Add: optional Supabase integration tests. |
| P1 | CI pipeline | .github/workflows/ci.yml runs format check, tsc, lint, tests, and next build on push/PR to main. |
| P2 | Dependency hygiene | Scheduled npm audit, pin major deps, Dependabot/Renovate. |
5. Compliance, legal, and go-to-market
| Priority | Item | Notes |
|---|---|---|
| P1 | Privacy & terms | /privacy and /terms stubs + link to engineering inventory docs/DATA_HANDLING.md (non-legal). Todo: counsel-approved policies, DPA, public subprocessors list. |
| P1 | Data retention | Outline in DATA_HANDLING.md. Todo: implement purge jobs / SQL schedules for evaluations, audit_log, webhook_deliveries per contract. |
| P2 | SOC 2 / ISO | Shipped (product aid): dashboard Compliance reports — ZIP (Markdown + PDF summary + CSV samples) or PDF-only; optional COMPLIANCE_EXPORT_ADMIN_ONLY; optional Upstash rate limit on export (COMPLIANCE_REPORTS.md). Still required for attestation: formal control matrix, evidence collection process, and auditor engagement—not replaced by the export. |
Engineering inventory (non-legal): categories for integrator webhook data (URLs, delivery snapshots, optional HMAC / hostname allowlist in workspace_integration_settings) are outlined in DATA_HANDLING.md. Use it with counsel when drafting privacy terms and retention scope.
6. Deployment & configuration (current stack)
| Item | Notes |
|---|---|
| Environment matrix | Maintain .env.example as source of truth (incl. optional CRON_SECRET for webhook replay). Run npm run verify:production-env before go-live. Ops notes: runbooks/OPERATIONS.md. |
| Supabase Auth | Confirm Site URL, redirect URLs, and email templates for production domain. |
| Build | next build should run in CI; address any known trace / platform quirks on the host. |
| Repository | Ensure .env.local is gitignored and no secrets committed. |
7. What is already in good shape (for context)
- API key hashing, workspace resolution from key, service role confined to server routes.
- Policy engine with validation, templates, multi-stage + OTP tier support.
- Audit events and evaluation/stage modeling with a usable dashboard; Governance analytics (
/dashboard/analytics) for aggregate approval metrics and rejection breakdown (GOVERNANCE_ANALYTICS.md). - Receipt path with idempotent semantics (per route implementation).
- UX hardening on key flows (loading states, server vs client component boundaries for filters).
- Archived workspaces and guarded delete flows reduce accidental data loss.
/api/healthfor monitors; evaluate idempotency for safe agent retries; baseline security headers on all routes.- Webhooks:
webhook_deliveries+ audit UI; replay (dashboard +POST /api/cron/replay-webhook+CRON_SECRET); per-workspace HMAC + optional hostname allowlist (workspace_integration_settings, dashboard Integrator webhooks). - Production env:
.env.examplechecklist,npm run verify:production-env,npm run verify(remote DB schema viaverify:db-schema). - Optional Upstash rate limits; correlation ids + JSON logs on primary APIs; OpenAPI at
/openapi.yaml. - MCP gateway —
POST /api/mcpforwards evaluate to the same host; setNEXT_PUBLIC_APP_URL(or rely onVERCEL_URL) so server-sidefetchresolves. OptionalMCP_ALLOWED_ORIGINSfor browserOriginpinning (MCP_GATEWAY.md). - Enterprise SSO (SAML / OAuth) — login UI and
/auth/callbackhandling when env flags are set; Supabase Pro+ and IdP registration still required per ENTERPRISE_SSO.md. - Policy-as-Code (GitHub) —
policy_git_links+ signedPOST /api/webhooks/github/policy-git/[linkId]; setNEXT_PUBLIC_APP_URLfor correct webhook URLs; migration20260331310000_policy_git_links.sql— POLICY_AS_CODE_GIT.md. - Official thin SDKs (evaluate + receipt): Python
integrations/python/agentnexus, TypeScriptintegrations/typescript/agentnexus-sdk; framework recipes in FRAMEWORK_INTEGRATIONS.md. Browse all docs on the deployed app at/docs(same markdown, rendered).
8. Suggested order of attack
Done (code / runbooks)
SSRF / webhook URL policy (P0)— baseline inwebhookUrlSafety.ts; optional per-workspace hostname allowlist + dashboard (Integrator webhooks).Webhook authenticity & tenant controls— per-workspace HMAC + globalINTEGRATION_WEBHOOK_HMAC_SECRET; signing + allowlist run on evaluate, dispatch, and replay; OPERATIONS.md.Webhook delivery visibility & replay—webhook_deliveries, audit UI, dashboard replay, cron route +CRON_SECRET.Dedicated— required in production (APPROVAL_OTP_SECRETgetOtpPepper()); dev may fall back to service role.Production env checklist—.env.example+npm run verify:production-env(+ optionalVERIFY_PRODUCTION_ENV_STRICT=1); OPERATIONS.md “Production environment pass”.CI + tests—.github/workflows/ci.yml; Vitest includes webhook URL safety, evaluate/receipt/approval/health routes.Evaluate idempotency—Idempotency-Key+ DB columns (migration20260331210000_evaluations_idempotency.sql).Health endpoint—GET /api/health, optional?deep=1.Remote schema smoke check—npm run verify→verify:db-schema(expects.env.local+ applied migrations).
Partial / host configuration
- Rate limiting — optional Upstash (IP middleware + per API key on evaluate/receipt); still use CDN/WAF.
- Observability — structured logs +
x-request-id; optional Sentry; still: log aggregation / APM depth if you need them. - Database ops — confirm all migrations in prod; PITR/backups; restore drill (Supabase + process).
- Supabase Auth — production Site URL / redirect URLs / email templates.
Still open (prioritize by customer promises)
- Webhook automation — batch / multi-
delivery_idreplay (or scheduled sweep); optional admin-only replay policy. - Email reliability — retry queue or scheduled re-notify for approval mail (Resend remains single-shot today).
- API contract — document stable error
codes in OpenAPI for integrators. - CSP / HSTS — enforce CSP (beyond report-only where applicable); HSTS at CDN.
- Legal + retention — counsel-approved privacy/terms/DPA, subprocessors; retention jobs for evaluations / audit / webhook tables per contract.
Update this section as items close; keep it aligned with what sales and security actually promise.
9. Full production program (unified ordering)
The numbered “path toward full production” in this repo merges technical readiness (this document) with product/GTM priorities from the March 2026 viability analysis. Use it as the master sequence for planning:
VIABILITY_AND_GTM_REFERENCE.md §8 — Ordered path toward full production
Review reflects repository state; adjust for your hosting and compliance targets.