Backwards compat shim (task 27)
- requireCliAuth() falls back to body.user_id when BROKER_LEGACY_AUTH=1
and no bearer present. Sets Deprecation + Warning headers + bumps a
broker_legacy_auth_hits_total metric so operators can watch the
legacy traffic drain to 0 before removing the shim.
- All handlers parse body BEFORE requireCliAuth so the fallback can
read user_id out of it.
HA readiness (task 29)
- .artifacts/specs/2026-04-15-broker-ha-statelessness-audit.md
documents every in-memory symbol and rollout plan (phase 0-4).
- packaging/docker-compose.ha-local.yml spins up 2 broker replicas
behind Traefik sticky sessions for local smoke testing.
- apps/broker/src/audit.ts now wraps writes in a transaction that
takes pg_advisory_xact_lock(meshId) and re-reads the tail hash
inside the txn. Concurrent broker replicas can no longer fork the
audit chain.
Deploy gate (task 30)
- /health stays permissive (200 even on transient DB blips) so
Docker doesn't kill the container on a glitch.
- New /health/ready checks DB + optional EXPECTED_MIGRATION pin,
returns 503 if either fails. External deploy gate can poll this
and refuse to promote a broken deploy.
Metrics dashboard (task 32)
- packaging/grafana/claudemesh-broker.json: ready-to-import Grafana
dashboard covering active conns, queue depth, routed/rejected
rates, grant drops, legacy-auth hits, conn rejects.
Tests (task 28)
- audit-canonical.test.ts (4 tests) pins canonical JSON semantics.
- grants-enforcement.test.ts (6 tests) covers the member-then-
session-pubkey lookup with default/explicit/blocked branches.
Docs (task 34)
- docs/env-vars.md catalogues every env var the broker + CLI read.
Crypto review prep (task 35)
- .artifacts/specs/2026-04-15-crypto-review-packet.md: reviewer
brief, threat model, scope, test coverage list, deliverables.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- apps/cli/ is now the canonical CLI (was apps/cli-v2/).
- apps/cli/ legacy v0 archived as branch 'legacy-cli-archive' and tag
'cli-v0-legacy-final' before deletion; git history preserves it too.
- .github/workflows/release-cli.yml paths updated.
- pnpm-lock.yaml regenerated.
Broker-side peer-grant enforcement (spec: 2026-04-15-per-peer-capabilities):
- 0020_peer-grants.sql adds peer_grants jsonb + GIN index on mesh.member.
- handleSend in broker fetches recipient grant maps once per send, drops
messages silently when sender lacks the required capability.
- POST /cli/mesh/:slug/grants to update from CLI; broker_messages_dropped_by_grant_total metric.
- CLI grant/revoke/block now mirror to broker via syncToBroker.
Auto-migrate on broker startup:
- apps/broker/src/migrate.ts runs drizzle migrate with pg_advisory_lock
before the HTTP server binds. Exits non-zero on failure so Coolify
healthcheck fails closed.
- Dockerfile copies packages/db/migrations into /app/migrations.
- postgres 3.4.5 added as direct broker dep.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the minimum ops surface area for a production broker without
over-engineering. All new config knobs are env-var driven with sane
defaults.
New modules:
- logger.ts: structured JSON logs (one line, stderr, ready for
Loki/Datadog ingestion without preprocessing)
- metrics.ts: in-process Prometheus counters + gauges, exposed at
GET /metrics. Tracks connections, messages, queue depth, TTL
sweeps, hook requests, DB health.
- rate-limit.ts: token-bucket rate limiter keyed by (pid, cwd).
Applied to POST /hook/set-status at 30/min default.
- db-health.ts: Postgres ping loop with exponential-backoff retry.
GET /health returns 503 while DB is down.
- build-info.ts: version + gitSha (from GIT_SHA env or `git rev-parse`
fallback) + uptime, surfaced on /health.
Behavior changes:
- Connection caps: MAX_CONNECTIONS_PER_MESH (default 100). Exceed →
close(1008, "capacity") + metric increment.
- Message size: MAX_MESSAGE_BYTES (default 65536). WS applies it via
`ws.maxPayload`. Hook POST bodies cap out with 413.
- Structured logs everywhere replacing the old `log()` helper.
- Env validation stricter: DATABASE_URL required + regex-checked for
postgres:// prefix.
New endpoints:
- GET /health → {status, db, version, gitSha, uptime}. 503 if DB down.
- GET /metrics → Prometheus text format.
Verified: 21/21 tests still pass. Hit /health + /metrics live —
gitSha resolves correctly via `git rev-parse --short HEAD` in dev.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>