Three correctness fixes on top of the m1 schema migration:
1) Fix the drainForMember claim-then-push race
----------------------------------------------------------------
Previously the claim CTE set delivered_at = NOW() *before* the WS
send. If readyState !== OPEN at push time, the row was marked
delivered and the message dropped silently — at-most-once with no
retry hook.
The new flow:
- claim sets (claimed_at, claim_id, claim_expires_at = NOW()+30s)
- delivered_at stays NULL until the recipient acks
- re-eligibility predicate now also accepts rows whose lease
expired, so dropped pushes redeliver (at-least-once)
Adds two helpers:
- markDelivered() — scoped to (mesh_id, recipient pubkey) so a
peer can only ack its own messages
- sweepExpiredClaims() — clears expired (claimed_at, claim_id,
claim_expires_at) every 15s, wired into startSweepers
2) Accept `client_ack` from recipients
----------------------------------------------------------------
New WS message type handled in the dispatcher right after `send`.
Lookups by clientMessageId or brokerMessageId; either is fine. Until
the daemon (apps/cli, separate worktree) starts emitting acks, leases
will simply expire and re-deliver — which is the desired retry
behaviour.
3) Tag presence rows with `role`
----------------------------------------------------------------
handleHello (member-keyed, used by the long-lived daemon WS) →
role: 'control-plane'
handleSessionHello (per-Claude-Code session WS) →
role: 'session'
listPeersInMesh exposes the new field; the peers_list response
surfaces it. WSPeersListMessage type adds an optional `role` plus the
long-undocumented `memberPubkey`. CLI-side filter swap from peerType
to role lands in a follow-up worktree — that's why the CLI is
untouched here per the M1 spec.
Typechecks clean (apps/broker tsc --noEmit, packages/db tsc --noEmit).
Test suite needs a real DB so wasn't run in this worktree; existing
dup-delivery and broker tests use drainForMember positionally and the
new claimerPresenceId arg is optional, so they should continue to pass.
@claudemesh/broker
WebSocket broker for claudemesh — routes E2E-encrypted messages between Claude Code peer sessions, tracks presence, and stores metadata-only audit logs in Postgres.
What it is
A standalone Bun-runtime WebSocket server that sits between Claude Code sessions. Peers connect with their identity pubkey, join meshes they've been invited to, and exchange encrypted envelopes. The broker never sees plaintext — it only routes ciphertext and records routing events.
Running locally
# from the repo root
pnpm --filter=@claudemesh/broker dev # watch mode
pnpm --filter=@claudemesh/broker start # production
Required env vars
| Var | Default | Purpose |
|---|---|---|
BROKER_PORT |
7900 |
Single port for HTTP routes + WebSocket upgrade |
DATABASE_URL |
— | Postgres connection string (shared with apps/web) |
STATUS_TTL_SECONDS |
60 |
Flip stuck-"working" peers to idle after this TTL |
HOOK_FRESH_WINDOW_SECONDS |
30 |
How long a hook signal beats JSONL inference |
Routes (single port)
| Path | Protocol | Purpose |
|---|---|---|
/ws |
WebSocket | Authenticated peer connections |
/hook/set-status |
HTTP POST | Claude Code hook scripts report status |
/health |
HTTP GET | Liveness probe |
Depends on
@turbostarter/db— Drizzle/Postgres schema (uses themeshpgSchema)@turbostarter/shared— cross-package utilities
Deployment
Runs as a separate process (not inside Next.js). Intended deployment targets:
Fly.io, Railway, or Coolify on the surfquant VPS. WebSocket server must be
reachable at ic.claudemesh.com.
Status
Scaffold only. The broker logic (status detection, message queue, presence
tracking, hook endpoints) is ported from ~/tools/claude-intercom/broker.ts
in a follow-up step.