Skip to content

Transport (spec)

Status: draft v0.2 (2026-05-17; op-log persistence + log replay

  • session-seed protocol) Prerequisite reading: op vocabulary v0.6, identity and trust v0.2 Subsequent work: transport implementation refresh; multi-tab dev harness.

Three concrete additions, all motivated by the identity v0.2 trust model (specifically: enabling anonymous-with-link sessions where the owner may be offline when joiners arrive):

  • Relay holds per-session op log. Previously the relay was a pure forwarder with no state beyond connected peers. v0.2 adds an in-memory op log per session, so late joiners can sync without requiring an online peer who happens to hold the history. This is a meaningful expansion of the relay’s responsibility (it now holds data, not just connections), but a narrow one — the relay still doesn’t verify signatures, sign anything, or interpret op content.
  • Session-seed protocol. The first peer to claim a fresh session ID seeds the session’s metadata (rootKeys, startFen) via an extended HELLO message. Subsequent peers receive the seeded metadata in their WELCOME. This eliminates the v0.1 assumption that the relay’s sessionMetaResolver had pre-knowledge of every session.
  • LOG_REPLAY message type. A new joiner whose log is empty (or significantly behind) requests a replay from the relay. The relay streams all stored ops back, ordered by arrival. The joiner applies them through the standard pipeline (signature + chain verification on every op). Replaces the v0.1 “peer-to-peer snapshot exchange” for the offline-host scenario.

What’s unchanged: the Transport interface, both topologies (star and mesh), the reliable-ordered-delivery requirement, JSON canonical wire format, membership awareness, watermark gossip, op-set summary gossip.

The pull-only snapshot exchange from v0.1 (§4.5) is preserved but demoted: it’s still useful for online-host scenarios (snapshot is more compact than full log replay) and remains in §4.5. For offline- host scenarios, LOG_REPLAY (§4.10) is the primary mechanism.

The op-vocabulary and identity specs define what gets sent on the wire (signed ops, trust state) and how each peer validates what it receives (verify signature, check trust at op-HLC, validate op type). They’re silent on how the wire works: how peers find each other, how messages get delivered, how state syncs, how membership is tracked, how late joiners catch up. That’s this spec.

Reminder of scope: the wire carries OpenFile ops, snapshots, and signaling messages — runtime machinery for a live session. The persistent artifact a session produces is a PGN (see README). The relay’s op log buffer is bounded; once a session’s effects are captured in a snapshot, ops below the watermark are GC’d. Wire formats here are operational, not archival.

OpenFile supports two distinct deployment topologies:

  • Star (relay-based). A central server forwards messages between peers. Production deployments, large sessions, broadcast scenarios.
  • Mesh (P2P). Peers connect directly via WebRTC after a brief signaling handshake. Hostless small-group sessions, “share-a-link- no-server” deployments.

Apps choose. Both topologies use the same wire protocol, the same ops, the same identity model, the same persistence formats. The Transport interface abstracts the difference. The op layer doesn’t know or care which is underneath.

The protocol is deliberately small. WebRTC and WebSocket already solve framing, ordering, reliability, and channel security. This spec specifies what we layer on top of them: the application protocol (op envelopes, membership events, snapshot exchanges, watermark gossip, rejection notifications).


1.1 Two topologies, one Transport interface

Section titled “1.1 Two topologies, one Transport interface”

OpenFile supports both star and mesh. They’re real product modes with distinct use cases (see Background); we don’t choose one and defer the other.

The mechanism is a Transport interface that the op layer consumes, implemented by two reference impls (WebSocket for star, WebRTC for mesh). Apps instantiate the right transport for their deployment; the rest of the stack is identical.

const target = createOpenFileTarget({
transport: createWebSocketTransport({ url: 'wss://relay.example.com/...' }),
// OR:
transport: createWebRTCTransport({ signalingUrl: 'wss://signaling.example.com/...' }),
// ... other options (rootKeys, signer, verifier, etc.) are identical
});

The interface (§3) is small: send / receive ops, observe membership, exchange snapshots, manage connection lifecycle. Both impls satisfy the same contract; custom impls (in-process bus for tests, IPC between Electron windows, anything else) plug in identically.

The Transport interface mandates reliable, ordered, eventually- delivered semantics for ops. Both reference impls provide this:

  • WebSocket — TCP-backed; reliable + ordered by default.
  • WebRTC data channel — configured with { ordered: true, maxRetransmits: undefined } (mimics TCP).

Why mandate at the interface: the op-layer’s apply pipeline assumes ops arrive in some sane order. Out-of-order delivery is buffered gracefully via causal-validity rules (op-vocab §4.3), but reliable delivery is required — a permanently-lost op breaks convergence. Both transports underneath are reliable; the interface enforces it explicitly so custom transports can’t surprise the op layer.

(Note: HLC ordering of ops is independent of transport ordering. Ops can arrive out of HLC order; the buffer + apply machinery handles that. What we require is that each op eventually arrives.)

All wire messages use canonical JSON per RFC 8785 (the same encoding mandated by the identity spec for signing). Implications:

  • Op envelopes on the wire are byte-identical to op envelopes at rest (persistence spec §10.6). One canonicalization, three contexts.
  • Wire framing is line-delimited JSON (for WebSocket text frames) or length-prefixed JSON (for WebRTC binary mode); both are implementation choices that don’t affect content.
  • Non-op messages (membership events, snapshot exchanges) are also canonical JSON, with a type discriminator field.

v1 ships JSON only. Binary canonical form (CBOR or similar) is reserved for v2 if profiling demands it — see op-vocab §12 #1.

The Transport interface exposes strong membership — every peer knows the current set of connected peers in the session. Required because:

  • Mesh requires it. A peer broadcasting an op in mesh topology must send to every other peer individually; without a peer list, there’s nowhere to send.
  • Star benefits from it. The relay already tracks who’s connected; exposing this to peers enables UI affordances (“Alice and Bob are here”).
  • Future-proofs presence. A v2+ presence feature (peer cursors, “Alice is looking at move 47”) layers naturally on top of membership events.

Cost is small: the Transport emits peer-join / peer-leave events; peers maintain a Set locally.

When a new peer joins a long-running session, they need to catch up. Two models considered:

  • Push — existing peers (or the relay) proactively send snapshots to joiners.
  • Pull — joiners request a snapshot from a chosen peer at a chosen HLC.

v1 commits to pull. Reasons:

  • Topology-agnostic. Star and mesh both support it identically: the joiner picks a peer (relay or any other peer) and requests.
  • Simpler error handling. If a snapshot request fails, the joiner retries against a different peer. Push would require the sender to know whether the receiver got it.
  • Bandwidth-controlled. Joiner decides when to fetch (e.g., after initial connection is healthy); push would require the sender to manage outbound timing.

Push is a v2 optimization if profiling shows pull is too slow for many simultaneous joiners. Until then, pull.


  • Transport interface — the contract that the op layer consumes.
  • Wire protocol — message types, envelope structure, ordering rules between message types.
  • Two reference Transport impls — WebSocket and WebRTC.
  • Reference signaling server — minimal NAT-traversal endpoint for WebRTC topology.
  • Membership protocol — peer-join / peer-leave events, peer presentation of session secrets.
  • Session-seed protocol (v0.2) — first peer to claim a sessionId seeds metadata via HELLO; relay stores it for future joiners.
  • Per-session op log at the relay (v0.2) — stored bytes, not interpreted content. Enables log replay without a peer being online to provide a snapshot.
  • Log replay protocol (v0.2) — bounded streaming of stored ops to joiners.
  • Snapshot exchange protocol — pull request / response between peers (preserved from v0.1; still useful for online-host cases).
  • Causal stability protocol — watermark gossip; coordinates with op-layer’s GC machinery.
  • Gossip protocol — periodic op-set summaries for censorship-by- withholding detection.
  • Rejection notifications — when receivers reject ops, the emitter learns (with appropriate information-leak care).
  • Sync mode enforcement — at the transport layer, ensure spectator / follow-leader peers don’t broadcast local ops.

2.2 What the op layer + identity layer own

Section titled “2.2 What the op layer + identity layer own”

(For clarity — these belong to other specs, not this one.)

  • Op vocabulary, validation, apply rules — op-vocab spec.
  • HLC clock, ordering, tie-breaks — op-vocab spec.
  • Op signing and verification — identity spec.
  • Trust state derivation — identity spec.
  • Persistence formats — op-vocab spec §10.

This spec consumes all of the above. We assume each peer’s op layer + identity layer are working correctly; the transport’s job is to get bytes from one peer’s emit to other peers’ receive.

  • Deployment topology choice — pick the transport at construction.
  • Relay server operation (for star deployments) — hosting, scaling, persistence.
  • Signaling server operation (for mesh deployments) — hosting the minimal NAT-traversal endpoint, or reusing a public one.
  • Session metadata distribution — root keys, secrets, URLs. The link / invite payload that brings peers together.
  • Display name resolution — same as identity spec.
  • UI affordances — connection-status indicators, “X is offline,” reconnect buttons.
  • Persistence wiring — where the bytes go, per op-vocab §10.
  • Bandwidth optimization — delta encoding, op batching beyond natural framing. v2 if profiling demands.
  • Multi-relay federation — star deployments with multiple relays that gossip among themselves. Today: one relay per session.
  • Selective sync — “send me only ops affecting this subtree.” Not a chess-collab need; ops are small enough to ship everything.
  • NAT-traversal fallback to TURN servers — STUN-only signaling works for most home networks; corporate networks behind strict NATs may need TURN relay. App / signaling-server policy.
  • Push-based snapshot distribution — see §1.5.
  • Encrypted application-level payloads — ops sent over already- encrypted TLS / DTLS channels. Encrypting inside (so the relay can’t read content) is a v2+ privacy concern.

The contract the op layer consumes:

interface Transport {
// ── Lifecycle ───────────────────────────────────────────────────
/** Establish the underlying connection (WebSocket, WebRTC, etc.).
* Resolves when ready to send/receive. */
connect(): Promise<void>;
/** Tear down the connection. Idempotent. */
disconnect(): void;
/** Current connection state. */
state: 'idle' | 'connecting' | 'connected' | 'reconnecting' | 'closed';
/** Subscribe to state transitions. */
onState(handler: (state: TransportState) => void): () => void;
// ── Op exchange ─────────────────────────────────────────────────
/** Broadcast a signed op to all connected peers (subject to
* sync-mode policy — see §10). */
send(op: SignedOp): void;
/** Subscribe to ops arriving from peers. */
onOp(handler: (op: SignedOp) => void): () => void;
// ── Membership ──────────────────────────────────────────────────
/** Currently-connected peers (excluding self). */
getPeers(): PeerInfo[];
/** Subscribe to peer-join events. */
onPeerJoin(handler: (peer: PeerInfo) => void): () => void;
/** Subscribe to peer-leave events. */
onPeerLeave(handler: (peerPublicKey: string) => void): () => void;
// ── Snapshot exchange ───────────────────────────────────────────
/** Request a snapshot from a peer at a given HLC. Resolves with
* the snapshot bytes; rejects on timeout, refused, peer-gone. */
requestSnapshot(
fromPeer: string,
opts?: { atHlc?: number; timeoutMs?: number },
): Promise<Uint8Array>;
/** Subscribe to incoming snapshot requests. The handler must
* produce snapshot bytes (typically by calling target.exportSnapshot)
* or reject. */
onSnapshotRequest(
handler: (req: SnapshotRequest) => Promise<Uint8Array>,
): () => void;
// ── Watermark gossip ────────────────────────────────────────────
/** Broadcast this peer's current local-applied watermark to peers.
* Other peers' transports collect these; the op layer computes
* the session-wide minimum and advances its GC watermark
* accordingly. */
sendWatermark(localHlc: number): void;
/** Subscribe to incoming watermark gossip. */
onWatermark(
handler: (peer: string, theirHlc: number) => void,
): () => void;
// ── Op-set gossip (censorship defense) ──────────────────────────
/** Periodically — typically every N ops or T seconds —
* exchange compact op-set summaries with peers. Discrepancies
* surface when a peer is missing ops they should have. See §11.
* Optional: transports without gossip return a no-op
* implementation; identity spec's threat-model already notes
* censorship-by-withholding as an out-of-protocol concern. */
sendOpSetSummary?(summary: OpSetSummary): void;
onOpSetSummary?(
handler: (peer: string, summary: OpSetSummary) => void,
): () => void;
// ── Rejection notifications ─────────────────────────────────────
/** When this peer receives an op and rejects it (per identity
* spec §4.3), optionally notify the emitting peer so their UX
* can surface "your op was rejected." */
sendRejection?(toPeer: string, rejection: RejectionInfo): void;
onRejection?(handler: (rej: RejectionInfo) => void): () => void;
}
type PeerInfo = {
publicKey: string; // identity-layer public key
transportId: string; // transport-level peer ID (separate)
joinedAt: number; // local timestamp of join
};
type SnapshotRequest = {
fromPeer: string;
atHlc?: number; // requested snapshot HLC; undefined = current
requestId: string; // for response correlation
};
type OpSetSummary = {
// Implementation choice: Bloom filter, Merkle tree root, sorted
// opId list per author, etc. Both peers must implement the same
// summary format to compare. Reference impls use per-author
// (maxSeq, hash-of-applied-opIds-up-to-maxSeq). See §11.
perAuthor: { [author: string]: { maxSeq: number; hash: string } };
};
type RejectionInfo = {
opId: { author: string; seq: number };
reason: 'invalid-signature' | 'untrusted-author' | 'invalid-op' | 'illegal-move';
// Note: 'invalid-signature' is information-sensitive; see §13.
};

That’s the whole interface. Custom transports (test mocks, in-process buses, anything else) implement exactly this contract; the op layer behaves identically regardless of what’s underneath.


Messages exchanged between peers (or via a relay) over the underlying transport. All messages are canonical JSON with a type discriminator.

TypeDirectionPurpose
helloPeer → relay/peerInitial handshake on connect
welcomeRelay/peer → joinerHandshake response with session metadata
opPeer → all peersBroadcast a signed op
peer-joinRelay/signal → peersNew peer joined the session
peer-leaveRelay/signal → peersPeer disconnected
snapshot-requestPeer → peerAsk for state snapshot at HLC
snapshot-responsePeer → peerSnapshot bytes (or refusal)
log-replay-request (v0.2)Peer → relayRequest stored op log replay
log-replay-chunk (v0.2)Relay → peerStreamed op (one per chunk)
log-replay-end (v0.2)Relay → peerEnd-of-stream marker
watermarkPeer → all peersCausal-stability gossip
op-set-summaryPeer → all peersOp-set summary for gossip protocol
rejectionPeer → emitter”I rejected your op N for reason X”

Each message carries type and a messageId (UUID) for response correlation. Some types carry additional fields per their semantics.

{
"type": "op",
"messageId": "uuid",
"op": { /* signed op envelope per identity spec §3.1 */ }
}

The op field is the full signed op — the same bytes used at rest (persistence spec §10) and the same canonicalized payload used for signing.

Broadcast semantics:

  • In star topology, peers send op to the relay; the relay forwards to all other connected peers.
  • In mesh topology, peers send op directly to each connected peer (or via a chosen routing strategy — see §6.3).

The relay (in star) does NOT verify signatures or check trust. It’s a dumb pipe. Verification happens at each endpoint per identity spec §4.2. This keeps the relay simple and means a compromised relay can drop or reorder ops but can’t forge them.

4.3 Handshake (hello / welcome) — extended in v0.2

Section titled “4.3 Handshake (hello / welcome) — extended in v0.2”

When a peer connects, it sends hello. v0.2 extends the v0.1 form with an optional seedSessionMeta for the first-peer-to-claim-a- session-id case:

{
"type": "hello",
"messageId": "uuid",
"publicKey": "MCowBQ...", // joiner's identity public key
"sessionId": "session-abc-123", // which session to join
"sessionSecret": "...", // for link-based join (optional)
"seedSessionMeta": { // v0.2 — optional; populates a
"rootKeys": ["MCowBQ..."], // fresh session if relay has
"startFen": "rnbqkbnr/..." // no record of this sessionId yet
},
"protocolVersion": 1
}

Session-seed protocol (v0.2). When the relay receives a HELLO for a sessionId it has no record of:

  • If seedSessionMeta is present, the relay stores it as the session’s metadata for future joiners. The seeding peer becomes the bootstrapping owner.
  • If seedSessionMeta is absent, the relay responds with welcome carrying sessionMeta: null. The joining peer can either:
    • Wait for someone else to seed (uncommon; usually means misconfiguration), OR
    • Disconnect with a session-not-found error.

When the relay receives a HELLO for a sessionId it already knows:

  • seedSessionMeta, if present in the new HELLO, is ignored. The seeded metadata is immutable for the session’s lifetime.

Welcome response:

{
"type": "welcome",
"messageId": "uuid",
"inReplyTo": "uuid-of-hello",
"sessionMeta": {
"rootKeys": ["MCowBQ...root1"],
"startFen": "rnbqkbnr/..."
},
"currentPeers": [
{ "publicKey": "...", "transportId": "...", "joinedAt": 12345 }
],
"watermark": 1730412345000123,
"logSize": 142, // v0.2 — number of ops in the relay's log
"protocolVersion": 1
}

The joiner now has:

  • Session metadata — needed to validate root delegations.
  • Current peer list — needed for mesh broadcast routing.
  • Current watermark — so the joiner knows what HLC range to expect.
  • v0.2: log size — lets the joiner decide whether to request full log replay or a snapshot, based on cost.

After welcome, the joiner typically requests either a snapshot (§4.5, online-host scenario) or a log replay (§4.10, offline-host scenario, also fine for fresh joiners). The standard pattern for the multi-tab / anonymous-with-link UX is log replay — it doesn’t require any peer to be online.

The relay (star) or the signaling/discovery mechanism (mesh) emits these to existing peers when membership changes:

{
"type": "peer-join",
"messageId": "uuid",
"peer": {
"publicKey": "MCowBQ...",
"transportId": "...",
"joinedAt": 12345
}
}
{
"type": "peer-leave",
"messageId": "uuid",
"peerPublicKey": "MCowBQ..."
}

Peers update their local membership Sets on each event. In mesh, peer-join also typically triggers a new WebRTC connection negotiation between the new peer and the existing peer (see §6.3).

{
"type": "snapshot-request",
"messageId": "uuid",
"atHlc": 1730412345000123,
"requestedBy": "MCowBQ..."
}
{
"type": "snapshot-response",
"messageId": "uuid",
"inReplyTo": "uuid-of-request",
"snapshot": { /* per op-vocab §10.3 */ },
"tail": [ /* signed ops with hlc > snapshot.generatedAt */ ]
}

The respondent (typically the relay in star, or any peer in mesh) calls target.exportSnapshot() and optionally target.exportOpLog({ sinceHlc: snapshot.generatedAt }) to assemble the tail. The joiner hydrates via createOpenFileTarget({ snapshot, opLogTail }).

If a peer can’t fulfill a snapshot request (e.g., they don’t have state at the requested HLC, or they’re under load), they respond with:

{
"type": "snapshot-response",
"messageId": "uuid",
"inReplyTo": "uuid-of-request",
"error": "no-snapshot-at-hlc" | "load-shedding" | ...
}

The joiner retries against a different peer.

{
"type": "watermark",
"messageId": "uuid",
"fromPeer": "MCowBQ...",
"hlc": 1730412345000123
}

Periodically (every N ops or T seconds — implementation choice), each peer broadcasts its current applied-watermark to peers. Other peers track the per-peer values. The session-wide watermark for GC is min(observed peer watermarks) — once that minimum advances, ops below it are GC-eligible (op-vocab §6.2).

{
"type": "op-set-summary",
"messageId": "uuid",
"fromPeer": "MCowBQ...",
"summary": {
"perAuthor": {
"MCowBQ...alice": { "maxSeq": 47, "hash": "..." },
"MCowBQ...bob": { "maxSeq": 92, "hash": "..." }
}
}
}

Periodic exchange (less frequent than watermarks — every minute, say). Discrepancies between peer summaries surface missing ops; see §11 for the resolution protocol.

{
"type": "rejection",
"messageId": "uuid",
"toPeer": "MCowBQ...",
"opId": { "author": "...", "seq": 7 },
"reason": "untrusted-author"
}

Sent to the emitting peer when their op was rejected. See §13 for which reasons are safe to share and which aren’t.

The protocolVersion field in hello / welcome declares the protocol version. v1 = 1. If peers’ versions don’t match:

  • Relay (or first-contact peer) responds with welcome carrying error: "version-mismatch". Joiner displays “this app version isn’t compatible with the session.”
  • Future versions may support backward compatibility via downgrade negotiation. v1 just fails cleanly.

Schema changes within v1 are additive only (new optional fields, new message types). Breaking changes bump to v2.

A new joiner whose state is empty (or far behind) requests stored ops from the relay. Unlike snapshot exchange (§4.5), log replay doesn’t depend on any peer being online — the relay serves from its own buffer.

Request:

{
"type": "log-replay-request",
"messageId": "uuid",
"fromHlc": null, // null = from beginning; or an HLC for incremental
"requestedBy": "MCowBQ..."
}

Response stream. The relay sends one log-replay-chunk per op, in arrival order (which is approximately HLC order but the receiver’s causal buffer handles out-of-order anyway):

{
"type": "log-replay-chunk",
"messageId": "uuid",
"inReplyTo": "uuid-of-request",
"seqInReplay": 0, // 0-indexed; lets receiver detect drops
"op": { /* signed op */ }
}

Followed by:

{
"type": "log-replay-end",
"messageId": "uuid",
"inReplyTo": "uuid-of-request",
"totalSent": 142,
"watermark": 1730412345000123 // relay's current watermark
}

Applying the replay. The joiner applies each op through the standard pipeline — signature verification, chain walk, capability check, chess-data validation. Same code path as receiving live ops. The causal buffer handles out-of-order arrival.

Live ops during replay. Ops broadcast to the session while the replay is in flight are also delivered to the joiner via the normal op channel. The joiner’s apply pipeline deduplicates by opId, so overlap with the replay is harmless.

Replay vs snapshot — when to use which.

  • Replay is the default for the anonymous-with-link / multi-tab case. Doesn’t require an online peer. Cost: O(ops) bytes; for long-lived sessions this can be large.
  • Snapshot (§4.5) is the optimization for online-host scenarios where another peer can serve a compact snapshot of derived state. Snapshots compress repeated mutations to a single per-register entry. Cost: smaller, but requires an online provider.

Apps can choose. The reference WebSocket client (transport implementation) defaults to: try snapshot first if other peers exist, fall back to log replay if no peers or snapshot request times out.

Incremental replay. The fromHlc field supports resume — a peer that disconnects and reconnects can request only ops with hlc > lastSeen. The relay filters its log accordingly.

Log GC at the relay. The relay’s op log can be pruned for ops below the session’s watermark — those ops are causally settled and can’t be needed by any future joiner (snapshot would carry them in derived form). Pruning policy is implementation-defined; reference implementation keeps everything in v0.2 and adds policy in v0.3.


§5 WebSocket Transport (reference impl for star)

Section titled “§5 WebSocket Transport (reference impl for star)”

The reference implementation for star topology. Peers connect to a WebSocket relay server; the relay forwards messages between peers.

const transport = createWebSocketTransport({
url: 'wss://relay.example.com/session/abc-123',
onError: (err) => ...,
reconnect: { enabled: true, maxAttempts: 10, backoffMs: 1000 },
});

The client:

  • Opens a WebSocket to the URL.
  • Sends hello immediately on open; awaits welcome.
  • Receives all subsequent messages and dispatches per type:
    • oponOp handlers
    • peer-join / peer-leave → update peers set, fire handlers
    • snapshot-response → resolve the pending requestSnapshot promise matching the request ID
    • snapshot-request → invoke the consumer’s snapshot handler; send response
    • watermark → fire onWatermark handlers
    • rejection → fire onRejection handlers
  • Sends outbound messages by serializing canonical JSON + ws.send().

The relay is a forwarding switch with a per-session op log, not a logic server. Per session (URL path or query param), it maintains:

  • A connection map: publicKey → WebSocket
  • Session metadata (rootKeys, startFen) — seeded by the first peer’s HELLO; immutable thereafter
  • An ordered list of every signed op broadcast through the session — the session op log. Used to serve log-replay-requests from joiners. The relay never inspects op content.
  • A current watermark — heuristically updated as ops flow through.

For each connection:

  1. On open, await hello. Validate sessionId.
  2. Session seeding (v0.2):
    • If the relay has no record of this sessionId AND hello.seedSessionMeta is present, store the metadata. The seeding peer is the bootstrapping owner.
    • If the relay has a record, the existing metadata is authoritative; ignore any seedSessionMeta in this HELLO.
    • If the relay has no record AND no seedSessionMeta was supplied, respond with welcome carrying sessionMeta: null and let the client decide how to proceed (typically disconnect).
  3. Send welcome with sessionMeta, current peer list, watermark, and log size (v0.2).
  4. Broadcast peer-join to all existing peers in this session; add this peer to the map.
  5. For each subsequent message from this peer:
    • opappend to session op log AND forward to all OTHER peers in this session. Update watermark heuristic.
    • log-replay-request (v0.2) → stream all ops in the session log (filtered by fromHlc if supplied) as log-replay-chunk messages, ending with log-replay-end. See §4.10.
    • snapshot-request → forward to a chosen peer (random, least-loaded). The relay does NOT serve snapshots itself — that requires interpreting op content.
    • snapshot-response → forward to the originally-requesting peer
    • watermark → forward to all OTHER peers
    • rejection → forward to the targeted peer
  6. On disconnect, broadcast peer-leave; remove from map. Session state (op log, metadata) is retained as long as the session has at least one peer connected OR within a configurable TTL of the last peer’s disconnect.

The relay does not:

  • Verify signatures (verification happens at endpoints per identity spec v0.2 §4)
  • Walk delegation chains (chain verification happens at endpoints)
  • Interpret op content (the relay never parses chess data or trust state)
  • Modify message contents (canonical JSON is preserved byte-for-byte)
  • Issue trust grants on anyone’s behalf (option 1 from the architectural discussion is explicitly NOT taken; the relay is a storage authority, not a trust authority)
  • Hold a link table (short-token → credentials) for app-style share URLs. URL scheme is app territory — see identity v0.2 §8.5. Apps that want Lichess-style short URLs build that on their own app server, not on the OpenFile relay.

The v0.2 expansion is narrow: the relay now holds bytes, not meanings. The trust model is still anchored in cryptographic delegation chains rooted at the session’s seeded rootKeys; the relay can withhold or reorder ops (a denial-of-service vector) but cannot forge authority.

Durability. v0.2 reference relay keeps the op log in memory. Production relays should persist to disk for crash resilience (SQLite, append-only file, etc.). The interface (log-replay streaming) is identical regardless of storage backend; persistence is a deployment choice.

Bounded buffers. A relay with finite memory needs an eviction policy. Reference implementation v0.2: unbounded (development / small sessions). Production policy: prune ops below the session’s watermark, periodically request a peer-provided snapshot to compress state. v0.3 spec will codify a recommended policy.

When a connection drops:

  • The client transitions to 'reconnecting' state, fires onState.
  • It retries with exponential backoff (initial 1s, doubling up to cap, jitter to avoid thundering herd).
  • On successful reconnect, it sends hello again. The relay treats this as a fresh join (broadcasts peer-join).
  • The reconnected peer requests a fresh snapshot from a peer to catch up on missed ops.

(A reconnect-with-resume feature could carry “last-seen HLC” in hello, letting the relay ship just the ops since then. v2 optimization; v1 keeps it simple via snapshot.)


§6 WebRTC Transport (reference impl for mesh)

Section titled “§6 WebRTC Transport (reference impl for mesh)”

The reference implementation for mesh topology. Peers connect to a signaling server to discover each other and exchange ICE candidates, then establish direct WebRTC data channels and communicate peer-to-peer thereafter.

The signaling server is a dumb message relay for connection setup only (see §7 for its full spec). For each peer:

  1. Peer connects to signaling server (WebSocket).
  2. Peer sends hello carrying its public key + sessionId.
  3. Signaling server adds peer to session’s connected list; sends welcome with current peers.
  4. For each existing peer in the session, the new peer initiates a WebRTC PeerConnection: creates offer, sends offer via signaling, awaits answer, exchanges ICE candidates via signaling.
  5. Once a WebRTC data channel opens to the existing peer, the signaling channel for that pair is no longer needed. (The signaling connection itself stays open to learn about new joiners.)

The signaling server never sees ops, snapshots, or anything else. It only carries WebRTC setup messages and membership events.

WebRTC data channels are configured for OpenFile’s needs:

const channel = peerConnection.createDataChannel('openfile', {
ordered: true,
maxRetransmits: undefined, // unlimited; mimics TCP reliability
protocol: 'openfile-v1',
});

This gives reliable + ordered delivery, matching the WebSocket transport’s semantics.

After signaling, each peer has direct WebRTC connections to every other peer in the session — a full mesh. To broadcast an op, the peer sends it on every channel.

Connection count: N×(N-1)/2 for N peers. Acceptable for small sessions (≤10 peers). For larger meshes, apps should switch to star topology.

Topology changes:

  • A new peer joining triggers WebRTC negotiations with every existing peer. The signaling server brokers them in parallel.
  • A peer leaving (clean disconnect): each remaining peer sees the WebRTC connection close and emits peer-leave locally.
  • A peer leaving (unclean — network drop): WebRTC detects via heartbeat/timeout; behaves the same.

Watermark and gossip messages are sent on every data channel (same broadcast pattern as ops).

Snapshot requests are sent to a single chosen peer (random, least-recently-used, or app-policy). If that peer fails to respond, retry against a different one.

WebRTC’s NAT traversal works for ~85-90% of network conditions but fails on:

  • Symmetric NATs (most corporate networks)
  • Some carrier-grade NATs
  • Restrictive firewalls

For these cases, options are:

  1. TURN server (relay over UDP/TCP through a stable IP). v2+ feature — apps configure their own TURN servers if needed.
  2. Fallback to WebSocket transport — if WebRTC connection establishment fails for too long, the app prompts the user to reconnect via a relay. Requires the app to operate one; not automatic in v1.

v1 reports the failure via onState('closed') with an error reason; apps surface this in UI. NAT-traversal reliability is a known trade-off for mesh; star is the production answer when reliability across all network conditions is required.


A minimal NAT-traversal endpoint for WebRTC topology. Scope: just enough to get two browsers’ WebRTC connections to establish. Nothing more.

  • Accepts WebSocket connections from peers.
  • Groups peers by sessionId.
  • Forwards WebRTC setup messages (offer, answer, ICE candidates) between peers in the same session.
  • Emits peer-join / peer-leave events to existing peers when membership changes.
  • See ops, snapshots, or any OpenFile content.
  • Track session state beyond “which peers are connected.”
  • Verify identity (peer key validation happens at the data-channel level after handshake).
  • Persist anything (it’s all in-memory; restart = sessions reset).
  • Authenticate users (anyone can connect; if the session uses a secret, peers present it in their hello).
  • Match-make (“find me an opponent”). Out of scope; apps build that themselves.

The signaling server is its own tiny wire protocol — separate from the OpenFile transport wire protocol. Messages:

{ "type": "hello", "publicKey": "...", "sessionId": "..." }
{ "type": "welcome", "peers": [...] }
{ "type": "peer-join", "peer": {...} }
{ "type": "peer-leave", "peerPublicKey": "..." }
{ "type": "offer", "fromPeer": "...", "toPeer": "...", "sdp": "..." }
{ "type": "answer", "fromPeer": "...", "toPeer": "...", "sdp": "..." }
{ "type": "ice", "fromPeer": "...", "toPeer": "...", "candidate": "..." }

These match WebRTC’s signaling needs precisely. Once peers have WebRTC data channels, the signaling server’s job for that pair is done.

A reference implementation in ~150 lines of Node.js ships in the OpenFile repo under reference-servers/. Apps either:

  • Deploy it as-is (Cloudflare Workers, Heroku, anything that runs Node).
  • Modify it (add auth, rate limits, observability).
  • Reimplement from this spec (Go, Rust, Python — straightforward).
  • Use a public signaling service (if one exists with compatible protocol).

The signaling server is stateless except for the in-memory peer map. It can scale horizontally with a shared session store (Redis, etc.) — that’s app territory.


For both star and mesh, joining a session looks like:

Fresh-session creator path (the first peer in a session):

  1. Generate session ID (random UUID or similar) and owner keypair.
  2. Connect to transport endpoint.
  3. Send hello with publicKey, sessionId, AND seedSessionMeta: { rootKeys: [ownerPubKey], startFen }.
  4. Receive welcome confirming the seeded metadata.
  5. Construct OpenFileTarget with the sessionMeta + ownerSigner. The target auto-emits the root delegation as op #0 (identity v0.2 §1.7). This op is broadcast and stored in the relay’s op log.
  6. Begin emitting and receiving ops.

Joining-existing-session path (joining via link or invite):

  1. Parse the URL hash for sessionId + link credentials per identity v0.2 §7.
  2. Generate a fresh tab keypair (K_tab).
  3. Connect to transport endpoint.
  4. Send hello with publicKey=K_tab, sessionId. No seedSessionMeta (the session already exists).
  5. Receive welcome with sessionMeta and watermark and log size.
  6. Send log-replay-request (v0.2) — receives all ops from the relay’s session log, including the root delegation and any intermediate delegations.
  7. Construct OpenFileTarget with the received sessionMeta (no ownerSigner). Apply each replayed op through the standard pipeline (signature + chain verification).
  8. Emit per-tab delegation signed by linkSk, granting K_tab the link’s capability bundle. Broadcast via op.
  9. Begin emitting and receiving chess ops, signed by K_tab.

The joining peer is now a full participant (or scoped per the link’s capability bundle).

The v0.1 “trust store with addTrustedKey ops” model is replaced by the v0.2 delegation-chain model. The bootstrap question becomes: “how does K_tab earn a delegation chain to a session root?”

Closed-invite session. The owner has pre-emitted a delegate op naming the joiner’s pubkey as audience. The joiner already has the delegation in the log they replay; their ops verify against the chain immediately.

Open-with-link / Powerline session (anonymous-with-link). The URL contains a link keypair. The joiner generates K_tab and uses the link’s private key to sign a per-tab delegation: K_tab inherits the link’s capabilities (or a strict subset).

The relay never participates in the trust decision. It simply stores and forwards the delegation op; existing peers verify it via the same pipeline used for any other op.

Truly-hostless / future P2P. When no relay is online, peers exchange delegations via direct WebRTC. The same envelope and chain walk apply.

See identity v0.2 §7 (Powerline) and §8 (per-link generation lifecycle) for the cryptographic details of these flows.

Clean — peer calls transport.disconnect(). Transport closes the WebSocket / WebRTC connections; relay (or signaling) emits peer-leave to other peers.

Unclean — connection drops (network failure, browser crash). Detection:

  • WebSocket: server’s onclose event fires after the heartbeat times out (~30-60s).
  • WebRTC: data channel’s iceConnectionState transitions to disconnected then failed.

Either way, peer-leave is emitted to remaining peers. The departed peer’s already-applied ops stay; only future ops are blocked (which is moot if the peer is gone).


The joiner requests a snapshot from a chosen peer (or the relay):

joiner ──snapshot-request{atHlc: W}──→ chosen-peer
chosen-peer ──snapshot-response{snapshot, tail}──→ joiner

The response includes:

  • A snapshot at HLC ≤ atHlc.
  • The op-log tail from snapshot’s HLC to the responder’s current HLC.

The joiner hydrates via createOpenFileTarget({ snapshot, opLogTail }). Once hydrated, the joiner’s HLC and trust state match the responder’s view as of the response time.

In star: typically the relay (if it maintains a snapshot cache), or the host peer if the relay doesn’t. Apps configure which.

In mesh: any connected peer. Strategies:

  • Pick the longest-online peer (most likely to have a comprehensive view).
  • Pick a random peer (load distribution).
  • App-configurable.

If the chosen source fails (timeout, refused, returns error), retry against another. If all sources fail, the joiner enters a “waiting for snapshot” state and the app surfaces a “connection problem” UI.

The joiner’s snapshot reflects the responder’s state at response time. Ops that arrived at other peers BEFORE the snapshot was generated but AFTER the responder’s view caught up are included. Ops that arrive at the responder AFTER snapshot generation but DURING the response transit are in the tail.

After hydration, the joiner is “live” — receiving fresh ops from peers. Ops with hlc > snapshot.generatedAt arriving from anyone are applied normally; the snapshot got them to a consistent baseline from which divergence is impossible (per HLC monotonicity).


The op-vocab spec defines three sync modes (§9.7): collaborative, follow-leader, spectator. The transport enforces them on the emit side:

  • collaborative — transport’s send(op) actually transmits.
  • follow-leader — transport’s send(op) is a local no-op. The op is applied locally (so the local view is consistent) but never reaches the wire.
  • spectator — same as follow-leader: local-only.

Receive side is unchanged: ops from configured upstream / any peer arrive and apply per the sync-mode acceptance rules.

The transport doesn’t need to know HOW the op was signed (some modes use ephemeral keys for local-only ops, per identity spec). It just routes — or doesn’t — based on construction-time policy.


The identity spec notes that signatures don’t protect against bad receivers (peers who drop legitimate ops). This section provides the optional mitigation: peers periodically exchange op-set summaries; discrepancies surface drops.

For each peer (other than self), the summary lists per-author (maxSeq, hash):

{
"perAuthor": {
"MCowBQ...alice": { "maxSeq": 47, "hash": "h0..." },
"MCowBQ...bob": { "maxSeq": 92, "hash": "h1..." }
}
}

Where maxSeq is the highest seq this peer has applied from author and hash is a deterministic hash of the applied-set of author’s ops up to maxSeq (e.g., XOR of opId hashes, or a Merkle root).

When peer A receives B’s summary:

  • For each author K in both summaries:
    • If A[K].maxSeq < B[K].maxSeq, A is behind on K’s ops; request them from B (regular snapshot or targeted op replay).
    • If A[K].hash != B[K].hash despite same maxSeq, divergence — one of A or B has a different applied-set, indicating selective drops or applied-set divergence.

The detection is approximate (Bloom filters and hash-summaries have false negatives) but catches systematic drops.

On detected discrepancy:

  • The lagging peer requests missing ops via snapshot or targeted op-replay.
  • Both peers continue normal operation; convergence resumes once catch-up completes.

If discrepancy persists across multiple gossip rounds with the same peer, that peer may be the bad actor. App-level policy decides: disconnect, flag, audit. The op layer doesn’t punish; it just surfaces.

This is an optional feature on the Transport interface (§3). The reference WebSocket transport may or may not include it (lean: skip in v1; relay-based deployments rarely face censorship since the relay sees all). The reference WebRTC transport may include a minimal version (mesh deployments lack a central witness and benefit more from gossip).

v2 may upgrade to required and standardize the summary format.


Per identity spec §4.3, receivers can reject ops for various reasons. The transport optionally notifies the emitter.

ReasonNotify?Why
invalid-signatureNoSharing leaks crypto failure detail; potential side channel for attackers.
untrusted-authorYesUseful UX (“you’re not authorized in this session”).
invalid-op (illegal SAN, etc.)YesUseful for debugging emitter bugs.
replay / idempotence-dupNoSilent dedup is correct behavior.
below-watermarkYesThe emitter is too far behind; “fetch snapshot” UX.

invalid-signature rejections are silent because confirming “your signature was invalid” leaks information about cryptographic state that attackers could probe. All other reasons are safe to share.

Per §4.8.

If multiple peers reject the same op, the emitter may receive multiple notifications. The transport doesn’t deduplicate; apps present a “consensus” rejection UI if multiple peers reject (vs. “one peer’s network issue” if only one does).


  • WebSocket: TCP drop, server crash, client browser tab backgrounded for too long.
  • WebRTC: ICE failure, network change, restrictive NAT, peer crash.

In both cases:

  • Transport transitions to reconnecting.
  • Outbound queue holds pending ops.
  • On reconnect, ops in queue are flushed; a fresh snapshot is fetched to catch up on missed inbound ops.

A peer can’t establish a WebRTC connection to another peer due to NAT restrictions:

  • Connection attempt times out.
  • Transport surfaces onState('closed') with reason.
  • App may prompt the user to retry, or recommend switching to a relay-backed deployment.

Detection via WebSocket heartbeat (server-managed timeout) or WebRTC ICE failure. Treated as unclean leave.

All clients lose their WebSocket. They enter reconnecting and retry. Once the relay is back, they reconnect and re-sync. If the relay was the only snapshot authority and didn’t persist, late joiners can’t catch up until other peers reconnect (since they hold state in-memory).

For production deployments: the relay SHOULD persist (write op log to a database). Apps that don’t need durability can skip persistence; those that do treat the relay as the source of truth.

Existing WebRTC connections are unaffected (they’re peer-to-peer; the signaling server isn’t in the data path). New joiners can’t connect until the signaling server is back, but established peers continue normally.

This is a real architectural strength of mesh: the signaling server is only critical during connection establishment.

If peers split into disjoint groups by network failure, each group continues independently. Ops emitted in group A don’t reach group B. On reconnection (partition heals), peers exchange watermarks and op-set summaries; missing ops in either direction are requested. Eventually-consistent merge resumes.

The CRDT design (op-vocab) handles this naturally — no special partition-tolerance code required at the transport layer.


14.1 Star deployment: tournament broadcast

Section titled “14.1 Star deployment: tournament broadcast”
import { createOpenFileTarget } from 'openfile';
import { createWebSocketTransport } from 'openfile/transport-ws';
import { createWebCryptoSigner, createWebCryptoVerifier } from 'openfile/crypto';
const myKeypair = await loadKeypairFromAccount();
const sessionMeta = await fetch('/api/session/' + tournamentId).then(r => r.json());
const target = createOpenFileTarget({
transport: createWebSocketTransport({
url: `wss://relay.tournaments.example.com/session/${tournamentId}`,
publicKey: myKeypair.publicKey,
sessionId: tournamentId,
}),
rootKeys: sessionMeta.rootKeys,
signer: createWebCryptoSigner(myKeypair),
verifier: createWebCryptoVerifier(),
syncMode: 'spectator', // viewers can't broadcast
});
await target.transport.connect();
const game = createGameFromTarget(target, { onTree, onCursor });

Spectator-mode peers receive moves as the players make them; their local analysis stays local. The relay’s only job is forwarding; authentication, signature verification, and trust derivation are all client-side.

import { createOpenFileTarget } from 'openfile';
import { createWebRTCTransport } from 'openfile/transport-webrtc';
import { generateEphemeralKeypair, createWebCryptoSigner, createWebCryptoVerifier } from 'openfile/crypto';
// Parse session from link.
const { rootKey, secret, signalingUrl } = parseLink(window.location);
// Fresh ephemeral keypair for this browser.
const myKeypair = await generateEphemeralKeypair();
const target = createOpenFileTarget({
transport: createWebRTCTransport({
signalingUrl,
publicKey: myKeypair.publicKey,
sessionId: linkSessionId,
sessionSecret: secret, // for trust bootstrap
}),
rootKeys: [rootKey],
signer: createWebCryptoSigner(myKeypair),
verifier: createWebCryptoVerifier(),
syncMode: 'collaborative',
});
await target.transport.connect();
const game = createGameFromTarget(target, { onTree, onCursor });

Two browsers, no server (besides the public signaling server for NAT traversal). Once WebRTC connects, the signaling server is out of the path. Closing both browsers ends the session (no persistence by default; apps can wire IndexedDB if they want resume).


§15 What’s deliberately NOT in the spec

Section titled “§15 What’s deliberately NOT in the spec”
  • Application-level identity / authentication — identity spec.
  • Op signing / verification details — identity spec.
  • Op vocabulary, validation rules, apply machinery — op-vocab spec.
  • Persistence formats — op-vocab spec §10.
  • TURN server configuration — app/deployment policy.
  • Relay scaling / sharding — deployment territory.
  • Matchmaking — app-level product, not transport.
  • Presence features — peer cursor sharing, “Alice is here” UX. Layered above this spec; uses membership events as substrate.
  • Channel encryption beyond TLS/DTLS — WebSocket-over-WSS and WebRTC-over-DTLS provide channel security. End-to-end encryption inside (so the relay can’t read content) is v2+.

  1. Reconnect-with-resume vs. snapshot-on-reconnect. §5.3 says “fetch a fresh snapshot on reconnect.” A reconnect-with-resume feature carries last-seen HLC in hello; relay ships just the ops since then. Optimization; v2 if profiling shows reconnect is too slow.

  2. Gossip-protocol summary format. §11.1 sketches per-author maxSeq + hash. Bloom filters would be more compact for large per-author seq ranges; trade-off is false-positive complications. Defer to implementation; reference impls pick one and document.

  3. Relay-as-authority snapshot vs peer-as-authority. §9.2 leaves this app-configurable. Best-practice guidance: in star, relays that persist serve snapshots; in mesh, any peer. Worth documenting patterns more concretely as deployment guides accumulate.

  4. Cross-relay federation — multiple relays for one session, gossiping among themselves. v2+ for very-large-session apps.

  5. Backwards-compatible protocol evolution. v1 fails cleanly on version mismatch (§4.9). When v2 lands, do we support v1 ↔ v2 downgrade negotiation? Probably yes; the cost is small and the adoption story is much better.

  • Topology choice → both, behind a pluggable Transport interface (§1.1).
  • WebSocket vs WebRTC → both, as two reference impls.
  • Wire format → JSON canonical (matches signing format from identity spec).
  • Reliability semantics → reliable + ordered required at the Transport interface; both reference impls provide it (§1.2).
  • Membership awareness → strong, baked into the Transport interface (§1.4).
  • Snapshot exchange → pull only (§1.5).
  • Signaling server scope → bare minimum, NAT-traversal only (§7.2).
  • Relay verification responsibilities → none. Relay is a forwarder; verification at endpoints (§5.2).

After this spec is locked:

  1. Reference WebSocket transport — client + relay server. ~300 LOC client, ~200 LOC server.
  2. Reference signaling server — for WebRTC. ~150 LOC Node.js.
  3. Reference WebRTC transport — client. ~500 LOC.
  4. Integration with OpenFileTarget — wire the Transport interface into the op layer’s emit / receive pipeline.
  5. Connection-lifecycle UX — reconnect, snapshot-on-rejoin.
  6. Gossip protocol — optional v1, recommended for mesh.
  7. Sample apps — the §14 examples as runnable demos.

Steps 1-4 are required for any networked OpenFile deployment. Steps 5-7 land progressively as real consumers materialize.


Appendix: protocol version 1 message reference

Section titled “Appendix: protocol version 1 message reference”

For quick reference, all v1 message types and their fields:

// Sent peer → relay/peer
type Hello = {
type: 'hello';
messageId: string;
publicKey: string;
sessionId: string;
sessionSecret?: string;
protocolVersion: 1;
};
// Sent relay/peer → joiner
type Welcome = {
type: 'welcome';
messageId: string;
inReplyTo: string;
sessionMeta: { rootKeys: string[]; startFen: string };
currentPeers: PeerInfo[];
watermark: number;
protocolVersion: 1;
error?: 'version-mismatch' | 'session-not-found' | 'auth-failed';
};
// Sent peer → all peers
type OpMessage = {
type: 'op';
messageId: string;
op: SignedOp;
};
// Sent relay/signal → peers
type PeerJoin = {
type: 'peer-join';
messageId: string;
peer: PeerInfo;
};
type PeerLeave = {
type: 'peer-leave';
messageId: string;
peerPublicKey: string;
};
// Sent peer → peer
type SnapshotRequest = {
type: 'snapshot-request';
messageId: string;
atHlc: number;
requestedBy: string;
};
type SnapshotResponse = {
type: 'snapshot-response';
messageId: string;
inReplyTo: string;
snapshot?: SnapshotData;
tail?: SignedOp[];
error?: 'no-snapshot-at-hlc' | 'load-shedding' | 'refused';
};
// Sent peer → all peers
type Watermark = {
type: 'watermark';
messageId: string;
fromPeer: string;
hlc: number;
};
// Sent peer → all peers (optional)
type OpSetSummary = {
type: 'op-set-summary';
messageId: string;
fromPeer: string;
summary: { perAuthor: { [author: string]: { maxSeq: number; hash: string } } };
};
// Sent peer → emitter (optional)
type Rejection = {
type: 'rejection';
messageId: string;
toPeer: string;
opId: { author: string; seq: number };
reason: 'invalid-signature' | 'untrusted-author' | 'invalid-op' | 'below-watermark';
};