Skip to content

Output Model

Status: all three reducer families built (2026-05-29). The GameSet (game-bitmap) reducer (--output → PMOTE-BM), the aggregate reducer (--heatmap → PMOTE-HM, --group-by pawn-structure --top-n → PMOTE-GB), and now the position-output reducer (--positions ref|fen|both [--positions-unique] [--limit N] → PMOTE-PS / .fen) are live in query-engine. The one remaining spec-only item is Partitioned GameSet (design below, not built). This is the design for the output/reducer axis — the tie-together piece the earlier specs were missing — and the conceptual model below governs all three. A reducer is stage 5 of the execution model (the reduce of a single-pass fold). For exactly what runs, see README → Status.

A scan emits a stream; a reducer folds it. Concretely, a scan walks the candidate games once and produces a stream of matching positions (game, ply, board) — every ply where the position predicate holds, with the replayed board in hand. A reducer is a fold over that stream (EXECUTION-MODEL § stage 5). The three families are three folds: per-game dedups the stream to its distinct games (a bitmap); aggregate accumulates it into a bounded summary; position-output passes it through raw (or deduped to distinct positions). The board is already materialized at each emission, so nothing downstream needs re-derive it unless the reducer discards it.

A query has three axes:

  1. Predicatewhat to look for in a single position (or header). See PREDICATE-LANGUAGE / PREDICATE-LIBRARY.
  2. Quantifierhow per-position results collapse into a verdict on a whole game (Ever, Streak, …). See PREDICATE-LANGUAGE.
  3. Output (the reducer)what you get back. This document.

The first two decide which positions/games match; this one decides what the engine produces from those matches — a game set, a position list, or an aggregate. Getting this axis right is what lets one predicate serve many features, and it is the seam the GPU backend and the collections layer (Shape 2) build on.

The core model: one map, a pluggable reduce

Section titled “The core model: one map, a pluggable reduce”

Every scan, regardless of output, has the same skeleton:

for each game g in candidate_set: # candidate_set = header-tier survivors
board = initial_position
reducer.begin_game(g)
for ply in g:
make_move(board, g[ply]) # advance state (cheapest variant — see MaintenanceNeeds)
if predicate(board): # ── THE MAP ──
if not reducer.accept(g, ply, board): # ── THE REDUCE ── (returns "want more?")
break # reducer declined the rest of this game (early-exit)
reducer.end_game(g)
return reducer.finalize()

The decision this doc commits to: the reducer(s) are a parameter of the scan, not a separate hand-written scan per output type. Predicate (the map) and output (the reduce) are orthogonal; the scan driver is written once and parameterized by both. A scan may attach more than one reducer — the map runs once and each matching position fans out to all of them (see Open decision #7, resolved).

Reducer:
begin_game(game_id) # init per-game state (e.g. a streak counter)
accept(game_id, ply, board) -> bool # per matching position; return false to stop this game
end_game(game_id) # finalize the per-game decision (e.g. Count >= m)
finalize() -> Output # the in-memory artifact (bitmap / stream / aggregate); a sink persists it if asked
needs() -> MaintenanceNeeds # engine state this reducer requires (see below)

The reducer must compile away. accept runs on the order of 10^8–10^9 times per corpus scan; it cannot be a virtual call, and the match stream must never be materialized just to be reduced. The implementation is a compile-time policy (a C++ template parameter) monomorphized and inlined into the scan loop — the generic scan<Pred, Reducer> compiles to exactly the code a hand-written single-purpose loop would. This is the same idiom already proven in the engine’s make_move<mm_opts::…> specializations.

The one place map and reduce are not fully decoupled: accept returns a “want-more” signal, so a reducer can stop a game early (the game-bitmap Ever reducer stops at the first hit). That back-edge from reduce → map is the only coupling.

  • The walk is written once. Game iteration, replay, candidate prefiltering, sharding, threading — one driver, shared by every output. New outputs don’t re-derive (and re-bug) it.
  • The predicate is reused across outputs. “Sicilian with an isolated d-pawn” is one kernel; ask for the games, the positions, or a heatmap without rewriting it. This is the leverage the multi-scan analytical features need.
  • Quantifiers and outputs unify onto one axis (see below), shrinking both the conceptual and the code surface.
  • The GPU backend is a reducer swap, not a rewrite: map = a kernel over positions; reduce = a (segmented) reduction.
  • The cost model composes: cost(scan) ≈ walk(candidates) + Σ map + reduce, the reduce term differing predictably per reducer.

Three built families — per-game (GameSet), position-output, and aggregate — plus one designed-not-built (Partitioned GameSet). Each is a fold over the match stream; they differ in what the fold keeps.

Folds per-game matches into one bit per game — an in-memory dense bitmap (sharded to match the corpus); this is the GameSet value. The fold’s question is “which games?” — it dedups the match stream to its distinct games. It’s persisted only on request, in PMOTE-BM .bm format.

  • Output: bounded + dense — 1 bit/game (~1.25 MB / 10M games), fixed size regardless of match count.
  • Early-exit: yes — once a game’s verdict is decided, skip its remaining plies (quantifier-dependent; see below).
  • Quantifiers live here. The per-game fold is the quantifier (next section).
  • Composition: set algebra (and/or/xor/sub/not) — the Shape-1 layer.
  • Foundation for collections (Shape 2): a collection is a named, persisted game-bitmap with a stored definition.

Built (2026-05-28). query-engine produces this bitmap via --output @name|PATH in PMOTE-BM format (interoperates with bitmap-combine), with --input-bitmap as a cascade-AND prefilter. It was the first reducer built; the per-game fold runs on the BB_ONLY scan path under the CSW condition model.

2. Position-output (the matching positions themselves)

Section titled “2. Position-output (the matching positions themselves)”

Built (2026-05-29). Emits one record per matching position — the “where/what” of each match — rather than folding them away. It is the unbounded/sparse sibling of the GameSet, and it has two orthogonal axes plus a cap:

  • payloadwhat each record carries:
    • ref — a lean (shard, game, ply) reference (12-byte binary record). “Where.”
    • fen — the board materialized as a FEN, one per line. “What position.”
    • both — the ref and the FEN (two sidecar files).
    • hash — a position-identity hash (design; the engine computes this hash already for unique, so exposing it as a payload is a small add — not yet a CLI flag).
  • modehow the stream collapses:
    • stream (default) — every occurrence: each matching ply emits a record. “Where (all of it).”
    • unique (--positions-unique) — dedup to distinct positions by a position-identity hash (a hash set). “What positions appear.”
  • --limit N — stop after N records are emitted (the backpressure cap; see Open decision #6).

The FEN-list is the fen payload (optionally with unique) — not a separate reducer type. So the two headline shapes are: stream answers “where,” and unique + fen answers “what distinct positions appear.”

The FEN is cheap, and cheaper end-to-end than a reference. The scan has the replayed board in hand at the moment of the match, so serializing a FEN is a few dozen bytes of local work. A ref is smaller on the wire, but it defers the cost: any consumer that needs the actual position must replay the game to ply k to reconstruct it. Emitting the FEN now pays once, at the one place the board already exists; emitting a ref pays later, possibly many times, by re-deriving what was discarded. Choose ref when the consumer only needs to locate games/plies (and will go back to the corpus anyway); choose fen when it needs the positions.

  • Output: unbounded + sparse. A loose predicate can match 100M+ positions, so this is a stream/list, never a dense per-position bitmap — and --limit / unique are how you keep it bounded.
  • Early-exit: none from the predicate — every ply of every surviving game is visited; the only early-stop is --limit (which all attached reducers must permit, per #7).
  • Quantifier: none at the game level (you want the positions themselves, not a per-game verdict). The streak-membership case — “emit only positions inside a matched streak” — is the still-open coupling of #5.
  • Composition: not list-intersection. Compose by predicate-AND + a game-set prefilter (--input-bitmap: “positions matching P, within games in bitmap X”).
  • Role: the escape valve — display, app-level consumption, and app-side aggregation for anything the built-in aggregators don’t cover.

As built. query-engine runs the position-output reducer on the same scan_bb_aggregate per-position path the aggregators use (same position predicate, no game-level window, no predicate early-exit). It always prints a summary — positions emitted, plus distinct positions under unique. With --output it writes the records: PMOTE-PS (a 16-byte header magic "PMOTE-PS" | u32 version | u32 count, then count × {u32 shard, u32 game, u32 ply} LE) for ref/both, and a plain-text .fen (one FEN per line) for fen/both. Without --output it counts only — it never materializes the list — which is how the count gate runs over 187 M positions without buffering them. Per-thread accumulation (records, the dedup set) is merged after the join, exactly like the aggregators; --positions-unique reports the exact global-distinct count by merging the per-shard hash sets (the same after-join merge group-by uses).

3. Aggregate (accumulate into a bounded summary)

Section titled “3. Aggregate (accumulate into a bounded summary)”

Built (2026-05-29). A dedicated aggregation scan path (scan_bb_aggregate, separate from the game-bitmap scan_bb_only path) reduces matching positions into a bounded accumulator in-scan, without materializing the position list. It reduces over every matching position — the position-level predicate (DNF-combined C+S; no game-level window quantifier, no early-exit). Game-level filtering is the header prefilter + --input-bitmap, not a quantifier (Open decision #5). Two sub-kinds are live (--heatmap and --group-by pawn-structure); histogram shares the group-by machinery and is not yet exposed as a flag. Sub-kinds:

  • square-heatmap (built — --heatmap) — a per-shard u64 heat[2][6][64] table (color × piece-type × square); each matching position increments every occupied square by its piece type. Output PMOTE-HM (merged to 2×6×64 u64 LE). “Where do the pieces sit in matching positions?”

  • histogram — counts bucketed by a key (result, ply, a derived feature, …). (Same machinery as group-by over a small enum key; not yet exposed as a flag.)

  • group-by (built — --group-by pawn-structure, --top-n K) — counts/aggregates keyed by a derived feature of the position. The key is computed by a small feature kernel — a mini-map feeding the reduce; the first (and currently only) kernel is a pawn-structure signature (a mix of the white/black pawn bitboards). An open-addressing hash-map accumulates key → {count, raw} and finalizes to top-N. Output PMOTE-GB.

  • Output: bounded (fixed for heatmap; key-space-bounded for group-by).

  • Early-exit: no.

  • Composition: terminal — aggregates don’t feed set algebra.

  • Role: the multi-scan analytical features. These almost never want a raw 100M-record list; they want the reduction.

Worked example — the north-star feature. “Extract the pawn structures from my games, compare their frequency, and for the common ones find where masters put their pieces.” Entirely this reducer: (a) group-by structure-signature over my games → frequency histogram; (b) for each common structure S, scan the master corpus with predicate structure == S under a square-heatmap reducer → piece-placement map. One predicate family, two aggregations, dozens of scans — the aggregate-throughput case the whole design targets.

As-built validation (2026-05-29, 10.16M-game corpus). All three reducers reduce over the same position population, which is the cross-check that ties them together. With no predicate, heat[white-king] summed over squares == the group-by grand total == 685,863,447 (the corpus’s true ply count); under --queens-off, heatmap-sum == group-by-total == position-output stream count == 187,591,286 (the per-position reducer rides the identical predicate and emission point). --queens-off --positions-unique182,550,541 distinct positions (≤ 187.6 M — queens-off positions are nearly all distinct); --limit 1000 emits exactly 1000; --positions fen yields valid FENs (no queens, one king per side, round-tripping through the engine’s own loader). The reducers sample POST-move positions (the start layout is never recorded — the top group is the position after 1.e4 e5). The game-bitmap path is unchanged (killer query still reproduces 152,148). Perf over the corpus: heatmap ~1.25 s, group-by ~1.66 s, position-output count ~1.3 s / unique ~2.5 s.

A fourth shape (design): Partitioned GameSet

Section titled “A fourth shape (design): Partitioned GameSet”

Not built — design only. The aggregate family has a group-by that routes each matching position to a counter keyed by a derived feature. Its game-set sibling routes each matching game to a bitmap keyed by a per-game feature: a per-game group-by whose buckets are GameSets, one dense bitmap per key.

group-by : position ─key→ counter[key]++ → counts per key
Partitioned GameSet : game ─key→ bitmap[key].set(game) → a GameSet per key

Example: classify every game by endgame type in one scan — route each game into bitmap["R+P"], bitmap["B vs N"], bitmap["Q+P"], … The result is a family of GameSets, each independently composable with the Shape-1 set algebra and persistable as its own .bm.

  • Why it’s cheap. It is one shared scan (the replay paid once), and K bitmaps cost K bits/game total — at 10 M games, ~1.25 MB per key, so dozens of partitions still fit in RAM. This is the single-fold cost story: many output bitmaps on one scan ≈ free relative to K separate scans.
  • The one new wrinkle — a per-game key needs an assignment rule. A position-keyed group-by keys on the board at that ply; a game-keyed partition must pick which ply’s feature labels the whole game. The rule is a parameter: final position (the endgame it reached), first (its opening character), or ever-featured (route the game into every bucket whose feature it ever exhibited — then the partitions overlap and the bitmaps are not disjoint). Different questions want different rules; the reducer takes it as config.

This slots in as a fourth reducer family with no new machinery beyond the key-assignment rule — a group-by whose accumulator cell is a bit in a per-key bitmap instead of a counter.

The unifier: GameSet and unique-positions are one fold at two grains

Section titled “The unifier: GameSet and unique-positions are one fold at two grains”

The GameSet and the unique position-set are the same fold — “keep each distinct matching thing once” — applied at two granularities:

GameSet (per-game)unique position-output
distinct whatgamespositions
representationdense bitmap (1 bit/game)materialized list (a record per distinct position)
why that representationgames are bounded (one corpus-sized bit vector)positions are unbounded (no fixed universe to index)
dedup keythe game id (intrinsic)a position-identity hash

The only reason they look different is the universe: games have a fixed finite id space, so “distinct games” packs into a dense bitmap indexed by id; positions have no such bounded universe, so “distinct positions” must be a list keyed by a hash. Same intent, two physical shapes forced by boundedness. (And the stream mode is this fold’s identity — keep everything, dedup nothing.) Seeing them as one fold is what makes “give me the games / the positions / the distinct positions” one axis rather than three features.

A quantifier is the game-bitmap reducer’s internal fold — the rule that turns a game’s sequence of per-position results into one yes/no. It is not a peer of the reducer; it lives inside one of them. The other two reducers (position-output, aggregate) take no game-level quantifier.

Every quantifier is the same three-part shape — begin_game (init state), accept (update), end_game (decide) — with an optional early-exit:

Quantifierper-game stateaccept doesearly-exit
Ever(P)set bit, stop1st hit
Never(P)hit ⇒ exclude, stop1st hit
Always(P)¬P ⇒ exclude, stop1st miss
Streak(P, n)run counterinc / reset; n ⇒ set, stopon trigger
Count(P) ≥ mcounterinc; compare at end_gamewhen ≥ m
AtPly(P, k)consider only at ply kafter k
FromPly/UntilPly/BetweenPlyply-windowed Everwindow-dependent
InPhase(P, phase)phase-restricted Ever

So the game-bitmap reducer is really a family parameterized by quantifier; position-output and aggregate are separate families. “Output type” and “quantifier” are the same kind of thing — folds over the per-position predicate results — which is why they share one axis.

GameSet (game-bitmap)Position-outputAggregate
Outputdense bitmap, 1 bit/gamesparse list (ref / fen records)bounded accumulator
Sizefixed (~1.25 MB/10M)unbounded (capped by --limit / unique)fixed / key-bounded
Early-exityes (quantifier)only --limitno
Quantifieryes (the fold)none (stream/unique are the fold)none
Compositionset algebra (Shape-1)predicate-AND + --input-bitmap prefilterterminal
Typical consumerfurther queries, collectionsdisplay / app-side / FEN exportanalytics features
Persistenceopt-in → PMOTE-BM .bmopt-in → PMOTE-PS .ps / .fen textopt-in → PMOTE-HM / PMOTE-GB

Outputs are in-memory by default. Three lifetimes:

  1. Transient — cascade-prefilter steps and the operands of an in-engine set-op. Never written.
  2. Session result — the answer to a one-shot query, or a result the daemon hands back / caches under a handle. In-memory; printed or returned. Persisted only if asked.
  3. Durable / named — saved @name results, shipped accelerator bitmaps, and Shape-2 collections. Persisted to PMOTE-BM on purpose.

Persistence is the exception, not the rule — and for game-bitmaps it is never about capacity: 1 bit/game is ~1.25 MB per 10M games, so hundreds fit in RAM. It is only ever about durability or cross-invocation reuse. (Contrast tree.dat at ~2.4 GB, where size does force on-disk + mmap.)

The in-memory bitmap and the on-disk PMOTE-BM payload are the same bytes (dense per-shard packed bits; the corpus meta supplies the layout). So persist = prepend a 32-byte header + write; load = mmap; and the same libedge set-algebra kernels run on in-memory and mmap’d bits alike. Choosing PMOTE-BM as the format does not commit us to writing files — it just makes writing one a memcpy when a sink asks.

So a reducer yields an in-memory Output; a separate, optional output sink (--output @name / path) serializes it — for the durable cases above, and nothing else.

MaintenanceNeeds: the reducer + predicate declare state

Section titled “MaintenanceNeeds: the reducer + predicate declare state”

The predicate and reducer jointly declare what the inner loop must maintain — and that declaration is the planner’s input for choosing the cheapest ScanPath (PLANNER). make_move is templated by what’s needed:

Needmake_move variantExample
header onlynone (no replay)Ever(result-based) → header-only path (~7 ms)
bitboardsBB_ONLYbishop_pair, material(...)
+ zobrist hashWITH_HASHexact-position / transposition predicates
+ derived tallyBB_ONLY + side-channelmaterial(side) kept as a running sum

Two consequences worth designing for:

  • The reducer’s needs matter, not just the predicate’s. A game-bitmap over a hash-only predicate needs no board reconstruction → header/hash path. A square-heatmap needs the actual piece squares → full board replay. Same predicate, different path depending on the reducer.
  • Incremental derived features. A predicate like material(side) ≤ 13 shouldn’t recompute material every ply: the scan can maintain a running material tally (a capture/promotion updates it by a delta), so the predicate reads a number it already has. This is a maintenance service the planner enables when a predicate declares it needs material — the same mechanism as maintaining the zobrist hash.

Set algebra: composing game-sets (Shape 1)

Section titled “Set algebra: composing game-sets (Shape 1)”

Set algebra (and/or/xor/sub/not) is the composition layer for the game-bitmap output, and only that one (GameSet × GameSet → GameSet). It is one face of a single idea: boolean composition lives at several levels — predicate, quantifier-boundary, and GameSet — and the planner lowers each operator to its cheapest physical form:

  • fused into one predicate (a single scan) — cheapest;
  • a prefilter cascade (AND across scans: scan B over A’s survivors);
  • extensional set algebra on materialized bitmapsrequired only for disjunction and its relatives (OR/XOR/SUB/NOT), which can’t be expressed by narrowing.

Lowering is legal only when it preserves meaning — quantifiers do not generally distribute over a boolean op (Streak(A,5) OR Streak(B,5)Streak(A OR B, 5)), so the level at which an operator sits is part of the query’s meaning, not just an optimization. The set-algebra kernels live in libedge, called both in-engine (set-op AST nodes, combined in memory) and by the standalone bitmap-combine CLI (persisted .bm). See ARCHITECTURE → Bitmap composition and PLANNER.

A query-defined collection is a materialized view: a stored query (predicate + quantifier + game-bitmap reducer) whose result bitmap is persisted and kept current as games are added. It is built directly on the game-bitmap reducer here — which is why getting that reducer and the PMOTE-BM output right is the foundation. Collections, incremental indexing, dedup/delete, and the rest are a separate database-management track, not part of this doc. (Prior art to revisit: TNMP ctx/records.)

1. Reducer in the type system. [RESOLVED 2026-05-28] Not a fourth type — an execution-time choice attached to the Scan node. A Scan carries (corpus, predicate, outputs). Reducers do not compose in the type algebra; composition happens via set-algebra on game-bitmap outputs and the cascade. (outputs is a list — see #7.)

2. Position-output format. [RESOLVED 2026-05-29 — built] Lean binary (game, ply) records by default, with the payload chosen per query. The reducer takes a --positions ref|fen|both axis:

  • refPMOTE-PS, a 16-byte header (magic "PMOTE-PS" | u32 version | u32 count) then count × {u32 shard, u32 game, u32 ply} little-endian — the lean reference; the consumer reconstructs the board by lazy replay to ply k when it needs it.
  • fenplain text, one FEN per line — the board materialized now, while it is in hand at the match (no later replay). The brief’s “snapshot saves a downstream replay” tradeoff, taken explicitly: a FEN is cheap to emit and cheaper end-to-end than a ref whenever the consumer actually needs positions.
  • both → the refs (.ps) and the FENs (.fen).
  • hash payload (position-identity hash) is designed but not yet a CLI flag — the engine already computes it for unique, so it is a small add. No JSON: the lean binary + plain-text FEN cover the wire and the human/pipe cases respectively.

So the resolution is exactly the brief’s “lean records + lazy reconstruction, with an optional richer payload (fen/hash)” — with fen promoted from “optional extra” to a first-class payload because the board is free at emission time.

3. Aggregation surface. [RESOLVED 2026-05-29 — heatmap + group-by built] Three aggregators, one machinery. square-heatmap: a heat[2][6][64] table (color × piece-type × square); each matching position increments every occupied square by its piece type. As built, the per-shard accumulator is u64 (so a single shard can’t overflow); the merged PMOTE-HM payload is 2×6×64 u64 LE. Output PMOTE-HM. group-by: parameterized by a feature-kernel board → u64 key; an open-addressing hash-map key → {count, raw} accumulates, finalizing to top-N (key, count) (the raw feature retained per key for rendering). First kernel: pawn-structure signature (hash of the white/black pawn bitboards). Output PMOTE-GB. histogram is group-by over a small enum key (result, ply-bucket) — same machinery, fixed key space. Game-level filtering is via prefilter, not a quantifier (see #5).

4. Sequential predicates. [RESOLVED 2026-05-28] A predicate is a pure function of the current board + whatever cross-ply state it declares it needs (an extension of MaintenanceNeeds). The cheapest sufficient state, by case: counting results → the quantifier’s counter (no history — e.g. “doubled pawns for ≥5 plies” is just Streak(5) of a one-board predicate); a transition → a 1-bit edge, exposed as Became(P) / Ceased(P) (one prior bit — “a doubled pawn was just created”); a running feature → a maintained tally (e.g. material, like the zobrist hash); and only the rare “compare to the board k plies ago” keeps prior boards. No blanket “every predicate carries a window” — full board-windows are a declared, uncommon need. Ordered gating (Then) is built (single-pass, latched per-game fired-flags, no automaton); While/Until were dropped — continuity tied to an event isn’t a chess need.

5. Quantifier × non-game reducers. [RESOLVED 2026-05-29 — strict] Quantifiers are strictly the game-bitmap reducer’s fold. Aggregation reduces over every matching position (the position-level predicate C+S, DNF-combined; the window quantifier W does not apply); game-level filtering is a separate --input-bitmap prefilter (or a preceding game-bitmap scan). The streak-membership case (stream only positions inside a matched streak) is deferred until a real use case forces it.

6. Position-output backpressure. [RESOLVED 2026-05-29 — built] Three composable controls, not one:

  • --limit N — a hard cap; the scan stops emitting (and, since no later position can be emitted, stops scanning) after N records. Under threads a single shared atomic hands out emission slots 0..N-1, so the kept records total exactly N (validated: --queens-off --limit 1000 → 1000).
  • count-first probe — run the reducer count-only (no --output) to size the result before materializing; with no sink it counts without buffering, so you can probe a 187 M-position result for free, then decide whether to stream, --limit, or tighten the predicate. (The GameSet/aggregate reducers also size their populations the same way.)
  • --positions-unique — collapse the stream to distinct positions, which is the natural cap when you want what positions appear rather than every occurrence.

True streaming-with-backpressure (a bounded channel to a live consumer) is deferred — --limit + count-first + unique cover the batch cases; a streaming sink is a daemon-API concern, not this reducer’s.

7. Multi-output in one pass. [RESOLVED 2026-05-28] Supported: Scan.outputs is a list of reducers. The map (replay + predicate) runs once; each matching position fans out to every attached reducer (set the game bit, append to the stream, tally the heatmap). Two rules: the scan may early-exit only if all reducers permit it (a stream/aggregation forces a full scan; the game-bitmap rides along for free), and the scan maintains the union of the reducers’ MaintenanceNeeds. This is not reducer algebra — just several reducers sharing one pass. (Coupling one reducer’s output to another’s filter — “stream only the positions inside the matched streak” — is the separate open #5.)