Skip to content

Planner

Status: TEMPLATE. Sections below are scaffolds for collaborative authoring. This doc covers the execution model — how query-engine turns a predicate AST into a fast scan.

Implemented today: header predicates + a first slice of position predicates (--queens-off, --bishop-pair, --count) ANDed under --min-streak, on the header-only and BB_ONLY paths, with game-bitmap output. Everything else here is target design. See README → Status.

The planner takes a PREDICATE-LANGUAGE AST and decides:

  • What order to evaluate predicates in
  • Which engine state (none / mailbox / hash) to maintain
  • When to materialize intermediate bitmaps
  • How to parallelize over shards

The goal: minimize wall-clock time for a correct match set, given the known cost structure of the corpus.

Cover: the two-tier cost asymmetry (headers ~7 ms, position scans ~360 ms). Without a planner, position-tier predicates run on the full corpus; with one, they run on the small surviving set after header filtering. The win is ~50× in the typical case.

Cover: the per-predicate cost components and how they’re estimated.

Per-game cost decomposes into:

  • Header read cost — constant. ~7 ns per game (cache-friendly, 36-byte stride). Negligible variation between header predicates.
  • Move-replay cost — proportional to game length × per-ply work. Per-ply work depends on what engine state the predicate needs:
    • None: just decode SMove16 (~3 ns/ply)
    • Bitboards only: make_move<BB_ONLY> (~6 ns/ply)
    • Bitboards + hash: make_move<WITH_HASH> (~10 ns/ply)
  • Per-predicate-evaluation cost — varies wildly. Cheap predicates (bishop_pair: ~2 popcounts) are ~5 cycles; expensive predicates (mate_in_one: full movegen) are thousands.

Predicate metadata declares its engine-state requirement and rough evaluation cost; the planner uses these.

Open question: how do we estimate selectivity? Options:

  • Static hints in PREDICATE-LIBRARY — each predicate declares approximate “matches X% of games” advisory.
  • Per-corpus profile — first run of a predicate writes its actual selectivity to a small cache; subsequent runs use that.
  • No estimate — just use predicate-author-supplied hints.

Cover: how the planner splits the AST into tiers.

For an AST of the form and(p1, p2, p3, ...):

  1. Partition: header_preds = [p for p in preds if isinstance(p, HeaderPred)], game_preds = [p for p in preds if isinstance(p, GamePred)].
  2. Header tier: AND all header_preds into one combined HeaderPred.
  3. Game tier: order game_preds (see below) and chain.

For non-AND root nodes (set algebra at top), the planner descends into each sub-scan independently. Set-algebra operations on resulting bitmaps happen post-scan.

Cover: the single shared header pass.

All header predicates evaluate in one pass over the corpus:

for each game:
read 36-byte header
if all_header_preds(header):
set bit in survivors bitmap

This is bandwidth-bound (header bytes streaming through cache) at ~7 ms for 10M games. Order among header predicates doesn’t matter — they all evaluate per game regardless.

Output: a survivors bitmap (in-memory, one bit per game) holding the games that passed all header predicates.

Cover: the iterative chain over surviving games.

After the header tier, each game-level predicate runs in turn, each filtering the survivor set further:

for each game-pred in ordered(game_preds):
new_survivors = empty bitmap
for each game in survivors:
if game-pred(game):
set game in new_survivors
survivors = new_survivors

Per-game-pred evaluation involves:

  • Mmap the game’s .moves record (via the .gameidx byte offset)
  • Replay plies with the cheapest engine state covering this predicate’s needs
  • Apply the predicate’s quantifier (Ever, Streak, AtPly, etc.) to the ply stream
  • Set the survivor bit iff the predicate’s quantifier accepts

Cover: how the planner orders them.

Each game-pred has:

  • Cost factor: how expensive per ply to evaluate
  • Selectivity hint: fraction of games matching Ever(P)

Greedy ordering: sort by cost / (1 - selectivity) ascending — i.e., “how many CPU cycles do I spend to eliminate one game?”

Open question: when selectivity hints are absent (new predicate, no data), fall back to alphabetical / definition order? Or run an initial cheap-sample pass to get rough numbers? Probably alphabetical for V1; revisit if it shows up as a real loss.

Cover: how the planner picks the cheapest engine state.

Each game-pred declares one of:

  • NONE — works on header alone (shouldn’t be a GamePred, but some quantifiers like AtPly with a header-only predicate could be)
  • BITBOARDS_ONLY — needs Board bitboards updated; uses make_move<BB_ONLY>
  • BITBOARDS_AND_HASH — needs zobrist hash too; uses make_move<WITH_HASH>
  • FULL_MOVES — needs the full Move struct + flags; uses make_move<ALL>

The planner picks the union of all surviving predicates’ requirements. As-built (2026-05-28): MaintenanceNeeds exists but only sets need_replay (true iff a position predicate is present); ScanPath is a two-path dispatch (header-only / BB_ONLY). WITH_HASH and the richer needs (need_hash, need_halfmove) are designed but unwired — and since no position predicate is currently wired, the BB_ONLY path is unreachable in practice. See README → Status.

Cover: cross-query composition. AND, OR, XOR, SUB, NOT on bitmaps.

When the AST root is a set-algebra node (e.g., or(scan(q1), scan(q2))), the planner:

  1. Recursively evaluates each child scan, producing a bitmap.
  2. Applies the set-algebra operation on the bitmaps.

These are the operations that used to live in bitmap-combine.c. As-built (2026-05-28): not implemented in query-engine. Set algebra lives only in the standalone bitmap-combine binary; the engine emits no bitmap to feed it yet. Target: the kernels move to libedge and are called both in-engine (set-op AST nodes, combined in memory) and by the bitmap-combine CLI.

Open question: do we evaluate child scans in parallel? They’re independent. For two-scan OR, parallelism wins. For a deep tree of set algebra, scheduling matters.

Cover: when to keep bitmaps in memory vs spill to disk.

For V1, the simple rule:

  • All bitmaps produced by a query stay in memory for the duration of the query invocation.
  • The user can request persistence via --output @name, which writes the final bitmap to <corpus>/bitmaps/<name>.bm.
  • Intermediate bitmaps in chained predicates are never persisted — they’re transient working state.

Open question: how do we handle “accelerator bitmaps” — pre-computed common subsets (e.g., the engine ships a precomputed @masters alongside the corpus)? They’d be persisted somewhere but not user-managed. (Separate design conversation per the architectural discussion; the planner just consumes whatever’s available.)

Cover: how the planner spawns threads.

Each scan over the corpus parallelizes across shards (one worker per shard, mmap’d independently). The header tier is embarrassingly parallel; the game tier is also per-shard parallel.

Within a single shard, scanning is sequential — games are walked in order.

Across set-algebra operations on independent bitmaps, the operations themselves are O(num_shards × bits_per_shard) byte-bashing — bandwidth- bound, vectorizable, single-threaded today (sub-millisecond at 10M scale, no need to parallelize).

Cover: 3-5 end-to-end traces of how a query becomes work.

Example 1: “Sicilian Najdorf, blitz, both 2500+, white wins, white had bishop pair sustained ≥10 plies starting from move 15.”

Plan:

  1. Header tier (one pass): eco_in(B90-B99) ∧ tc_category(blitz) ∧ both_elo_ge(2500) ∧ result(W) → survivors bitmap, ~N_h games
  2. Game tier (one pass over survivors): Streak(white_bishop_pair, 10) restricted to FromPly(30) Engine state: BB_ONLY (popcounts on bitboards) → final bitmap

Example 2: Set algebra. “Sicilian wins OR French wins, all by master.”

Plan:

  1. Scan A: eco_in(B20-B99) ∧ result(W) ∧ both_elo_ge(2200) → bitmap A
  2. Scan B: eco_in(C00-C19) ∧ result(W) ∧ both_elo_ge(2200) → bitmap B
  3. Set-op: A ∪ B → final bitmap Open question: can the planner detect that scans A and B differ only in one predicate and fuse them into a single scan with a disjunctive header check? Yes for V2; not worth the complexity for V1.

Open list:

  • Selectivity-hint source: static / dynamic / hybrid?
  • Empirical predicate-cost calibration (planner self-tunes)?
  • Parallelism of independent set-algebra sub-scans?
  • Predicate fusion (multiple scans with shared work merged into one)?
  • Predicate-output beyond bitmap (counts, histograms, position-list)?
  • Cancellation / streaming results (if a UI wants progress feedback)?