Planner
Status: TEMPLATE. Sections below are scaffolds for collaborative authoring. This doc covers the execution model — how
query-engineturns a predicate AST into a fast scan.
Implemented today: header predicates + a first slice of position predicates (
--queens-off,--bishop-pair,--count) ANDed under--min-streak, on the header-only and BB_ONLY paths, with game-bitmap output. Everything else here is target design. See README → Status.
The planner takes a PREDICATE-LANGUAGE AST and decides:
- What order to evaluate predicates in
- Which engine state (none / mailbox / hash) to maintain
- When to materialize intermediate bitmaps
- How to parallelize over shards
The goal: minimize wall-clock time for a correct match set, given the known cost structure of the corpus.
Why a planner
Section titled “Why a planner”Cover: the two-tier cost asymmetry (headers ~7 ms, position scans ~360 ms). Without a planner, position-tier predicates run on the full corpus; with one, they run on the small surviving set after header filtering. The win is ~50× in the typical case.
Cost model
Section titled “Cost model”Cover: the per-predicate cost components and how they’re estimated.
Per-game cost decomposes into:
- Header read cost — constant. ~7 ns per game (cache-friendly, 36-byte stride). Negligible variation between header predicates.
- Move-replay cost — proportional to game length × per-ply work.
Per-ply work depends on what engine state the predicate needs:
- None: just decode SMove16 (~3 ns/ply)
- Bitboards only:
make_move<BB_ONLY>(~6 ns/ply) - Bitboards + hash:
make_move<WITH_HASH>(~10 ns/ply)
- Per-predicate-evaluation cost — varies wildly. Cheap predicates
(
bishop_pair: ~2 popcounts) are ~5 cycles; expensive predicates (mate_in_one: full movegen) are thousands.
Predicate metadata declares its engine-state requirement and rough evaluation cost; the planner uses these.
Open question: how do we estimate selectivity? Options:
- Static hints in PREDICATE-LIBRARY — each predicate declares approximate “matches X% of games” advisory.
- Per-corpus profile — first run of a predicate writes its actual selectivity to a small cache; subsequent runs use that.
- No estimate — just use predicate-author-supplied hints.
Predicate partitioning
Section titled “Predicate partitioning”Cover: how the planner splits the AST into tiers.
For an AST of the form and(p1, p2, p3, ...):
- Partition:
header_preds = [p for p in preds if isinstance(p, HeaderPred)],game_preds = [p for p in preds if isinstance(p, GamePred)]. - Header tier: AND all
header_predsinto one combinedHeaderPred. - Game tier: order
game_preds(see below) and chain.
For non-AND root nodes (set algebra at top), the planner descends into each sub-scan independently. Set-algebra operations on resulting bitmaps happen post-scan.
Header-tier execution
Section titled “Header-tier execution”Cover: the single shared header pass.
All header predicates evaluate in one pass over the corpus:
for each game: read 36-byte header if all_header_preds(header): set bit in survivors bitmapThis is bandwidth-bound (header bytes streaming through cache) at ~7 ms for 10M games. Order among header predicates doesn’t matter — they all evaluate per game regardless.
Output: a survivors bitmap (in-memory, one bit per game) holding the
games that passed all header predicates.
Game-tier execution
Section titled “Game-tier execution”Cover: the iterative chain over surviving games.
After the header tier, each game-level predicate runs in turn, each filtering the survivor set further:
for each game-pred in ordered(game_preds): new_survivors = empty bitmap for each game in survivors: if game-pred(game): set game in new_survivors survivors = new_survivorsPer-game-pred evaluation involves:
- Mmap the game’s
.movesrecord (via the.gameidxbyte offset) - Replay plies with the cheapest engine state covering this predicate’s needs
- Apply the predicate’s quantifier (Ever, Streak, AtPly, etc.) to the ply stream
- Set the survivor bit iff the predicate’s quantifier accepts
Ordering game-tier predicates
Section titled “Ordering game-tier predicates”Cover: how the planner orders them.
Each game-pred has:
- Cost factor: how expensive per ply to evaluate
- Selectivity hint: fraction of games matching
Ever(P)
Greedy ordering: sort by cost / (1 - selectivity) ascending — i.e.,
“how many CPU cycles do I spend to eliminate one game?”
Open question: when selectivity hints are absent (new predicate, no data), fall back to alphabetical / definition order? Or run an initial cheap-sample pass to get rough numbers? Probably alphabetical for V1; revisit if it shows up as a real loss.
Engine-state maintenance
Section titled “Engine-state maintenance”Cover: how the planner picks the cheapest engine state.
Each game-pred declares one of:
NONE— works on header alone (shouldn’t be aGamePred, but some quantifiers likeAtPlywith a header-only predicate could be)BITBOARDS_ONLY— needsBoardbitboards updated; usesmake_move<BB_ONLY>BITBOARDS_AND_HASH— needs zobrist hash too; usesmake_move<WITH_HASH>FULL_MOVES— needs the full Move struct + flags; usesmake_move<ALL>
The planner picks the union of all surviving predicates’ requirements.
As-built (2026-05-28): MaintenanceNeeds exists but only sets
need_replay (true iff a position predicate is present); ScanPath is
a two-path dispatch (header-only / BB_ONLY). WITH_HASH and the
richer needs (need_hash, need_halfmove) are designed but unwired —
and since no position predicate is currently wired, the BB_ONLY path is
unreachable in practice. See README → Status.
Set algebra (post-scan)
Section titled “Set algebra (post-scan)”Cover: cross-query composition. AND, OR, XOR, SUB, NOT on bitmaps.
When the AST root is a set-algebra node (e.g., or(scan(q1), scan(q2))),
the planner:
- Recursively evaluates each child scan, producing a bitmap.
- Applies the set-algebra operation on the bitmaps.
These are the operations that used to live in bitmap-combine.c.
As-built (2026-05-28): not implemented in query-engine. Set
algebra lives only in the standalone bitmap-combine binary; the
engine emits no bitmap to feed it yet. Target: the kernels move to
libedge and are called both in-engine (set-op AST nodes, combined in
memory) and by the bitmap-combine CLI.
Open question: do we evaluate child scans in parallel? They’re independent. For two-scan OR, parallelism wins. For a deep tree of set algebra, scheduling matters.
Materialization strategy
Section titled “Materialization strategy”Cover: when to keep bitmaps in memory vs spill to disk.
For V1, the simple rule:
- All bitmaps produced by a query stay in memory for the duration of the query invocation.
- The user can request persistence via
--output @name, which writes the final bitmap to<corpus>/bitmaps/<name>.bm. - Intermediate bitmaps in chained predicates are never persisted — they’re transient working state.
Open question: how do we handle “accelerator bitmaps” — pre-computed common subsets (e.g., the engine ships a precomputed
@mastersalongside the corpus)? They’d be persisted somewhere but not user-managed. (Separate design conversation per the architectural discussion; the planner just consumes whatever’s available.)
Parallelism
Section titled “Parallelism”Cover: how the planner spawns threads.
Each scan over the corpus parallelizes across shards (one worker per shard, mmap’d independently). The header tier is embarrassingly parallel; the game tier is also per-shard parallel.
Within a single shard, scanning is sequential — games are walked in order.
Across set-algebra operations on independent bitmaps, the operations themselves are O(num_shards × bits_per_shard) byte-bashing — bandwidth- bound, vectorizable, single-threaded today (sub-millisecond at 10M scale, no need to parallelize).
Worked execution examples
Section titled “Worked execution examples”Cover: 3-5 end-to-end traces of how a query becomes work.
Example 1: “Sicilian Najdorf, blitz, both 2500+, white wins, white had bishop pair sustained ≥10 plies starting from move 15.”
Plan:
- Header tier (one pass): eco_in(B90-B99) ∧ tc_category(blitz) ∧ both_elo_ge(2500) ∧ result(W) → survivors bitmap, ~N_h games
- Game tier (one pass over survivors): Streak(white_bishop_pair, 10) restricted to FromPly(30) Engine state: BB_ONLY (popcounts on bitboards) → final bitmap
Example 2: Set algebra. “Sicilian wins OR French wins, all by master.”
Plan:
- Scan A: eco_in(B20-B99) ∧ result(W) ∧ both_elo_ge(2200) → bitmap A
- Scan B: eco_in(C00-C19) ∧ result(W) ∧ both_elo_ge(2200) → bitmap B
- Set-op: A ∪ B → final bitmap Open question: can the planner detect that scans A and B differ only in one predicate and fuse them into a single scan with a disjunctive header check? Yes for V2; not worth the complexity for V1.
Open design decisions
Section titled “Open design decisions”Open list:
- Selectivity-hint source: static / dynamic / hybrid?
- Empirical predicate-cost calibration (planner self-tunes)?
- Parallelism of independent set-algebra sub-scans?
- Predicate fusion (multiple scans with shared work merged into one)?
- Predicate-output beyond bitmap (counts, histograms, position-list)?
- Cancellation / streaming results (if a UI wants progress feedback)?