Roadmap

Status of Edge in three columns: what’s built, what’s next, what’s been decided against (and why).

Updates land here as decisions are made. The README links to this file as the source of truth for “what’s the current state.”

Built and exercised

These are validated end-to-end against a 10 M Lichess corpus.

Indexer (`indexer`)

PGN → .scoredb in a single MapReduce pass.
Tiered shard count by PGN size (1 / 4 / 16 / 32 / 64).
Timestamped output directory naming, --out PATH override.
36-byte ScoreHeader with full PGN metadata (result, termination, ECO, ratings, date, TC, player name indices).
16-bit move encoding (SMove16: 4-flag, 4-promotion).
Per-shard player-name dictionaries.
Per-game .gameidx sidecar (uint64 byte offsets) for O(matches) bitmap-filtered scans.
Aggregate edge table persisted as tree.dat (see Tree below).
Chess kernel in position.hpp: piece_at[64] mailbox + templated make_move<Opts> + state-machine SAN parser + NEON-2-way batched movegen + FIDE-2024+ EP semantics (only hash EP when capture is legal — makes transpositions aggregate correctly).

`.scoredb` directory format (v2)

meta key=value file (version, num_shards, total_games, etc.).
shards/N.moves + shards/N.dict + shards/N.gameidx files.
tree.dat at the scoredb root (header + bucket-offset table + packed Edge[] payload).
bitmaps/ subdirectory for stored bitmaps.
See DATA-FORMAT.md. (File extension renamed from .score → .moves in 2026-05; the format itself is unchanged from v2.)

Query engine (`query-engine`) — the CSW model

A query = a boolean combination of conditions over a game’s ply trace, under a global header-predicate AND prefilter. A condition is (C, S, W): composite position-predicate, side-quantifier, window- quantifier. (Grammar: QUERY-LANGUAGE.md; logic: FORMAL-BASIS.md; outputs: OUTPUT-MODEL.md.)

Header predicates (~20): result, ECO range, year, ELO bands, TC category, termination, player names (per-shard dict resolution), --gm/--master. AND-chained; the cheap first tier (~10 ms / 10M).
Position predicates (C): --queens-off, --bishop-pair (the comparative pair), --doubled-pawn, --isolated-pawn, --passed-pawn, --rook-on-seventh (absolute 7th), --rook-on-open-file, --king-castled, and --count EXPR (material mini-language: "QBNqbn=0", "R=r", "P<p", …).
Side quantifiers (S): --side white|black|either|both|neither.
Window quantifiers (W): --streak N/--min-streak, --ever, --always (≥1-ply gated), --never, --at-ply, --from-ply, --until-ply, --between-ply.
Edge transforms: --became / --ceased (1-step-past).
Game-level boolean (DNF): --cond/--and (AND in a group), --or (between groups), --not (per condition).
Per-predicate floors: doubled/bishop default to streak:2.
Scan paths: header-only (~10 ms) and BB_ONLY (~0.3 s / 10M; 6.48 ns/ply hot path) via MaintenanceNeeds + ScanPath. (full-hash path designed, not wired.)
Output: game-bitmap --output @name|PATH (PMOTE-BM); --input-bitmap prefilter (cascade-AND).

Tree (`explorer` + `tree.dat`)

Aggregate edge table persisted by indexer at finalize.
explorer <corpus.scoredb> --fen "FEN" performs FEN → position-hash → bucket scan → moves with W/D/B counts. UCI output today (SAN deferred).
“Explorer” is the chess-player term for a position-tree query tool.
10 M Lichess: tree.dat ~2.4 GB, query ~0.15 ms warm.
FIDE-2024+ EP semantics → transpositions aggregate correctly (e.g., 1.Nf3 d5 2.d4 ≡ 1.d4 d5 2.Nf3).

Bitmap algebra

bitmap-combine: info, count, and, or, xor, sub, not.
PMOTE-BM file format with header, shard table, dense bits.
Byte-identical validation across set operations.

Aggregation reducers (the OUTPUT half — heatmap + group-by)

The model is result = reduce(boolean-combo-of-(C,S,W)). Two of three reducers are built: the game-bitmap reducer (above) and aggregation, on a dedicated per-position scan path (scan_bb_aggregate, separate from the game-bitmap scan_bb_only). The aggregation path reduces over every matching position (the DNF-combined C+S predicate; no game-level window quantifier, no early-exit); game-level filtering stays the header prefilter + --input-bitmap.

square-heatmap (--heatmap): per matching position, increment a heat[color][type][square] table (per-shard u64, overflow-safe). Output PMOTE-HM (merged 2×6×64 u64 LE).
group-by (--group-by pawn-structure, --top-n K): bucket matching plies by a derived feature key via an open-addressing hash-map (key → {count, raw}), finalize to top-N. First feature kernel: pawn-structure signature (mix of the white/black pawn bitboards). Output PMOTE-GB. (histogram = same machinery over a small enum key; not yet exposed.)
Validated on the 10.16M corpus: with no predicate, heat[white-king] summed == group-by grand total == 685,863,447 (the true ply count); under --queens-off, both == 187,591,286. Reducers sample POST-move positions (top group = after 1.e4 e5). Game-bitmap path unchanged (killer 152,148). Perf: heatmap ~1.25 s, group-by ~1.66 s.
position-stream (--positions ref|fen|both, --positions-unique, --limit N): emit a record per matching position — a (shard,game,ply) reference, the board as a FEN (serialized from the board in hand), or both; --output writes PMOTE-PS (binary refs) and/or .fen text. The unbounded sibling of the GameSet reducer. Validated on 10M: --queens-off stream == 187,591,286, unique == 182,550,541 distinct, --limit exact. (All three OUTPUT-MODEL reducer families are now built.)

Validation (M4 Max, 10.16 M-game Lichess corpus)

Corpus: 10,164,006 games / 685,863,447 plies (~67.5 plies/game).
Indexing wall: ~1.79 s (drops 0-move games → 10,164,006 games).
meta total_moves now reports the true ply total (685,863,447) — previously a capped tree-tuple count (199,672,940); the scan/aggregate population matches it exactly.
engine_reset’s start-position init is now a function-local static (C++11 thread-safe), fixing a prior non-atomic lazy-init data race (torn reads under the per-shard worker threads).
Killer rook-endgame query (two --count conditions unioned): 152,148 matches, ~0.3 s per branch (BB_ONLY).
Header-only query (--result W = 5,056,391): ~10 ms.
Tree query (single position via explorer): ~0.15 ms warm.
Engine hot path: 6.48 ns/ply single-thread BB_ONLY.
All 6 standard perft positions pass.
CSW operators pinned by exact cross-checks: never == complement of ever; either + neither == corpus; inclusion-exclusion on white/black/either/both; in-engine OR == external bitmap-combine or (byte-identical); Ever(Became(P)) == Ever(P).

Engine optimization (2026-05-29 campaign)

An 8-commit hardening of query-engine.cpp, every change benchmark- or behavior-gated:

Hot path: the BB-scan gating / per-branch / full-Boolean machinery is monomorphized on a compile-time query shape (<HasGates, HasPbh>), recovering a +21% regression to neutral (queens-off ~0.06 s / 1M).
One scan driver: the game-bitmap and aggregate paths are unified under a single bb_replay driver with BitmapBody / AggregateBody policies (always_inline-flat — a policy over the hot loop costs 27% without it; the benchmark caught it, speculation would have shipped it).
One header parser (the flat CLI routes through build_header_pred), one output-path resolver, dead code removed — net −63 LOC vs pre-campaign.
Latent bug fixed: the PMOTE-HM heatmap truncated counts to u32 (silent overflow past ~60-100M games); widened to u64 (format v2).
Group-by hash measured healthy at scale (avg 1.82 probes over 56.7M keys).

Next — short term

In rough priority order. None are blocked; pick up the one that matches the next use case.

`libedge` extraction

Pull the C primitives out of the four CLI binaries into a shared library (libedge.a + headers). CLI tools become thin wrappers. Enables the daemon and WASM builds without parallel rewrites.

Estimated work: ~300 lines of refactoring + new headers, mostly mechanical.

Daemon

A minimal HTTP loopback service exposing the core operations as JSON-RPC or REST endpoints. Sketch in ARCHITECTURE.md.

Estimated work: ~500 lines for the HTTP shell + JSON encoding. Library-side work is already done; this is purely the integration shell.

Threads decoupled from shards

The indexer currently couples worker threads to shards (each thread writes one shard). For 4-core machines indexing 16-shard corpora, this creates light oversubscription. Switch to a work-queue model: fixed thread pool, chunks pulled by atomic counter, each chunk produces one shard.

Estimated work: ~80 lines for the work queue + chunk-to-buffer decoupling.

Fix indexer game-count drift — RESOLVED (2026-05-28)

The 30,933-game discrepancy was the zero-move (abandoned-before-move-1) games. The indexer now drops move_count==0 games at index, and the .gameidx / .moves / total_games counters are in lockstep (all 10,164,006). This also fixed a latent .gameidx inconsistency (0-move games previously got gameidx entries but weren’t counted).

Aggregation reducers (the OUTPUT half) — BUILT (2026-05-29)

The model is result = reduce(boolean-combo-of-(C,S,W)). square-heatmap (--heatmap) and group-by (--group-by pawn-structure) now run on a dedicated scan_bb_aggregate path, emitting PMOTE-HM / PMOTE-GB — see Built above. The flagship multi-scan analytical feature (extract pawn structures → frequency → where masters place pieces) is now expressible. The third reducer, position-stream (--positions ref|fen|both), also landed (2026-05-29) — all three OUTPUT-MODEL reducer families are now built.

Text-DSL parser (the human query surface) — BUILT (2026-05-29)

--query "<expr>" is a recursive-descent parser for the locked grammar (QUERY-LANGUAGE.md) — nested precedence, the full (C, S, W) condition form, then, value-set brackets — lowering directly to the same engine structs the flat CLI builds (Option 1, no JSON layer), sharing the predicate/header builders so the two surfaces can’t drift. Round-trip validated 9/9 against the flat flags. The canonical JSON AST interchange (Option 2) is deferred until a non-CLI consumer lands.

`then` (ordered gating) + per-branch headers + full-Boolean `C` — BUILT (2026-05-29)

then(P₁,…,Pₙ) adds ordering via single-pass predicate gating (a latched per-game fired-flag per condition; a gated condition counts only at/after its gate fires) — no automaton, no nesting. while/until were dropped: continuity tied to an event isn’t a chess need (streak covers run-length, AND covers co-occurrence). Per-branch headers (a header inside an OR-branch) make the killer union a single (C AND result:W) OR (C AND result:B) query — 152,148 bit-identical, one scan. And a condition’s composite C is now a full Boolean (AND/OR/NOT) over its predicate leaves, not just an AND.

WASM build

Compile libedge to WASM, expose a JS-friendly API. Acknowledge the 4 GB memory ceiling and async filesystem constraints in docs. Smaller- corpora tier; not the canonical delivery target.

Estimated work: ~200 lines of binding code + an Emscripten/wasi build setup. Filesystem access via OPFS or virtual mounting.

Next — medium term

Features that we’ve discussed but haven’t designed in detail.

Match-record streaming

For app-level aggregations beyond the canonical set, query-engine emits a stream of (shard_id, game_id_in_shard, ply, board_hash) per matching position. Apps consume the stream and aggregate however they want.

Open questions: streaming format (binary vs JSON-lines)? Include board hash or board snapshot? How does this interact with min-streak (emit on each matching ply, or only when the streak fires)?

Composable position sequences

Multi-position patterns within a single game. E.g., “find games where white has the bishop pair at one point AND later reaches a queenless endgame.” Currently expressed via bitmap intersection, but within-game sequences aren’t equivalent to game-set intersections (the ordering and same-game constraint matter).

Open design space; no concrete spec yet.

Position-derived opening classification

Instead of relying on the PGN [ECO] tag, classify each game’s opening by the actual moves played. Useful when corpora have missing or inconsistent ECO tags. Requires an opening-table data structure matched at index time.

Estimated work: ~500 lines once we have a position-table representation chosen.

Sparse bitmap encoding

For corpora with many stored very-sparse bitmaps (e.g., 500 named ECO buckets), dense bitmaps waste disk. Add Roaring or sorted-ID encoding alongside dense, choose per-bitmap at write time based on density. Reader auto-detects via a header flag.

Deferred until disk pressure becomes real; the trigger would be ≥30 stored bitmaps with most under 5% density.

Negative position predicates

Sub-FEN currently expresses inclusion (“this square has this piece”); no clean way to express exclusion (“no white pawn on this file”). Workarounds via piece-count rules cover some cases but not all.

Design path: extend sub-FEN with a ! prefix for negation, or add a companion --no-piece-on FILE PIECE flag. Both have ergonomic tradeoffs; defer until a real use case forces the decision.

SAN output in `explorer`

Currently emits UCI moves (e2e4, e7e8q). UCI is unambiguous and trivial to decode from SMove16; SAN needs full legal-move generation + disambiguation logic (~80 lines). Defer until a real consumer asks for SAN output specifically.

Per-bucket sort in `tree.dat`

Currently the linear scan over ~73 K edges per bucket is ~0.15 ms warm. Sorting buckets by prev_hash at write time would drop this to microseconds via binary search. Not worth doing until a workload demands it; 0.15 ms is already faster than any UI cares about.

`score-extract` utility

Walk a bitmap and dump matching games’ PGN text from the source PGN to stdout (or to a new PGN file). Closes the loop on “find these games → see them.”

Estimated work: ~150 lines. Uses pgn_offset from each record’s header.

Folder-mode ingest

Index a directory of multiple PGN files into one .scoredb. Either concat the PGNs into one logical stream during indexing, or process N inputs into N corpora. Decision deferred until a real use case clarifies the desired shape.

Decided against (with reasons)

`--aggregate branch-tree` flag in `query-engine`

Why not: The tree builder already does this aggregation during indexing. Filtered branch trees are the same machinery run over a filtered subset of games (via input bitmap). Adding a flag to the query engine for this would be redundant. Build the filtered-tree runner instead.

Era aliases (`--modern-era`, `--engine-era`, etc.)

Why not: Year ranges are clear enough as primitives. --year 1985-2010 is no harder than --modern-era, and era boundaries are arbitrary. Reviewed at the alias-design discussion; user-vetoed.

Opening-name aliases (`--sicilian`, `--french`, etc.)

Why not: The full set would balloon (every ECO chapter has its own opening name) and the mappings are fuzzy (Sicilian vs Sicilian sub-lines vs transposition cases). Year/ECO ranges suffice for now. Will revisit if a clean named-opening registry emerges.

Material macros (`--queens-off`, …) — REVERSED, now built

Originally declined in favor of --count. Reversed: named position predicates (--queens-off, --bishop-pair, --doubled-pawn, …) are now the readable, discoverable surface, with --count EXPR as the power tool underneath. Both coexist — named predicates encode chess concepts (e.g. bishop_pair is the comparative advantage, not just “two bishops”) that --count can’t express.

Built-in `--rp-endgame-*` rules

Removed. The original implementation baked in five rook-pawn-endgame predicates (entry-strict + permissive variants). Replaced by composition of --count primitives. Validated to match exactly on the permissive variants; entry-strict semantics were experimental and unused.

Multi-bitmap archive format (`.bix`)

Deferred. Initially considered for storing many precomputed bitmaps in a single file. The directory-as-database approach (the .scoredb form) achieves the same outcome — bitmaps live in bitmaps/ as individual files. Archive format adds a directory layer we don’t need.

Roaring bitmaps as default encoding

Deferred. For our hot path (prefilter scan with bit-per-game test), dense bitmaps are faster (cache-friendly, branch-predictable). Roaring would only earn its keep for very-sparse stored bitmaps with disk pressure, which we don’t have yet. Format reserves an upgrade path.

Putting the C engine inside Tabia (as WASM addon)

Decided against. The runtime mismatch (single-game JS vs batch C), the 4 GB WASM ceiling, the threading complexity, and the data-shape mismatch (move tree with annotations vs linear SAN stream) all argue against unification. Edge is a sibling stack project, not a Tabia module. Apps compose both at the app boundary.

Persisting ECO bitmaps at index time

Decided against. ECO is one uint16_t per game in the header; a scan filter takes ~50 ms over 10 M games. Pre-materializing 500 ECO bitmaps would consume disk for negligible speedup. Header-based filters stay header-based.

Generic OR composition inside `query-engine` — REVERSED, now built

Originally decided against (OR at the bitmap layer via bitmap-combine). Reversed once the CSW condition model landed: game-level OR/NOT over conditions (DNF: --or / --not) is now in the engine, byte-identical to the external bitmap-combine or, and avoids the 2-scan-then-combine round-trip for the common disjunctive query. bitmap-combine is still the path for composing persisted bitmaps; per-branch-header unions (headers inside OR-branches) are now in-engine too — a single DNF query.

Open architectural questions

These need a decision but don’t block current work.

Daemon API protocol

REST vs JSON-RPC vs WebSocket-streaming? Probably hybrid: REST for synchronous operations, WebSocket for long-running ones (indexing progress, large scan results).

Tree edge encoding for sparse positions

Most positions in the tree have very few outgoing edges (1–10 for the median game). Tight encoding matters for cache density at lookup time. Choose between sorted-array-per-position vs hash-table-per-position vs trie. Defer until the persistence work.

`libedge` ABI stability

Once we have downstream consumers (daemon, WASM, Node bindings, …), ABI breaks become costly. Versioning policy + header organization matters. Decide before the extraction lands.

Cross-machine `.scoredb` portability

Endianness is the obvious one (all data is little-endian per spec). Less obvious: shard count is a property of the corpus, not the machine — so a .scoredb built on one machine should run on another identically. Validate this once decoupled threading lands.

Versioning

Edge versions independently from PROMOTE’s JS projects. The internal data formats (scoredb v1, PMOTE-BM v1) are stable; bumping a format version requires re-indexing existing corpora.

API breaks in libedge will follow semver once the extraction lands and we ship a 1.0.