Skip to content

Edge

High-performance chess data primitives. The batch tier of the PROMOTE stack.

Edge — a positional advantage in chess; the edges of a move tree; edge computing, because everything runs on the user’s machine.

A native C toolkit for indexing and querying large PGN collections. Built around two complementary engines:

  • A tree builder that aggregates every position reached across millions of games into a queryable move tree (the classic “opening explorer” data structure).
  • A scan engine that evaluates rich per-position predicates (sub-FEN patterns, piece-count comparators, structural conditions) against arbitrary game subsets and emits set-of-game-ID bitmaps.

Both engines share a single on-disk corpus format (.scoredb), produced in one PGN→corpus pass.

The rest of the PROMOTE stack — Tabia, Motif, Rabbit — is built around one game at a time: parse a PGN, attach annotations, walk a move tree with variations, render a board, listen for clicks. It’s interactive. It’s JS. It’s correct for what it does.

But chess apps also need to answer questions like:

  • “In all 10 million games this month, where does white put knights in the Carlsbad pawn structure?”
  • “Which master games reached a rook endgame where the side with fewer pawns won?”
  • “What does the world play after 1. e4 c5 2. Nf3 d6?”

These are batch questions over millions of games. JavaScript on a single thread can’t keep up. Edge fills that gap with optimized, multi-threaded, native C code, while preserving PROMOTE’s central thesis: world-class primitives so developers don’t have to reinvent the wheel.

graph TD
APPS["Apps<br/>research tools · opening prep · content sites"]
APPS --> JS["PROMOTE — JS runtime<br/>Tabia · Motif · Rabbit · Priyome · OpenFile<br/>single-game · interactive · per-tab (browser)"]
APPS --> EDGE["Edge — native runtime<br/>indexer · scan · tree · combine<br/>libedge / CLI / daemon / WASM"]
classDef edge fill:#2563eb,color:#fff,stroke:#93c5fd,stroke-width:2px;
class EDGE edge;

Edge is a sibling to the JS projects, not a layer above or below. Apps that need both compose them at the app boundary:

Use Edge to find the games you care about → load those games via Tabia → render with Motif/Rabbit.

Edge never reaches into Tabia; Tabia never depends on Edge. They meet in app code.

ToolRole
indexerPGN → .scoredb directory (.moves records + dict + per-game offsets + aggregate tree).
query-engineScan engine. MaintenanceNeeds analyzer + ScanPath dispatcher. Built: header predicates, the full CSW model (full-Boolean C, then gating, value-sets, per-branch headers), the --query text-DSL front-end, and all three output reducers (game-bitmap PMOTE-BM, aggregation PMOTE-HM/PMOTE-GB, position-stream PMOTE-PS/.fen). See Status.
explorerTree query. FEN → position hash → bucket scan over tree.dat; W/D/B counts per move.
bitmap-combineSet algebra over bitmap files (AND, OR, NOT, XOR, SUB).

Each is a thin CLI shell around position.hpp (the chess kernel — Board, FEN/SAN parsers, make/unmake, NEON movegen) and a small set of file-format primitives for .scoredb / .moves / tree.dat / PMOTE-BM. The libedge extraction is on the roadmap as a prerequisite for the eventual daemon / WASM frontends; currently the format knowledge is duplicated across binaries.

A .scoredb is a directory containing one indexed corpus:

foo.20260524-153045.scoredb/
├── meta key=value text: version, num_shards, etc.
├── tree.dat aggregate edge table (parent_hash → moves + W/D/B counts)
├── shards/
│ ├── 0.moves per-shard game records (36-byte header + 16-bit moves)
│ ├── 0.dict per-shard player-name dictionary
│ ├── 0.gameidx per-game byte offsets into 0.moves (uint64 per game)
│ ├── 1.moves
│ └── ...
└── bitmaps/ set-of-game-ID bitmaps (named, optional)
├── sicilian.bm
└── killer.bm

A bitmap is a set of matched game IDs, packed dense, sharded to match the corpus. Bitmaps are the connective tissue between queries — the output of one scan becomes the input to the next.

A rule is a per-position or per-game predicate evaluated during the scan. Position rules AND together under a shared streak window; game-level filters apply before the position scan.

Indexing:

Terminal window
$ indexer lichess_2017-02.pgn
# → lichess_2017-02.20260524-153045.scoredb/ (16 shards, ~1.9 s for 10 M games on M4 Max)

Tree query (chess-player’s “explorer”):

Terminal window
$ explorer lichess_2017-02.20260524-153045.scoredb \
--fen "r1bqkbnr/pppp1ppp/2n5/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R b KQkq - 3 3"
# a7a6 84669 games (W:50.4% D:4.7% B:44.9%)
# d7d6 56883 games (W:51.9% D:4.8% B:43.3%)
# g8f6 49270 games (W:52.5% D:4.5% B:43.0%)
# ...
# (default FEN is the starting position)

Query with predicates. Header filters (result, ECO, ELO, year, time-control, player names) run on the ~7 ms header-only path; the full CSW position-predicate model, game-bitmap output (--output, PMOTE-BM), and the aggregation reducers (--heatmap/--group-by) are all built — see Status:

Terminal window
$ query-engine lichess_2017-02.20260524-153045.scoredb --result W
# header-only path → ~7 ms for 10 M games (prints a match count)
$ query-engine CORPUS.scoredb --queens-off --heatmap
# aggregation path → square-heatmap over every matching position (~1.25 s)

See USE-CASES.md for worked examples.

Numbers measured on an M4 Max, 12 cores, 10M-game Lichess corpus (~9.3 GB PGN, ~1.74 GB .moves + ~80 MB .gameidx + 2.4 GB tree.dat after indexing). All warm-cache:

OperationTime
Index PGN → .scoredb (16 shards, incl. tree.dat write)~1.79 s
Tree query (single position via explorer)~0.15 ms
Header-only metadata filter (e.g. --result W)~7 ms
Aggregation: square-heatmap (--heatmap, full corpus)~1.25 s
Aggregation: group-by (--group-by pawn-structure, full corpus)~1.66 s
bitmap-combine and/or/not/xor/sub (set algebra over .bm)~10 ms

Position-scan numbers (current query-engine, 10.16M corpus, BB_ONLY replay): --queens-off ~0.30 s · killer branch (4 --count preds + --result + --min-streak 5) ~0.30 s. --input-bitmap prefiltered scans are built; the main predicate still to add is --subfen (arbitrary sub-position matching).

Move-replay scan rate: ~1.87 B plies/sec (the engine hot path is 6.48 ns/ply BB_ONLY). For comparison, Lichess Explorer serves opening-tree lookups in tens of milliseconds over a small fixed tree; Edge serves the same lookup in 0.15 ms over the full corpus tree, AND can produce arbitrary tree subsets over arbitrary filtered game sets in the same envelope.

Architecture & format

  • EXECUTION-MODEL.md — the conceptual spine: a scan as a single-pass stateful fold (mapAccum) over the ply trace, in five stages (measure → test → track → combine → reduce). The frame the predicate, query, and output models all hang off; a reducer is stage 5.
  • ARCHITECTURE.md — the two engines, their data flow, the indexer, delivery targets, what’s in/out of scope.
  • DATA-FORMAT.md — wire-level reference for .scoredb, .moves, .dict, .gameidx, tree.dat, .bm, the 16-bit move encoding.

The query language (spec — in progress)

  • PREDICATE-LANGUAGE.md — type system, composition rules, modifier vocabulary, canonical AST grammar. The source-of-truth abstraction every frontend (CLI, GUI, NLP) targets.
  • OUTPUT-MODEL.md — the output/reducer axis: one map (predicate) + a pluggable reduce (game-bitmap / position-stream / aggregation); how quantifiers and set-algebra fit. The tie-together piece.
  • PREDICATE-LIBRARY.md — the named chess primitives (bishop_pair, doubled_pawn, etc.) the language composes. Each entry: type, signature, bitboard expression, cost.
  • OPENING-ALIASES.md — opening-name → ECO resolution table ("Sicilian Najdorf" → B90-B99) with resolution rules. Chess-domain reference data.
  • PLANNER.md — how query-engine turns a predicate AST into a fast scan: tiered ordering, engine-state maintenance, materialization, set-algebra dispatch.
  • QUERY-LANGUAGE.md — the CLI surface today; flag → canonical-AST translation table. CLI is one frontend among several planned (GUI, NLP).
  • FORMAL-BASIS.md — the logic underneath the query language (propositional + finite-FO side sugar + LTLf over the ply trace); a correctness oracle + completeness checklist, not a dependency.

Recipes & roadmap

  • USE-CASES.md — worked recipes: opening exploration, the killer query, aggregations the right way, bitmap chaining.
  • ROADMAP.md — built / next / deferred, plus the decision log for “we considered X and chose Y because Z.”

Prototype, validated at 10M-game scale. The C source currently lives at ../experiments/c-explorer/ while the spec stabilizes; once we’re happy with the design, it moves into this directory. Treat the path migration as a low-effort follow-up, not a redesign.

This section is the authoritative implementation status. The spec docs (PREDICATE-LANGUAGE, PREDICATE-LIBRARY, PLANNER, QUERY-LANGUAGE) describe target design; where they and this section disagree, this section is what actually runs today (verified against the code 2026-05-28).

What’s built and exercised:

  • Indexer producing .scoredb with tiered shard counts; persistent aggregate edge table (tree.dat); the chess kernel in position.hpp (NEON-batched movegen, FIDE-2024+ EP semantics)
  • Query-engineMaintenanceNeeds analyzer + ScanPath dispatcher. Header predicates (~20: result, ECO, ELO bands, year, time-control, termination, player names, per-shard name resolution), AND-chained, header-only path ~10 ms / 10M. Position predicates landed (2026-05-28): --queens-off, --bishop-pair white|black, and --count material expressions ("QBNqbn=0", "R=r", "P<p", "B>=2"), plus side-parameterized --doubled-pawn, --isolated-pawn, --passed-pawn, --rook-on-seventh, --rook-on-open-file, --king-castled — composed under the CSW condition model (per-condition side + window quantifiers; multiple conditions AND’d) on the live BB_ONLY path (~0.3 s / 10M, 2.3 B plies/s). Game-bitmap output via --output @name|PATH (PMOTE-BM; interoperates with bitmap-combine), and an --input-bitmap prefilter (cascade-AND; ~38× on a small input set). Validated: the killer rook-endgame query (two --count scans unioned via bitmap-combine) reproduces the historical 152,148 exactly on the 10M corpus. Aggregation reducers landed (2026-05-29): a dedicated scan_bb_aggregate path reduces over every matching position — --heatmap (square-heatmap → PMOTE-HM) and --group-by pawn-structure --top-n K (→ PMOTE-GB). Validated: no-predicate heat[white-king] sum == group-by total == 685,863,447 (the corpus’s true ply count); --queens-off both == 187,591,286. Perf: heatmap ~1.25 s, group-by ~1.66 s. Position-output reducer landed (2026-05-29): on the same scan_bb_aggregate path — --positions ref|fen|both (emit a (shard,game,ply) reference / a FEN serialized from the board in hand / both), --positions-unique (dedup to distinct positions by a position-identity hash), and --limit N; --output writes PMOTE-PS (binary refs) and/or .fen text, and a summary always prints. Validated on the 10M corpus: --queens-off stream count == the aggregators’ 187,591,286; unique == 182,550,541 distinct; --limit 1000 == exactly 1000; fen output round-trips through the engine loader (no queens, one king/side). Perf: count ~1.3 s, unique ~2.5 s.
  • Query language — the human surface (2026-05-29): --query "<expr>" parses the full locked grammar (nested precedence, the (C, S, W) condition form, then ordered gating, value-set brackets field:[A SUB B], per-branch headers, full-Boolean C) and lowers to the same engine structs the flat CLI builds — one set of builders, no drift; round-trip 9/9 vs the flat flags. The killer union now runs as one per-branch query (152,148, bit-identical, one scan).
  • Engine optimization (2026-05-29, 8 commits): the BB-scan gating/per-branch machinery monomorphized on a compile-time query shape; both scan paths unified under one bb_replay driver; one header parser; dead code removed (net −63 LOC). Fixed a latent bug — the on-disk heatmap truncated counts to u32 (v1), now u64 (v2). Every change benchmark- or behavior-gated.
  • Explorer: FEN → move-list with W/D/B counts at ~0.15 ms per position
  • Bitmap-combine — set algebra (and/or/xor/sub/not) over PMOTE-BM .bm files. Fed by query-engine’s --output (byte-identical to the v1 writer’s PMOTE-BM) and the source for --input-bitmap.
  • 10M-game corpus validated end-to-end: indexer, header query, tree, and the killer rook-endgame query reproducing 152,148 on the current query-engine (position preds + bitmap output + a bitmap-combine union)
  • All 6 standard perft positions pass after every engine change

What’s designed but not yet built:

  • More position predicates: --subfen, exact-set --material "KRkr", and the rest of PREDICATE-LIBRARY (knight outposts, backward pawns, pawn chains, …). The predicate framework is proven; this is vocabulary expansion. (Built so far: queens-off, bishop-pair, doubled/isolated/passed pawns, rook-on-7th, rook-on-open-file, king-castled, --count material exprs, and the became/ceased edge wrappers.)
  • Opening-name literals in value-sets (Opening:[Sicilian] → ECO ranges) and the remaining derived conveniences (between:, …) wired through the parser.
  • The canonical JSON AST interchange (Option 2 — the machine interchange for non-CLI frontends; the engine runs today via direct text→struct lowering).
  • The libedge extraction; daemon (HTTP loopback) and WASM builds.
  • SAN output in explorer (currently UCI).
  • Threads decoupled from shards (work-queue model).

See ROADMAP.md for the running list.

Same conventions as other PROMOTE projects — currently private/personal; release license TBD.