Edge
High-performance chess data primitives. The batch tier of the PROMOTE stack.
Edge — a positional advantage in chess; the edges of a move tree; edge computing, because everything runs on the user’s machine.
A native C toolkit for indexing and querying large PGN collections. Built around two complementary engines:
- A tree builder that aggregates every position reached across millions of games into a queryable move tree (the classic “opening explorer” data structure).
- A scan engine that evaluates rich per-position predicates (sub-FEN patterns, piece-count comparators, structural conditions) against arbitrary game subsets and emits set-of-game-ID bitmaps.
Both engines share a single on-disk corpus format (.scoredb), produced
in one PGN→corpus pass.
Why Edge exists
Section titled “Why Edge exists”The rest of the PROMOTE stack — Tabia, Motif, Rabbit — is built around one game at a time: parse a PGN, attach annotations, walk a move tree with variations, render a board, listen for clicks. It’s interactive. It’s JS. It’s correct for what it does.
But chess apps also need to answer questions like:
- “In all 10 million games this month, where does white put knights in the Carlsbad pawn structure?”
- “Which master games reached a rook endgame where the side with fewer pawns won?”
- “What does the world play after 1. e4 c5 2. Nf3 d6?”
These are batch questions over millions of games. JavaScript on a single thread can’t keep up. Edge fills that gap with optimized, multi-threaded, native C code, while preserving PROMOTE’s central thesis: world-class primitives so developers don’t have to reinvent the wheel.
Position in the stack
Section titled “Position in the stack”graph TD APPS["Apps<br/>research tools · opening prep · content sites"] APPS --> JS["PROMOTE — JS runtime<br/>Tabia · Motif · Rabbit · Priyome · OpenFile<br/>single-game · interactive · per-tab (browser)"] APPS --> EDGE["Edge — native runtime<br/>indexer · scan · tree · combine<br/>libedge / CLI / daemon / WASM"] classDef edge fill:#2563eb,color:#fff,stroke:#93c5fd,stroke-width:2px; class EDGE edge;Edge is a sibling to the JS projects, not a layer above or below. Apps that need both compose them at the app boundary:
Use Edge to find the games you care about → load those games via Tabia → render with Motif/Rabbit.
Edge never reaches into Tabia; Tabia never depends on Edge. They meet in app code.
What’s in the toolkit
Section titled “What’s in the toolkit”| Tool | Role |
|---|---|
indexer | PGN → .scoredb directory (.moves records + dict + per-game offsets + aggregate tree). |
query-engine | Scan engine. MaintenanceNeeds analyzer + ScanPath dispatcher. Built: header predicates, the full CSW model (full-Boolean C, then gating, value-sets, per-branch headers), the --query text-DSL front-end, and all three output reducers (game-bitmap PMOTE-BM, aggregation PMOTE-HM/PMOTE-GB, position-stream PMOTE-PS/.fen). See Status. |
explorer | Tree query. FEN → position hash → bucket scan over tree.dat; W/D/B counts per move. |
bitmap-combine | Set algebra over bitmap files (AND, OR, NOT, XOR, SUB). |
Each is a thin CLI shell around position.hpp (the chess kernel — Board,
FEN/SAN parsers, make/unmake, NEON movegen) and a small set of file-format
primitives for .scoredb / .moves / tree.dat / PMOTE-BM. The
libedge extraction is on the roadmap as a prerequisite for the
eventual daemon / WASM frontends; currently the format knowledge is
duplicated across binaries.
Concepts at a glance
Section titled “Concepts at a glance”A .scoredb is a directory containing one indexed corpus:
foo.20260524-153045.scoredb/├── meta key=value text: version, num_shards, etc.├── tree.dat aggregate edge table (parent_hash → moves + W/D/B counts)├── shards/│ ├── 0.moves per-shard game records (36-byte header + 16-bit moves)│ ├── 0.dict per-shard player-name dictionary│ ├── 0.gameidx per-game byte offsets into 0.moves (uint64 per game)│ ├── 1.moves│ └── ...└── bitmaps/ set-of-game-ID bitmaps (named, optional) ├── sicilian.bm └── killer.bmA bitmap is a set of matched game IDs, packed dense, sharded to match the corpus. Bitmaps are the connective tissue between queries — the output of one scan becomes the input to the next.
A rule is a per-position or per-game predicate evaluated during the scan. Position rules AND together under a shared streak window; game-level filters apply before the position scan.
Quick start
Section titled “Quick start”Indexing:
$ indexer lichess_2017-02.pgn# → lichess_2017-02.20260524-153045.scoredb/ (16 shards, ~1.9 s for 10 M games on M4 Max)Tree query (chess-player’s “explorer”):
$ explorer lichess_2017-02.20260524-153045.scoredb \ --fen "r1bqkbnr/pppp1ppp/2n5/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R b KQkq - 3 3"# a7a6 84669 games (W:50.4% D:4.7% B:44.9%)# d7d6 56883 games (W:51.9% D:4.8% B:43.3%)# g8f6 49270 games (W:52.5% D:4.5% B:43.0%)# ...# (default FEN is the starting position)Query with predicates. Header filters (result, ECO, ELO, year,
time-control, player names) run on the ~7 ms header-only path; the full
CSW position-predicate model, game-bitmap output (--output, PMOTE-BM),
and the aggregation reducers (--heatmap/--group-by) are all built —
see Status:
$ query-engine lichess_2017-02.20260524-153045.scoredb --result W# header-only path → ~7 ms for 10 M games (prints a match count)
$ query-engine CORPUS.scoredb --queens-off --heatmap# aggregation path → square-heatmap over every matching position (~1.25 s)See USE-CASES.md for worked examples.
Performance reference
Section titled “Performance reference”Numbers measured on an M4 Max, 12 cores, 10M-game Lichess corpus
(~9.3 GB PGN, ~1.74 GB .moves + ~80 MB .gameidx + 2.4 GB tree.dat
after indexing). All warm-cache:
| Operation | Time |
|---|---|
Index PGN → .scoredb (16 shards, incl. tree.dat write) | ~1.79 s |
Tree query (single position via explorer) | ~0.15 ms |
Header-only metadata filter (e.g. --result W) | ~7 ms |
Aggregation: square-heatmap (--heatmap, full corpus) | ~1.25 s |
Aggregation: group-by (--group-by pawn-structure, full corpus) | ~1.66 s |
bitmap-combine and/or/not/xor/sub (set algebra over .bm) | ~10 ms |
Position-scan numbers (current
query-engine, 10.16M corpus, BB_ONLY replay):--queens-off~0.30 s · killer branch (4--countpreds +--result+--min-streak 5) ~0.30 s.--input-bitmapprefiltered scans are built; the main predicate still to add is--subfen(arbitrary sub-position matching).
Move-replay scan rate: ~1.87 B plies/sec (the engine hot path is 6.48 ns/ply BB_ONLY). For comparison, Lichess Explorer serves opening-tree lookups in tens of milliseconds over a small fixed tree; Edge serves the same lookup in 0.15 ms over the full corpus tree, AND can produce arbitrary tree subsets over arbitrary filtered game sets in the same envelope.
Docs in this directory
Section titled “Docs in this directory”Architecture & format
- EXECUTION-MODEL.md — the conceptual spine: a
scan as a single-pass stateful fold (
mapAccum) over the ply trace, in five stages (measure → test → track → combine → reduce). The frame the predicate, query, and output models all hang off; a reducer is stage 5. - ARCHITECTURE.md — the two engines, their data flow, the indexer, delivery targets, what’s in/out of scope.
- DATA-FORMAT.md — wire-level reference for
.scoredb,.moves,.dict,.gameidx,tree.dat,.bm, the 16-bit move encoding.
The query language (spec — in progress)
- PREDICATE-LANGUAGE.md — type system, composition rules, modifier vocabulary, canonical AST grammar. The source-of-truth abstraction every frontend (CLI, GUI, NLP) targets.
- OUTPUT-MODEL.md — the output/reducer axis: one map (predicate) + a pluggable reduce (game-bitmap / position-stream / aggregation); how quantifiers and set-algebra fit. The tie-together piece.
- PREDICATE-LIBRARY.md — the named chess
primitives (
bishop_pair,doubled_pawn, etc.) the language composes. Each entry: type, signature, bitboard expression, cost. - OPENING-ALIASES.md — opening-name → ECO
resolution table (
"Sicilian Najdorf"→ B90-B99) with resolution rules. Chess-domain reference data. - PLANNER.md — how
query-engineturns a predicate AST into a fast scan: tiered ordering, engine-state maintenance, materialization, set-algebra dispatch. - QUERY-LANGUAGE.md — the CLI surface today; flag → canonical-AST translation table. CLI is one frontend among several planned (GUI, NLP).
- FORMAL-BASIS.md — the logic underneath the query language (propositional + finite-FO side sugar + LTLf over the ply trace); a correctness oracle + completeness checklist, not a dependency.
Recipes & roadmap
- USE-CASES.md — worked recipes: opening exploration, the killer query, aggregations the right way, bitmap chaining.
- ROADMAP.md — built / next / deferred, plus the decision log for “we considered X and chose Y because Z.”
Status
Section titled “Status”Prototype, validated at 10M-game scale. The C source currently
lives at ../experiments/c-explorer/
while the spec stabilizes; once we’re happy with the design, it
moves into this directory. Treat the path migration as a low-effort
follow-up, not a redesign.
This section is the authoritative implementation status. The spec docs (PREDICATE-LANGUAGE, PREDICATE-LIBRARY, PLANNER, QUERY-LANGUAGE) describe target design; where they and this section disagree, this section is what actually runs today (verified against the code 2026-05-28).
What’s built and exercised:
- Indexer producing
.scoredbwith tiered shard counts; persistent aggregate edge table (tree.dat); the chess kernel inposition.hpp(NEON-batched movegen, FIDE-2024+ EP semantics) - Query-engine —
MaintenanceNeedsanalyzer +ScanPathdispatcher. Header predicates (~20: result, ECO, ELO bands, year, time-control, termination, player names, per-shard name resolution), AND-chained, header-only path ~10 ms / 10M. Position predicates landed (2026-05-28):--queens-off,--bishop-pair white|black, and--countmaterial expressions ("QBNqbn=0","R=r","P<p","B>=2"), plus side-parameterized--doubled-pawn,--isolated-pawn,--passed-pawn,--rook-on-seventh,--rook-on-open-file,--king-castled— composed under the CSW condition model (per-condition side + window quantifiers; multiple conditions AND’d) on the live BB_ONLY path (~0.3 s / 10M, 2.3 B plies/s). Game-bitmap output via--output @name|PATH(PMOTE-BM; interoperates withbitmap-combine), and an--input-bitmapprefilter (cascade-AND; ~38× on a small input set). Validated: the killer rook-endgame query (two--countscans unioned viabitmap-combine) reproduces the historical 152,148 exactly on the 10M corpus. Aggregation reducers landed (2026-05-29): a dedicatedscan_bb_aggregatepath reduces over every matching position —--heatmap(square-heatmap → PMOTE-HM) and--group-by pawn-structure --top-n K(→ PMOTE-GB). Validated: no-predicateheat[white-king]sum == group-by total == 685,863,447 (the corpus’s true ply count);--queens-offboth == 187,591,286. Perf: heatmap ~1.25 s, group-by ~1.66 s. Position-output reducer landed (2026-05-29): on the samescan_bb_aggregatepath —--positions ref|fen|both(emit a(shard,game,ply)reference / a FEN serialized from the board in hand / both),--positions-unique(dedup to distinct positions by a position-identity hash), and--limit N;--outputwrites PMOTE-PS (binary refs) and/or.fentext, and a summary always prints. Validated on the 10M corpus:--queens-offstreamcount == the aggregators’ 187,591,286;unique== 182,550,541 distinct;--limit 1000== exactly 1000;fenoutput round-trips through the engine loader (no queens, one king/side). Perf: count ~1.3 s,unique~2.5 s. - Query language — the human surface (2026-05-29):
--query "<expr>"parses the full locked grammar (nested precedence, the(C, S, W)condition form,thenordered gating, value-set bracketsfield:[A SUB B], per-branch headers, full-BooleanC) and lowers to the same engine structs the flat CLI builds — one set of builders, no drift; round-trip 9/9 vs the flat flags. The killer union now runs as one per-branch query (152,148, bit-identical, one scan). - Engine optimization (2026-05-29, 8 commits): the BB-scan gating/per-branch
machinery monomorphized on a compile-time query shape; both scan paths unified
under one
bb_replaydriver; one header parser; dead code removed (net −63 LOC). Fixed a latent bug — the on-disk heatmap truncated counts to u32 (v1), now u64 (v2). Every change benchmark- or behavior-gated. - Explorer: FEN → move-list with W/D/B counts at ~0.15 ms per position
- Bitmap-combine — set algebra (and/or/xor/sub/not) over PMOTE-BM
.bmfiles. Fed byquery-engine’s--output(byte-identical to the v1 writer’s PMOTE-BM) and the source for--input-bitmap. - 10M-game corpus validated end-to-end: indexer, header query, tree,
and the killer rook-endgame query reproducing 152,148 on the
current
query-engine(position preds + bitmap output + abitmap-combineunion) - All 6 standard perft positions pass after every engine change
What’s designed but not yet built:
- More position predicates:
--subfen, exact-set--material "KRkr", and the rest of PREDICATE-LIBRARY (knight outposts, backward pawns, pawn chains, …). The predicate framework is proven; this is vocabulary expansion. (Built so far: queens-off, bishop-pair, doubled/isolated/passed pawns, rook-on-7th, rook-on-open-file, king-castled,--countmaterial exprs, and thebecame/ceasededge wrappers.) - Opening-name literals in value-sets (
Opening:[Sicilian]→ ECO ranges) and the remaining derived conveniences (between:, …) wired through the parser. - The canonical JSON AST interchange (Option 2 — the machine interchange for non-CLI frontends; the engine runs today via direct text→struct lowering).
- The
libedgeextraction; daemon (HTTP loopback) and WASM builds. - SAN output in
explorer(currently UCI). - Threads decoupled from shards (work-queue model).
See ROADMAP.md for the running list.
License
Section titled “License”Same conventions as other PROMOTE projects — currently private/personal; release license TBD.