Architecture
Edge is two engines on one corpus. This document covers what each engine does, how they share state, and where the architectural seams are. For wire-level format details see DATA-FORMAT.md; for the user-facing query surface see QUERY-LANGUAGE.md.
Top-level data flow
Section titled “Top-level data flow”graph TD PGN["PGN corpus<br/>millions of games"] --> IDX["Indexer<br/>one PGN-parse pass"] IDX -->|"per-game .moves records"| SCAN["Scan Engine<br/>per-position predicate matching"] IDX -->|"per-move tuples"| TREE["Tree Builder<br/>MapReduce to tree.dat"] SCAN -->|"game-ID bitmap (.bm)"| BM["Bitmap composition,<br/>position-pattern search"] TREE -->|"position to moves + W/D/L"| EXP["Opening explorer,<br/>branch analytics<br/>(~0.15 ms warm)"] BM -.->|"filtered subset"| TREE classDef engine fill:#2563eb,stroke:#93c5fd,stroke-width:2px,color:#fff; class SCAN,TREE engine;A single PGN-parse pass produces both the per-game records used by the scan engine and the per-move tuples used by the tree builder. The two engines see the same world; they just project it differently.
The Indexer
Section titled “The Indexer”indexer is the single PGN→.scoredb pass. It:
-
Mmaps the PGN and partitions the byte range into N chunks (one per shard).
find_game_boundaryadvances each chunk start to the next[Event "...]header so games aren’t split. -
In parallel, per chunk:
- Parses the PGN headers (Result, Termination, ECO, ratings, time control, date, player names).
- Replays moves through the v10 bitboard engine (mailbox + templated make_move + state-machine SAN parser), producing per-position Zobrist hashes with FIDE-2024+ EP semantics.
- Emits a per-game
.movesrecord (36-byte header + 16-bit SMove16 stream) into a per-shard buffer. - Emits per-move tuples (parent_hash, move_id, result_bits) into 1024 hash-keyed spill buckets.
- Interns player names into a per-shard string dictionary.
-
Sequential phase: writes the per-shard scoredb files:
metafile (key=value: version, num_shards, total_games, etc.)shards/N.moves(the game records)shards/N.dict(the name dictionary, newline-separated)shards/N.gameidx(uint64 byte offsets per game)bitmaps/(created empty)
-
Reduce phase: aggregates the per-bucket tuples across all shards into a unified edge table (one row per unique
(parent_hash, move_id)with summed W/D/B counts). Written totree.datat the scoredb root for querying byexplorer.
Indexing is roughly (M4 Max, 10 M Lichess — 10,164,006 games after the indexer drops zero-move games):
- Map phase: ~1.50 s (engine + tokenize + spill).
- Score (.moves write) phase: ~0.07 s (1.74 GB of
.movesdata). - Reduce + tree.dat write: ~0.20 s.
- Total wall: ~1.79 s.
Sharding
Section titled “Sharding”Shard count is a property of the corpus, not the machine. Default tiers by PGN size:
| PGN size | Default shards | Per-shard data |
|---|---|---|
| < 200 MB | 1 | up to ~35 MB |
| 200 MB – 2 GB | 4 | ~5 – 90 MB |
| 2 – 20 GB | 16 | ~50 – 280 MB |
| 20 – 200 GB | 32 | ~280 MB – 1.4 GB |
| > 200 GB | 64 | as needed |
--shards N overrides the default. A .scoredb indexed on one machine
behaves the same when read on any other: same shard count, same record
boundaries.
Threading model (current and planned)
Section titled “Threading model (current and planned)”Currently: nthreads == nshards. Each thread writes its own shard.
On a 12-core machine indexing a corpus that wants 16 shards, four
threads do double duty (kernel time-slices).
Planned: decouple threads from shards. A work queue dispatches chunks to a fixed pool of worker threads (capped at available cores); each chunk becomes one shard. This lets a 4-core laptop index a 16-shard corpus efficiently. The on-disk artifact is identical either way.
The Tree Builder
Section titled “The Tree Builder”The tree builder runs as the second half of the indexer’s MapReduce
and persists its output as tree.dat at the scoredb root.
What it produces (one Edge per unique (parent_hash, move_id)):
Edge { parent_hash : uint64 // Zobrist of the position before the move // (FIDE-2024+ EP semantics — transpositions // that reach the same legal-move set // hash identically) move_id : uint32 // SMove16 in low 16 bits total : uint32 // W + D + B w, d, b : uint32 // game counts split by result}Total 28 bytes per record; 32-byte stride on disk. Packed by bucket
(parent_hash & (nbuckets - 1)); a bucket-offset table at the head
of tree.dat makes bucket lookup O(1). Reduced from ~200 M raw
spill tuples into ~75 M unique edges at 10 M-game scale.
Query model: explorer --fen "FEN" loads the FEN through the
v10 engine, computes the position hash, indexes into the bucket
table, and linear-scans that bucket (~73 K records) for matching
parent_hash. Returns the moves played from that position across
the corpus, with win/draw/loss counts.
End-to-end query latency: ~0.15 ms warm at 10 M-game scale. This is the opening-explorer primitive.
Filtered tree builds
Section titled “Filtered tree builds”The user-facing question “what’s the branch tree from this position, restricted to GM games?” is not a new engine; it’s the same tree builder run over a filtered subset of games. The pipeline:
- Scan engine produces a bitmap of matching games (e.g. GM Sicilians).
- Tree builder takes the bitmap as a pre-filter and reduces only those games’ tuples into a fresh edge table.
This is why we don’t need a --aggregate branch-tree flag in the scan
engine. The tree builder already does the aggregation; the filtered-
subset variant just needs a bitmap-aware reduce pass that consumes
the scan’s match bitmap as input.
What about other aggregations?
Section titled “What about other aggregations?”See USE-CASES.md → Aggregations the right way. Short version:
- Branch trees (opening explorer): tree builder, optionally filtered.
- Square heatmaps (“where does white put knights here?”) and
group-by (count by a derived feature, e.g. pawn-structure): built —
the in-scan aggregation reducers (
--heatmap/--group-by) on thescan_bb_aggregatepath. - Anything else: match-record streaming + app-level aggregation. The escape valve for the long tail.
The Query Engine
Section titled “The Query Engine”As-built (2026-05-29): the full CSW query model is live — composite position predicates (queens-off, bishop-pair, doubled/isolated/passed pawns, rook-on-7th, rook-on-open-file, king-castled, and
--countmaterial expressions), side quantifiers (--side white|black|either|both|neither), window quantifiers (--streak/--min-streak,--ever,--always,--never,--at-ply/--from-ply/--until-ply/--between-ply), 1-step-past edges (--became/--ceased), and game-level boolean composition in DNF — AND via--cond/--and, OR via--or, NOT via--not. Runs on the BB_ONLY path; game-bitmap output (--output, PMOTE-BM) and the--input-bitmapprefilter (cascade-AND) are both wired. The killer query reproduces 152,148. Aggregation reducers are also built (--heatmap→ PMOTE-HM,--group-by pawn-structure→ PMOTE-GB) on a dedicatedscan_bb_aggregatepath that reduces over every matching position. Still target design below:--subfen(arbitrary sub-position matching), the full-hash scan path, and the position-stream reducer. See README → Status.
query-engine is the predicate-evaluation engine. (The previous
flat-scan v1 implementation is archived at
experiments/c-explorer/old/score-scan-v1.c and serves as the
working reference for the predicate-library port.) The current
engine introduces:
- A predicate IR that decomposes a query into game-level filters
- per-ply predicates.
- A maintenance-needs analyzer that decides which engine state must be maintained (none / mailbox-only / hash too).
- A scan-path dispatcher that chooses the replay path. Live paths:
header-only (no per-ply work), BB_ONLY (mailbox-only via
make_move<mm_opts::BB_ONLY>) for the game-bitmap reducer, and BB_AGGREGATE (the same BB_ONLY replay feeding the per-position aggregation reducers). Both BB paths share onebb_replaydriver withBitmapBody/AggregateBodypolicies, monomorphized on a compile-time query shape (<HasGates, HasPbh>); the publicscan_bb_only/scan_bb_aggregateare thin shape-dispatchers over it. A full-hash (WITH_HASH) path is designed but not yet wired.
The header-only path is ~1000× faster than per-ply replay for
queries that touch only header fields (e.g., --result W).
It:
- Opens a
.scoredbdirectory. - Spawns one worker thread per shard (current model; will become a work queue once threads are decoupled).
- Each worker mmaps its shard’s
.movesfile and walks game records:- Decodes the 36-byte header.
- Applies game-level filters (result, ECO, year, ratings, termination, time control, names): skip the game entirely if any fail. Also applies the input-bitmap pre-filter here.
- For surviving games (and only if some condition carries a position
predicate), replays the move stream through the
position.hppbitboard engine (make_move<mm_opts::BB_ONLY>), evaluating each condition’s position predicates at each ply. - Maintains per-
(condition, side)window state — e.g. a streak counter per condition — and resolves each condition’s window quantifier (streak:N/always/never/at-ply/ …) to a per-game boolean. - Combines the per-condition booleans in disjunctive normal form (OR of AND-groups, with per-condition NOT) to decide the game match.
- Emits a match count and, when
--output @name|PATHis given, the per-shard game-ID bitmap (PMOTE-BM).
The above is the game-bitmap path (BitmapBody). For aggregation
(--heatmap / --group-by / --positions), the same bb_replay driver runs
an AggregateBody instead: same game-level gate (headers + --input-bitmap)
and same BB_ONLY replay, but it reduces over every matching position (the
DNF-combined position predicate without the window fold — no per-game verdict,
no early-exit) into a per-shard accumulator. main() merges the per-shard
accumulators and writes PMOTE-HM (square-heatmap) or
PMOTE-GB (top-N group-by) via --output. See
OUTPUT-MODEL.md.
Position rules vs. game filters
Section titled “Position rules vs. game filters”The two categories have different semantics:
| Position rules | Game filters | |
|---|---|---|
| Evaluated | Once per ply | Once per game |
| Composes | AND within a condition; conditions combine in DNF (--and/--or/--not), each under its own side+window quantifier | ANDed before scan |
| Examples | --count, --queens-off, --bishop-pair, --passed-pawn (+ --subfen, planned) | --result, --eco, --gm, --white-name |
| Cost | Drives the inner loop | Cheap |
For full semantics see QUERY-LANGUAGE.md.
Window quantifiers (streak and friends)
Section titled “Window quantifiers (streak and friends)”Each condition carries a window quantifier that turns its per-ply
position-predicate result into a per-game boolean. --streak N
(alias --min-streak N) is the most common: the condition matches
only when its predicates hold for N consecutive plies. The default
when a condition has position predicates but no explicit window is
streak:floor (the predicate’s minimum meaningful streak); streak:1
means “ever.” The other window quantifiers are --always, --never,
--at-ply k, --from-ply, --until-ply, and --between-ply — and
the edge transforms --became/--ceased capture the 1-ply onset/exit
of a condition.
Streaks matter for queries like “rook endgames” where transient capture-and-recapture sequences would otherwise produce false positives. A 1-ply structural match might be a rook briefly sacrificed mid-combat; a 5-ply streak indicates a sustained endgame phase.
Bitmap composition
Section titled “Bitmap composition”As-built (2026-05-29): mostly live.
query-enginewrites a PMOTE-BM match bitmap via--output @name|PATHand consumes one via--input-bitmap(cascade-AND prefilter); the standalonebitmap-combinedoesand/or/not/xor/subover persisted.bmfiles and reads the v2 writer’s output. In-engine disjunction/negation over conditions is also live (--or/--not, DNF). Still target: set-op AST nodes that compose intermediate bitmaps fully in memory inside onequery-engineinvocation. See README → Status.
The atomic output of the scan engine is a bitmap: a dense
per-shard packed bit array, one bit per game in the corpus, encoded
in a single .bm file with a header + shard table.
Bitmaps are the connective tissue between queries:
- The output of one scan becomes the input of the next via
--input-bitmap. - Set algebra across bitmaps (
bitmap-combine and/or/not/xor/sub) expresses cross-game boolean composition. - Stored bitmaps inside a
.scoredb/bitmaps/directory are referenced by@name; external.bmfiles take filesystem paths.
Boolean composition is the same operators (AND/OR/NOT) applied at several levels, and the planner lowers each to its cheapest physical form:
- Fused into one position predicate — multiple predicates ANDed inside a single condition (one scan).
- In-engine DNF over conditions — AND-groups OR’d together, with
per-condition NOT (
--and/--or/--not), evaluated game-by-game in a single scan with no materialized intermediate bitmaps. This is the default home for disjunction/negation now. - Prefilter cascade — AND across separate scans via
--input-bitmap. - Extensional set algebra on materialized bitmaps —
bitmap-combine and/or/not/xor/sub. Still the right tool when you want to compose persisted, reusable bitmaps, union per-branch header sets that an in-engine OR can’t express in one pass (e.g. the killer query’s two different--resultfilters), or run set ops that the in-engine DNF doesn’t cover (XOR/SUB).
Lowering is legal only when it preserves meaning — quantifiers don’t
always distribute over a boolean op (Streak(A,5) OR Streak(B,5) ≠
Streak(A OR B, 5)), which is why the in-engine OR is over whole
conditions (each with its own quantifier), not over raw predicates.
The set-algebra kernels live in libedge, called both in-engine and
by the standalone bitmap-combine CLI (composing persisted .bm
files).
Why bitmaps, not roaring or sorted-ID lists
Section titled “Why bitmaps, not roaring or sorted-ID lists”The hot path is prefilter scanning: iterate all games in a shard, test “is this game’s bit set” per game. For that pattern, a dense bitmap is optimal (sequential cache access, branch-predictable). For storage of sparse bitmaps, sorted IDs or Roaring would be smaller, but query-side performance is what we’re optimizing.
We may revisit Roaring later if we accumulate many very-sparse stored bitmaps and disk pressure becomes real. The switch is ~200 lines and doesn’t change the public API.
Delivery targets
Section titled “Delivery targets”The C source is the canonical asset. From it, multiple delivery shells share the same primitives:
graph TD LIB["libedge — the C library<br/>indexer · scan · tree · bitmap algebra · corpus"] LIB --> CLI["CLI tools<br/>(today)"] LIB --> DAE["Daemon<br/>HTTP loopback<br/>(primary integration path)"] LIB --> WASM["WASM build<br/>(browser, small corpora)"] LIB --> FFI["FFI<br/>(Node, Python)"] classDef lib fill:#2563eb,stroke:#93c5fd,stroke-width:2px,color:#fff; class LIB lib;Today: four CLI binaries. Library code is inlined into each tool;
the libedge extraction is on the roadmap.
Planned (in priority order):
libedge— extract the shared primitives into a real library (.a+ headers). CLI tools become thin wrappers.- Daemon — HTTP loopback service exposing index / query / tree /
combine endpoints. Apps run a local daemon and talk to it via
http://localhost:PORT. This is the primary integration path for app developers. - WASM build — for embedded/browser-only use, smaller corpora. Same API surface, with the 4 GB memory ceiling and ~30–50 % speed penalty acknowledged.
Daemon API surface (sketch)
Section titled “Daemon API surface (sketch)”Concrete shape is TBD, but the rough endpoints:
POST /index { pgn_path, out_path? } → progress events (WebSocket)POST /query { corpus_path, rules, ... } → bitmap path or match listPOST /tree { corpus_path, position } → edges at positionPOST /combine { op, args[] } → bitmap pathGET /corpus/inspect?path=... → meta + bitmap listJSON request/response. Sync ops via POST; long-running ops (indexing) emit progress events over WebSocket.
What’s IN scope, what’s OUT
Section titled “What’s IN scope, what’s OUT”In scope
Section titled “In scope”- PGN ingest at scale (millions of games)
- Per-position pattern matching and structural conditions
- Set algebra over game ID sets
- Aggregate move-tree construction (with filtering)
- Canonical aggregations: branch tree, square heatmap
- Match-record streaming for app-level aggregations
- Multiple delivery targets sharing one C codebase
Out of scope
Section titled “Out of scope”- Move-tree variations and annotations. Edge strips variations, comments, NAGs, drawables. Games are linear SAN sequences. The full tree-with-annotations model is Tabia’s domain.
- Game rendering, UI, or user interaction. Edge emits IDs and data; rendering is Rabbit/Motif/Tabia’s job.
- Game-playing engine (move generation, evaluation, search). That’s the (future) Engine project.
- Per-game live state (current position, side to move during play). That’s Tabia.
- Schema evolution beyond chess. The
.movesformat is chess-specific; no attempt at variant or rule-extension generality.
Cross-stack boundaries
Section titled “Cross-stack boundaries”| Stack project | Relationship to Edge |
|---|---|
| Tabia | App-level integration. Edge finds games; Tabia displays them. No code-level dependency either direction. |
| Motif | None directly. Apps that consume both use Motif for UI and Edge for data. |
| Rabbit | None directly. Same as Motif. |
| Priyome / OpenFile | None directly. Conceivable future bridge: a Priyome page-level board could highlight positions from an Edge query, but the integration lives in apps. |
| Engine (future) | None planned. Engine analyzes positions; Edge counts them. |
What the scan engine doesn’t compute
Section titled “What the scan engine doesn’t compute”Edge’s scan engine does not understand:
- Variations or annotations from the input PGN (stripped at ingest)
- Engine evaluations or scores
- Opening names (only ECO codes, taken from the PGN tag)
- Player identity beyond the literal name string
- Time-per-move (only the time-control declaration)
These belong in higher layers if they’re needed: the app loads matched
games from the source PGN via the .moves record’s pgn_offset and
parses them with Tabia for the full data shape.
Open architectural questions
Section titled “Open architectural questions”- Threading/sharding decoupling for the indexer (work queue).
- Within-bucket sort in
tree.datto enable binary-search lookup (current linear scan is 0.15 ms warm; not a real bottleneck yet). - Daemon API protocol (REST vs JSON-RPC vs WebSocket-streaming).
- Sparse bitmap encoding (Roaring vs dense, density threshold).
- Folder-mode ingest (concat N PGNs into one
.scoredbor one each). libedgeABI: C-only? C with optional C++ wrapper? Versioning.- Query-defined collections & incremental indexing (a separate
“database-management” track, distinct from query-building set-algebra
but reusing the same bitmap kernels): collections as materialized
views (a stored predicate + its bitmap) auto-maintained by scanning
only newly-added games; incremental
.scoredbappend (new-shard addressing keeps existing game IDs stable), incrementaltree.datmerge, dedup / delete (SUB). Design conversation pending; revisit TNMPctx/records as prior art.
See ROADMAP.md for status on each.