Architecture

Edge is two engines on one corpus. This document covers what each engine does, how they share state, and where the architectural seams are. For wire-level format details see DATA-FORMAT.md; for the user-facing query surface see QUERY-LANGUAGE.md.

Top-level data flow

graph TD
    PGN["PGN corpus<br/>millions of games"] --> IDX["Indexer<br/>one PGN-parse pass"]
    IDX -->|"per-game .moves records"| SCAN["Scan Engine<br/>per-position predicate matching"]
    IDX -->|"per-move tuples"| TREE["Tree Builder<br/>MapReduce to tree.dat"]
    SCAN -->|"game-ID bitmap (.bm)"| BM["Bitmap composition,<br/>position-pattern search"]
    TREE -->|"position to moves + W/D/L"| EXP["Opening explorer,<br/>branch analytics<br/>(~0.15 ms warm)"]
    BM -.->|"filtered subset"| TREE
    classDef engine fill:#2563eb,stroke:#93c5fd,stroke-width:2px,color:#fff;
    class SCAN,TREE engine;

A single PGN-parse pass produces both the per-game records used by the scan engine and the per-move tuples used by the tree builder. The two engines see the same world; they just project it differently.

The Indexer

indexer is the single PGN→.scoredb pass. It:

Mmaps the PGN and partitions the byte range into N chunks (one per shard). find_game_boundary advances each chunk start to the next [Event "...] header so games aren’t split.
In parallel, per chunk:
- Parses the PGN headers (Result, Termination, ECO, ratings, time control, date, player names).
- Replays moves through the v10 bitboard engine (mailbox + templated make_move + state-machine SAN parser), producing per-position Zobrist hashes with FIDE-2024+ EP semantics.
- Emits a per-game .moves record (36-byte header + 16-bit SMove16 stream) into a per-shard buffer.
- Emits per-move tuples (parent_hash, move_id, result_bits) into 1024 hash-keyed spill buckets.
- Interns player names into a per-shard string dictionary.
Sequential phase: writes the per-shard scoredb files:
- meta file (key=value: version, num_shards, total_games, etc.)
- shards/N.moves (the game records)
- shards/N.dict (the name dictionary, newline-separated)
- shards/N.gameidx (uint64 byte offsets per game)
- bitmaps/ (created empty)
Reduce phase: aggregates the per-bucket tuples across all shards into a unified edge table (one row per unique (parent_hash, move_id) with summed W/D/B counts). Written to tree.dat at the scoredb root for querying by explorer.

Indexing is roughly (M4 Max, 10 M Lichess — 10,164,006 games after the indexer drops zero-move games):

Map phase: ~1.50 s (engine + tokenize + spill).
Score (.moves write) phase: ~0.07 s (1.74 GB of .moves data).
Reduce + tree.dat write: ~0.20 s.
Total wall: ~1.79 s.

Sharding

Shard count is a property of the corpus, not the machine. Default tiers by PGN size:

PGN size	Default shards	Per-shard data
< 200 MB	1	up to ~35 MB
200 MB – 2 GB	4	~5 – 90 MB
2 – 20 GB	16	~50 – 280 MB
20 – 200 GB	32	~280 MB – 1.4 GB
> 200 GB	64	as needed

--shards N overrides the default. A .scoredb indexed on one machine behaves the same when read on any other: same shard count, same record boundaries.

Threading model (current and planned)

Currently: nthreads == nshards. Each thread writes its own shard. On a 12-core machine indexing a corpus that wants 16 shards, four threads do double duty (kernel time-slices).

Planned: decouple threads from shards. A work queue dispatches chunks to a fixed pool of worker threads (capped at available cores); each chunk becomes one shard. This lets a 4-core laptop index a 16-shard corpus efficiently. The on-disk artifact is identical either way.

The Tree Builder

The tree builder runs as the second half of the indexer’s MapReduce and persists its output as tree.dat at the scoredb root.

What it produces (one Edge per unique (parent_hash, move_id)):

Edge {
    parent_hash : uint64    // Zobrist of the position before the move
                            //   (FIDE-2024+ EP semantics — transpositions
                            //    that reach the same legal-move set
                            //    hash identically)
    move_id     : uint32    // SMove16 in low 16 bits
    total       : uint32    // W + D + B
    w, d, b     : uint32    // game counts split by result
}

Total 28 bytes per record; 32-byte stride on disk. Packed by bucket (parent_hash & (nbuckets - 1)); a bucket-offset table at the head of tree.dat makes bucket lookup O(1). Reduced from ~200 M raw spill tuples into ~75 M unique edges at 10 M-game scale.

Query model: explorer --fen "FEN" loads the FEN through the v10 engine, computes the position hash, indexes into the bucket table, and linear-scans that bucket (~73 K records) for matching parent_hash. Returns the moves played from that position across the corpus, with win/draw/loss counts.

End-to-end query latency: ~0.15 ms warm at 10 M-game scale. This is the opening-explorer primitive.

Filtered tree builds

The user-facing question “what’s the branch tree from this position, restricted to GM games?” is not a new engine; it’s the same tree builder run over a filtered subset of games. The pipeline:

Scan engine produces a bitmap of matching games (e.g. GM Sicilians).
Tree builder takes the bitmap as a pre-filter and reduces only those games’ tuples into a fresh edge table.

This is why we don’t need a --aggregate branch-tree flag in the scan engine. The tree builder already does the aggregation; the filtered- subset variant just needs a bitmap-aware reduce pass that consumes the scan’s match bitmap as input.

What about other aggregations?

See USE-CASES.md → Aggregations the right way. Short version:

Branch trees (opening explorer): tree builder, optionally filtered.
Square heatmaps (“where does white put knights here?”) and group-by (count by a derived feature, e.g. pawn-structure): built — the in-scan aggregation reducers (--heatmap / --group-by) on the scan_bb_aggregate path.
Anything else: match-record streaming + app-level aggregation. The escape valve for the long tail.

The Query Engine

As-built (2026-05-29): the full CSW query model is live — composite position predicates (queens-off, bishop-pair, doubled/isolated/passed pawns, rook-on-7th, rook-on-open-file, king-castled, and --count material expressions), side quantifiers (--side white|black|either|both|neither), window quantifiers (--streak/--min-streak, --ever, --always, --never, --at-ply/--from-ply/--until-ply/--between-ply), 1-step-past edges (--became/--ceased), and game-level boolean composition in DNF — AND via --cond/--and, OR via --or, NOT via --not. Runs on the BB_ONLY path; game-bitmap output (--output, PMOTE-BM) and the --input-bitmap prefilter (cascade-AND) are both wired. The killer query reproduces 152,148. Aggregation reducers are also built (--heatmap → PMOTE-HM, --group-by pawn-structure → PMOTE-GB) on a dedicated scan_bb_aggregate path that reduces over every matching position. Still target design below: --subfen (arbitrary sub-position matching), the full-hash scan path, and the position-stream reducer. See README → Status.

query-engine is the predicate-evaluation engine. (The previous flat-scan v1 implementation is archived at experiments/c-explorer/old/score-scan-v1.c and serves as the working reference for the predicate-library port.) The current engine introduces:

A predicate IR that decomposes a query into game-level filters
- per-ply predicates.
A maintenance-needs analyzer that decides which engine state must be maintained (none / mailbox-only / hash too).
A scan-path dispatcher that chooses the replay path. Live paths: header-only (no per-ply work), BB_ONLY (mailbox-only via make_move<mm_opts::BB_ONLY>) for the game-bitmap reducer, and BB_AGGREGATE (the same BB_ONLY replay feeding the per-position aggregation reducers). Both BB paths share one bb_replay driver with BitmapBody / AggregateBody policies, monomorphized on a compile-time query shape (<HasGates, HasPbh>); the public scan_bb_only / scan_bb_aggregate are thin shape-dispatchers over it. A full-hash (WITH_HASH) path is designed but not yet wired.

The header-only path is ~1000× faster than per-ply replay for queries that touch only header fields (e.g., --result W).

It:

Opens a .scoredb directory.
Spawns one worker thread per shard (current model; will become a work queue once threads are decoupled).
Each worker mmaps its shard’s .moves file and walks game records:
- Decodes the 36-byte header.
- Applies game-level filters (result, ECO, year, ratings, termination, time control, names): skip the game entirely if any fail. Also applies the input-bitmap pre-filter here.
- For surviving games (and only if some condition carries a position predicate), replays the move stream through the position.hpp bitboard engine (make_move<mm_opts::BB_ONLY>), evaluating each condition’s position predicates at each ply.
- Maintains per-(condition, side) window state — e.g. a streak counter per condition — and resolves each condition’s window quantifier (streak:N / always / never / at-ply / …) to a per-game boolean.
- Combines the per-condition booleans in disjunctive normal form (OR of AND-groups, with per-condition NOT) to decide the game match.
Emits a match count and, when --output @name|PATH is given, the per-shard game-ID bitmap (PMOTE-BM).

The above is the game-bitmap path (BitmapBody). For aggregation (--heatmap / --group-by / --positions), the same bb_replay driver runs an AggregateBody instead: same game-level gate (headers + --input-bitmap) and same BB_ONLY replay, but it reduces over every matching position (the DNF-combined position predicate without the window fold — no per-game verdict, no early-exit) into a per-shard accumulator. main() merges the per-shard accumulators and writes PMOTE-HM (square-heatmap) or PMOTE-GB (top-N group-by) via --output. See OUTPUT-MODEL.md.

Position rules vs. game filters

The two categories have different semantics:

	Position rules	Game filters
Evaluated	Once per ply	Once per game
Composes	AND within a condition; conditions combine in DNF (`--and`/`--or`/`--not`), each under its own side+window quantifier	ANDed before scan
Examples	`--count`, `--queens-off`, `--bishop-pair`, `--passed-pawn` (+ `--subfen`, planned)	`--result`, `--eco`, `--gm`, `--white-name`
Cost	Drives the inner loop	Cheap

For full semantics see QUERY-LANGUAGE.md.

Window quantifiers (streak and friends)

Each condition carries a window quantifier that turns its per-ply position-predicate result into a per-game boolean. --streak N (alias --min-streak N) is the most common: the condition matches only when its predicates hold for N consecutive plies. The default when a condition has position predicates but no explicit window is streak:floor (the predicate’s minimum meaningful streak); streak:1 means “ever.” The other window quantifiers are --always, --never, --at-ply k, --from-ply, --until-ply, and --between-ply — and the edge transforms --became/--ceased capture the 1-ply onset/exit of a condition.

Streaks matter for queries like “rook endgames” where transient capture-and-recapture sequences would otherwise produce false positives. A 1-ply structural match might be a rook briefly sacrificed mid-combat; a 5-ply streak indicates a sustained endgame phase.

Bitmap composition

As-built (2026-05-29): mostly live. query-engine writes a PMOTE-BM match bitmap via --output @name|PATH and consumes one via --input-bitmap (cascade-AND prefilter); the standalone bitmap-combine does and/or/not/xor/sub over persisted .bm files and reads the v2 writer’s output. In-engine disjunction/negation over conditions is also live (--or/--not, DNF). Still target: set-op AST nodes that compose intermediate bitmaps fully in memory inside one query-engine invocation. See README → Status.

The atomic output of the scan engine is a bitmap: a dense per-shard packed bit array, one bit per game in the corpus, encoded in a single .bm file with a header + shard table.

Bitmaps are the connective tissue between queries:

The output of one scan becomes the input of the next via --input-bitmap.
Set algebra across bitmaps (bitmap-combine and/or/not/xor/sub) expresses cross-game boolean composition.
Stored bitmaps inside a .scoredb/bitmaps/ directory are referenced by @name; external .bm files take filesystem paths.

Boolean composition is the same operators (AND/OR/NOT) applied at several levels, and the planner lowers each to its cheapest physical form:

Fused into one position predicate — multiple predicates ANDed inside a single condition (one scan).
In-engine DNF over conditions — AND-groups OR’d together, with per-condition NOT (--and/--or/--not), evaluated game-by-game in a single scan with no materialized intermediate bitmaps. This is the default home for disjunction/negation now.
Prefilter cascade — AND across separate scans via --input-bitmap.
Extensional set algebra on materialized bitmaps — bitmap-combine and/or/not/xor/sub. Still the right tool when you want to compose persisted, reusable bitmaps, union per-branch header sets that an in-engine OR can’t express in one pass (e.g. the killer query’s two different --result filters), or run set ops that the in-engine DNF doesn’t cover (XOR/SUB).

Lowering is legal only when it preserves meaning — quantifiers don’t always distribute over a boolean op (Streak(A,5) OR Streak(B,5) ≠ Streak(A OR B, 5)), which is why the in-engine OR is over whole conditions (each with its own quantifier), not over raw predicates. The set-algebra kernels live in libedge, called both in-engine and by the standalone bitmap-combine CLI (composing persisted .bm files).

Why bitmaps, not roaring or sorted-ID lists

The hot path is prefilter scanning: iterate all games in a shard, test “is this game’s bit set” per game. For that pattern, a dense bitmap is optimal (sequential cache access, branch-predictable). For storage of sparse bitmaps, sorted IDs or Roaring would be smaller, but query-side performance is what we’re optimizing.

We may revisit Roaring later if we accumulate many very-sparse stored bitmaps and disk pressure becomes real. The switch is ~200 lines and doesn’t change the public API.

Delivery targets

The C source is the canonical asset. From it, multiple delivery shells share the same primitives:

graph TD
    LIB["libedge — the C library<br/>indexer · scan · tree · bitmap algebra · corpus"]
    LIB --> CLI["CLI tools<br/>(today)"]
    LIB --> DAE["Daemon<br/>HTTP loopback<br/>(primary integration path)"]
    LIB --> WASM["WASM build<br/>(browser, small corpora)"]
    LIB --> FFI["FFI<br/>(Node, Python)"]
    classDef lib fill:#2563eb,stroke:#93c5fd,stroke-width:2px,color:#fff;
    class LIB lib;

Today: four CLI binaries. Library code is inlined into each tool; the libedge extraction is on the roadmap.

Planned (in priority order):

libedge — extract the shared primitives into a real library (.a + headers). CLI tools become thin wrappers.
Daemon — HTTP loopback service exposing index / query / tree / combine endpoints. Apps run a local daemon and talk to it via http://localhost:PORT. This is the primary integration path for app developers.
WASM build — for embedded/browser-only use, smaller corpora. Same API surface, with the 4 GB memory ceiling and ~30–50 % speed penalty acknowledged.

Daemon API surface (sketch)

Concrete shape is TBD, but the rough endpoints:

POST /index    { pgn_path, out_path? }      → progress events (WebSocket)
POST /query    { corpus_path, rules, ... }  → bitmap path or match list
POST /tree     { corpus_path, position }    → edges at position
POST /combine  { op, args[] }               → bitmap path
GET  /corpus/inspect?path=...               → meta + bitmap list

JSON request/response. Sync ops via POST; long-running ops (indexing) emit progress events over WebSocket.

What’s IN scope, what’s OUT

In scope

PGN ingest at scale (millions of games)
Per-position pattern matching and structural conditions
Set algebra over game ID sets
Aggregate move-tree construction (with filtering)
Canonical aggregations: branch tree, square heatmap
Match-record streaming for app-level aggregations
Multiple delivery targets sharing one C codebase

Out of scope

Move-tree variations and annotations. Edge strips variations, comments, NAGs, drawables. Games are linear SAN sequences. The full tree-with-annotations model is Tabia’s domain.
Game rendering, UI, or user interaction. Edge emits IDs and data; rendering is Rabbit/Motif/Tabia’s job.
Game-playing engine (move generation, evaluation, search). That’s the (future) Engine project.
Per-game live state (current position, side to move during play). That’s Tabia.
Schema evolution beyond chess. The .moves format is chess-specific; no attempt at variant or rule-extension generality.

Cross-stack boundaries

Stack project	Relationship to Edge
Tabia	App-level integration. Edge finds games; Tabia displays them. No code-level dependency either direction.
Motif	None directly. Apps that consume both use Motif for UI and Edge for data.
Rabbit	None directly. Same as Motif.
Priyome / OpenFile	None directly. Conceivable future bridge: a Priyome page-level board could highlight positions from an Edge query, but the integration lives in apps.
Engine (future)	None planned. Engine analyzes positions; Edge counts them.

What the scan engine doesn’t compute

Edge’s scan engine does not understand:

Variations or annotations from the input PGN (stripped at ingest)
Engine evaluations or scores
Opening names (only ECO codes, taken from the PGN tag)
Player identity beyond the literal name string
Time-per-move (only the time-control declaration)

These belong in higher layers if they’re needed: the app loads matched games from the source PGN via the .moves record’s pgn_offset and parses them with Tabia for the full data shape.

Open architectural questions

Threading/sharding decoupling for the indexer (work queue).
Within-bucket sort in tree.dat to enable binary-search lookup (current linear scan is 0.15 ms warm; not a real bottleneck yet).
Daemon API protocol (REST vs JSON-RPC vs WebSocket-streaming).
Sparse bitmap encoding (Roaring vs dense, density threshold).
Folder-mode ingest (concat N PGNs into one .scoredb or one each).
libedge ABI: C-only? C with optional C++ wrapper? Versioning.
Query-defined collections & incremental indexing (a separate “database-management” track, distinct from query-building set-algebra but reusing the same bitmap kernels): collections as materialized views (a stored predicate + its bitmap) auto-maintained by scanning only newly-added games; incremental .scoredb append (new-shard addressing keeps existing game IDs stable), incremental tree.dat merge, dedup / delete (SUB). Design conversation pending; revisit TNMP ctx/records as prior art.

See ROADMAP.md for status on each.