Use Cases
Worked examples and recipes. Each section starts with the question you’re trying to answer, then walks through the queries that answer it.
For per-flag semantics see QUERY-LANGUAGE.md. For the architectural concepts behind these recipes see ARCHITECTURE.md.
API maturity note. Recipes below mix shipped and target API. Much of the core is now live: the composite position predicates (
--queens-off,--bishop-pair,--doubled-pawn,--isolated-pawn,--passed-pawn,--rook-on-seventh,--rook-on-open-file,--king-castled, and--count EXPR), side/window quantifiers, 1-step-past edges (--became/--ceased), in-engine boolean composition (AND via--cond/--and, OR via--or, NOT via--not), bitmap I/O (--output/--input-bitmap), and the standalonebitmap-combine. Still aspirational and labeled inline below where used:--subfen(arbitrary sub-position matching) and the aggregation reducers (square-heatmap, match-record streaming). The v1 source (archived atexperiments/c-explorer/old/score-scan-v1.c) remains the reference for predicate semantics still being ported. See ROADMAP.md for the current implementation status.
Throughout, $CORPUS refers to your indexed corpus, e.g.
lichess_2017-02.20260524-153045.scoredb.
Setting up a corpus
Section titled “Setting up a corpus”# One-time: index a PGN. Output goes into a timestamped .scoredb directory.indexer lichess_2017-02.pgn
# Look at what got produced (just inspect the directory directly).ls -lh lichess_2017-02.20260524-153045.scoredb/cat lichess_2017-02.20260524-153045.scoredb/metaThe indexer prints summary stats, shard counts, and the path of the
created .scoredb. Save the path for subsequent queries.
For convenience, assign it to a variable for the examples below:
CORPUS=lichess_2017-02.20260524-153045.scoredbRecipe 1: Opening exploration
Section titled “Recipe 1: Opening exploration”“What’s the world play after 1. e4 c5 in this corpus?”
The classic opening-explorer question. For an unfiltered look at the full corpus, query the persisted move tree directly:
# Position after 1. e4 c5: ask for the moves played from here and counts.explorer $CORPUS --fen "rnbqkbnr/pp1ppppp/8/2p5/4P3/8/PPPP1PPP/RNBQKBNR w KQkq - 0 2"# g1f3 1432104 games (W:51.2% D:4.3% B:44.5%)# d2d4 345287 games (W:47.8% D:5.1% B:47.1%)# ...Result lookup is ~0.15 ms at 10 M-game scale; the entire tree
(tree.dat) is mmapped at startup. Output is UCI today (SAN is on
the roadmap).
For a filtered look (e.g., “what does the world play after 1.e4 c5 in master games only?”), the current path is to scan-filter first and then either re-index a filtered subcorpus or use a (future) bitmap-aware tree-builder pass:
# Find all games tagged Sicilian by ECO.query-engine $CORPUS --eco B20-B99# → 1,137,517 games matchedSave it as a reusable bitmap:
query-engine $CORPUS --eco B20-B99 --output @sicilianNow ask narrower questions:
# Sicilians in long time controlsquery-engine $CORPUS --input-bitmap @sicilian --tc-category classical# → small subset (Lichess is mostly fast TC)
# Sicilians played by 2200+ on both sidesquery-engine $CORPUS --input-bitmap @sicilian --master# → ~16,000 gamesEach filtered query takes ~50 ms because the input bitmap pre-filters to the cheapest possible stage.
Recipe 2: The killer query
Section titled “Recipe 2: The killer query”“In rook-pawn endgames where the two sides had asymmetric pawn counts, which games did the side with fewer pawns win?”
This is the validation case Edge was designed around. It combines:
- A structural condition (only K, R, P on the board; equal rook counts ≥ 1; asymmetric pawn counts)
- A min-streak to filter transient capture-recapture states
- A result-conditional asymmetry (“the side with fewer pawns won”)
The two branches pair a different result with each pawn-asymmetry
(white-fewer + 1-0 vs. black-fewer + 0-1). Because --result is a
game-level header filter that applies to the whole query — it can’t
vary per OR-branch — this case doesn’t collapse into a single
in-engine --or. The natural decomposition is two scans plus a bitmap
union, which also leaves each branch persisted as a reusable bitmap.
# Branch A: white has fewer pawns, white winsquery-engine $CORPUS \ --count "QBNqbn=0" \ --count "R=r" --count "R>=1" \ --count "P<p" \ --result 1-0 \ --min-streak 5 \ --output @killer-A
# Branch B: black has fewer pawns, black winsquery-engine $CORPUS \ --count "QBNqbn=0" \ --count "R=r" --count "R>=1" \ --count "P>p" \ --result 0-1 \ --min-streak 5 \ --output @killer-B
# Union thembitmap-combine or $CORPUS/bitmaps/killer-A.bm \ $CORPUS/bitmaps/killer-B.bm \ -o $CORPUS/bitmaps/killer.bmOn the 10 M Lichess corpus: ~150,000 matches, ~720 ms total.
This is a query no other chess tool can answer at this scale: no mainstream chess database supports result-correlated structural predicates with sustained-pattern semantics.
Variations
Section titled “Variations”Restrict to high-rated games:
# Two-step: build the precondition bitmaps separatelyquery-engine $CORPUS --master --output @masters# Then run the killer query with that as a pre-filterquery-engine $CORPUS --input-bitmap @masters \ --count "QBNqbn=0" --count "R=r" --count "R>=1" --count "P<p" \ --result 1-0 --min-streak 5 --output @killer-A-mastersRecipe 3: Pawn-structure inquiries
Section titled “Recipe 3: Pawn-structure inquiries”“How often does the Carlsbad pawn structure appear?”
Carlsbad pawn structure (from the Queen’s Gambit Declined Exchange): white pawns on a2, b2, c3, d4, e3, f2, g2, h2; black pawns on a7, b7, c6, d5, e6, f7, g7, h7 — though typically defined by just the characteristic c3/d4 vs c6/d5 + pawn islands.
Use sub-FEN with placement only (planned — --subfen is not yet
built; it’s the next position predicate on the roadmap):
query-engine $CORPUS --subfen "8/pp3ppp/2p1p3/3p4/3P4/2P1P3/PP3PPP/8" --min-streak 5The placement string fixes the pawn structure. Non-named squares (kings,
other pieces) are unconstrained. --min-streak 5 ensures the structure
persists for at least 5 consecutive plies, filtering games where pawns
are momentarily there mid-exchange.
For a sustained structure, longer streaks (10–20 plies) tighten the filter further.
Composing with other conditions
Section titled “Composing with other conditions”# Carlsbad structures in master gamesquery-engine $CORPUS --master \ --subfen "8/pp3ppp/2p1p3/3p4/3P4/2P1P3/PP3PPP/8" --min-streak 5 \ --output @carlsbad-masterRecipe 4: Material-imbalance studies
Section titled “Recipe 4: Material-imbalance studies”“In games with the bishop pair imbalance, who tends to win?”
“Bishop pair imbalance” = one side has two bishops, the other has at
most one. Two branches by which side holds it. Because both branches
filter on position predicates only (no differing header filter), this
collapses into a single in-engine query using --or to OR two
AND-groups:
# Either side holds the bishop-pair imbalance, in one pass.# Group 0 = (B>=2 AND b<2); --or; group 1 = (b>=2 AND B<2).query-engine $CORPUS \ --count "B>=2" --count "b<2" \ --or \ --count "b>=2" --count "B<2" \ --output @bp-imbalanceIf you also want each side’s set kept separately (e.g. to tally results per branch, below), run them as two scans and union the persisted bitmaps instead:
# White has the pair, black doesn'tquery-engine $CORPUS --count "B>=2" --count "b<2" --output @bp-white# Black has the pair, white doesn'tquery-engine $CORPUS --count "b>=2" --count "B<2" --output @bp-black
# Union (equivalent to the single --or pass above)bitmap-combine or $CORPUS/bitmaps/bp-white.bm $CORPUS/bitmaps/bp-black.bm \ -o $CORPUS/bitmaps/bp-imbalance.bmThen count results in each subset:
query-engine $CORPUS --input-bitmap @bp-white --result 1-0 # → white wins with the pairquery-engine $CORPUS --input-bitmap @bp-white --result 0-1 # → white loses despite the pairquery-engine $CORPUS --input-bitmap @bp-white --result 1/2-1/2 # → drawsResult percentages over the imbalance set inform the question.
Recipe 5: Multi-step pipelines
Section titled “Recipe 5: Multi-step pipelines”“Build a Sicilian Najdorf reference set, filtered to high-rated games with sustained endgame play.”
A pipeline of three to four bitmaps composed via set algebra:
# Step 1: Siciliansquery-engine $CORPUS --eco B20-B99 --output @sicilian
# Step 2: Specifically the Najdorf (B90-B99)query-engine $CORPUS --eco B90-B99 --output @najdorf
# Step 3: High-rated gamesquery-engine $CORPUS --master --output @master
# Step 4: Games that reach a queenless position for 5+ pliesquery-engine $CORPUS \ --count "Q=0" --count "q=0" --min-streak 5 \ --output @queenless
# Composebitmap-combine and $CORPUS/bitmaps/najdorf.bm $CORPUS/bitmaps/master.bm \ -o $CORPUS/bitmaps/najdorf-master.bmbitmap-combine and $CORPUS/bitmaps/najdorf-master.bm $CORPUS/bitmaps/queenless.bm \ -o $CORPUS/bitmaps/najdorf-master-endgame.bmEach step is a precomputed reusable building block. Mix and match later without re-scanning.
Recipe 6: Position-pattern search
Section titled “Recipe 6: Position-pattern search”“Find positions where white has a passed pawn on d6 supported by a knight.”
Sub-FEN with the specific squares named (planned — --subfen is not
yet built; see the roadmap):
query-engine $CORPUS --subfen "rnbqkbnr/pp1ppppp/8/8/8/3P4/PPP1PPPP/RNBQKBNR" --min-streak 1Refine with material:
# Make sure black doesn't have a pawn on c7 (which would defend against d6)# This requires negative position evidence, which sub-FEN doesn't currently express.# Workaround: use --count to require black pawn count < 8.(Negative position predicates — “no piece on square X” — are a known gap. Tracked in ROADMAP.md.)
Aggregations the right way
Section titled “Aggregations the right way”Most aggregations belong outside Edge. The pattern is:
- Edge scan emits a bitmap of matching games.
- App iterates the bitmap, loads matched games via
.moves/.dictreads (or via the source PGN’spgn_offsetfield), and computes whatever statistics it wants.
This keeps Edge focused on filtering — the expensive part at scale — and lets app code use chess-typed data structures (move trees, board states with annotations) for the aggregation.
Exceptions: tree builder and planned in-scan aggregations
Section titled “Exceptions: tree builder and planned in-scan aggregations”Two aggregations live in-engine because they’re chess workhorses and they get a meaningful speedup from being computed during the scan:
Branch tree (the opening explorer)
Section titled “Branch tree (the opening explorer)”The tree builder computes the aggregate edge table during indexing
and persists it as tree.dat at the scoredb root. For unfiltered
queries (“what does the world play at this position?”), explorer
looks up the position hash and returns the moves with W/D/B counts
in ~0.15 ms at 10 M-game scale.
For filtered queries (“what do GM Sicilians play at this position?”), the tree builder runs over the filtered subset of games, using the bitmap as a pre-filter:
# Planned interface, not yet built:## explorer $CORPUS --input-bitmap @gm-sicilian --position FENBoth paths use the same reduce machinery; only the input differs.
Square heatmap (planned)
Section titled “Square heatmap (planned)”“Where does white put knights in the Carlsbad pawn structure?”
A heatmap is a 64-element array of “how often was piece X on square Y across all matching positions.” Cheap to maintain during the scan (~30 lines of inner-loop code, ~256 bytes output).
# Planned interface, not yet built:## query-engine $CORPUS \# --subfen "8/pp3ppp/2p1p3/3p4/3P4/2P1P3/PP3PPP/8" \# --aggregate square-heatmap N \# --output-aggregate heatmap.binThe output is a 256-byte file (64 u32 counters); apps render it as a chessboard with square shading.
Match-record streaming (the escape valve)
Section titled “Match-record streaming (the escape valve)”For aggregations beyond the canonical two, Edge can stream
(shard_id, game_id, ply, board_hash) per matching position. Apps
consume the stream and aggregate however they like.
# Planned interface, not yet built:## query-engine $CORPUS [rules] --output-match-records records.binApps can then mmap each matched game’s record and replay to the indicated ply for further data extraction. This is slower than baked-in aggregation but works for any aggregation shape.
Working with results
Section titled “Working with results”After a scan, you have:
- A count of matched games (printed)
- First-N PGN offsets per shard (printed)
- An output bitmap (if requested)
To display matches:
# Get a few matched-game PGN offsetsquery-engine $CORPUS --input-bitmap @killer 2>&1 | grep "shard"# shard 0: 6297 12317 22555 ...# shard 1: 774095634 774097748 ...Then extract the games from the source PGN:
# Seek to offset 6297 in the original PGN to find the matched game.# Apps would mmap the PGN and parse from there with Tabia.The pgn_offset field points at the [Event ...] header of the
matched game.
To iterate all matches, walk the bitmap. (A small score-extract
utility for this is on the roadmap.)
Corpus management
Section titled “Corpus management”A .scoredb is a plain directory; standard Unix tools work. (A dedicated
corpus-admin utility is deferred — see ROADMAP.md.)
# What's in this corpus?cat $CORPUS/metals -lh $CORPUS/shards/ls -lh $CORPUS/bitmaps/
# What bitmaps have been stored?ls $CORPUS/bitmaps/*.bm 2>/dev/null# → killer.bm sicilian.bm master.bm# (or use `bitmap-combine info <file>` for header details)
# Remove a stored bitmaprm $CORPUS/bitmaps/sicilian.bm.scoredb is a directory. You can rm -rf it, rsync it, tar it
like any other filesystem artifact.
When NOT to use Edge
Section titled “When NOT to use Edge”- Single-game playback / interactive analysis. That’s Tabia + Rabbit’s domain. Edge strips variations and annotations on ingest; if you need those, parse the original PGN with Tabia.
- Live game state / move generation. Edge replays moves but doesn’t validate them (the indexer trusts the PGN). For playing positions or generating legal moves, use a chess engine.
- Small corpora (< 1000 games). Edge will work, but the per-PGN startup cost dominates. A direct PGN parse with Tabia is simpler.
- Annotation-aware queries. “Games where the commentator wrote ’!!’ for white’s 18th move” — Edge can’t answer this; the annotations were stripped at ingest. Parse the PGN with Tabia or use a different tool.
Performance sanity checks
Section titled “Performance sanity checks”Benchmarks on 10 M Lichess corpus, M4 Max, 12 cores, warm cache. Use these to spot when something looks off:
| Operation | Expected wall |
|---|---|
| Indexing | ~1.8 s |
| Metadata filter | ~50 ms |
| Single-rule position scan | ~250–350 ms |
| Multi-rule position scan | ~300–500 ms |
Same with --input-bitmap at 10% density | ~50–80 ms |
Same with --input-bitmap at 1% density | ~30–50 ms |
bitmap-combine set algebra | < 10 ms |
If a metadata filter takes seconds, you’re cold-cache. If a position
scan takes minutes, something’s wrong with the corpus or shard layout
— check cat $CORPUS/meta and ls -lh $CORPUS/shards/.