Query Language
Two layers. DESIGN (normative, §I–II) — what the language is, the gold standard the engine is measured against — followed by IMPLEMENTATION STATUS (§III) — what is actually built today. Design leads; status follows. A feature that is designed but not yet built is a build target listed in §III, never a caveat inside the grammar. The design sections describe the whole language as if finished; §III is the honest ledger of how far the engine has reached.
This document specifies the Edge query language — the textual filter that selects games and positions from a corpus. The companion axes are specified elsewhere and referenced throughout:
- FORMAL-BASIS.md — the LTLf semantics underneath the window quantifiers (the correctness oracle).
- PREDICATE-LANGUAGE.md — the canonical typed AST and type system (the machine interchange a query lowers to).
- PREDICATE-LIBRARY.md — the chess-domain predicate vocabulary the language composes.
- OPENING-ALIASES.md — opening-name → ECO resolution.
- OUTPUT-MODEL.md — the output/reducer axis, which is orthogonal to and outside this language.
- PLANNER.md — how the resulting query is planned and executed.
PART I — THE DESIGN (normative)
Section titled “PART I — THE DESIGN (normative)”The shape of a query
Section titled “The shape of a query”Every query has the form:
result = reduce( boolean-combination-of conditions )Two orthogonal axes meet here, and the separation is load-bearing:
- The filter is the
boolean-combination-of conditions— which games (or positions) match. This is the query language. Everything in Part I and Part II defines the filter. - The output is the
reduce— what you get back (a set of games, a stream of positions, a heatmap, a group-by). The output is a separate scan attribute, specified alongside the filter but never inside the boolean grammar. See OUTPUT-MODEL.md.
You never write the reducer into the boolean expression. The same filter serves every output; you choose the output independently. The rest of Part I is therefore exclusively about the boolean filter.
Three kinds of predicate
Section titled “Three kinds of predicate”The filter is built from exactly three kinds of thing:
-
Header predicates — per-game facts. A header predicate tests
field ∈ value-set, where the field is a scalar property of the game (result, ECO, year, an Elo, a player name, …). Trace-constant: it has no side and no ply-window. -
Position predicates — per-board tests,
Board → bool(bishop-pair,queens-off, a material comparator). Evaluated at a single position. -
Conditions — the unit that lifts position predicates to a verdict over a whole game. A condition is a triple
(C, S, W): a position-predicate expressionC, a side quantifierS, and a window quantifierW, evaluated over the game’s ply trace.
Header predicates and conditions are the leaves of the boolean
combination; position predicates live inside a condition’s C.
Header predicates — the value-set model
Section titled “Header predicates — the value-set model”Every header field is scalar; a header predicate is set membership
Section titled “Every header field is scalar; a header predicate is set membership”Each PGN header field holds one value per game (the result is one of W/B/D; the year is one number; White is one name). A header predicate therefore asks a single question: is the game’s value for this field a member of a given set?
field ∈ value-set- A scalar literal is the singleton set:
result:Wmeansresult ∈ {W}. - A range is a set:
eco:B20-B99meanseco ∈ {B20, …, B99};year:2010-2020meansyear ∈ {2010, …, 2020}.
Framing membership as the one primitive is what makes the value-set algebra below fall out cleanly and lower to a single test.
Base scalar fields
Section titled “Base scalar fields”The base fields, each scalar:
| Field | Domain |
|---|---|
result | W | B | D (and the convenience decisive = {W, B}) |
eco | ECO code, or a range of codes |
year | integer, or a range |
white-elo | integer, or a range |
black-elo | integer, or a range |
white-name (alias White) | player name |
black-name (alias Black) | player name |
tc-category | bullet | blitz | rapid | classical | correspondence |
termination | normal | time | abandoned | … |
ply-count | integer; surfaced as the bounds min-ply / max-ply |
Opening names (Opening:Sicilian) are not a separate field — they are
literals that resolve into the eco field’s value-set via
OPENING-ALIASES.md.
Intra-field value-set algebra — exactly three operators
Section titled “Intra-field value-set algebra — exactly three operators”A value-set is written in brackets, field:[ … ], and is built with
set algebra minus intersection — three operators, and only three:
-
Union —
,(or the wordOR). The common case: build a set from several alternatives.Opening:[Sicilian, French] eco ∈ (Sicilian ∪ French)result:[W, D] result ∈ {W, D} -
Difference —
SUB. Remove a sub-set from a set.Opening:[Najdorf SUB B92] the Najdorf, minus the Opočenský (B92) -
Complement —
NOT. The set’s complement within the field’s finite domain.Result:[NOT W] result ∈ {B, D} (the domain is {W,B,D})
There is no intra-field AND / intersection, by design. On a scalar field a single value cannot be two things at once, so an intra-field intersection is always one of:
- empty, when the operands are disjoint —
[W AND D]= ∅ (a game’s result is never both W and D); or - redundant, when they overlap — the intersection of two ranges is just the narrower range, already writable directly.
So intra-field AND carries no information a union/difference/complement
expression cannot already state. The parser rejects AND inside a
value-set and points the author at inter-condition AND (the next
section) — which is what someone reaching for it almost always meant
(“Sicilian games that White won” is Opening:Sicilian AND Result:W, two
fields, not one). Should a true intersection ever be wanted, it is
derivable as NOT(NOT A OR NOT B) and needs no dedicated operator.
Set-algebra completeness: union + difference + complement over a finite domain generate the full Boolean algebra of subsets (intersection included, via De Morgan). The grammar omits intersection as an atom only because on a scalar field it is never the useful thing to type.
A value-set resolves at plan time to one membership test
Section titled “A value-set resolves at plan time to one membership test”A value-set is not a disjunction of branches in the boolean layer. It resolves, at plan time, to a single concrete set, against which the scan runs one membership test:
Opening:[Sicilian, French] AND Result:[W, D]is two membership tests AND’d (one per field) — not a four-way
cartesian expansion (Sicilian∧W) ∨ (Sicilian∧D) ∨ (French∧W) ∨
(French∧D). The set is computed once; the test is eco ∈ S_eco and
result ∈ S_result.
Literals inside a value-set resolve through the existing machinery: opening names → ECO ranges (OPENING-ALIASES.md), raw ECO codes and ranges, result/tc/termination enums, and numeric year/Elo values and ranges.
Derived conveniences — sugar over inter-field combinations
Section titled “Derived conveniences — sugar over inter-field combinations”Several everyday questions span more than one base field. These are named derived conveniences. They are not base fields — each is pure sugar that expands into a Boolean combination of the base scalar fields above:
| Convenience | Expansion |
|---|---|
played-by:X | White:X OR Black:X |
between:[X, Y] | White:[X, Y] AND Black:[X, Y] |
both-elo:N | white-elo:≥N AND black-elo:≥N |
gm | both-elo:2500 |
master | both-elo:2200 |
elo-diff:N | cross-field relation ` |
higher-rated:X | cross-field relation — X is the higher-rated side |
between:[X, Y] deserves a note: because each name field is scalar,
White:[X,Y] AND Black:[X,Y] forces {White, Black} = {X, Y} exactly
(White is one of the two, Black is the other) — i.e. “X versus Y, either
color.” That is the intended “this pairing met” semantics, and it is why
the expansion uses two value-set memberships rather than played-by:X AND played-by:Y (which would also admit X-vs-X if the corpus had such a
game).
Position conditions — the (C, S, W) triple
Section titled “Position conditions — the (C, S, W) triple”A condition reads as ( what holds, for which side, over what ply-window ). The three components:
C — the position-predicate expression
Section titled “C — the position-predicate expression”C is a Boolean expression — AND / OR / NOT — of position
predicates, evaluated per ply. The leaves are named predicates from
PREDICATE-LIBRARY.md (bishop-pair,
passed-pawn, rook-on-seventh, …) and material-comparator atoms
(QBNqbn=0, R=r, P<p).
S — the side quantifier
Section titled “S — the side quantifier”S quantifies over the two sides {white, black}:
S | meaning | logic |
|---|---|---|
white / black | that specific side | (specific) |
either | some side (∃) | C(white) ∨ C(black) |
both | each side (∀) | C(white) ∧ C(black) |
neither | no side (∄) | ¬C(white) ∧ ¬C(black) |
side:either binds one consistent shared side through the whole
expression C — the same s ∈ {white, black} feeds every predicate in
C. So (bishop-pair AND passed-pawn, side:either) means one side had
both, never White’s-pair combined with Black’s-passed-pawn.
A predicate may carry its own side to override the shared one —
pred(black) — for a genuinely mixed-side condition (e.g.
bishop-pair(white) AND passed-pawn(black)). Sideless predicates
(queens-off — a property of the whole board) ignore S entirely.
The side quantifiers are finite first-order quantification over a
two-element domain; they expand to pure propositional logic (see
FORMAL-BASIS.md §“layers”). They add no power beyond
AND/OR/NOT — they are convenient, not fundamental.
W — the window quantifier
Section titled “W — the window quantifier”W quantifies over the ply trace. ever is the default when no window is
written.
W | meaning | LTLf |
|---|---|---|
ever | holds at some ply (∃) — default | F P |
always | holds at every ply (∀) | G P |
never | holds at no ply (∄) | G ¬P |
streak:N | holds for ≥ N consecutive plies | F(P ∧ XP ∧ … ∧ Xᴺ⁻¹P) |
at-ply:k | holds at the specific ply k | Xᵏ P |
from-ply:k | ever, restricted to plies ≥ k | bounded F |
until-ply:k | ever, restricted to plies ≤ k | bounded F |
between-ply:a-b | ever, restricted to plies in [a, b] | bounded F |
became(P) | P holds now and did not one ply ago (rising edge) | P ∧ ¬YP |
ceased(P) | P held one ply ago and not now (falling edge) | ¬P ∧ YP |
streak:1 is the simultaneity operator: “there exists one ply where C
holds” — i.e. the predicates of C are true together at a single
position. always requires ≥ 1 evaluated ply (the empty trace is not
counted as vacuously satisfying G). The window operators are grounded as
LTLf over the finite trace in FORMAL-BASIS.md; the
edge operators became/ceased use one step of past (PLTLf).
Ordering is not a window operator. Ordered-temporal questions (“P,
then later Q”) are not in the window table — W quantifies one
condition over the trace, and ordering relates two conditions. Ordering
is expressed by gating (then), specified in its own subsection after
the composition layer. (while/until are deliberately not in the
language at all — see that subsection.)
The condition is a nested quantification
Section titled “The condition is a nested quantification”A condition is the two-dimensional quantification
[S over sides] [W over plies] C(side, ply)read outside-in: pick the side(s) per S, then ask the window question
W of C along the trace. Examples:
(king-castled, neither, ever) ∄side ∃ply — neither side ever castled(king-castled, both, ever) both sides castled (each at some ply)(bishop-pair, either, streak:10) one side held the pair 10 plies straight(king-castled, white, never) White never castledThe four side × simultaneity forms — all expressible
Section titled “The four side × simultaneity forms — all expressible”A persistent design question is: can we say “the same side got A and B, but at different plies”? Yes. The full 2×2 of {specific, either} × {simultaneous, independent} is expressible, and the distinction is carried by whether the predicates share one condition (simultaneous) or sit in separate conditions AND’d (independent):
| simultaneous (same ply) | independent (possibly different plies) | |
|---|---|---|
| specific side | (A AND B, white, streak:1) | (A, white, ever) AND (B, white, ever) |
| either side | (A AND B, either, streak:1) | ((A, white, ever) AND (B, white, ever)) OR ((A, black, ever) AND (B, black, ever)) |
The rule that generates the table:
- A multi-predicate condition carrying a modifier (a side or a window)
is evaluated per-ply, simultaneously —
Cmust hold as a whole at the relevant ply(s). - The independent reading — “A happened, and B happened, not
necessarily together” — is written as separate conditions AND’d,
each with its own
ever.
Both readings are first-class. In particular, “same specific side, A and B at different plies” is the top-right cell, and “same (but unspecified) side, A and B at different plies” is the bottom-right cell — neither is a gap in the language.
Composition — the Boolean layer
Section titled “Composition — the Boolean layer”Conditions and header predicates combine into the filter with AND /
OR / NOT.
Precedence and grouping
Section titled “Precedence and grouping”Precedence is NOT > AND > OR. Parentheses group; they do not
change what the operators mean (as in arithmetic). Mixed AND/OR at
one level must be parenthesized — the language rejects ambiguous
A AND B OR C and asks for (A AND B) OR C or A AND (B OR C).
What the operators mean
Section titled “What the operators mean”AND= independent (game-level).X AND Ymeans both held in the game, at possibly different plies. This is the cheap, early-exit-friendly default; simultaneity is opt-in, expressed inside a single condition via its modifier (above), never byANDbetween conditions.OR= union.NOT= negation, subject to precedence above.
AND is always inter-predicate — it combines whole predicates, never
values within a field. This includes same-field conjunctions, which are
the inter-condition form of the intra-field SUB:
Opening:Najdorf AND NOT Opening:B92 ≡ Opening:[Najdorf SUB B92]played-by:X AND played-by:Y both players appear (either color)Headers are first-class in every branch
Section titled “Headers are first-class in every branch”Header predicates are not a separate pre-filter layer in the language — they are leaves of the same Boolean expression and may appear inside any branch (per-branch headers):
(Sicilian AND bishop-pair(white)) OR (French AND master)Each branch carries its own header context; the planner is free to run headers as a cheap tier per branch (an execution choice — see PLANNER.md), but the language treats them uniformly.
neither / never are sugar, with their dualities
Section titled “neither / never are sugar, with their dualities”neither and never are sugar over NOT:
(C, neither, …) ≡ NOT (C, either, …)(C, …, never) ≡ NOT (C, …, ever)and they carry the expected cross-dimension duality
(C, neither, ever) ≡ (C, both, never)(see FORMAL-BASIS.md §“Algebraic laws”). Synonyms are welcome; the language does not police which form an author writes.
Ordering — gating (then)
Section titled “Ordering — gating (then)”Ordering — “P held, and then later Q held” — is the one temporal
relation that spans two conditions, so it lives neither in W (which
quantifies a single condition over the trace) nor in the Boolean layer
(whose AND is order-free, game-level co-occurrence). It is expressed by
gating.
A gate modulates when a condition’s window may count
Section titled “A gate modulates when a condition’s window may count”A condition P = (C, S, W) may carry an optional gate: a reference to
another condition Q. The gate has one effect — P’s window counter
does not advance until Q has fired:
- before
Qfires,Pis pinned — its per-ply truth is treated as false, so its window records no progress (astreakcounter stays at 0, anevernever triggers); Q’s firing is latched — onceQ’s window is satisfied at some ply, it stays satisfied for the rest of the game;- after
Qhas fired,PcountsCunderSnormally.
Consequently a gated P can only be satisfied at a ply at or after the
ply Q was — and that is ordering. Gating is the whole mechanism; it
adds no new window kind, only a precondition on when the existing window is
allowed to count.
then(P₁, …, Pₙ) is a gate chain
Section titled “then(P₁, …, Pₙ) is a gate chain”then(...) is the surface spelling of gating: then(P₁, P₂, …, Pₙ) gates
P₂ on P₁, P₃ on P₂, …, Pₙ on Pₙ₋₁ — a dependency chain.
Binary then(P, Q) is the n = 2 case (“P, then later Q”). Every
condition in a then(...) is in the same AND-group: then is an
ordered AND — all of P₁ … Pₙ must hold in the game, and each after
the one before it. Gates must be acyclic (a self-gate or a cycle is a
clear error).
Single-pass, no nesting
Section titled “Single-pass, no nesting”Gating is single-pass, O(1) state per condition — each condition keeps its existing window state plus one latched “has fired” flag, evaluated in one trace walk with the gate resolved before its dependents. There is no nesting and no product automaton: a gate only modulates when the counter may count, reusing the same window machinery. This is exactly why windowed-then-windowed composes flat:
then( (rook-on-seventh(white), streak:3), queens-off )is two flat conditions linked by a gate — a 3-ply rook-on-the-seventh
streak, then (later) a queenless board — not a window nested inside a
window. The same flatness gives the recovery / comeback class directly:
a condition gated on an earlier adverse one reads as “Q happened after
having been in state P,” e.g. material even after having been a queen
down, king safe after the opponent had a sustained attack.
while / until are deliberately excluded
Section titled “while / until are deliberately excluded”The language has no while/until. Continuity-tied-to-an-event is not
a chess need: run-length is already streak, and co-occurrence is already
AND. The questions an until(P, Q) would answer (“P held every ply
until Q”) are either restatements of a streak/always over a ply range
or, in practice, ordering questions better asked with then. Excluding
them keeps the temporal surface to the two things chess actually asks —
run-length (streak) and order (then).
Output is orthogonal (restated)
Section titled “Output is orthogonal (restated)”The query text is the filter only. The output — game-bitmap, square-heatmap, group-by, position-stream — is a separate scan attribute (a flag/clause alongside the query; a list, since a single scan may attach several outputs and fan each match out to all of them). It is specified per OUTPUT-MODEL.md and is not part of the Boolean grammar. This keeps one filter reusable across every output.
PART II — THE --query SURFACE SYNTAX
Section titled “PART II — THE --query SURFACE SYNTAX”Part I defines the language abstractly. This part fixes one concrete
textual surface: the --query "<text>" string accepted by
query-engine. (Other frontends — a GUI, an NLP translator — are expected
to target the canonical AST of
PREDICATE-LANGUAGE.md directly; this is the
human-typed surface.)
Lexical and structural rules
Section titled “Lexical and structural rules”-
Boolean operators
AND/OR/NOTare case-insensitive, and group with parentheses. Comma,is a separator (inside a condition’s attribute list and inside a value-set), never a synonym forAND. -
Conditions are parenthesized:
( pred-expr [, side:S] [, window] )where
pred-expris anAND/OR/NOTexpression of position predicates,side:Sis one ofwhite|black|either|both|neither, andwindowis one of the §I window forms. -
Position predicates are named in lowercase, with an optional per-predicate side override in parentheses —
bishop-pair(white),passed-pawn(black). Material-comparator atoms are bare:QBNqbn=0,R=r,R>=1,P<p(uppercase = White, lowercase = Black). -
Header predicates are colon-labeled —
field:valuefor a single value or range, orfield:[ value-set-expr ]for a bracketed set. Thevalue-set-expruses,/OR,SUB, andNOTover literals; ranges are writtenB20-B99,2010-2020,2400-2600. Quoted names where they contain commas/spaces:played-by:"Carlsen, Magnus". Bare aliasesgm,masterstand alone. -
Edge windows
became(P)/ceased(P)wrap a single predicate; the attribute form(P, became)is equivalent on a single-predicate condition. -
Ordering
then(c₁, c₂, …, cₙ)is a prefix, n-ary form whose operands are whole conditions (each apred-exprwith its own optionalside:/window, parenthesized when it carries attributes — e.g.(rook-on-seventh(white), streak:3)). The operands are comma-separated atthen’s own paren depth (an operand’s own attribute commas sit inside its parentheses). It builds a gate chain in one AND-group (§I). Operands must be conditions: anOR, aNOT, a bare header, or a nestedthenis rejected with a clear message;thenof a single condition is an error (an ordering needs ≥ 2). -
Attributes are labeled (
side:,streak:, …) so they are never confused with predicates. -
Case rule for names. Operators are case-insensitive; predicate and header names are lowercase. A mis-cased or unknown name is a clear “unrecognized term” error — never a silently wrong count.
Worked examples
Section titled “Worked examples”The killer union — per-branch headers
Section titled “The killer union — per-branch headers”One side reached a sustained rook endgame and won, expressed as the union of the two colors, each with its own result header in its branch:
( (QBNqbn=0 AND R=r AND R>=1 AND P<p, streak:5) AND result:W )OR( (QBNqbn=0 AND R=r AND R>=1 AND P>p, streak:5) AND result:B )Each branch: no minors/majors, equal rooks, at least one rook, the winning side a pawn down — held for 5 consecutive plies — AND that side won. The full union is validated at 152,148 games on the 10M-game corpus.
The four side × simultaneity forms
Section titled “The four side × simultaneity forms”(bishop-pair AND passed-pawn, white, streak:1) specific + simultaneous(bishop-pair, white, ever) AND (passed-pawn, white, ever) specific + independent(bishop-pair AND passed-pawn, either, streak:1) either + simultaneous((bishop-pair, white, ever) AND (passed-pawn, white, ever)) OR ((bishop-pair, black, ever) AND (passed-pawn, black, ever)) either + independentA genuinely mixed-side condition, via per-predicate override:
(bishop-pair(white) AND passed-pawn(black))Value-set examples
Section titled “Value-set examples”Opening:[Sicilian, French] AND Result:[W, D]Two membership tests AND’d — Sicilian-or-French games that did not end in a Black win. Not four branches.
Opening:[Najdorf SUB B92]Najdorf games except the Opočenský (B92). Equivalent inter-condition form:
Opening:Najdorf AND NOT Opening:B92.
between:[Carlsen, Nakamura]Carlsen-versus-Nakamura games, either color. Expands to the base fields:
White:[Carlsen, Nakamura] AND Black:[Carlsen, Nakamura](scalar fields force {White, Black} = {Carlsen, Nakamura}).
Headers in branches, mixed with conditions
Section titled “Headers in branches, mixed with conditions”(Opening:Sicilian AND bishop-pair(white)) OR (Opening:French AND master)(Opening:Sicilian OR Opening:French)AND ( (bishop-pair, either, streak:10) OR (rook-on-seventh(white), streak:5) )Ordering via gating (then)
Section titled “Ordering via gating (then)”then(...) takes whole conditions as operands; each operand carries its
own side/window. The operands form a gate chain (each gated on the one
before) in a single AND-group.
then(passed-pawn(white), queens-off)White had a passed pawn at some ply, and then later the queens came off —
the passed pawn carried into a queenless endgame, in that order. Reversing
the operands (then(queens-off, passed-pawn(white))) is a different
query — the passed pawn must appear only after the board was queenless —
and both are a subset of the order-free passed-pawn(white) AND queens-off.
then( (rook-on-seventh(white), streak:3), queens-off )Windowed-then-windowed: White held a rook on the seventh for 3 consecutive plies, and then later the queens left the board. The first operand is a full windowed condition; the gate makes the second count only after it fired. No window is nested inside another — two flat conditions, one gate.
PART III — IMPLEMENTATION STATUS
Section titled “PART III — IMPLEMENTATION STATUS”This section measures the built engine against the design above. It is the only place implementation limits appear. For the canonical running list, see README → Status; for sequencing, see ROADMAP.md.
Built and verified
Section titled “Built and verified”The --query parser lexes and parses the surface syntax of Part II and
lowers directly to the engine’s query structs — the same structs the
flat CLI flags build, sharing the predicate/header builders so the two
surfaces cannot drift. This is Option 1 (direct lowering, no JSON).
The flat CLI is simply the degenerate, flat subset of the parser.
The executable subset that runs end-to-end today:
- Filter shape: a DNF of conditions, with header predicates either
global (an
AND-prefilter over every group) or per-branch (inside a specificOR-group). A global fast-path skips replay for games failing global headers; per-branch headers cut replay per group. C: a full Boolean expression (AND/OR/NOT) over position predicates, evaluated per ply (AND-onlyCkeeps a fast flat-fold path).S: full —white/black/either/both/neither, plus per-predicate side override.W: full —ever,always(≥1-ply gated),never,streak:N,at-ply:k,from-ply:k,until-ply:k,between-ply:a-b, and the one-step-past edgesbecame/ceased.- Ordering — gating (
then): built.then(P₁, …, Pₙ)(prefix, n-ary; each operand a full condition) lowers to one AND-group with a gate chain (Pᵢ.gate = Pᵢ₋₁). Single-pass: each condition carries a latched per-game “fired” flag; a gated condition’s window is held at 0 until its gate has fired, then counts normally — so it can only be satisfied at a ply at or after the gate. The per-condition update orders a gate before its dependents, so a chain propagates in one ply. Cycles / self-gates are rejected (the chain is acyclic by construction). An ungated condition (gate = -1) is byte-identical to before. Windowed-then-windowed and the recovery/comeback class fall out flat (no nesting, no automaton). - Per-condition
NOT(a condition may be negated). - Headers: value-set membership — single value, range, and bracketed
value-sets
field:[…](intra-field{OR, SUB, NOT}→ one membership test:result:[W, D],eco:[B90-B99 SUB B92]) over raw literals — plus the scalar/alias forms (result:W,eco:B20-B99,both-elo:N,tc:blitz,played-by/white-name/black-name,gm/master). Opening-name literals in a value-set still pending (see below). - Outputs: game-bitmap (PMOTE-BM), the aggregation reducers (heatmap,
group-by), and the position-stream reducer (
--positions ref|fen|both, PMOTE-PS /.fen) — all three families, per OUTPUT-MODEL.md.
Validation. The parser is round-trip validated against the flat
flags (9/9 exact) — every parser query produces a bitmap identical to
the equivalent flat invocation. The killer query’s single branch
(… AND result:W) reproduces its historical 77,462 games; the full
two-branch union now runs as a single query (per-branch result:W /
result:B) and reproduces 152,148 — bit-identical to the old external
set-algebra assembly — on the 10M-game corpus. Gating (then) is validated
on the same corpus: then(Q=q, queens-off) = 4,478,229 (= ever
queens-off, since equal-queen-count holds from the start and fires before
any queens-off ply); the two orderings then(passed-pawn(white), queens-off) = 3,152,252 and then(queens-off, passed-pawn(white)) =
2,967,652 differ and are both below the order-free
passed-pawn(white) AND queens-off = 3,165,892 (pre-gate occurrences
are suppressed); windowed-then-windowed then((rook-on-seventh(white), streak:3), queens-off) = 798,425 (≤ the AND of its parts, 803,536);
and the chain then(passed-pawn(white), queens-off, rook-on-seventh(white))
= 586,118 (≤ every pairwise then). The regressions hold exactly
(queens-off = 4,478,229; result:[W, D] = 5,465,968).
Safety. Anything the engine cannot yet run is rejected with a clear error — never a crash, never a silently wrong answer. The current rejections are exactly the build targets below.
Build targets — to fully realize the design
Section titled “Build targets — to fully realize the design”Each item below is a designed feature (Part I/II) not yet built. They extend the engine toward the full language; none is a revision of the design.
- Opening-name literals in value-sets —
Opening:[Sicilian, French],Opening:[Najdorf SUB B92]. The value-set bracket machinery (intra-field{OR, SUB, NOT}→ one membership test) is built for raw literals (ECO codes/ranges, result/tc/termination enums, year/Elo ranges); only opening-name → ECO resolution remains (OPENING-ALIASES.md). - Derived-convenience completeness —
between:,higher-rated:, and the rest of the convenience expansions wired through the parser. - Option 2 — the canonical JSON AST interchange (PREDICATE-LANGUAGE.md) as the machine interchange for non-CLI frontends (GUI, NLP, daemon). Future; the EBNF there predates the C-S-W model and will be redefined when a JSON consumer lands.
Also rejected today (and resolved by the targets above): a header carrying a modifier (a header is per-game — it has no window), and conflicting per-predicate sides within one condition.
Notes for the planner and parser
Section titled “Notes for the planner and parser”- Flag order is immaterial to the resulting query; execution ordering is the planner’s job (PLANNER.md).
ANDwith vs without a modifier (the rule that generates the §I side×simultaneity table):A AND Bwith no modifier lowers to two independent conditions (Aever ∧Bever — flatA --cond B); any modifier binds the group into one condition evaluated simultaneously, per-ply. A multi-predicate condition is thus always per-ply-simultaneous; the independent reading uses separate conditions (validated by the 9/9 round-trip equality).- Negative-form early-exit (
never/neither) bails on the first occurrence of the forbidden event; positives bail on the first satisfaction. (The negative early-refute optimization is tracked in FORMAL-BASIS.md checklist #6.)
Cross-references
Section titled “Cross-references”- FORMAL-BASIS.md — LTLf semantics; the completeness checklist this language is measured against.
- PREDICATE-LANGUAGE.md — the canonical typed AST (Option 2 interchange).
- PREDICATE-LIBRARY.md — the position/header predicate vocabulary.
- OPENING-ALIASES.md — opening-name → ECO resolution.
- OUTPUT-MODEL.md — the orthogonal output/reducer axis.
- PLANNER.md — planning and execution of the lowered query.
- ROADMAP.md — build sequencing for the targets above.