Skip to content

Predicate Language

Status: TEMPLATE. Sections below are scaffolds for collaborative authoring. The “open questions” lines mark spots where we still need to make decisions; remove them as they’re resolved.

Status (2026-05-29): the C-S-W engine + the --query text-DSL parser are built (QUERY-LANGUAGE); the parser lowers directly to the engine’s query structs (Option 1). This doc’s canonical JSON AST is the future interchange (Option 2) — the EBNF below predates the C-S-W condition model (no side-quantifiers / simultaneity / DNF / became-ceased) and will be redefined when a JSON consumer (workbench GUI / daemon) lands. For what actually runs, see README → Status.

The Edge query language is an algebra of typed predicates over the corpus. This doc defines the canonical form — the abstract syntax tree (AST) that every frontend (CLI flags, GUI widgets, NLP translation) produces. The engine evaluates the canonical form; the surface syntax is downstream.

Sibling docs:

Cover: design pressure of composability strength; predicate-language is the source of truth, frontends translate to it; type system enforces unambiguity; downstream consumers (CLI / GUI / NLP) all produce canonical AST.

Cover: the four core types and what they represent.

  • PositionPredBoard → bool. A check against a single board state. Examples: bishop_pair, doubled_pawn(file=d).
  • HeaderPredGameHeader → bool. A check against the 36-byte PGN header. Examples: eco_in(B20-B99), both_elo_ge(2500).
  • GamePredGame → bool. A check against the whole game (a ply sequence). Produced by lifting a PositionPred with a quantifier. Examples: Ever(bishop_pair), Streak(passed_pawn, 8).
  • GameSet — a set of matched games. The output of a query; the input to set algebra.

Resolved 2026-05-28: no MovePred type. A predicate is a pure function of the current board + declared cross-ply state — adjacent-ply (“move”) conditions are a 1-bit Became(P) / Ceased(P) edge over a position predicate. The output/reducer is likewise not a type; it’s a Scan attribute (a list — multi-output). See OUTPUT-MODEL.md #1, #4, #7.

Cover: which operators preserve which types. The legal-compositions table.

Operator Inputs → Output type
───────────────── ──────────────── ──────────────
∧, ∨, ¬ PositionPred* → PositionPred
∧, ∨, ¬ HeaderPred* → HeaderPred
quantifier (see) PositionPred → GamePred
∧ GamePred × GamePred → GamePred
∧ HeaderPred × GamePred → GamePred (composes across tiers)
set algebra GameSet × GameSet → GameSet

Open question: do we allow mixing PositionPred and HeaderPred at the position-level AND? E.g., “on a ply where bishop pair AND game’s ECO is B20-B99” is degenerate (the second is constant per-game), but what about OnPly(P, n) AND HeaderPred? It probably collapses.

Quantifiers — lifting PositionPred → GamePred

Section titled “Quantifiers — lifting PositionPred → GamePred”

Cover: the modifier vocabulary. Each entry: name, signature, semantics, performance notes.

Existential / universal:

  • Ever(P) — ∃ ply where P holds. The default if no quantifier specified.
  • Never(P) — ∀ plies: ¬P holds.
  • Always(P) — ∀ plies: P holds.

Streak (min-consecutive):

  • Streak(P, n) — ∃ ≥n consecutive plies where P holds. (Inherited from v1’s --min-streak.)

Positional bounds:

  • AtPly(P, n) — P holds at the specific ply n.
  • FromPly(P, n)Ever(P) restricted to plies ≥ n.
  • UntilPly(P, n)Ever(P) restricted to plies ≤ n.
  • BetweenPly(P, lo, hi)Ever(P) restricted to plies in [lo, hi].

Phase-based:

  • InPhase(P, phase)Ever(P) restricted to a phase classifier (opening / middlegame / endgame). Phase definition lives in PREDICATE-LIBRARY.

Temporal sequence (harder — V2?):

  • Then(P, Q) — ∃ plies i < j: P at i AND Q at j.
  • While(P, Q) — ∃ plies i < j: Q at j AND P holds at all plies in [i, j].
  • Until(P, Q) — ∃ ply i: P holds at all plies < i, Q at i.

Open question: temporal-sequence quantifiers double the implementation complexity (state machines over the ply stream). Worth gating to V2?

Open question: streak vs Always-within-range. Streak(P, n) is “P for n consecutive plies somewhere”; Always is “every ply”. Is there a use case for “every ply for ≥n consecutive starting from move M”?

Set algebra — GameSet × GameSet → GameSet

Section titled “Set algebra — GameSet × GameSet → GameSet”

Cover: the cross-query composition layer. AND, OR, XOR, SUB, NOT. When does the user reach for this vs in-query AND-chain?

  • In-query AND-chain: when all conditions must hold within one game.
  • Set algebra: when composing INDEPENDENT query results.

The “Sicilian OR French” case → run two queries, OR the bitmaps. The “Sicilian AND master” case → one query, both predicates AND’d.

Cover: the formal grammar of the canonical form. EBNF or similar. This is what frontends produce and what the planner consumes.

Query ::= GameSetExpr
GameSetExpr ::= Scan | SetOp
Scan ::= "scan" Corpus Predicate
SetOp ::= "and" GameSetExpr+
| "or" GameSetExpr+
| "sub" GameSetExpr GameSetExpr
| "xor" GameSetExpr GameSetExpr
| "not" Corpus GameSetExpr
Predicate ::= GamePred | PredicateAnd
PredicateAnd ::= "and" Predicate+
GamePred ::= HeaderPred | LiftedPred
LiftedPred ::= Quantifier PositionPred
Quantifier ::= "ever" | "never" | "always" | "streak" Number
| "at-ply" Number | "from-ply" Number | "until-ply" Number
| "between-ply" Number Number
| "in-phase" Phase
| "then" Predicate Predicate
| "while" Predicate Predicate
PositionPred ::= NamedPositionPred (Param*) | PositionAnd | PositionOr | PositionNot
HeaderPred ::= NamedHeaderPred (Param*) | HeaderAnd | HeaderOr | HeaderNot

Open question: do we use a Lisp-y s-expression form, JSON, or a custom syntax? The canonical form is what AST processors operate on; the surface form is downstream. (For NLP-translation purposes, JSON is the obvious target.)

Cover: 5-10 worked examples translating natural-language queries into canonical form. These also serve as test cases for any frontend.

Example: “Sicilian Najdorf, both 2500+, white wins, blitz.”

and(
header_eco_in(B90-B99),
header_both_elo_ge(2500),
header_result(W),
header_tc_category(blitz)
)

Example: “Games where white had the bishop pair sustained ≥10 plies starting from move 15.”

and(
streak(white_bishop_pair, 10) within from_ply(30, ...)
)

Open question: how do we cleanly express the composition of streak

  • from-ply? Two quantifiers on the same PositionPred. Is the right shape Streak(FromPly(P, 30), 10) (apply bound first, then streak over the restricted range)?

(More examples to follow.)

Resolved 2026-05-28 (output-axis decisions in OUTPUT-MODEL.md):

  • Canonical unit: ply (half-move). “Move N” is a frontend input/display convenience (move N = plies 2N−1, 2N).
  • Surface form: JSON is the canonical AST — machine interchange, what frontends emit and the engine/daemon consume. Ergonomic complex-query input (a text DSL / GUI) is a frontend concern, deferred until a consumer needs it.
  • No MovePred type. Predicate = current board + declared cross-ply state (quantifier counter / 1-bit Became/Ceased edge / maintained tally / rarely, prior boards). See OUTPUT-MODEL #4.
  • Stacked quantifiers: bound first, then streakStreak(FromPly(P, k), n) (JSON nests streak { of: from_ply { of: P }}).
  • Side is a predicate parameter, not a variant: isolated_pawn(side, …).
  • Temporal-sequence quantifiers (Then/While/Until): V2.
  • Output/reducer is a Scan attribute (a list — multi-output), not a core type. See OUTPUT-MODEL #1, #7.

Still open:

  • Phase classifier: piece-count / move-number / material / hybrid? (Chess-domain call — decide when the first phase-dependent predicate is built.)
  • Type for “headerless” mode (no ECO / tags in the PGN)?
  • Mixing PositionPredHeaderPred at the position level (likely collapses; confirm).