Vol. 001 Jeffs Brain·Documentation

Source concepts/query-dsl

Query DSL

Grammar, tokenisation, stopwords, alias expansion, FTS5 compilation.

Every query to memory search, memory recall, or memory_search runs through the query DSL. It’s a deliberately small grammar that normalises input, filters noise, and compiles to SQLite FTS5 MATCH syntax while preserving intent.

Pipeline

normalise (NFC, whitespace) → parse → stopword filter → alias expand → FTS5 compile

Grammar

query     := clause ( WS clause )*
clause    := term | phrase | not
term      := WORD ( "*" )?
phrase    := '"' WORD ( WS WORD )* '"'
not       := "NOT" clause
  • WORD is a maximal run of characters excluding whitespace and the reserved operators.
  • Operators are uppercase only: AND, OR, NOT. Lowercase and, or, not are treated as ordinary terms.
  • Juxtaposition is implicit AND: hedgehog winter is the same as hedgehog AND winter.
  • A trailing * on a term is a prefix wildcard: hedge* matches hedgehog, hedgerow, etc.

Example queries

hedgehog winter                    # implicit AND
"gardens and parks"                # phrase, bypasses stopwords
hedge*                             # prefix wildcard
NOT urban                          # negation
hedgehog OR shrew                  # explicit OR

Tokenisation

Tokens are normalised by NFC and folded to lowercase before parsing. FTS5 control characters (* except as a trailing wildcard, backtick, parentheses, double quotes outside phrases) are stripped to stop user input from smuggling operators into the compiled query.

Stopword filter

Stopwords are removed only from bare-word queries. A query that uses an explicit AND / OR / NOT, that contains a phrase, or that ends up empty after stripping, bypasses the filter. The English and Dutch stopword lists are the canonical JSONs at spec/fixtures/stopwords/en.json and nl.json, and every SDK loads the same file.

Alias expansion

Before compiling to FTS5, the DSL expands caller-supplied aliases. An alias is a term-to-term map, typically { "ts": ["typescript"], "go": ["golang"] }. Alias expansion runs only on bare terms, never inside phrases — so "ts deep dive" stays unchanged.

The alias map is an in-memory ReadonlyMap<string, readonly string[]> only; there is no on-disk alias-table format in v1.0.

Temporal expansion

Three English recognisers append (around YYYY/MM/DD) tokens so phrases like “2 weeks ago” and “last Thursday” route to documents stamped near the anchor date the caller supplies:

Phrase familyExampleAppended token
Relative offset3 days ago(around 2026/04/16)
Last weekdaylast Thursday(around 2026/04/16)
Ordering hintthe first meeting(ordinal 1)

FTS5 compilation

The AST compiles to SQLite FTS5 MATCH syntax term-by-term:

  • termterm
  • term*term*
  • phrase"phrase"
  • AND / ORAND / OR
  • NOTNOT

If a query collapses to empty after normalisation and filtering, the pipeline short-circuits to an empty result without issuing FTS5.

Why a DSL

A tiny grammar means every SDK implements the same parser and every caller gets predictable behaviour across TypeScript, Go, and Python. The alternative (pass the raw string to FTS5) fails badly on operator characters in ingested content and makes the stopword-exemption rule impossible to express.

Reference

See spec/QUERY-DSL.md for the normative grammar, stopword list, and FTS5 escaping rules. Canonical parser fixtures live at spec/fixtures/query-parser/cases.json.