Retrieval

Hybrid RRF, the five-rung retry ladder, unanimity shortcut, tail-preserving rerank, intent-aware reweighting.

Every jeffs-brain SDK implements the same retrieval pipeline, specified in spec/ALGORITHMS.md. At a glance:

Parse the query into an AST (see query DSL).
Expand caller-supplied aliases.
Run BM25 over FTS5.
In parallel, embed the query and run a vector search (when an embedder is configured).
Fuse the two lists with Reciprocal Rank Fusion.
If BM25 returned zero hits, walk the retry ladder before fusion.
Apply intent-aware reweighting based on query type (English only).
Optionally rerank the top N with a cross-encoder, preserving the tail.

Reciprocal Rank Fusion

RRF is order-independent and tuning-free. For each candidate across each input list:

score = 1 / (k + rank + 1)          # k = 60 by default

A candidate’s final score is the sum of its contributions across every list it appears in. Ties break on path ascending for determinism.

The key property: the order of the two input lists doesn’t matter. BM25-first or vector-first produces the same ranking.

Modes

bm25 — FTS5 only. Cheapest; always available.
semantic — vectors only. Requires an embedder.
hybrid — both, fused via RRF. Requires an embedder; falls back to bm25 silently when the embedder isn’t available and records fellBackToBM25.
auto — resolves to hybrid when an embedder is configured, bm25 otherwise.

Retry ladder

When BM25 returns zero candidates, the pipeline retries with progressively loosened queries. The first rung that returns hits wins; the rest are skipped. Every SDK mirrors the five rungs in order so traces stay comparable.

initial — the original, fully-formed query.
strongest_term — the single strongest term from the query, chosen by length and frequency signals.
force-refresh — a deliberate no-op kept only for trace parity across SDK ports. SDKs MUST preserve the rung even though it issues no new search. Historically this slot reinvalidated a cache; the cache is gone, the rung stays.
refreshed_sanitised — the normalised, stopword-filtered, alias-expanded form. Operator characters stripped.
refreshed_strongest — the strongest term from the sanitised form.
trigram_fuzzy — trigram Jaccard similarity over indexed chunk content, keeping hits with Jaccard ≥ 0.3. This catches typos and near-misses that exact-term FTS can’t reach.

Each attempt is bounded so the ladder terminates fast on genuinely empty indexes.

Unanimity shortcut

When BM25’s head and the vector leg’s head agree on two of their top three ids, the reranker is skipped entirely. Both legs already concur on what’s relevant; paying for a cross-encoder wouldn’t change the answer. This is a cost and latency optimisation with a bounded quality risk.

Rerank with tail preservation

When a reranker is wired (cross-encoder or LLM-as-a-judge), it only reorders the top rerankTopN candidates (default 20). The tail is appended unchanged, so rerank can never hide a borderline result that the reranker would have liked but never saw.

Intent-aware reweighting

After fusion, the retrieval layer inspects the query for intent signals and applies multiplicative boosts or dampers to specific candidate sources. Two intent families are recognised today:

Preference queries — “what do I prefer?”, “my favourite X”. Memory-derived notes boost; raw document chunks damp.
Concrete-fact queries — “what’s the version of X?”, “when did Y happen?”. Concise factual snippets boost; long prose damps.

Multipliers compose. A result matching both a boost and a damp lands on the product (for example, 0.45 × 0.75 = 0.3375).

Reweighting is English-only in v1.0. Queries in other languages flow through without adjustment.

Where to look next

spec/ALGORITHMS.md — the normative spec with pseudocode and the verbatim regex sources SDK ports must mirror.
spec/QUERY-DSL.md — query grammar and compilation rules.
Retrieval MCP tools — memory_search, memory_recall, memory_ask.