Idioms & drift
An idiom is a convention your codebase actually follows — not one you wrote down. Unyform mines them straight from your code graph: recurring naming clusters and module-organization patterns where one canonical choice has won. Each idiom names that canonical example and reports how widely it's adopted. Drift is new code that diverges from an established idiom — reaching for a near-synonym, a legacy alternative, or a one-off when a canonical already exists.
What an idiom looks like
Take handler functions. Your codebase has dozens, and 47 of them return
Result<HttpResponse, ApiError> while a handful of older ones return
HttpResponse directly. Unyform sees the cluster, picks the dominant member as
the canonical, and records the rest as alternatives:
Idiom: handlers return Result<HttpResponse, ApiError>
family: handler
canonical: create_session (crates/auth/src/handlers.rs)
adoption: 87% · 47 call sites
alternatives: legacy_login, health_check ← drift
That's the whole shape of an idiom: a purpose, a canonical symbol, an adoption percentage, and the alternatives callers sometimes reach for instead.
Note
Idioms are descriptive, not prescriptive. They report what your codebase does — the canonical is simply the choice that won by usage, not one a maintainer decreed.
How mining works
Mining runs automatically during code analysis, in three passes.
flowchart LR
A[Code graph<br/>functions · modules · calls] --> B[Cluster<br/>by name + module]
B --> C[Score adoption<br/>≥70% · ≥3 callers]
C --> D[LLM naming<br/>purpose + rationale]
D --> E[Idioms<br/>with canonical + drift]
Cluster every function, class, and module by the first and last word of
its name. format_date joins both the format cluster and the date cluster;
TextField joins text and field; useDebouncedValue joins use and
value. Module organization gets its own pass — "where do error types live?",
"which module is the API layer?" — by clustering on directory and cross-module
calls. There's no fixed catalogue of patterns, so idioms surface on Rust, TS,
Python, and Go alike without anyone enumerating conventions by hand. Generic
verbs (get, set, is) and trivially short tokens are blocklisted so we
don't cluster on noise.
Score each cluster by incoming calls — a member's usage is its in-degree.
The canonical is the most-used member. A cluster only becomes an idiom when the
canonical clears two thresholds: it owns ≥70% of the cluster's usage
(min_adoption_pct) and has ≥3 callers (min_adoption_count). The
percentage gate drops split clusters where the codebase has genuinely
diverged; the count gate drops "100% adoption" idioms that are really a sample
of one. Both miners then de-duplicate, keeping the higher-confidence idiom when
the same canonical surfaces twice.
Name the survivors with an LLM pass. Mining knows what the canonical is;
it doesn't know why. A temperature-0 model reads the canonical's signature
and docstring plus its alternatives and sample callers, then writes an
imperative purpose ("Capture text input from the user") and a short
rationale. The structural adoption metric is always the floor — model
confidence can raise it but never lowers it. Measured usage outranks opinion.
Tip
The thresholds are deliberately conservative. Unyform would rather surface fewer, high-confidence idioms than flood the board with weak signals. On our own workspace, mining surfaced ~164 idioms in ~157ms.
Drift detection
Once an idiom exists, every symbol in its cluster is either conforming or
drifting. The canonical and the call sites that reach for it earn
FollowsIdiom edges; the alternatives — the near-synonyms and legacy holdouts —
earn DriftsFromIdiom edges. Drift isn't an error; it's a signal. A 90%-adopted
idiom with a few drift sites usually means a half-finished migration or a corner
the convention hasn't reached yet.
Warning
High drift on a low-adoption idiom (just over the 70% line) often means the convention is still contested — two patterns competing, neither clearly canonical yet. Read those as "the team hasn't decided" rather than "someone broke the rule."
Where idioms show up
Mined idioms aren't a report you read once. They flow into the three surfaces that shape how code gets written.
On the blueprint (dashboard)
The Idiom Board renders idioms grouped by family (component, hook,
handler, error, data, …). Each card shows the purpose, an adoption
chip (green ≥85%, amber ≥70%), the canonical symbol deep-linked to its exact
line on GitHub, a call count, and an expandable drift list of the
alternatives. Filter by family, search by name or rationale, and click straight
through to the source of truth.
In gateway context
When a request flows through a gateway, the attached blueprint's idioms ride along as injected context — so the model generates code that matches your canonical patterns, not generic stdlib snippets.
As MCP tools
Tools that speak MCP can query idioms directly, on demand:
_gw_codegraph_list_idiomsList the canonical artifacts this codebase has standardized on, with adoption metrics. Answers "how do we do X here?" — optionally filtered by family.
_gw_codegraph_idiom_lookupFind the canonical artifact for a free-form purpose ("format a date", "log an audit event"). Returns the canonical symbol, its adoption metrics, and the rationale — meant to be called before writing new code so suggestions follow your conventions.
Why this beats a hand-written style guide
Style guides drift the moment they're committed. They describe what someone intended, go stale as the code moves on, and carry no evidence — a rule is just as loud whether the codebase follows it or not.
Idioms are mined from reality and come with receipts:
- Mined, not declared. No one maintains them. Re-run analysis and they track the code as it stands today.
- Backed by adoption. Every idiom carries a percentage and a call count. "47/52 handlers do it this way" is a fact, not an opinion.
- Drift is visible. A style guide can't tell you which files violate it. The code graph can, down to the line.
- They reach the model. Conventions only help if the AI writing code actually sees them — idioms flow into gateway context and MCP tools, closing the loop your wiki never could.
The result: AI tools generate code that looks like your code, because the patterns they follow were measured from your code in the first place.
Edit this page on GitHub