Idioms & drift

An idiom is a convention your codebase actually follows — not one you wrote down. Unyform mines them straight from your code graph: recurring naming clusters and module-organization patterns where one canonical choice has won. Each idiom names that canonical example and reports how widely it's adopted. Drift is new code that diverges from an established idiom — reaching for a near-synonym, a legacy alternative, or a one-off when a canonical already exists.

What an idiom looks like

Take handler functions. Your codebase has dozens, and 47 of them return Result<HttpResponse, ApiError> while a handful of older ones return HttpResponse directly. Unyform sees the cluster, picks the dominant member as the canonical, and records the rest as alternatives:

Idiom: handlers return Result<HttpResponse, ApiError>
  family:        handler
  canonical:     create_session  (crates/auth/src/handlers.rs)
  adoption:      87%  ·  47 call sites
  alternatives:  legacy_login, health_check   ← drift

That's the whole shape of an idiom: a purpose, a canonical symbol, an adoption percentage, and the alternatives callers sometimes reach for instead.

Note

Idioms are descriptive, not prescriptive. They report what your codebase does — the canonical is simply the choice that won by usage, not one a maintainer decreed.

How mining works

Mining runs automatically during code analysis, in three passes.

flowchart LR
    A[Code graph<br/>functions · modules · calls] --> B[Cluster<br/>by name + module]
    B --> C[Score adoption<br/>≥70% · ≥3 callers]
    C --> D[LLM naming<br/>purpose + rationale]
    D --> E[Idioms<br/>with canonical + drift]

Cluster every function, class, and module by the first and last word of its name. format_date joins both the format cluster and the date cluster; TextField joins text and field; useDebouncedValue joins use and value. Module organization gets its own pass — "where do error types live?", "which module is the API layer?" — by clustering on directory and cross-module calls. There's no fixed catalogue of patterns, so idioms surface on Rust, TS, Python, and Go alike without anyone enumerating conventions by hand. Generic verbs (get, set, is) and trivially short tokens are blocklisted so we don't cluster on noise.

Score each cluster by incoming calls — a member's usage is its in-degree. The canonical is the most-used member. A cluster only becomes an idiom when the canonical clears two thresholds: it owns ≥70% of the cluster's usage (min_adoption_pct) and has ≥3 callers (min_adoption_count). The percentage gate drops split clusters where the codebase has genuinely diverged; the count gate drops "100% adoption" idioms that are really a sample of one. Both miners then de-duplicate, keeping the higher-confidence idiom when the same canonical surfaces twice.

Name the survivors with an LLM pass. Mining knows what the canonical is; it doesn't know why. A temperature-0 model reads the canonical's signature and docstring plus its alternatives and sample callers, then writes an imperative purpose ("Capture text input from the user") and a short rationale. The structural adoption metric is always the floor — model confidence can raise it but never lowers it. Measured usage outranks opinion.

Tip

The thresholds are deliberately conservative. Unyform would rather surface fewer, high-confidence idioms than flood the board with weak signals. On our own workspace, mining surfaced ~164 idioms in ~157ms.

Drift detection

Once an idiom exists, every symbol in its cluster is either conforming or drifting. The canonical and the call sites that reach for it earn FollowsIdiom edges; the alternatives — the near-synonyms and legacy holdouts — earn DriftsFromIdiom edges. Drift isn't an error; it's a signal. A 90%-adopted idiom with a few drift sites usually means a half-finished migration or a corner the convention hasn't reached yet.

Warning

High drift on a low-adoption idiom (just over the 70% line) often means the convention is still contested — two patterns competing, neither clearly canonical yet. Read those as "the team hasn't decided" rather than "someone broke the rule."

Where idioms show up

Mined idioms aren't a report you read once. They flow into the three surfaces that shape how code gets written.

On the blueprint (dashboard)

The Idiom Board renders idioms grouped by family (component, hook, handler, error, data, …). Each card shows the purpose, an adoption chip (green ≥85%, amber ≥70%), the canonical symbol deep-linked to its exact line on GitHub, a call count, and an expandable drift list of the alternatives. Filter by family, search by name or rationale, and click straight through to the source of truth.

In gateway context

When a request flows through a gateway, the attached blueprint's idioms ride along as injected context — so the model generates code that matches your canonical patterns, not generic stdlib snippets.

As MCP tools

Tools that speak MCP can query idioms directly, on demand:

MCP_gw_codegraph_list_idioms

List the canonical artifacts this codebase has standardized on, with adoption metrics. Answers "how do we do X here?" — optionally filtered by family.

MCP_gw_codegraph_idiom_lookup

Find the canonical artifact for a free-form purpose ("format a date", "log an audit event"). Returns the canonical symbol, its adoption metrics, and the rationale — meant to be called before writing new code so suggestions follow your conventions.

Why this beats a hand-written style guide

Style guides drift the moment they're committed. They describe what someone intended, go stale as the code moves on, and carry no evidence — a rule is just as loud whether the codebase follows it or not.

Idioms are mined from reality and come with receipts:

Mined, not declared. No one maintains them. Re-run analysis and they track the code as it stands today.
Backed by adoption. Every idiom carries a percentage and a call count. "47/52 handlers do it this way" is a fact, not an opinion.
Drift is visible. A style guide can't tell you which files violate it. The code graph can, down to the line.
They reach the model. Conventions only help if the AI writing code actually sees them — idioms flow into gateway context and MCP tools, closing the loop your wiki never could.

The result: AI tools generate code that looks like your code, because the patterns they follow were measured from your code in the first place.

Edit this page on GitHub

← Codegraph Policies →