ADR 001: Lint the Source, Not the Build

Status: Active (Genesis Decision) Decider: Architecture Lead Date: 2026-01-01 (founding principle, pre-v0.1.0)

Context

When Zenzic was conceived, the dominant approach to documentation validation was output-based analysis: tools like linkchecker and htmlproofer fetch or parse the HTML generated by the build engine, then traverse the rendered page structure to verify link targets, image paths, and anchor IDs.

This approach has a fundamental structural flaw: the validator is downstream of the build. Validation can only run after the build succeeds. If the build fails — due to a syntax error, a missing plugin, or an engine version mismatch — no validation occurs at all. The pipeline produces silence where it should produce a diagnostic.

Three compounding problems emerge in CI environments:

Build coupling. A documentation validator that requires a successful build

cannot be the first gate in the pipeline. It must be placed after mkdocs build or npm run build, adding 2–10 minutes of build overhead before a single link is checked.
Engine fragility. Build engines change how they generate anchor IDs, URL

slugs, and asset paths between minor versions. A validator calibrated to the output of MkDocs 1.5 may silently miss broken links under MkDocs 1.6 because the ID generation scheme changed. The validator is, in effect, testing the engine's output rather than the author's intent.
Engine lock-in. A validator that understands HTML from one engine cannot

validate HTML from another without engine-specific adaptation. This creates a validation ecosystem that fragments along engine lines rather than converging on universal documentation quality standards.

The "MkDocs Crisis" — a period during Zenzic's early development when the reference documentation lost all link validity due to an MkDocs upgrade that changed slug generation — crystallised the cost of output-based validation. The error was not in the Markdown source; it was in the mismatch between the source and the engine's new URL convention. An output-based validator would have caught this only after the broken site was deployed.

Decision

Zenzic analyzes raw Markdown source files and static configuration files exclusively. It never inspects, fetches, or depends on HTML build output.

The implementation vehicle for this decision is the Virtual Site Map (VSM) — a complete in-memory projection of the final site, constructed from source files alone, using engine-specific knowledge encoded in adapters (see ADR 005, ADR 007).

The VSM allows Zenzic to answer questions that previously required a live site:

"Does this anchor #installation exist in the target page?" — answered by

parsing the Markdown heading structure, not the rendered HTML.
"Is this path /docs/reference/finding-codes a valid route?" — answered by

the VSM's route graph, which models i18n fallbacks and versioned slugs without executing the build.
"Is this asset referenced in docusaurus.config.ts present on disk?" — answered

by static parsing of the TypeScript config file, not by starting a Node.js process.

Rationale

1. Pre-Build Error Prevention

A broken link discovered before the build is a developer warning. A broken link discovered after a 10-minute build is a CI failure that blocks the PR queue. Zenzic's position in the pipeline is always before the build — it is the gate that certifies the source is structurally sound before any build resource is consumed.

2. Engine Agnosticism by Design

By analyzing source files rather than build output, Zenzic is inherently engine-agnostic. The same check links command validates an MkDocs project, a Docusaurus site, and a Zensical wiki — because all three share the same raw Markdown format. Engine-specific URL conventions are encoded in the adapter layer (not in the validator), making the core engine permanently portable.

3. Deterministic Analysis

Source files are static. A given set of Markdown files produces the same analysis results regardless of which machine runs Zenzic, which Python version is installed, or which timezone the CI runner is in. Build-output validators introduce non-determinism through engine version drift, network-fetched pages, and CDN caching. Zenzic's source-based analysis is a pure function of the repository state — identical input, identical output, always.

4. The Ghost Route Capability

The VSM models routes that do not exist as physical files on disk: i18n fallback routes, versioned documentation slugs, and engine-generated index pages. An output-based validator can only test routes that the build produces. Zenzic's VSM models the intent of the documentation architecture, catching structural errors in routes that the author planned but hasn't yet published.

Invariants (Non-Negotiable)

Zenzic's validation logic (core/validator.py, core/scanner.py) must never

start an HTTP request, load a browser, or parse HTML. All analysis operates on bytes read from the filesystem.
The VSM (models/vsm.py) is the canonical source of route truth. No validator

may compute a route by invoking the build engine — even as a subprocess.
Adapters may read static configuration files (.ts, .yml, .toml) using

pure-Python text parsing. They must not execute those files (see ADR 002).

Consequences

Zenzic's analysis performance is content-dependent. Measured against

the real zenzic-doc project (59 MDX pages with JSX, frontmatter, and tables): ~420 ms of pure analysis time on a warm Python process. Simple Markdown projects with minimal frontmatter and no JSX can scan 200 files in ~100 ms. End-to-end wall time on a cold uvx invocation adds ~2–8 s of Python interpreter startup on top of analysis time. Run python scripts/benchmark.py --repo <path> to measure your own project.
Zenzic can be placed as the first step in any CI pipeline, before

npm install, before pip install, before the build engine is even available.
Engine-specific quirks (Docusaurus anchor generation, MkDocs nav contracts,

Zensical slug conventions) are isolated in the adapter layer. The core engine is permanently engine-neutral.
The VSM provides a testable, inspectable data structure for documentation

architecture — enabling future capabilities like structural diffing, coverage metrics, and ghost route detection without modifying the analysis core.

Context​

Decision​

Rationale​

1. Pre-Build Error Prevention​

2. Engine Agnosticism by Design​

3. Deterministic Analysis​

4. The Ghost Route Capability​

Invariants (Non-Negotiable)​

Consequences​