Skip to main content

Philosophy

Documentation is infrastructure

Source code has compilers, type checkers, and linters that prevent broken code from reaching production. Documentation has historically had none of these. A broken link, a leaked credential, or a stub page that was never completed can ship to users without any automated gate catching it.

Zenzic exists to close that gap. Documentation health should be measurable, deterministic, and rigorously enforced — caught with the same mathematical certainty as a type error in source code.


The three pillars

Every design decision in Zenzic follows three rules. When in doubt about an implementation choice, these are the tie-breakers. If a line of code violates any of these pillars, it is instant technical debt.

1. Lint the source, not the build. Zenzic analyses raw Markdown files under docs/ and configuration files (mkdocs.yml, docusaurus.config.ts, zensical.toml). It never waits for an HTML build. Analysis at the source level is faster, generator-agnostic, and always reproducible. Where a build engine defines resolution semantics (e.g. i18n fallback), Zenzic reads the configuration as plain text and emulates the semantics — without importing or executing the engine.

2. No subprocesses in the core engine. The core library is 100% pure Python. It never calls subprocess.run or os.system to invoke external tools. This guarantees security (zero execution of untrusted code), total portability, and extreme speed. Link validation is implemented natively with a Markdown link extractor. Zenzic is self-contained and testable without installing any documentation generator.

3. Pure functions first. All validation logic lives in pure functions: no file I/O, no network access, no terminal output. I/O happens only at the edges — in CLI wrappers that read files from disk. Pure functions are trivially testable and can be composed freely. This is why Zenzic supports parallelism without race conditions.


The Safe Harbor principle

The documentation tooling ecosystem is not stable. MkDocs introduces breaking changes. New build systems emerge. Teams migrate from one generator to another. In this environment, a quality tool that is tightly coupled to a specific build engine creates a dangerous dependency: when the engine changes, the quality gate breaks, and projects are left without verification during the exact window when mistakes are most likely.

Zenzic is designed to be a Safe Harbor — a fixed, stable point that remains valid before, during, and after a build engine migration. This is not a coincidental feature; it is the core architectural commitment.

The implementation of this commitment is absolute engine-agnosticism:

  • Zenzic reads raw Markdown files and configuration as plain data. It never imports or executes a documentation framework.
  • Engine-specific knowledge (nav structure, i18n conventions, locale fallback rules) is encapsulated in adapters — thin, replaceable components that translate engine semantics into a neutral protocol. The Core never sees a MkDocsAdapter or ZensicalAdapter — it sees only a BaseAdapter that answers five questions.
  • Third-party adapters install as Python packages and are discovered at runtime via entry-points. Adding support for a new engine (Hugo, Docusaurus, Sphinx) requires no Zenzic release.

The practical consequence: a project migrating from MkDocs to Zensical — or to Hugo, Docusaurus, or any future generator — can run zenzic check all continuously against both configurations simultaneously. A project that has not yet decided on a build engine can still validate its documentation quality today, using Vanilla mode with zero configuration.

Documentation ecosystems evolve constantly — build engines introduce breaking changes, configuration formats shift, plugin APIs are redesigned. Zenzic's position is intentionally conservative and user-protective:

  • Every supported engine receives the same level of validation fidelity.
  • Existing configuration files are validated as source contracts, not as runtime promises.
  • Unknown or future configuration keys are treated with tolerant parsing, not hard failures.

In practice, this means a team can defer migration decisions without losing quality gates. Zenzic keeps the verification layer alive while the ecosystem around it evolves.


The frictionless sentinel

A linter is only as good as its false-positive rate. When tools blindly raise errors, developers learn to ignore them — and a tool that is ignored provides no safety.

Zenzic's architecture is designed around a single constraint: every finding must be actionable. This shapes several decisions:

  • In-memory multi-pass algorithms. The Two-Pass Reference Pipeline separates definition harvesting from usage checking, eliminating the false positives caused by forward references that a naive single-pass scanner would produce.
  • i18n fallback awareness. A link from a translated page to a default-locale asset is not a broken link — the build engine will serve the fallback at runtime. Zenzic suppresses it.
  • Vanilla mode for nav-agnostic projects. When Zenzic has no nav declaration to compare against, it skips the orphan check entirely rather than flagging every file as an orphan.

The goal is a tool that is utterly silent when your documentation is sound, and surgically precise when it is not.


The validation pipeline

Every Zenzic run follows the same four-stage pipeline, regardless of the documentation engine:

Source — Zenzic reads raw .md/.mdx files and zenzic.toml directly from disk. No HTML build required.

Shield IO — Every frontmatter line passes through safe_read_line() before any parser sees it. If a credential pattern matches (OpenAI key, GitHub token, AWS key, etc.), Zenzic raises ShieldViolation and aborts with Exit 2 — the line is never stored or parsed. This makes the Shield an IO middleware: it sits between disk reads and metadata extraction.

Metadatabuild_metadata_cache() performs single-pass eager harvesting of YAML frontmatter across all Markdown files, producing a FileMetadata per file (slug, draft, unlisted, tags). This data feeds both the Adapter and the VSM without re-reading any file.

Adapter — selected by build_context.engine (or --engine). Translates engine-specific knowledge into engine-agnostic RouteMetadata (canonical URL, status, slug, route base path, proxy flag) via the get_route_info() API. Four built-in adapters ship with Zenzic; third-party adapters install via pip and are discovered through the zenzic.adapters entry-point group. Legacy adapters that only implement map_url() + classify_route() are supported via automatic fallback in build_vsm().

Rule Engine — applies all active checks to the raw Markdown text: built-in checks (orphans, snippets, placeholders, assets, references), project-specific [[custom_rules]] from zenzic.toml, and any installed plugin rules (Python classes registered under zenzic.rules).

Findings — results are aggregated into typed IntegrityReport objects (per file) and rendered as Rich terminal output, JSON, or a 0–100 quality score.

Scalability profile

Site sizePhase 1Phase 1.5Phase 2Total
100 pages, 500 links< 5 ms< 1 ms< 2 ms~ 8 ms
1 000 pages, 5 000 links~ 30 ms~ 8 ms~ 15 ms~ 55 ms
10 000 pages, 50 000 links~ 300 ms~ 80 ms~ 150 ms~ 530 ms
50 000 pages, 250 000 links~ 1.5 s~ 400 ms~ 750 ms~ 2.6 s

All measurements are single-threaded on a mid-range CI runner (2 vCPU, 4 GB RAM). The Shield scan (Phase 1, overlapping) adds < 10% overhead regardless of site size because it is a single regex pass per file.


Determinism as an architectural guarantee

Quality scores are worthless if they are not reproducible. Zenzic guarantees that for any given documentation state, it always produces the same score — no nondeterminism from network state, build artifacts, or filesystem ordering.

This determinism is what makes zenzic diff reliable. Tracking a score over time only makes sense if the score is computed by a pure function of the source files. The --save / diff workflow is built entirely on this guarantee.


Security is not optional

Credential leaks in documentation are a real attack vector. A developer copies an API key into a Markdown example and commits it. The reference pipeline's Shield component addresses this directly: every reference URL is scanned for known credential patterns during Pass 1, before any HTTP request is issued and before any downstream processing touches the URL.

Exit code 2 is reserved exclusively for Shield events (credential detection). It can never be suppressed by --exit-zero, --strict, or any other flag. A Shield detection is a build-blocking security incident — by design.

Exit code 3 — the Blood Sentinel — guards against system-path traversal. When a Markdown link references a path that escapes the documentation root (e.g. ../../etc/passwd), Zenzic terminates immediately with exit code 3. Like the Shield, the Blood Sentinel cannot be suppressed or downgraded. These two non-negotiable exit codes form Zenzic's security perimeter: no configuration, no flag, and no adapter can silence them.


The Headless Architecture

Zenzic v0.6.0 introduced the Headless Architecture: the verification engine (zenzic) and the documentation site (zenzic-doc) live in separate repositories.

This separation is deliberate and permanent:

  • The engine has no opinion on how its own documentation is rendered. The core repository contains Python source, tests, and example fixtures. It does not contain a documentation framework, a theme, or a CSS file.
  • The documentation site is a sovereign Docusaurus project. It has its own package.json, its own i18n system, its own React components. It is not a subdirectory of the engine — it is a peer.
  • Zenzic validates its own documentation. The docs repo carries a zenzic.toml that configures the Docusaurus adapter. Every documentation pull request is linted by the tool it documents. This is the recursive proof that engine agnosticism works.

The Headless Architecture is the structural implementation of the Safe Harbor principle: the verification layer and the presentation layer have no shared dependencies, no shared build steps, and no shared failure modes.


Cross-ecosystem reach

Zenzic is written in Python, but its reach is not limited to Python documentation.

The Docusaurus adapter demonstrates this concretely: Docusaurus is a Node.js framework built on React. Zenzic analyses its configuration (docusaurus.config.ts) by reading it as text and extracting structural data with pattern matching — no Node.js runtime, no npm install, no JavaScript evaluation.

Support levelEngineSSG languageAdapter
NativeDocusaurusNode.jsDocusaurusAdapter
NativeMkDocsPythonMkDocsAdapter
NativeZensicalPythonZensicalAdapter
AgnosticVanillaanyVanillaAdapter
ExtensibleHugo, Sphinx, Jekyll, AstroanyThird-party via entry-point

The adapter protocol is intentionally lean — five core methods — enabling developers to integrate new engines with surgical precision and minimal architectural overhead, rather than requiring months of reverse-engineering.


Why "Obsidian Glass"

Obsidian is volcanic glass — formed under extreme pressure, naturally transparent, and harder than steel. It is the material that ancient civilisations used for surgical instruments because nothing else could match its edge.

Zenzic aspires to be obsidian: transparent (deterministic, reproducible, source-only), hard (non-suppressible security gates, strict mode by default in CI), and precise (zero false positives as the design target, not the marketing claim).

The name is also a reminder: glass is fragile if you strike it wrong. A quality tool that cannot be trusted is worse than no tool at all. Every design decision in Zenzic is made with this in mind — the tool must earn the right to interrupt a developer's workflow by being unfailingly correct when it does.


Zenzic is proudly built by PythonWoods. Although its heart beats in Python, its mission is universal: to provide a Safe Harbor for technical documentation, regardless of the language or framework chosen for the build.