Skip to main content

Core Mechanics

Zenzic's validation engine relies on several fundamental architectures to guarantee deterministic, zero-false-positive results without executing a full site build.

The Virtual Site Map (VSM)

When Zenzic validates your links, it does not simply check whether a target file exists on disk. Instead, it builds a Virtual Site Map (VSM) — a pure in-memory projection of what your build engine will actually serve to readers.

The VSM maps every canonical URL to a Route entry:

FieldMeaning
urlThe URL a browser would request, e.g. /guide/install/
sourceThe source file that produces this URL, e.g. guide/install.md
statusWhether the page is reachable, orphaned, ignored, or in conflict
anchorsHeading anchors pre-computed from the source file

Each route carries a status that tells Zenzic how to treat links pointing to it:

StatusMeaningLink result
REACHABLEPage is listed in navigation or is a locale routeValid
ORPHAN_BUT_EXISTINGFile exists on disk but is not in site navigationZ103 error
IGNOREDExcluded by configuration (e.g. README files, private directories)Z101 error
CONFLICTTwo source files produce the same canonical URLZ101 error

Why this matters: A file can exist on your filesystem and still be IGNORED in the VSM. A URL can be REACHABLE in the VSM without having a corresponding file on disk (for example, locale index routes). The VSM is the authority — Zenzic checks reachability, not just file existence.

This design means that zenzic check links catches problems that a naive file-existence check would miss: pages removed from navigation, conflicting routes, and orphaned content that readers cannot discover through normal browsing.

The credential scanner Architecture

The Zenzic credential scanner uses a dual-stream architecture to ensure that no part of a file escapes credential scanning.

When the Reference Scanner processes a file, it creates two independent streams:

The two streams have opposite filtering rules by design. The Content stream must skip YAML frontmatter to avoid parsing metadata like author: Jane Doe as a broken reference definition. The credential scanner stream must see frontmatter because a key like aws_key: AKIA... hiding in YAML metadata is a real secret that must be caught. The streams never share a data source — merging them would create a blind spot.

Pre-Scan Normalizer. Before running detection patterns, the credential scanner normalises each line to defeat obfuscation. Inline code backticks are unwrapped, concatenation operators are removed, and table pipe characters are collapsed. This means a secret broken across Markdown table columns — such as an AWS key split into `AKIA` + `suffix` — is reassembled before scanning. Both the raw and normalised forms are checked, and a deduplication set prevents double-reporting.

ReDoS Protection. Custom regex patterns declared in [[custom_rules]] are compiled through RE2 compatibility gates at load time. Unsupported constructs (for example backreferences or lookarounds) are rejected before any scan begins. Separately, the parallel worker watchdog still emits Z902: RULE_TIMEOUT if a worker stalls at runtime because of a systemic hang (for example I/O or coordinator starvation) rather than a regex backtracking canary.

Documentation is a Knowledge Graph — a densely interconnected network where cross-linking between pages is expected and desirable. If a Tutorial links a Reference page for technical details, it is natural and beneficial for that Reference page to link back to the Tutorial as a working example. Circular link patterns are therefore structural data points, not defects.

Cycle detection is computed once with iterative DFS during resolver construction (Pass 1.5, Θ(V+E)). Every Pass 2 membership lookup against the cycle registry is O(1).

Why the engine computes cycles at all. The DFS traversal is a mechanical requirement of the Virtual Site Map builder: without identifying cycles, the recursive graph walk would loop infinitely. Detection is necessary to make the resolver terminate — it is not triggered by a quality concern.

Three-Pass Reference Pipeline

To ensure accurate link validation that supports out-of-order reference definitions, Zenzic executes a strict Three-Pass Pipeline:

PassNameWhat happens
1HarvestStreams every line; records [id]: url definitions; runs the credential scanner on every URL and line
2Cross-CheckResolves every [text][id] usage against the complete ReferenceMap; flags unresolvable IDs
3Integrity ReportComputes per-file integrity score; appends Dead Definition and alt-text warnings

Pass 2 always runs after Pass 1 harvest completion. Security findings from Pass 1 affect exit semantics (exit code 2) but do not skip Pass 2 cross-check.