Skip to content

Zenzic v0.16.0: Engineering Determinism into Documentation Pipelines

Zenzic v0.16.0 "Magnetite" marks the transition from an experimental toolchain to production infrastructure. The changes in this release address three classes of problems that accumulate silently in long-running documentation pipelines: malformed configuration that allows analysis to start on an invalid basis, unused allowlist entries that widen the security perimeter without purpose, and inconsistent exception types that force CI adapters to implement defensive workarounds.

1. The Bootstrap Gate: Z001 CORE_CONFIG_STRUCTURE

Problem statement

Prior to v0.16.0, a type error in .zenzic.toml could cause Zenzic to start analysis, produce partial output, and exit with an ambiguous message — or, in some code paths, exit 0. A CI pipeline consuming that output would record a clean run on top of a structurally invalid configuration.

The specific failure mode is a type mismatch in a numeric field:

# examples/z001-config-error/.zenzic.toml
[governance]
suppression_cap = "high"   # invalid: must be an integer

suppression_cap expects an integer. When the value is a string, the configuration parser previously applied a best-effort coercion path that masked the error. Analysis proceeded with an unvalidated governance ceiling.

The fix

V0.16.0 introduces ZenzicConfigError, a typed subclass of ConfigurationError, hard-coded to code Z001:

# src/zenzic/core/exceptions.py
class ZenzicConfigError(ConfigurationError):
    """Raised when the project configuration structure is semantically invalid (Z001)."""

    def __init__(self, message: str, context: dict[str, Any] | None = None) -> None:
        super().__init__(message, context, code="Z001")

When this exception is raised, cli_main() intercepts it and — if the output format is sarif or json — serializes a structured error report before exiting:

{
  "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
  "version": "2.1.0",
  "runs": [{
    "tool": { "driver": { "name": "zenzic", "version": "0.16.0", "rules": [] } },
    "invocations": [{
      "executionSuccessful": false,
      "toolExecutionNotifications": [{
        "descriptor": { "id": "Z001" },
        "level": "error",
        "message": { "text": "suppression_cap: expected int, got str ('high')" }
      }]
    }],
    "results": []
  }]
}

The executionSuccessful: false field in the SARIF invocation block is the critical element. The GitHub Code Scanning Security tab reads this field and marks the run as failed — even when results is empty. Z001 surfaces in the Security tab with the exact configuration file and message, without requiring any finding in the scanned documents.

Key guarantee: Z001 fires before any Markdown file is opened. Analysis either starts from a validated configuration or does not start at all.

$ zenzic check all examples/z001-config-error/
# Exit: 1 (Z001 — configuration guard raised before analysis begins)

ADR-020 (Law of the Mirror): every diagnostic code must be reproducible via a physical fixture. The examples/z001-config-error/ directory provides the canonical reproduction case for Z001.


2. Configuration Hygiene: Z110 STALE_ALLOWLIST_ENTRY

Problem statement

absolute_path_allowlist in .zenzic.toml permits specific absolute path prefixes in links without triggering Z105 ABSOLUTE_PATH. This allowlist exists for legacy migration paths. Once the migration is complete, the entries in the allowlist are no longer referenced by any link in the documentation.

An unused allowlist entry is a security-relevant configuration item. It represents a declared exemption from a rule for a path prefix that no longer exists in the codebase. It widens the rule's blind spot without benefit.

The fix

Z110 STALE_ALLOWLIST_ENTRY computes the set of active absolute path prefixes against the set of allowlisted prefixes after every link scan. Any entry in the allowlist that matched zero links emits a warning:

# examples/z110-stale-allowlist/.zenzic.toml
absolute_path_allowlist = ["/legacy/unused/path/"]
.zenzic.toml:1: Stale absolute_path_allowlist entry:
  '/legacy/unused/path/' is never referenced in links.
Code Severity DQS Penalty Exit (default) Exit (--strict)
Z110 warning 1.0 (structural) 0 1

Z110 is a warning by default. Repositories running strict = true in .zenzic.toml exit 1, blocking the CI pipeline. A configuration item that serves no function should be removed, not tolerated.

Fix: remove the stale entry from .zenzic.toml:

# Remove the unused entry
# absolute_path_allowlist = ["/legacy/unused/path/"]

3. Unified Exception Hierarchy

Problem statement

Before v0.16.0, distinct code paths in the scanner raised different exception types for semantically identical conditions — a rule violation. CI adapters implemented catch-by-type logic that required updates every time a new exception subclass was introduced.

The fix

ZenzicViolation is now the canonical exception for any condition where a rule or policy is violated. The full hierarchy is:

ZenzicError
├── ConfigurationError       — missing / malformed config
│   ├── ZenzicConfigError    — semantic invalidity (Z001)
│   └── EngineError          — engine absent or incompatible
├── ZenzicViolation          — rule or policy violation
├── CheckError               — internal check machinery failure
├── NetworkError             — transport-level HTTP failure
└── PluginContractError      — plugin serialisability violation

The cli_main() top-level handler catches ZenzicError in a single branch. The structured context dictionary and code attribute on every exception provide the information needed to produce valid SARIF without pattern-matching on subclass names. Any CI adapter that catches ZenzicError at the boundary receives the full diagnostic payload regardless of which subclass was raised.


4. The Great Decoupling: ADR-075

The Zenzic Core engine produces SARIF, JSON, and text output via a stable, platform-agnostic exit-code contract:

Exit Meaning Suppressible?
0 All checks passed
1 Documentation findings Yes (--exit-zero)
2 Credential detected (Z201/Z204) Never
3 Path traversal (Z202/Z203) Never

The Core contains no code that produces GitHub Annotations, calls the GitHub API, reads GITHUB_OUTPUT, or writes to GITHUB_STEP_SUMMARY. This is a design constraint documented in ADR-075:

Logic that maps Zenzic output to a CI platform's native format must live in the Adapter, never in the Core. Exit codes 2 and 3 propagate unchanged through every adapter.

The Zenzic Action v2.3.0 is the official GitHub Adapter. It reads the Core's exit code and SARIF output, then performs all GitHub-specific operations. The same Core binary can be consumed by a GitLab CI adapter, a Bitbucket Pipeline script, or a local pre-commit hook without any changes to the Core.


5. RE2 and DFA Guarantees

The Zenzic scanner uses an RE2-compatible regex engine for all pattern matching. RE2 compiles patterns to a Deterministic Finite Automaton (DFA), which provides O(N) time complexity with respect to input length N.

Adding new rules to the scanner does not change the worst-case scan time per file. A repository with 10,000 Markdown files scanned against 50 rules takes the same time per-file as a repository scanned against 5 rules — scan time scales linearly with file count, not rule count.

Any plugin rule that introduces a backtracking-capable pattern is rejected by the Z902 RULE_TIMEOUT guard, which enforces a per-file time limit.

Property Guarantee
Time complexity per file O(N) — linear in input length
Regex engine RE2/DFA — no backtracking
Rule timeout guard Z902 — emitted if per-file limit exceeded
New rules affect per-file time No

6. Zenzic Action v2.3.0 and the Integration Blueprints

Aligned with Core v0.16.0, the Action introduces three documented CI blueprints.

Blueprint 1 — Baseline Check: standard link and topology validation on every push and PR.

- name: Run Zenzic Baseline
  uses: PythonWoods/zenzic-action@v2
  with:
    version: "0.16.0"
    format: text
    fail-on-error: "true"

Blueprint 2 — Security Hardening: credential and path-traversal pre-gate with SARIF upload to GitHub Code Scanning.

- name: Run Hardened Zenzic Audit
  uses: PythonWoods/zenzic-action@v2
  with:
    version: "0.16.0"
    format: sarif
    upload-sarif: "true"
    guard-scan: "true"
  permissions:
    contents: read
    security-events: write

Blueprint 3 — PR Governance: DQS regression blocking against a baseline snapshot, with inline annotations and step summary output.

- name: Run Zenzic PR Quality Gate
  id: zenzic
  uses: PythonWoods/zenzic-action@v2
  with:
    version: "0.16.0"
    format: json
    diff-base: .zenzic-score.json
    fail-on-error: "true"

- name: Report DQS
  if: always()
  run: |
    echo "**DQS:** ${{ steps.zenzic.outputs.score }}/100" >> $GITHUB_STEP_SUMMARY
    echo "**Debt:** ${{ steps.zenzic.outputs.suppression-debt-pts }} pts" >> $GITHUB_STEP_SUMMARY

7. The Road to v0.17.0

V0.17.0 will introduce the AST Walker — a traversal layer that operates on the parsed abstract syntax tree of each Markdown file rather than on raw text. This changes the scanning model from pattern matching over strings to structural queries over a typed document model, providing access to node-level metadata not available to a regex scanner: heading hierarchy, list nesting depth, table cell count, and link context.

The RE2/DFA O(N) guarantee is preserved. The AST walker does not introduce pattern matching — it provides a structured input layer that the existing rule engine traverses in O(N) with respect to document node count.

No release timeline is committed. The AST Walker becomes a merge candidate when its integration test suite achieves the same coverage level as the current text-based scanner.


References