Skip to main content

Zenzic — Architectural Gaps & Roadmap

"What is not documented, does not exist; what is documented poorly, is an ambush."

This page tracks gaps that were closed during the v0.7.0 cycle and those that remain open for the v0.8.0 roadmap. It is a living document — updated each sprint.


Open — Target v0.8.0

GAP-001 — Auto-Fix Engine

Component: cli/_check.py, new core/fixer.py Description: Zenzic detects but does not repair. A contributor who receives a Z501 (placeholder) or Z502 (short content) finding must locate and edit the file manually. An Auto-Fix engine would apply safe, reversible patches directly to source files — replacing placeholder tokens, stubbing short sections, and reporting what was changed.

Planned semantics:

zenzic fix all # dry-run by default: shows diff, writes nothing
zenzic fix all --apply # writes changes; staged via git diff
zenzic fix links # fixes only Z101/Z104 (dead links) — renames or stubs

Design constraints:

  • Auto-fix must never touch files that triggered Z201 (Shield secret) — those require

    human judgment.

  • Exit code semantics are unchanged: --apply still exits 1 if unfixed findings remain.

  • Pure Python, no subprocess (Pillar 2).

Status: Design phase. No code merged.


GAP-002 — Dynamic Navbar/Footer Plugin Support

Component: core/adapters/_docusaurus.py, _parse_config_navigation() Description: Docusaurus supports navbar items declared via @docusaurus/plugin-* plugins (e.g. plugin-content-docs multi-instance, custom navbar components). When the navbar is populated dynamically at build time, Zenzic's static regex parser cannot extract those paths — it falls back to treating all files as REACHABLE.

Impact: Low false-positive risk (the fallback is conservative), but some true orphans may be missed in plugin-heavy configurations.

Planned resolution: A structured warning (::warning annotation in CI mode) when Zenzic detects dynamic navbar plugins, indicating that orphan detection may be incomplete. The user can suppress it with dynamic_nav_plugins = true in zenzic.toml.

Status: Tracked. RFC open.


Closed in v0.7.0 — Operation Obsidian Stress

What was the Operation?

Before the v0.7.0 release, four AI agents were instructed to break Zenzic's Shield (credential scanner) using realistic bypass techniques. They found four real vectors. All were closed before stable release.

See the full technical post-mortem: AI Red Team Attacks Code Linter

ZRT-001 — Unicode Normalization Bypass (Shield)

Identified by: AI Red Team agent "Alpha" during Operation Obsidian Stress Component: core/shield.py, scan_lines_with_lookback() Description: The Shield's regex patterns matched ASCII credential shapes. An attacker controlling a Markdown file could insert a Unicode lookalike character (e.g. ghp_… using fullwidth Latin letters) into what appeared to be a token. The Shield would not fire because the byte sequence did not match the ASCII pattern.

Resolution: Unicode normalization (NFKC) is applied to each line before pattern matching. unicodedata.normalize("NFKC", line) collapses fullwidth, superscript, enclosed, and other Unicode lookalikes to their ASCII canonical form. The original line content is preserved for output; only the normalized copy is matched against.

Lesson: Regex-based credential detection must normalize input. The attack surface is not the pattern — it is the encoding.

Closed in: v0.7.0 sprint D038.


ZRT-002 — Lookback Buffer Escape (Shield)

Identified by: AI Red Team agent "Bravo" during Operation Obsidian Stress Component: core/shield.py, scan_lines_with_lookback() Description: The Shield's lookback buffer was used to detect multi-line credential constructs (e.g. a password: key on one line, the value on the next). Agent Bravo inserted a sufficiently long "filler" block (> buffer size) between the key and value lines. The buffer emptied before the value line was scanned, breaking the association and suppressing the Z201 finding.

Resolution: The lookback buffer size was validated against the maximum known multi-line credential pattern length in the registry (codes.py). The buffer is now guaranteed to span the maximum pattern window. Additionally, the buffer is flushed on file boundaries only — never mid-file.

Lesson: Buffer-based detection requires formal sizing against the worst-case pattern. An informal "large enough" buffer is not a security guarantee.

Closed in: v0.7.0 sprint D039.


ZRT-003 — HTML Entity Obfuscation (Shield)

Identified by: AI Red Team agent "Charlie" during Operation Obsidian Stress Component: core/shield.py Description: The Shield scanned raw Markdown bytes. Agent Charlie used HTML entity encoding (ghp_… for ghp_…) inside fenced code blocks. The Shield's patterns did not match the entity-encoded form, allowing a fake credential to pass undetected.

Resolution: A lightweight HTML entity decoder is applied to each line before Shield pattern matching (after NFKC normalization). The decoder handles numeric (g) and named (&) entities. XML/HTML character references are normalized to their Unicode codepoints before the regex runs.

Lesson: Multi-encoding defense requires layered normalization. A single normalization pass (NFKC only) is insufficient when HTML rendering is part of the content pipeline.

Closed in: v0.7.0 sprint D040.


ZRT-004 — Fenced Block Scope Confusion (Shield)

Identified by: AI Red Team agent "Delta" during Operation Obsidian Stress Component: core/shield.py, fenced block state machine Description: The Shield originally skipped scanning inside triple-backtick fenced blocks, reasoning that code examples are not live secrets. Agent Delta embedded a ghp_ pattern inside a bash fenced block. The Shield did not fire.

Resolution after deliberation: The "skip fenced blocks" heuristic was reversed. The Shield now scans all lines, including fenced code blocks. The rationale: a documentation file that leaks a real credential inside a bash block is still leaking a real credential. The example nature of the block is irrelevant to the security outcome.

A # zenzic: ignore-next-line comment is the authorized mechanism for authors who need to include a credential-shaped string in a documented example (e.g. showing the format of a GitHub token without using a real one). The examples/matrix/red-team/ fixtures demonstrate this pattern.

Lesson: Heuristic scope exclusions that reduce false positives often create false negatives in adversarial conditions. Security-critical passes should default to scan everything, authorize exceptions explicitly.

Closed in: v0.7.0 sprint D041.


Closed Earlier (Pre-v0.7.0)

ZRT-005 — Bootstrap Paradox

Component: core/scanner.py Description: zenzic init crashed with a configuration error when invoked in an empty directory. The find_repo_root() function had no fallback, making it impossible to initialize a project that did not yet have a .git or zenzic.toml marker. Resolution: fallback_to_cwd=True parameter added to find_repo_root(), used exclusively by zenzic init. See ADR 003. Closed in: v0.6.0a4.


Component: core/validator.py — Phase 2 link validation loop

Description: When a Docusaurus project declares routeBasePath-owned prefixes (e.g. /blog/) via get_absolute_url_prefixes(), the validator suppresses Z105 (ABSOLUTE_PATH) for links starting with those prefixes. The suppression was implemented as a bare continue, which exited the per-link iteration before the VSM lookup — making Z001 impossible to fire on absolute prefix-owned links.

A second compounding issue: DocusaurusAdapter.set_slug_map() was never called during validate_links_async(), so the slug map was empty at VSM construction time. Blog posts declaring slug: my-post in frontmatter were routed via filename derivation instead (e.g. 2026-04-29-my-post/blog/my-post/), producing a VSM that diverged from the URLs Docusaurus actually served.

Combined effect: A link /blog/wrong-slug where the real slug was /blog/correct-slug produced no finding from Zenzic, while docusaurus build failed with a broken-link error. The sentinel was blind to the most common post-rename failure mode.

Resolution: Two coordinated fixes in core/validator.py:

  1. Lifecycle orderingadapter.set_slug_map(md_contents) is now called (via hasattr guard for cross-engine safety) immediately before build_vsm(). The VSM is built on the correct virtual identity, not the physical filename.

  2. Scoped VSM lookup — After Z105 suppression, the validator checks whether the matched prefix has at least one route in the VSM (_scanned_vsm_prefixes). If so, it performs a dict.get() lookup and reports FILE_NOT_FOUND when the route is absent. Prefixes with no VSM entries (sibling plugins whose markdown is outside the scan scope) retain the unconditional bypass — Zero-Config invariant preserved.

Cross-engine impact: MkDocs, Zensical, and Standalone adapters do not implement set_slug_map(). The hasattr guard makes the call a no-op for those engines — no behaviour change.

Regression lock: tests/test_docusaurus_blog_vsm.py — class TestAbsoluteSlugMismatch — two new tests:

  • test_absolute_broken_blog_link_is_detected — wrong slug raises FILE_NOT_FOUND
  • test_correct_absolute_slug_link_is_clean — correct slug produces no error

Closed in: v0.7.0.

D100 — Privacy Gate Migration: .zenzic.dev.toml.zenzic.local.toml

Component: cli/_standalone.py, models/config.py, core/shield.py, core/codes.py

Description: The original D002 Environmental Privacy Gate (_scaffold_dev_toml) created .zenzic.dev.toml with a [development_gate] table that held forbidden_patterns for export redaction. This file was not integrated into the Shield scanning pipeline — it served only as a local redaction hint for export tooling. The patterns were never checked against documentation content, so a developer could inadvertently publish a document containing a forbidden code-name without any Zenzic warning.

This gap created a false sense of security: users configured forbidden_patterns expecting Zenzic to block those terms from documentation, but the scan never happened.

Resolution (Sprint D100 — v0.7.0):

  1. New canonical file: .zenzic.local.toml replaces .zenzic.dev.toml as the machine-local, git-ignored privacy configuration. It is a flat TOML file with a top-level forbidden_patterns = [...] key.

  2. Automatic .gitignore management: zenzic init now always scaffolds .zenzic.local.toml and appends the filename to .gitignore if the file exists and the entry is absent. No manual step required.

  3. Config deep-merge: ZenzicConfig.load() performs an additive merge of forbidden_patterns from .zenzic.local.toml after loading the primary config (zenzic.toml or [tool.zenzic]). Duplicates are removed; insertion order is preserved.

  4. Z204 FORBIDDEN_TERM — Exit 2: scan_line_for_forbidden_terms() in core/shield.py performs a case-insensitive verbatim substring scan against the merged forbidden_patterns list. Any match on any line of any documentation file is emitted as a SecurityFinding with secret_type="FORBIDDEN_TERM". The scanner bridges this to Z204 (not Z201), preserving clear separation between credential leaks and forbidden-term violations.

  5. Backward compatibility: _scaffold_dev_toml() is retained as a shim that delegates to _scaffold_local_toml(). No external callers need updating.

Brand Integrity Shield — Two-Layer Design: The Z204 Privacy Gate and the Z905 Brand Obsolescence Guard form a two-layer architecture:

  • Z204 (forbidden_patterns in .zenzic.local.toml): exit 2, non-suppressible. Designed for private terms that must never appear in any published doc.
  • Z905 (obsolete_names in zenzic.toml): exit 1, suppressible with zenzic:ignore Z905. Designed for deprecated brand terms where historical references in CHANGELOG files are acceptable.

Closed in: v0.7.0 sprint D100.