Skip to content

Zenzic v0.19.0: The AST Foundations & Atomic Auto-Fix

Zenzic v0.19.0 is arguably the most significant structural milestone for the engine since its inception. Until today, Zenzic operated entirely as a read-only static analyzer, relying on heavily optimized regular expressions to validate Markdown. With v0.19.0, we have laid the architectural foundation for a new era of document mutation and auto-fixing.

This release represents a fundamental shift in how Zenzic understands and interacts with Markdown content.

The Shift from Regex to a Lossless AST

Relying exclusively on regex is exceptionally fast but ultimately constrained. Complex Markdown constructs—like nested blockquotes, tables, and layered emphasis—cannot be safely understood or mutated using flat string matching.

To solve this, we implemented a full Abstract Syntax Tree (AST) using the Composite Pattern. Zenzic now parses Markdown documents into a hierarchical structure of Node, BlockNode, and InlineNode objects. This allows the engine to understand the semantic intent of the document (e.g., distinguishing a link inside a code block from a link inside a paragraph).

Most importantly, our AST is lossless. When Zenzic serializes the tree back into a string, it guarantees byte-for-byte identity with the original input. This is the cornerstone of safe document mutation: Zenzic can surgically alter a specific node without altering a single space or newline anywhere else in the document.

The O(N) Inline Tokenizer

Markdown parsing is notoriously vulnerable to catastrophic backtracking, particularly when handling unclosed tags or nested emphasis (e.g., ***text***).

To preserve our strict performance guarantees, we engineered a custom inline tokenizer that operates entirely without backtracking. By utilizing a character-by-character state machine, the tokenizer scans the string linearly. This ensures that the engine processes even severely malformed Markdown in strictly $O(N)$ time, preserving Zenzic's resilience against ReDoS (Regular expression Denial of Service) and maintaining its incredible speed.

zenzic fix and the Atomic Write Barrier

With a lossless AST in place, v0.19.0 introduces the zenzic fix command. For the first time, Zenzic can automatically repair the issues it finds, starting with Z108 (Empty Link Text).

However, modifying source files programmatically introduces the risk of data destruction during unexpected crashes or power failures. To mitigate this, zenzic fix is guarded by an Atomic Write Barrier:

  1. Memory-Only Mutator: The AST is mutated in-memory.
  2. Temporary File: The updated document is serialized and written to a temporary .tmp file in the same directory.
  3. Atomic Replacement: Only after the OS confirms the write was successful does Zenzic issue an atomic os.replace() call.

This guarantees that a source file is never partially written or corrupted. The original file remains completely intact until the exact microsecond it is safely replaced by the corrected version.

Hostile Precision in File Mutation

Zenzic's philosophy of "Hostile Precision" remains central to the new mutation engine. We refuse to guess intent or apply heuristic fixes.

By default, zenzic fix operates as a dry-run, outputting a clear, unified diff of the proposed changes. To commit changes to disk, the user must explicitly provide the --apply flag. Furthermore, when Zenzic auto-fixes an empty link (Z108), it injects a deterministic [MISSING LINK LABEL] marker. It does not attempt to scrape URLs or guess appropriate labels. It reliably converts a structural accessibility error into a visible, trackable content debt (Z501) that the author must deliberately resolve.

Zenzic v0.19.0 proves that extreme performance and uncompromising safety can coexist in a mutation engine. Welcome to the era of the AST.