Credential Scanner Obligations
This page documents the four security obligations that apply to every PR touching
src/zenzic/core/. A PR that resolves a bug without satisfying all four will be rejected
by the Architecture Lead.
These rules exist because a security review demonstrated that four individually reasonable
design choices — each correct in isolation — composed into four distinct attack vectors.
See docs/internal/security/shattered_mirror_report.md for the full post-mortem.
Obligation 1 — The Security Tax (Worker Timeout)
Any PR that modifies ProcessPoolExecutor usage in scanner.py must preserve the
future.result(timeout=_WORKER_TIMEOUT_S) call. The current timeout is 30 seconds.
# ✅ Required form — always use submit() + wait(FIRST_COMPLETED) + result(timeout=...)
futures_map = {executor.submit(_worker, item): item[0] for item in work_items}
raw: list[IntegrityReport] = []
_pending: set[concurrent.futures.Future[IntegrityReport]] = set(futures_map)
while _pending:
done, _pending = concurrent.futures.wait(
_pending,
timeout=_WORKER_TIMEOUT_S,
return_when=concurrent.futures.FIRST_COMPLETED,
)
if not done:
# ZRT-002 deadlock guard: no worker completed within the timeout window
for fut in _pending:
raw.append(_make_timeout_report(futures_map[fut])) # Z902 finding
fut.cancel()
break
for fut in done:
raw.append(fut.result())
# ❌ Forbidden — blocks indefinitely on ReDoS or deadlocked workers
raw = list(executor.map(_worker, work_items))
The Z902 finding (WORKER_TIMEOUT) is not a crash — it surfaces in the standard report
UI. A worker that times out does not kill the scan; the coordinator continues with the
remaining workers.
If your change requires a longer timeout, increase _WORKER_TIMEOUT_S with a comment
explaining the cost and a benchmark proving the worst-case input.
Obligation 2 — The Regex-Canary Protocol
Every [[custom_rules]] entry that specifies a pattern is subject to the
Regex-Canary, a POSIX SIGALRM-based stress test that runs at AdaptiveRuleEngine
construction time.
# _assert_regex_canary() in rules.py — runs automatically for every CustomRule
_CANARY_STRINGS = (
"a" * 30 + "b", # classic (a+)+ trigger
"A" * 25 + "!", # uppercase variant
"1" * 20 + "x", # numeric variant
)
_CANARY_TIMEOUT_S = 0.1 # 100 ms
Test your pattern before committing:
from pathlib import Path
from zenzic.core.rules import CustomRule, _assert_regex_canary
from zenzic.core.exceptions import PluginContractError
rule = CustomRule(
id="MY-001",
pattern=r"your-pattern-here",
message="Found.",
severity="warning",
)
try:
_assert_regex_canary(rule)
print("✅ Canary passed — pattern is safe for production")
except PluginContractError as e:
print(f"❌ Canary failed — ReDoS risk detected:\n{e}")
Patterns to avoid (catastrophic backtracking triggers):
| Pattern | Why dangerous |
|---|---|
(a+)+ | Nested quantifiers — exponential paths |
(a|aa)+ | Alternation with overlap |
(a*)* | Nested star — infinite empty matches |
.+foo.+bar | Greedy multi-wildcard with suffix |
Patterns that are always safe:
| Pattern | Notes |
|---|---|
TODO | Literal match, O(n) |
^(DRAFT|WIP): | Anchored alternation, O(1) at each position |
[A-Z]{3}-\d+ | Bounded character classes |
\bfoo\b | Word-boundary anchored |
Platform note:
_assert_regex_canary()usessignal.SIGALRM, which is only available on POSIX systems (Linux, macOS). On Windows, the canary is a no-op. The worker timeout (Obligation 1) is the universal backstop.
Obligation 3 — The Dual-Stream Invariant
The credential scanner stream and the Content stream in ReferenceScanner.harvest() must
never share a generator. This is the architectural lesson from ZRT-001.
# ✅ CORRECT — independent generators, independent filtering contracts
with file_path.open(encoding="utf-8") as fh:
for lineno, line in enumerate(fh, start=1): # Credential scanner: ALL lines
list(scan_line_for_secrets(line, file_path, lineno))
for lineno, line in _iter_content_lines(file_path): # Content: filtered
...
# ❌ FORBIDDEN — sharing a generator silently drops frontmatter from credential scanner
with file_path.open(encoding="utf-8") as fh:
shared = _skip_frontmatter(fh)
for lineno, line in shared:
list(scan_line_for_secrets(...)) # ← blind to frontmatter
for lineno, line in shared: # ← already exhausted
...
Performance baseline: The dual-scan (raw + normalised line) runs at approximately
235,000 lines/second (12.74 ms median for 3,000 lines over 20 iterations). If a PR
refactors harvest() and CI throughput drops below 100,000 lines/second, investigate
before merging.
Obligation 4 — Mutation Score ≥ 90% for Core Changes
Any PR that modifies src/zenzic/core/ must maintain or improve the mutation score on
the affected module. The current baseline for rules.py is 86.7% (242/279 mutants
killed). Target for rc1: ≥ 90%.
nox -s mutation
The session targets rules.py, credentials.py, and reporter.py. Any PR touching the
_map_credentials_to_finding() conversion function, the SECURITY_BREACH severity path
in ZenzicReporter, or the exit-code routing in cli.py must kill all three mandatory
mutants:
| Mutant name | What is changed | Test that must kill it |
|---|---|---|
| The Invisible | severity="security_breach" → severity="warning" in _map_credentials_to_finding() | test_map_always_emits_security_breach_severity |
| The Amnesiac | _obfuscate_secret() returns raw instead of the redacted form | test_obfuscate_never_leaks_raw_secret |
| The Silencer | _map_credentials_to_finding() returns None instead of a Finding | test_pipeline_appends_breach_finding_to_list |
ResolutionContext pickle validation: Any PR that adds a field to ResolutionContext must
include:
def test_resolution_context_is_pickleable():
import pickle
ctx = ResolutionContext(docs_root=Path("/docs"), source_file=Path("/docs/a.md"))
assert pickle.loads(pickle.dumps(ctx)) == ctx
Reporting integrity: A secret that is detected but not correctly reported is a CRITICAL bug — indistinguishable from a secret that was never detected at all.