Reproducibility cuts both ways. Every Verdict carries an audit_trail_id that pins it to a specific engine + prompt version. So we owe you a public log of every version that version pointer can reference.
Current production · engine v3.8.14 · Engine version 41
Engine version 41Live in production
2026-05-22
Recall engine-variant scope hardening
§0.2.5 NEW — Recall engine-variant scope rule. NHTSA returns recalls cohort-level by year/make/model; a single cohort can ship with multiple engine variants. The new rule requires the engine to read each recall’s Summary + Component fields before citing, surface the engine-variant qualifier (8.1L, 5.7L Hemi, etc.) alongside every citation, and explicitly state “in cohort but not in scope” when the VIN’s engine variant doesn’t match.
output-validator §5.7 NEW — Deterministic safety net. New category recall_engine_variant_overclaim walks every NHTSA campaign-ID position, checks ±400-char windows for absolute-applicability phrases without a hedge or variant qualifier, and rewrites them to verification-hedge form.
Versioning scheme resync. Single monotonic integer (Engine version 41, 40, 39 …). Replaces both the earlier semver run (which ended at v3.10.5) and the short-lived commit-count scheme (V26/V27).
Supporting-system work landed alongside: Mini Verdict S1-S6 safety guardrails, 730+ failure-mode database (270 → 733), D1 BEV powertrain gate, BMW 7-Series engine-resolver strict matching, NHTSA recall “com..” truncation hygiene, eyebrow/badge/body prevention-stage parity assertion, parenthetical sanitizer across all rendered surfaces.
v3.10.0
2026-05-17
Veteran-mechanic voice transformation pass
§0.27.C expanded — anti-punt banned-phrase list grows from 5 to 19, catching the Sonnet leak modes still slipping through v3.9.5 (“could be a number of things”, “depending on a few factors”, “best to have a pro look at it”, “in some cases”, “many factors at play”).
§0.28 NEW — Veteran-mechanic voice discipline. The writer IS the veteran mechanic, never describes one in third person. Bans “a mechanic would tell you”, “any qualified tech”, the general pattern of describing-the-expert-instead-of-being-them.
§0.29 NEW — Counter-argument discipline. For every primary hypothesis at confidence ≥ medium, the prose must explicitly state what would disprove it AND why the engine ruled the alternative out. Biggest single credibility lever in v3.10.0.
§11.8 NEW — Probability-stack discipline. Ranked-cause lists MUST use ~55% / ~30% / ~12% / ~3% format, sum to ~100, carry a discriminator + tell per candidate.
§15.7c NEW — Coverage opportunity 5-question sweep on every Watching+ Verdict. Ford ESP, Subaru CVT extension, GM Theta II, Toyota goodwill, Honda 9-speed, Hyundai/Kia Theta II, Nissan CVT, Tesla drive unit explicitly enumerated; federal emissions 8yr/80k, state lemon-law, NHTSA investigations.
§15.13 NEW — Cost-spread sophistication. Dollar ranges MUST surface Independent vs Dealer pricing and apply regional COL adjustment when ZIP is in the audit trail.
§17.3 NEW — Predictive maintenance roadmap. Mandatory for vehicles past 100K mi. “At 175K expect X, at 200K expect Y.”
§25.5 NEW — Diagnostic “tells” requirement. Every named root cause must carry an italic Tell: line — the cold-start tick, the wet-weather miss, the brake-pedal sink.
§29.5 NEW — “How an experienced mechanic would approach this” walkthrough. Numbered diagnostic steps in the order a senior shop foreman would actually run them, with reasoning per step.
§15.11.1 NEW — What-we-don’t-know discipline mandatory on every Verdict (not just confident-looking ones). 3–5 specific limits drawn from the actual inputs.
No score-math changes. The deterministic pipeline is identical to v3.9.5. Same audit_trail still produces same score; new sections are audit-trail-driven so they remain reproducible.
Prompt size: 2,546 → 3,783 lines (+48.6%).
v3.9.5
2026-05-15
Prose edge-case hardening
Five new prose-discipline sections — §26.1 sub-40 framing, §31.5 EV/PHEV patterns, negotiation context view, §14.5.1 finding-ranking priority tiers, and audit-pass lens sync for §16.5 healthy-vehicle suppression. Closes audit findings from the 2026-05-15 review wave.
v3.9.4
2026-05-11
Anti-punt + deep-reasoning override
Cal West Auto’s first real customer Verdict (Dan Drew) is a deliberately tricky case; defaulting to “get a second opinion” on hard inputs kills the product. New §0.27 establishes a master rule: WE ARE the second opinion. The engine must do the deep mechanical work — symptom translation, system isolation, failure-mode hypothesis ranking — and produce a primary diagnostic theory with explicit confidence BEFORE recommending any external referral. Second-opinion suggestions are now an exception (catastrophic safety items + truly unresolvable contradictions), not a default.
v3.9.3
2026-05-11
Shop-relationship hardening
New §15.12 codifies “never contest a shop’s opinion directly; defer or suggest a second opinion at most.” The tool is increasingly used inside shops where the owner hands the customer a quote and then runs a Verdict against it; the prose must never frame the engine as adversarial to the shop that just spoke. Score math unchanged — only HOW divergent findings are framed in customer-facing prose.
v3.9
2026-05-09
Five new prose-discipline sections
§15.7 modification disclosure prose handling, §15.8 flood / water-damage suspect handling, §15.9 race / track use disclosure, §15.10 coverage hunt standardized format, §15.11 score-anomaly detection & graceful refusal. Hardened over the v3.8.7 → v3.8.10 engine-side iterations.
v3.8.3
2026-05-08
Two follow-up calibrations after v3.8.2 round-3
Tightened Tier 7 single-major-documented floor (58 documented / 60 sparse). Added !hasCritical guard to Tier 5 floors so Critical-presence cases don’t ride upward through Healthy gates.
v3.8.2
2026-05-08
Output discipline hardening
Round-3 audit findings closed against the Cal West reference set. §0 OUTPUT DISCIPLINE strengthened. Customer-facing band vocabulary aligned with §11 — “Strong / Healthy / Sound / Watching / Needs Attention / Repair Window / Major Concerns” (7-band system replacing the legacy 6-band “Strong / Solid / Moderate / Mixed / Cautious / Limited”).