Engine changelog

Every prompt revision, dated and scoped.

Reproducibility cuts both ways. Every Verdict carries an audit_trail_id that pins it to a specific engine + prompt version. So we owe you a public log of every version that version pointer can reference.

Current production · engine v3.8.14 · Engine version 41

Engine version 41Live in production
2026-05-22
Recall engine-variant scope hardening
- §0.2.5 NEW — Recall engine-variant scope rule. NHTSA returns recalls cohort-level by year/make/model; a single cohort can ship with multiple engine variants. The new rule requires the engine to read each recall’s Summary + Component fields before citing, surface the engine-variant qualifier (8.1L, 5.7L Hemi, etc.) alongside every citation, and explicitly state “in cohort but not in scope” when the VIN’s engine variant doesn’t match.
- output-validator §5.7 NEW — Deterministic safety net. New category recall_engine_variant_overclaim walks every NHTSA campaign-ID position, checks ±400-char windows for absolute-applicability phrases without a hedge or variant qualifier, and rewrites them to verification-hedge form.
- Versioning scheme resync. Single monotonic integer (Engine version 41, 40, 39 …). Replaces both the earlier semver run (which ended at v3.10.5) and the short-lived commit-count scheme (V26/V27).
- Supporting-system work landed alongside: Mini Verdict S1-S6 safety guardrails, 730+ failure-mode database (270 → 733), D1 BEV powertrain gate, BMW 7-Series engine-resolver strict matching, NHTSA recall “com..” truncation hygiene, eyebrow/badge/body prevention-stage parity assertion, parenthetical sanitizer across all rendered surfaces.
v3.10.0
2026-05-17
Veteran-mechanic voice transformation pass
- §0.27.C expanded — anti-punt banned-phrase list grows from 5 to 19, catching the Sonnet leak modes still slipping through v3.9.5 (“could be a number of things”, “depending on a few factors”, “best to have a pro look at it”, “in some cases”, “many factors at play”).
- §0.28 NEW — Veteran-mechanic voice discipline. The writer IS the veteran mechanic, never describes one in third person. Bans “a mechanic would tell you”, “any qualified tech”, the general pattern of describing-the-expert-instead-of-being-them.
- §0.29 NEW — Counter-argument discipline. For every primary hypothesis at confidence ≥ medium, the prose must explicitly state what would disprove it AND why the engine ruled the alternative out. Biggest single credibility lever in v3.10.0.
- §11.8 NEW — Probability-stack discipline. Ranked-cause lists MUST use ~55% / ~30% / ~12% / ~3% format, sum to ~100, carry a discriminator + tell per candidate.
- §15.7c NEW — Coverage opportunity 5-question sweep on every Watching+ Verdict. Ford ESP, Subaru CVT extension, GM Theta II, Toyota goodwill, Honda 9-speed, Hyundai/Kia Theta II, Nissan CVT, Tesla drive unit explicitly enumerated; federal emissions 8yr/80k, state lemon-law, NHTSA investigations.
- §15.13 NEW — Cost-spread sophistication. Dollar ranges MUST surface Independent vs Dealer pricing and apply regional COL adjustment when ZIP is in the audit trail.
- §16.5 hardened — Cost-summary table MANDATORY (not conditional) when any repair cost exists. Columns: Item / Priority / Independent / Dealer / Coverage path.
- §17.3 NEW — Predictive maintenance roadmap. Mandatory for vehicles past 100K mi. “At 175K expect X, at 200K expect Y.”
- §25.5 NEW — Diagnostic “tells” requirement. Every named root cause must carry an italic Tell: line — the cold-start tick, the wet-weather miss, the brake-pedal sink.
- §29.5 NEW — “How an experienced mechanic would approach this” walkthrough. Numbered diagnostic steps in the order a senior shop foreman would actually run them, with reasoning per step.
- §15.11.1 NEW — What-we-don’t-know discipline mandatory on every Verdict (not just confident-looking ones). 3–5 specific limits drawn from the actual inputs.
- Audit-pass adds 6th lens — credibility_of_claims catches voice violations, missing counter-arguments, missing tells, hedge-as-content, LLM tells (“In conclusion”, “It’s important to note”).
- No score-math changes. The deterministic pipeline is identical to v3.9.5. Same audit_trail still produces same score; new sections are audit-trail-driven so they remain reproducible.
- Prompt size: 2,546 → 3,783 lines (+48.6%).
v3.9.5
2026-05-15
Prose edge-case hardening
Five new prose-discipline sections — §26.1 sub-40 framing, §31.5 EV/PHEV patterns, negotiation context view, §14.5.1 finding-ranking priority tiers, and audit-pass lens sync for §16.5 healthy-vehicle suppression. Closes audit findings from the 2026-05-15 review wave.
v3.9.4
2026-05-11
Anti-punt + deep-reasoning override
Cal West Auto’s first real customer Verdict (Dan Drew) is a deliberately tricky case; defaulting to “get a second opinion” on hard inputs kills the product. New §0.27 establishes a master rule: WE ARE the second opinion. The engine must do the deep mechanical work — symptom translation, system isolation, failure-mode hypothesis ranking — and produce a primary diagnostic theory with explicit confidence BEFORE recommending any external referral. Second-opinion suggestions are now an exception (catastrophic safety items + truly unresolvable contradictions), not a default.
v3.9.3
2026-05-11
Shop-relationship hardening
New §15.12 codifies “never contest a shop’s opinion directly; defer or suggest a second opinion at most.” The tool is increasingly used inside shops where the owner hands the customer a quote and then runs a Verdict against it; the prose must never frame the engine as adversarial to the shop that just spoke. Score math unchanged — only HOW divergent findings are framed in customer-facing prose.
v3.9
2026-05-09
Five new prose-discipline sections
§15.7 modification disclosure prose handling, §15.8 flood / water-damage suspect handling, §15.9 race / track use disclosure, §15.10 coverage hunt standardized format, §15.11 score-anomaly detection & graceful refusal. Hardened over the v3.8.7 → v3.8.10 engine-side iterations.
v3.8.3
2026-05-08
Two follow-up calibrations after v3.8.2 round-3
Tightened Tier 7 single-major-documented floor (58 documented / 60 sparse). Added !hasCritical guard to Tier 5 floors so Critical-presence cases don’t ride upward through Healthy gates.
v3.8.2
2026-05-08
Output discipline hardening
Round-3 audit findings closed against the Cal West reference set. §0 OUTPUT DISCIPLINE strengthened. Customer-facing band vocabulary aligned with §11 — “Strong / Healthy / Sound / Watching / Needs Attention / Repair Window / Major Concerns” (7-band system replacing the legacy 6-band “Strong / Solid / Moderate / Mixed / Cautious / Limited”).

Engine changelog

Every prompt revision, dated and scoped.

Current production · engine v3.8.14 · Engine version 41

Engine version 41Live in production
2026-05-22
Recall engine-variant scope hardening
- §0.2.5 NEW — Recall engine-variant scope rule. NHTSA returns recalls cohort-level by year/make/model; a single cohort can ship with multiple engine variants. The new rule requires the engine to read each recall’s Summary + Component fields before citing, surface the engine-variant qualifier (8.1L, 5.7L Hemi, etc.) alongside every citation, and explicitly state “in cohort but not in scope” when the VIN’s engine variant doesn’t match.
- output-validator §5.7 NEW — Deterministic safety net. New category recall_engine_variant_overclaim walks every NHTSA campaign-ID position, checks ±400-char windows for absolute-applicability phrases without a hedge or variant qualifier, and rewrites them to verification-hedge form.
- Versioning scheme resync. Single monotonic integer (Engine version 41, 40, 39 …). Replaces both the earlier semver run (which ended at v3.10.5) and the short-lived commit-count scheme (V26/V27).
- Supporting-system work landed alongside: Mini Verdict S1-S6 safety guardrails, 730+ failure-mode database (270 → 733), D1 BEV powertrain gate, BMW 7-Series engine-resolver strict matching, NHTSA recall “com..” truncation hygiene, eyebrow/badge/body prevention-stage parity assertion, parenthetical sanitizer across all rendered surfaces.
v3.10.0
2026-05-17
Veteran-mechanic voice transformation pass
- §0.27.C expanded — anti-punt banned-phrase list grows from 5 to 19, catching the Sonnet leak modes still slipping through v3.9.5 (“could be a number of things”, “depending on a few factors”, “best to have a pro look at it”, “in some cases”, “many factors at play”).
- §0.28 NEW — Veteran-mechanic voice discipline. The writer IS the veteran mechanic, never describes one in third person. Bans “a mechanic would tell you”, “any qualified tech”, the general pattern of describing-the-expert-instead-of-being-them.
- §0.29 NEW — Counter-argument discipline. For every primary hypothesis at confidence ≥ medium, the prose must explicitly state what would disprove it AND why the engine ruled the alternative out. Biggest single credibility lever in v3.10.0.
- §11.8 NEW — Probability-stack discipline. Ranked-cause lists MUST use ~55% / ~30% / ~12% / ~3% format, sum to ~100, carry a discriminator + tell per candidate.
- §15.7c NEW — Coverage opportunity 5-question sweep on every Watching+ Verdict. Ford ESP, Subaru CVT extension, GM Theta II, Toyota goodwill, Honda 9-speed, Hyundai/Kia Theta II, Nissan CVT, Tesla drive unit explicitly enumerated; federal emissions 8yr/80k, state lemon-law, NHTSA investigations.
- §15.13 NEW — Cost-spread sophistication. Dollar ranges MUST surface Independent vs Dealer pricing and apply regional COL adjustment when ZIP is in the audit trail.
- §16.5 hardened — Cost-summary table MANDATORY (not conditional) when any repair cost exists. Columns: Item / Priority / Independent / Dealer / Coverage path.
- §17.3 NEW — Predictive maintenance roadmap. Mandatory for vehicles past 100K mi. “At 175K expect X, at 200K expect Y.”
- §25.5 NEW — Diagnostic “tells” requirement. Every named root cause must carry an italic Tell: line — the cold-start tick, the wet-weather miss, the brake-pedal sink.
- §29.5 NEW — “How an experienced mechanic would approach this” walkthrough. Numbered diagnostic steps in the order a senior shop foreman would actually run them, with reasoning per step.
- §15.11.1 NEW — What-we-don’t-know discipline mandatory on every Verdict (not just confident-looking ones). 3–5 specific limits drawn from the actual inputs.
- Audit-pass adds 6th lens — credibility_of_claims catches voice violations, missing counter-arguments, missing tells, hedge-as-content, LLM tells (“In conclusion”, “It’s important to note”).
- No score-math changes. The deterministic pipeline is identical to v3.9.5. Same audit_trail still produces same score; new sections are audit-trail-driven so they remain reproducible.
- Prompt size: 2,546 → 3,783 lines (+48.6%).
v3.9.5
2026-05-15
Prose edge-case hardening
Five new prose-discipline sections — §26.1 sub-40 framing, §31.5 EV/PHEV patterns, negotiation context view, §14.5.1 finding-ranking priority tiers, and audit-pass lens sync for §16.5 healthy-vehicle suppression. Closes audit findings from the 2026-05-15 review wave.
v3.9.4
2026-05-11
Anti-punt + deep-reasoning override
Cal West Auto’s first real customer Verdict (Dan Drew) is a deliberately tricky case; defaulting to “get a second opinion” on hard inputs kills the product. New §0.27 establishes a master rule: WE ARE the second opinion. The engine must do the deep mechanical work — symptom translation, system isolation, failure-mode hypothesis ranking — and produce a primary diagnostic theory with explicit confidence BEFORE recommending any external referral. Second-opinion suggestions are now an exception (catastrophic safety items + truly unresolvable contradictions), not a default.
v3.9.3
2026-05-11
Shop-relationship hardening
New §15.12 codifies “never contest a shop’s opinion directly; defer or suggest a second opinion at most.” The tool is increasingly used inside shops where the owner hands the customer a quote and then runs a Verdict against it; the prose must never frame the engine as adversarial to the shop that just spoke. Score math unchanged — only HOW divergent findings are framed in customer-facing prose.
v3.9
2026-05-09
Five new prose-discipline sections
§15.7 modification disclosure prose handling, §15.8 flood / water-damage suspect handling, §15.9 race / track use disclosure, §15.10 coverage hunt standardized format, §15.11 score-anomaly detection & graceful refusal. Hardened over the v3.8.7 → v3.8.10 engine-side iterations.
v3.8.3
2026-05-08
Two follow-up calibrations after v3.8.2 round-3
Tightened Tier 7 single-major-documented floor (58 documented / 60 sparse). Added !hasCritical guard to Tier 5 floors so Critical-presence cases don’t ride upward through Healthy gates.
v3.8.2
2026-05-08
Output discipline hardening
Round-3 audit findings closed against the Cal West reference set. §0 OUTPUT DISCIPLINE strengthened. Customer-facing band vocabulary aligned with §11 — “Strong / Healthy / Sound / Watching / Needs Attention / Repair Window / Major Concerns” (7-band system replacing the legacy 6-band “Strong / Solid / Moderate / Mixed / Cautious / Limited”).

Recall engine-variant scope hardening

Veteran-mechanic voice transformation pass

Prose edge-case hardening

Anti-punt + deep-reasoning override

Shop-relationship hardening

Five new prose-discipline sections

Two follow-up calibrations after v3.8.2 round-3

Output discipline hardening