VorticVortic
PlatformSolutionsContactBlogSign inRequest access
Back to all posts
·11 min read·Vortic team

Insurance AI compliance: NAIC model governance explained

NAIC model governance bulletins require insurers to document AI decision-making. Learn what the regulations require and how to build compliant AI underwriting systems.

NAIC model governance bulletins require insurers and MGAs to document the data sources, decision logic, and outputs of any AI system used in underwriting or claims decisions. Compliance means maintaining version-controlled audit trails at the model level, not just the decision level — so you can reconstruct exactly what the AI considered, and why, for any individual bind or decline decision. Systems that cannot produce this record are non-compliant regardless of how accurate their outputs are.

Key Takeaways

  • NAIC model governance applies to any AI or algorithmic system that influences underwriting, rating, or claims decisions
  • The core requirement is reproducibility: for any decision, you must be able to show which data, which model version, and which logic produced the output
  • State DOI examination priorities in 2025–2026 are focused on adverse action explanations, protected-class proxies, and audit trail completeness
  • Compliant AI systems are built on four pillars: prompt versioning, model logging, input/output archiving, and human decision documentation
  • [Vortic's solutions](/solutions) are designed to meet these requirements out of the box; see also our [delegated underwriting authority glossary entry](/resources/glossary/delegated-underwriting-authority) for the DUA governance context

What NAIC model governance bulletins actually require

The NAIC's Model Bulletin on the Use of Artificial Intelligence Systems (adopted 2023, with state implementation continuing through 2025–2026) establishes governance expectations across six areas:

1. Governance and accountability. The insurer must designate a responsible executive (typically a Chief Underwriting Officer or Chief Risk Officer) who is accountable for AI system performance. Governance documentation must identify which systems influence which decisions.

2. Risk management. AI systems must be subject to ongoing monitoring for performance degradation, bias, and distributional shift. "Set and forget" deployments are explicitly out of scope for compliant operations.

3. Transparency and explainability. For any individual adverse action — a declination, a coverage restriction, a premium loading — the insurer must be able to produce an explanation that is meaningful to the affected party and reviewable by a regulator. "The model said so" is not an explanation.

4. Third-party management. If you use a vendor's AI system, you are responsible for ensuring it meets the same governance standards as an in-house system. Vendor agreements must include provisions for audit access and model documentation.

5. Data governance. Training data and input data must be documented. The use of protected-class proxies — ZIP code used as a proxy for race, name used as a proxy for national origin — must be identified and managed.

6. Consumer protection. Adverse action notices must meet existing regulatory requirements (state UDAP, federal ECOA where applicable) and must reflect the actual reasons for the decision, not generic language.

What state DOIs are examining in 2025–2026

State departments of insurance have been conducting market conduct examinations with an explicit AI focus since 2024. The three areas generating the most examination findings are:

Audit trail gaps. The most common finding is that MGAs and carriers cannot reproduce the inputs and logic that generated a specific underwriting decision. The AI system made a decision; nobody logged what version of the model, what prompt, or what input data was active at that moment. This is a documentation failure, not an AI failure.

Adverse action explanation quality. When examiners request adverse action explanations for declined submissions, they are finding generic responses ("does not meet underwriting guidelines") rather than specific, decision-grounded explanations ("flood zone AE, base flood elevation below ground floor, TIV exceeds treaty attachment point"). Specific explanations require systems that archive the reasoning, not just the output.

Protected-class proxy analysis. Examiners are requesting disparity analyses comparing approval rates, premium levels, and coverage terms across demographic segments. MGAs using geocoded location data in risk scoring need to demonstrate that their models do not produce disparate outcomes attributable to protected-class proxies.

The four technical pillars of a compliant AI underwriting system

Building a compliant system is not primarily a governance document exercise. It requires four technical capabilities that must be designed in from the start — they are very difficult to retrofit.

Pillar 1: Prompt versioning

Every AI system in an underwriting pipeline runs on a system prompt that defines how the model reasons about the input. That prompt must be versioned, timestamped, and logged against every decision it influences. When a prompt is updated — to improve accuracy, add a new exclusion, or reflect a guideline change — the version change must be documented with a rationale.

Why this matters for compliance: if a DOI examiner asks "did your AI treat this submission the same way it would treat a similar submission from a different ZIP code?", the answer requires knowing which prompt version was active and being able to reproduce the model's reasoning on both submissions.

Pillar 2: Model logging

Which model (and which version of that model) processed each step of the pipeline must be logged at the submission level. This is non-trivial in practice because model providers release updates frequently, and the same model identifier can produce different outputs before and after an update.

Compliant logging means: for each submission, you have a record of the exact model identifier (including version or snapshot hash where available), the API provider, and the timestamp of the call. This record is immutable — it cannot be overwritten when the model is updated.

Pillar 3: Input and output archiving

The full input to each AI agent — the structured data passed in, the retrieved documents, the tool call results — and the full output must be archived per submission. Not summaries; the actual inputs and outputs.

This sounds obvious but is frequently omitted in production systems that cache or discard intermediate agent outputs for cost efficiency. A compliant architecture accepts the storage cost of archiving intermediate outputs because the alternative — being unable to reproduce a decision under examination — is far more expensive.

Pillar 4: Human decision documentation

The AI's output is not the decision. The underwriter's bind or decline action — including any manual overrides of AI recommendations — must be logged with the underwriter's identity, timestamp, and any documented rationale for deviations.

This is important for two reasons. First, it maintains human accountability for the bind decision (the AI recommends; the underwriter decides). Second, it creates the dataset needed to monitor whether underwriters are systematically overriding AI recommendations in ways that correlate with protected characteristics — itself a compliance risk.

Common compliance failures to avoid

Using AI outputs as the final record. Storing only the AI-generated memo (the readable output) without the underlying agent traces means you can explain what the AI said but not exactly how it produced that output. DOI examiners are increasingly sophisticated enough to ask for the traces.

Treating the AI system as a black box. "We use a third-party AI platform" is not a governance answer. You must be able to describe how the system works, what data it uses, and how it produces its outputs — even if the underlying model weights are proprietary.

Batch-deleting intermediate outputs. Deleting parser outputs and agent traces after the memo is generated reduces storage costs but destroys the audit trail. A retention policy for AI decision records should mirror your retention policy for underwriting files — typically seven years for commercial lines.

Not testing for disparate impact. Building a system that is accurate on average but produces disparate outcomes for protected-class proxies is a UDAP risk. Compliant systems include regular disparity testing as part of model monitoring.

How delegated underwriting authority complicates this

MGAs operating under [delegated underwriting authority](/resources/glossary/delegated-underwriting-authority) have a governance obligation that runs in two directions: to the state DOI that regulates their market conduct, and to the capacity provider that delegated the authority. DUA agreements increasingly include explicit AI governance provisions, requiring MGAs to:

  • Disclose which AI systems influence bind decisions
  • Provide audit access to decision records on request
  • Notify the capacity provider of material changes to AI systems (new model, significant prompt update)
  • Maintain loss model documentation that supports treaty pricing adequacy reviews

MGAs that deploy AI without governance structures aligned to their DUA agreement terms face both regulatory and contractual exposure.

How Vortic approaches this

[Vortic's solutions](/solutions) are built on the four technical pillars described above. Every submission run through the platform generates an immutable decision pack: prompt versions, model identifiers, full input/output archives for each agent, and the underwriter's bind decision with timestamp. This pack is exportable in the format required for DOI examination and reinsurer audit.

For a step-by-step walkthrough of how the pipeline produces this record, see [how does AI underwriting work](/blog/how-does-ai-underwriting-work). For the governance terminology, see our [delegated underwriting authority glossary entry](/resources/glossary/delegated-underwriting-authority).

NAICAI governancecomplianceinsurance regulationaudit trailmodel risk
Continue reading
14 min · LLM

Best LLM for underwriting in 2026 — a practical comparison

Read
12 min · rule customization

Underwriting rule customization & risk scoring: how AI platforms compare

Read