Financial Services & Fintech · Case study

An AI assistant a bank's auditors actually approved.

Outcome

RAG on approved sources. NIST AI RMF + ISO/IEC 42001 governance. Zero material audit findings at rollout review.

IndustryFinancial Services & Fintech
UpdatedApril 2026
Outcomes

Numbers the CFO will actually defend.

Average handle time · 90 days post-rollout
− min
Material findings · post-implementation audit
Zero
Governance framework · ISO/IEC 42001 aligned
NIST AI RMF
Answer-citation rate · every answer grounded
31%

Quick answer
A mid-to-large bank needed a generative AI assistant for frontline staff — answering policy, product, and procedure questions without pulling agents off calls. NUUN AI built a RAG assistant grounded on approved documents, governed against OSFI guidance and NIST AI RMF, and measured against handle time and first-contact resolution. Outcome: -minute handle-time reduction and zero material audit findings at rollout.

THE CHALLENGE

The bank's frontline agents worked across a mountain of policy, procedure, and product documentation — some current, some archived, much of it contradictory. Tenured agents knew where to look; new hires didn't, and the ramp cost was visible in handle time and escalations. An off-the-shelf chatbot pilot had gone sideways — plausible answers to sensitive questions, no provenance, internal audit uncomfortable.

Leadership wanted a bank-grade AI assistant: grounded in approved content, auditable per response, measured against real ops outcomes, and reviewed by second and third lines of defence before it ever touched a customer-facing interaction.

THE APPROACH

  1. Governance before features. A steering committee spanning risk, compliance, privacy, legal, and operations agreed the use-case scope, redlines, and kill criteria in writing. Framework grounded in NIST AI RMF and ISO/IEC 42001. OSFI guideline E-23 considerations documented.
  2. Curated knowledge base. Approved policy, product, and procedure sources cleaned, chunked, and indexed into a managed vector store. Deprecated documents blocked. Citation and provenance required in every response.
  3. RAG architecture on Azure. Retrieval, reranking, grounding, and generation decoupled so each component could be evaluated and upgraded independently. Content redaction and PII screening on both sides of every call.
  4. Evaluation harness and red-team. A living evaluation set covered accuracy, hallucination rate, refusal behaviour, bias checks, and jailbreak resistance. Red-team loops ran continuously; evals were a pre-deployment gate.
  5. Staged rollout with instrumentation. Pilot in a low-risk contact stream; operator-in-the-loop controls; telemetry on every interaction. Only after the pilot cleared pre-agreed thresholds did scope expand.

THE RESULTS

  • -minute average handle-time reduction on in-scope inquiry types within 90 days of full rollout.
  • 13-point first-contact-resolution lift on target inquiry categories (matched-cohort analysis).
  • 26% reduction in tenure-to-proficiency time for new-hire agents.
  • Zero material findings at the internal audit post-implementation review.
  • 38% answer-citation rate — every answer grounded in approved, retrievable source material.
  • ** hallucination rate** on the living evaluation set, trending down quarter over quarter.

CLIENT QUOTE

"Second and third line were the real customers of this project. Getting their sign-off changed the whole economics." — Senior leader, anonymized, Anonymized leadership

SERVICES INVOLVED

RELATED CASE STUDIES

METHODOLOGY & MEASUREMENT

Handle time and FCR measured via matched-cohort analysis against pre-deployment baselines. Accuracy, citation, and hallucination metrics tracked on a versioned evaluation set refreshed quarterly. Governance artifacts — risk assessments, DPIA, model cards, runbook — available under NDA on client and auditor request.

SOURCES & FURTHER READING

Case FAQ.

How do you deploy generative AI in a regulated bank?
Governance first, architecture second. A steering committee spanning risk, compliance, privacy, legal, and operations agrees scope, redlines, and kill criteria in writing — grounded in NIST AI RMF and ISO/IEC 42001 — before a single model call is made in production.
What is a RAG AI assistant for banking?
Retrieval-Augmented Generation pairs a language model with a curated, auditable knowledge base. For a bank, that means approved policy, product, and procedure documents indexed in a vector store; every answer grounded in retrievable source material with citations.
How do you govern AI in financial services?
Against NIST AI RMF and ISO/IEC 42001 at the framework level, and against OSFI Guideline E-23 (Canada) or SR 11-7 (US) at the regulatory level. Governance artifacts — risk assessments, DPIA, model cards, runbooks — are auditor-ready from day one.
What's the difference between a chatbot and a RAG assistant?
A chatbot generates plausible text. A RAG assistant retrieves approved content, grounds its answer in that content, and cites the source. For regulated use-cases, the difference is whether internal audit can defend the system to an external regulator.
How long does it take to deploy a bank AI assistant?
Approximately 6 months from discovery through production deployment with human-in-the-loop review, plus an additional 6-month governance and expansion phase. Use-cases with stricter regulatory overlay extend the governance phase.
What does "measurable handle-time reduction" actually mean?
Matched-cohort analysis of agent handle time on in-scope inquiry types, compared against a pre-deployment baseline, with confidence intervals reported. Not average handle time on the whole contact center — that would bury the signal.

Deploy AI That Passes Audit

Bring the regulated use-case. We'll bring the governance, architecture, and evaluation discipline.