← All insights

What an AI audit actually is

McKinsey tested twenty-five organisational factors to find what most determines whether AI delivers meaningful business results. The answer was not the model. It was not the data. It was not the size of the investment.

It was whether the organisation had fundamentally redesigned its workflows before deploying AI.

Only twenty-one per cent had done so. Nearly eighty per cent were layering AI on top of processes they had never properly examined.

That single finding explains most of what I see going wrong.

In the previous post in this sequence, I described the diagnosis gap. Several people responded with the same question: what does the diagnostic work actually look like in practice? The expanded answer below draws on the exchange that followed the original post. Several of the framings here were sharpened, or in some cases first surfaced, by contributors whose comments made the analysis stronger than I could have produced alone. Where their framings have shaped the thinking, I have named them.

What “AI audit” usually means, and why that is not enough

Most people hear “AI audit” and picture a technology assessment or a compliance exercise that produces a report nobody reads. That is not what I mean. An AI audit done properly is an organisational reality check. It answers one question: do we actually understand how our own operation works well enough to change it safely?

Every audit I have conducted starts with the people who do the work. Understanding what they actually do, how decisions get made, and where the workarounds live that nobody has documented. That is where the real process lives. It rarely matches the documented version.

The invisible judgement layer

In one regulated organisation, I mapped a compliance-sensitive process that leadership believed was largely automated. It was not. Three experienced people were quietly intervening every day to catch exceptions the system could not handle. Those interventions existed only in their judgement. If that process had been handed to an AI model based on the documented version, the model would have operated on an understanding of the workflow that had not been accurate for years.

That invisible judgement layer is what the audit surfaces. Once it is visible, the AI question changes from “where can we apply AI?” to “where does AI create genuine value, where does it introduce risk, and where would we be automating something we do not yet understand well enough to hand over?”

Swapnil Kaushik, a Principal AI Product and Solutions Architect, captured the failure mode this exposes more sharply than most consulting decks manage: teams who skip the diagnostic step do not fail at the AI part. They fail because they automated a process that did not exist in the form they thought it did. The output looks correct, passes every test, and quietly produces wrong answers in the exact cases those three people used to catch.

That is the most dangerous failure mode. It is invisible until the consequences surface. By then, the people who would have caught it may no longer be in the loop.

Why the audit reveals more than AI

The audit separates quick wins from larger moves that need preparation. It produces a roadmap for leadership, not a technical audience, because these are business decisions, not engineering ones.

What consistently surprises people is what the audit finds beyond AI itself. Process debt. Ownership ambiguity. Compliance obligations met through workarounds rather than design. In some cases, the most valuable outcome is not an AI recommendation at all - it is the clarity of mapping operational reality for the first time.

Michel Chapuis, advisor to Swiss banks and asset managers, named the second-order finding that most organisations miss. At one Swiss bank, leadership called a process “fully automated.” In reality, two people had been catching edge cases for years. Nobody had documented it. But, in his words, the harder finding was not the process gap. It was that nobody had the authority to say: we need to redesign this before we touch it with AI. The diagnostic reveals the truth. The org chart determines whether anyone acts on it.

Why the diagnostic creates organisational permission

That second-order finding points to the most important outcome of the audit work, and it is one that does not appear in the deliverables list.

A proper diagnostic gives organisations something they have never had before. Documented, structured, board-ready evidence that makes it professionally safe to say what people inside the operation already know. It turns an opinion that nobody wants to put in front of their boss into a finding that demands a response.

This is why the real value of the audit is not the AI roadmap it produces. It is the organisational permission it creates.

The KPMG 2026 AI in Risk and Compliance report shows that fifty-five per cent of Swiss financial institutions cite cultural resistance as the biggest barrier to scaling AI. That finding is, I would argue, mislabelled. It is not resistance. It is rational caution from experienced people whose concerns have never been given a formal channel. The diagnostic work provides that channel.

Eugene Chan PhD, a behavioural scientist working on AI trust, made the same observation from a different angle. AI stumbles because it gets layered onto workflows people navigate through judgement, shortcuts, and quiet fixes that never make it into the documentation. When the diagnostic is skipped, the deployment runs on an idealised version of the work, and users immediately sense the mismatch. That sensed mismatch is the first signal the diagnostic step was missed - and the behavioural response to it is rational, not obstructionist.

This connects directly to work Eugene has been developing through his AI Trust Audit (AITA) framework, a behavioural diagnostic that measures where and why trust breaks down at the human level before deployment. It is a complementary lens to the operational diagnostic I describe here. The combination of both is where organisations will find the most complete picture of their readiness.

Trust is not built by the capability of the system. It is built by the accuracy of the understanding underneath it.

What the audit does not solve

I want to be honest about what the audit does not solve.

It answers the pre-deployment question well. It does not answer what happens once AI moves from recommendation into action. Executing decisions. Processing transactions. Modifying records. That is a harder problem, and one most organisations will face next.

Tim Zlomke, founder of Moral Clarity AI, drew that boundary cleanly in the thread. Even with well-mapped workflows, clear ownership, and validated processes, execution can still occur under conditions that no longer hold in the moment. So the gap becomes more than understanding the system well enough to deploy AI safely. It becomes ensuring the system is only allowed to act when the current state can actually carry the outcome. At that point, failure is not coming from misunderstanding the workflow. It is coming from allowing execution under conditions the system was not designed to handle in real time. That is where the boundary moves from diagnosis to execution.

Jonathan Dunn, an AI Governance Architect, named the same shift in different language. Once an AI moves from recommendation to execution, it crosses what he calls Governance Escape Velocity - the point where it moves faster than a human committee can intervene. At that point, policy has to be replaced by architecture.

The diagnostic comes first. Without it, everything that follows is built on assumptions that have never been tested. But it is the foundation, not the destination. The execution problem is the next post in this sequence.

The closing point

Most organisations that surface the process debt still try to lean on manual oversight first. The instinct is understandable. They have just discovered how much undocumented human judgement their processes depend on, and the immediate response is to make that judgement more visible rather than to replace it with automated enforcement.

The problem is that manual oversight does not survive the speed transition. The moment an AI system moves from recommendation into execution at operational speed, human review becomes either a bottleneck that defeats the purpose of deployment or a rubber stamp that defeats the purpose of governance. Neither outcome is acceptable in a regulated environment.

The diagnostic is the foundation for designing an enforcement architecture that can operate at execution speed without requiring a human in every loop. The circuit breakers have to be built into the system before deployment, not bolted on afterwards.

That is what an AI audit actually is.