An AI governance checklist for skeptics

The state of AI governance#

Most 'AI governance' today is a slide deck and a steering committee. The frameworks are real (NIST AI RMF, ISO/IEC 42001, EU AI Act), but the gap between a framework and a governed system is enormous, and most enterprises are sitting in that gap, not crossing it.

The questions below are what I actually ask when reviewing an AI feature, derived from the NIST AI Risk Management Framework Core (Govern, Map, Measure, Manage) but written for engineers and product people. Each one has a 'good answer' and a 'red flag answer.' If you cannot answer any of them, you do not have a governed system. You have a project.

The questions#

1. What decision does this model make, and who can override it?

Good answer: 'The model assigns a fraud risk score from 0 to 100. Scores above 80 freeze the transaction and route to a human analyst, who can release or escalate. The analyst decision is logged with the model score and feature contributions.'

Red flag: 'The model helps the team make better decisions.' This is not a governed system. It is a tool with no accountability boundary. If something goes wrong, you cannot tell whether the model caused it or the human did.

2. What is the input distribution at training versus at inference?

Good answer: 'Trained on transactions from 2022 to 2024 across our top five markets. We monitor inference inputs daily for drift on the top twenty features against the training distribution. Alerts fire when drift exceeds two standard deviations on any monitored feature, with a runbook for retrain or rollback.'

Red flag: 'We retrained recently.' Recent training does not mean the distribution matches. Markets shift, fraud patterns evolve, customer behavior changes after a product launch. If you are not watching drift, the model is silently degrading.

3. What is the monitored failure mode, and who gets the page?

Good answer: 'We track precision, recall, and downstream business impact (false-positive customer complaints, false-negative loss). PagerDuty fires when precision drops below 0.85 on a 24-hour window. The on-call rotation owns initial response; the model owner owns the postmortem.'

Red flag: 'We have a dashboard.' Dashboards do not wake people up. If no human is being notified when the model misbehaves, the model is unsupervised.

4. What is the documented procedure for a model recall?

Good answer: 'We can roll back to the previous model version in under fifteen minutes via the feature-flag system. The communications template for affected customers is in the incident runbook. The postmortem template captures root cause, customer impact, and the framework control affected (for example NIST AI RMF Manage 4.1).'

Red flag: 'We would just redeploy.' Recall is not redeployment. It includes customer communication, regulatory notification if applicable, retraining or replacement, and a postmortem. If you do not have a procedure, the first recall will be improvised at the worst possible moment.

Three more, if you have time#

5. What inputs are you logging, and for how long? Without logs you cannot reconstruct a decision after the fact. With too many logs you create a privacy liability. There is a right answer for your domain; you need to have made it.

6. Who signed off on the model in production? Not 'the team agreed.' A named person, with a date, against a documented criterion. If something goes wrong, this is who explains the decision to the regulator.

7. What is the worst-case harm if this model is wrong, and who bears it? A recommender system getting your weekend wrong is different from a credit model denying you a mortgage. The framework treatment should scale with the harm. If you cannot describe the harm, the framework is theatrical.

The point#

These questions look like audit questions because they are audit questions. They are also engineering questions, and product questions, and risk questions. Governance is not a separate workstream; it is the set of decisions you have already made or have not made, made visible. The model is governed when these questions have answers a regulator could check. Until then, you have a project.