,

Audit Messages on AI Governance

The newly released Auditor-General’s report on AI governance at the ATO is a good read. While I won’t comment on the specific cases, the broader lessons for government entities (pages 13-14) are valuable and ring true. Given CSIRO’s Data61‘s contributions to the National Assurance Framework for Government Use of AI (released by DTA) and Australia’s AI Safety Standard, we’ve observed many interesting and sensitive AI use cases across government and industry. Three key messages stand out:

𝐀𝐈-𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜 𝐠𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 𝐢𝐬 𝐜𝐫𝐮𝐜𝐢𝐚𝐥. Many organisations still have superficial “AI-specific” governance frameworks. A simple litmus test: if you replace the word “AI” (or training data or models) with another technology term and your framework remains sensible, it’s not specific enough.

𝐍𝐞𝐰 𝐚𝐧𝐝 𝐞𝐯𝐨𝐥𝐯𝐢𝐧𝐠 𝐫𝐢𝐬𝐤𝐬 𝐞𝐱𝐢𝐬𝐭. A common argument is that AI introduces no new risks since all harms—physical, psychological, financial, and societal—already exist (thus existing controls are fine). While technically true at a high level, this reasoning misses the point. AI systems introduce novel intermediary risks (factors): deepfakes, biased decisions, and systemic errors that differ from human mistakes… These risks are the controllable ones that require tailored mitigation, even if the ultimate risks remain the same (but less directly controllable).

𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐦𝐨𝐧𝐢𝐭𝐨𝐫𝐢𝐧𝐠 𝐚𝐫𝐞 𝐮𝐧𝐝𝐞𝐫𝐝𝐞𝐯𝐞𝐥𝐨𝐩𝐞𝐝. People often assume that human benchmarks exist for comparison with AI during evaluation/monitoring. However, human benchmarks often do not exist, even at the outcome level, let alone for intermediary steps for understanding and risk control. Consider hiring: when humans review thousands of CVs, there are often no thresholds and benchmarks for shortlisting quality or hiring outcomes. AI, however, demands evaluation at both outcome and every stage as AI doesn’t just replicate human performance—it makes different mistakes and introduces new risks.

Another lesson is on contestability/impact assessment: if a human team of reviewers makes errors in 5% of fraud cases but processes only 100 cases per month, 5 individuals can contest mistakes through manual review. If an AI system reduces error rates to 1% (obviously better many would argue) but scales to 100,000 cases per month, errors now impact 1,000 people—overwhelming contestability mechanisms. Ironically, inefficiency can act as a safeguard. Sometimes the benefits/risk tradeoff makes sense proportionally, while at other times the absolute human rights and scale of harms matter and can’t be traded off for proportional benefits. This makes risk analysis and threshold determination more challenging.

To advance research and practice in this space, we are keen to talk to government agencies and industry if you have an interesting AI use case where risk assessment and assurance require expert input. Feel free to reach out.


Leave a Reply

Your email address will not be published. Required fields are marked *

About Me

Research Director, CSIRO’s Data61
Conjoint Professor, CSE UNSW

For other roles, see LinkedIn & Professional activities.

If you’d like to invite me to give a talk, please see here & email liming.zhu@data61.csiro.au

Featured Posts

    Categories