๐ค ๐ฏ๐๐ ๐
๐ ๐๐๐ ๐๐๐๐ ๐๐ ๐จ๐ฐ ๐๐๐๐๐ ๐๐๐๐๐๐ ๐๐๐๐๐๐โ๐๐๐๐ ๐๐๐ ๐
๐๐โ๐ ๐๐๐๐ ๐๐๐๐ ๐๐๐ ๐๐๐๐
๐๐๐ ๐๐๐๐๐๐๐ ๐๐๐๐๐ ๐๐๐๐๐๐๐ ๐๐? ๐จ๐๐
๐๐๐๐ ๐จ๐ฐ ๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐, ๐๐๐ ๐
๐ ๐๐๐ ๐๐๐๐ ๐๐ ๐๐๐ ๐๐๐ ๐๐๐๐๐ ๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐?
At the NZ Government Data Summit, I shared three important, often overlooked insights from real-world AI deployments:
– AI alone often outperforms AI-human collaboration. But where and how human oversight is introduced can either reduce or increase overall risk.
– Evaluating AI also reveals human and process errorsโwhich can generate resistance, especially when current processes were never rigorously assessed.
– Getting AI to explain its recommendations or conclusions isnโt enough. What matters is whether those explanations meet recognised expert standards and make sense to human reviewersโnot whether they reflect how the AI works internally.
I demonstrated these with two case studies:
– ๐๐ ๐๐ฏ๐๐ฅ๐ฎ๐๐ญ๐จ๐ซ: How to assess whether AI lowers or raises riskโeven when the absolute risk of the current human process is unknown (using marginal risk evaluation techniques).
– ๐๐ ๐๐๐๐ซ ๐๐๐ฏ๐ข๐๐ฐ๐๐ซ: How to ensure AI explanations are meaningful to humansโfocusing not on whether they mirror the AIโs internal logic, but whether they align with expert judgement and procedural fairness (what we call external reasoning faithfulness).
Selected Slides Here https://www.dropbox.com/scl/fi/5hfxx1ae0kswj0dflaot7/20250508-AI-for-Evaluation-NZ-Gov.pdf?rlkey=wsufwwph4dl73zdkxaicx0suy&dl=0
Leave a Reply