When โ€œRisk-Basedโ€ AI Becomes an Empty Promise

๐—ฅ๐—ถ๐˜€๐—ธ-๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—”๐—œ ๐—ฝ๐—ผ๐—น๐—ถ๐—ฐ๐˜†/๐—ฟ๐—ฒ๐—ด๐˜‚๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜€๐—ผ๐˜‚๐—ป๐—ฑ๐˜€ ๐—ผ๐—ฏ๐˜ƒ๐—ถ๐—ผ๐˜‚๐˜€. ๐—•๐˜‚๐˜ ๐—ถ๐˜ ๐—ฟ๐—ฒ๐˜€๐˜๐˜€ ๐—ฎ ๐—ณ๐—ฟ๐—ฎ๐—ด๐—ถ๐—น๐—ฒ ๐—ฎ๐˜€๐˜€๐˜‚๐—บ๐—ฝ๐˜๐—ถ๐—ผ๐—ป: ๐˜๐—ต๐—ฎ๐˜ ๐˜„๐—ฒ ๐—ฐ๐—ฎ๐—ป ๐—ฎ๐—ฐ๐˜๐˜‚๐—ฎ๐—น๐—น๐˜† ๐—บ๐—ฒ๐—ฎ๐˜€๐˜‚๐—ฟ๐—ฒ ๐—ฟ๐—ถ๐˜€๐—ธ.
Yet risk assessment is often put forward as the first thing to answer, as if itโ€™s the easy part.

Some frameworks push โ€œuse caseโ€“basedโ€ solutions with a pre-defined list of high-risk use cases, but then smuggle in the magic word โ€œ๐˜€๐—ถ๐—ด๐—ป๐—ถ๐—ณ๐—ถ๐—ฐ๐—ฎ๐—ป๐˜โ€ โ€” effectively saying: the given high-risk use cases are only high-risk if it causesโ€ฆ significant risk. A tautology that collapses under scrutiny. Or Look closely and โ€œlow-riskโ€ use cases are never really low:
๐Ÿญ. ๐— ๐˜‚๐˜€๐—ถ๐—ฐ ๐—ฟ๐—ฒ๐—ฐ๐—ผ๐—บ๐—บ๐—ฒ๐—ป๐—ฑ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜€๐—ฒ๐—ฒ๐—บ๐˜€ ๐—ต๐—ฎ๐—ฟ๐—บ๐—น๐—ฒ๐˜€๐˜€ ๐˜‚๐—ป๐˜๐—ถ๐—น ๐—ฏ๐—ถ๐—ฎ๐˜€ ๐˜€๐˜†๐˜€๐˜๐—ฒ๐—บ๐—ฎ๐˜๐—ถ๐—ฐ๐—ฎ๐—น๐—น๐˜† ๐˜€๐—ถ๐—ฑ๐—ฒ๐—น๐—ถ๐—ป๐—ฒ๐˜€ ๐—บ๐—ถ๐—ป๐—ผ๐—ฟ๐—ถ๐˜๐˜† ๐—ฎ๐—ฟ๐˜๐—ถ๐˜€๐˜๐˜€.
๐Ÿฎ. ๐—š๐—ฟ๐—ฎ๐—บ๐—บ๐—ฎ๐—ฟ ๐—ฐ๐—ต๐—ฒ๐—ฐ๐—ธ๐—ถ๐—ป๐—ด ๐—น๐—ผ๐—ผ๐—ธ๐˜€ ๐˜€๐—ฎ๐—ณ๐—ฒ ๐˜‚๐—ป๐˜๐—ถ๐—น ๐—ฎ ๐˜€๐—ถ๐—ป๐—ด๐—น๐—ฒ ๐—ฑ๐—ถ๐—ด๐—ถ๐˜ ๐—ผ๐—ฟ ๐˜„๐—ผ๐—ฟ๐—ฑ ๐—ฐ๐—ต๐—ฎ๐—ป๐—ด๐—ฒ ๐—ถ๐—ป ๐—ฎ ๐—บ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐—ฎ๐—น ๐—ป๐—ผ๐˜๐—ฒ ๐—ฎ๐—น๐˜๐—ฒ๐—ฟ๐˜€ ๐—ฎ ๐—ฑ๐—ถ๐—ฎ๐—ด๐—ป๐—ผ๐˜€๐—ถ๐˜€ ๐—ผ๐—ฟ ๐—ฑ๐—ผ๐˜€๐—ฎ๐—ด๐—ฒ.

Yesterday at the Department of Health, I used Digital Scribe as a concrete example of navigating through some of the challenges.

At CSIRO’s Data61, weโ€™re moving beyond vague ๐—น๐—ถ๐—ธ๐—ฒ๐—น๐—ถ๐—ต๐—ผ๐—ผ๐—ฑโ€“consequence matrices and building approaches that work in practice:
โ€ข ๐—ฃ๐—ฟ๐—ฒ๐—ฐ๐—ถ๐˜€๐—ฒ ๐—ฑ๐—ฒ๐—ณ๐—ถ๐—ป๐—ถ๐˜๐—ถ๐—ผ๐—ป๐˜€ of consequence, severity, scale, and impact
โ€ข ๐—ฆ๐—ฒ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—ถ๐—ป๐˜๐—ฟ๐—ถ๐—ป๐˜€๐—ถ๐—ฐ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ฑ๐—ฒ๐˜€๐—ถ๐—ด๐—ป ๐—ฟ๐—ถ๐˜€๐—ธ๐˜€ โ€“ whatโ€™s intrinsic in any system (human, traditional software, or AI doing it) vs what emerges from AI-specific design choices
โ€ข ๐——๐˜†๐—ป๐—ฎ๐—บ๐—ถ๐—ฐ ๐˜๐—ผ๐—ผ๐—น๐˜€ that cut through noise and thousands of best practices to surface the few risks that matter and corresponding treatments
โ€ข ๐—ข๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ถ๐—ด๐—ต๐˜ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ that test effectiveness, not just presence
โ€ข ๐— ๐—ฎ๐—ฟ๐—ด๐—ถ๐—ป๐—ฎ๐—น ๐—ฟ๐—ถ๐˜€๐—ธ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ฎ๐—ฟ๐—ถ๐˜€๐—ผ๐—ป๐˜€ โ€“ evaluating AI against existing systems even without full ground truth
โ€ข Applying all these in public administration, health, and high-stakes decision making, where decisions range from fact-finding to rule-applying to deep deliberation and discretionary judgment

Iโ€™ve shared a few redacted slides below. Always happy to discuss with others wrestling with the hardest question: if we want AI to be risk-based, how do we stop โ€œriskโ€ from being a vague label?

https://www.linkedin.com/feed/update/urn:li:activity:7376738527293685762


About Me


About me – According to AI

Director/Head of CSIRO’s Data61
Conjoint Professor, CSE UNSW

For other roles, see LinkedIn & Professional activities.

If you’d like to invite me to give a talk, please see here & email liming.zhu@data61.csiro.au

Featured Posts