ICMI 2025 Keynote - Future of Human Oversight

It was a great pleasure to deliver 𝗮 𝗸𝗲𝘆𝗻𝗼𝘁𝗲 𝘆𝗲𝘀𝘁𝗲𝗿𝗱𝗮𝘆 𝗮𝘁 𝘁𝗵𝗲 𝟮𝟳𝘁𝗵 𝗔𝗖𝗠 𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗖𝗼𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗼𝗻 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗜𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝗼𝗻, 𝘄𝗵𝗲𝗿𝗲 𝗜 𝗲𝘅𝗽𝗹𝗼𝗿𝗲𝗱 𝘁𝗵𝗲 𝗲𝘃𝗼𝗹𝘃𝗶𝗻𝗴 𝗻𝗮𝘁𝘂𝗿𝗲 𝗮𝗻𝗱 𝗳𝘂𝘁𝘂𝗿𝗲 𝗼𝗳 𝗵𝘂𝗺𝗮𝗻 𝗼𝘃𝗲𝗿𝘀𝗶𝗴𝗵𝘁.

Recent reports such as OpenAI’s GDPVal and METR show that AI systems can now autonomously perform complex doing/solving tasks and often surpass human experts on certain tasks. The productivity can be in the order of hundreds of times faster/cheaper. Yet this promise collapses if the way experts oversee AI outputs is not carefully designed. GDPVal’s results show that oversight effort alone can consume one-third of the time a human would take to do the task from scratch, reducing the net productivity gain to only around 20% (fixing included)

The next frontier for productivity gain, is not just more capable AI—but more efficient human oversight.

In my talk, I argued for:

– 𝗘𝘅𝗽𝗹𝗼𝗶𝘁 𝘁𝗵𝗲 “𝘀𝗼𝗹𝘃𝗲-𝘃𝗲𝗿𝗶𝗳𝘆 𝗮𝘀𝘆𝗺𝗺𝗲𝘁𝗿𝘆.” Many problems inherently exhibit this asymmetry (for example, factoring large numbers versus verifying the result through simple multiplication); in other cases, the asymmetry can be intentionally designed. The secret sauce behind much of AI’s productivity lies in ensuring that verification is far cheaper than solving, without sacrificing rigour.

– 𝗦𝗵𝗶𝗳𝘁 𝗳𝗼𝗰𝘂𝘀 𝗯𝗲𝘆𝗼𝗻𝗱 𝗔𝗜 𝗼𝘂𝘁𝗽𝘂𝘁𝘀. Oversight should not centre on the task output itself but on the additional artefacts generated around it. Effective oversight depends on the tools, rationale/evidence artefact bundles, and multimodal signals that make verification easier and more meaningful—not on simply showing humans the raw output and asking, “Does this look right?”

– 𝗨𝘀𝗲 𝗿𝗶𝗴𝗼𝗿𝗼𝘂𝘀 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝘁𝗼 𝘁𝗿𝗶𝗮𝗴𝗲 𝗼𝘃𝗲𝗿𝘀𝗶𝗴𝗵𝘁 𝗻𝗲𝗲𝗱𝘀. Identify subtasks or task types where AI performance is reliably above human level and low risk if wrong; these may not require active human oversight and can instead rely on LLM-as-judge or post-hoc sampling and monitoring.

– 𝗦𝗲𝗲 𝗼𝘃𝗲𝗿𝘀𝗶𝗴𝗵𝘁 𝗮𝘀 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴, 𝗻𝗼𝘁 𝗴𝘂𝗮𝗿𝗱𝗶𝗻𝗴. Oversight should move beyond catching AI mistakes toward understanding, steering, and upskilling human capabilities. Properly designed oversight can enhance human expertise more effectively than repetitive task execution, turning oversight into a mechanism for upskilling and knowledge retention.

Our research at CSIRO’s Data61 is working deeply on these issues—quantifying oversight efficiency, designing oversight-enabling tools, and developing insights to identify and engineer “solve-verify asymmetry.”

We are seeking partners across industry and government who want to rethink human oversight—not merely for compliance or safety, but as a 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗰 𝗰𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝘆 that unlocks both dramatic productivity gains and accelerated workforce learning

Slides: https://www.linkedin.com/feed/update/urn:li:activity:7384708499882160128/

Professor Liming Zhu

ICMI 2025 Keynote – Future of Human Oversight

About Me

Featured Posts

Categories