It was a great pleasure to deliver ๐ฎ ๐ธ๐ฒ๐๐ป๐ผ๐๐ฒ ๐๐ฒ๐๐๐ฒ๐ฟ๐ฑ๐ฎ๐ ๐ฎ๐ ๐๐ต๐ฒ ๐ฎ๐ณ๐๐ต ๐๐๐ ๐๐ป๐๐ฒ๐ฟ๐ป๐ฎ๐๐ถ๐ผ๐ป๐ฎ๐น ๐๐ผ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐ผ๐ป ๐ ๐๐น๐๐ถ๐บ๐ผ๐ฑ๐ฎ๐น ๐๐ป๐๐ฒ๐ฟ๐ฎ๐ฐ๐๐ถ๐ผ๐ป, ๐๐ต๐ฒ๐ฟ๐ฒ ๐ ๐ฒ๐
๐ฝ๐น๐ผ๐ฟ๐ฒ๐ฑ ๐๐ต๐ฒ ๐ฒ๐๐ผ๐น๐๐ถ๐ป๐ด ๐ป๐ฎ๐๐๐ฟ๐ฒ ๐ฎ๐ป๐ฑ ๐ณ๐๐๐๐ฟ๐ฒ ๐ผ๐ณ ๐ต๐๐บ๐ฎ๐ป ๐ผ๐๐ฒ๐ฟ๐๐ถ๐ด๐ต๐.
Recent reports such as OpenAIโs GDPVal and METR show that AI systems can now autonomously perform complex doing/solving tasks and often surpass human experts on certain tasks. The productivity can be in the order of hundreds of times faster/cheaper. Yet this promise collapses if the way experts oversee AI outputs is not carefully designed. GDPValโs results show that oversight effort alone can consume one-third of the time a human would take to do the task from scratch, reducing the net productivity gain to only around 20% (fixing included)
The next frontier for productivity gain, is not just more capable AIโbut more efficient human oversight.
In my talk, I argued for:
– ๐๐
๐ฝ๐น๐ผ๐ถ๐ ๐๐ต๐ฒ โ๐๐ผ๐น๐๐ฒ-๐๐ฒ๐ฟ๐ถ๐ณ๐ ๐ฎ๐๐๐บ๐บ๐ฒ๐๐ฟ๐.โ Many problems inherently exhibit this asymmetry (for example, factoring large numbers versus verifying the result through simple multiplication); in other cases, the asymmetry can be intentionally designed. The secret sauce behind much of AIโs productivity lies in ensuring that verification is far cheaper than solving, without sacrificing rigour.
– ๐ฆ๐ต๐ถ๐ณ๐ ๐ณ๐ผ๐ฐ๐๐ ๐ฏ๐ฒ๐๐ผ๐ป๐ฑ ๐๐ ๐ผ๐๐๐ฝ๐๐๐. Oversight should not centre on the task output itself but on the additional artefacts generated around it. Effective oversight depends on the tools, rationale/evidence artefact bundles, and multimodal signals that make verification easier and more meaningfulโnot on simply showing humans the raw output and asking, โDoes this look right?โ
– ๐จ๐๐ฒ ๐ฟ๐ถ๐ด๐ผ๐ฟ๐ผ๐๐ ๐ฒ๐๐ฎ๐น๐๐ฎ๐๐ถ๐ผ๐ป ๐๐ผ ๐๐ฟ๐ถ๐ฎ๐ด๐ฒ ๐ผ๐๐ฒ๐ฟ๐๐ถ๐ด๐ต๐ ๐ป๐ฒ๐ฒ๐ฑ๐. Identify subtasks or task types where AI performance is reliably above human level and low risk if wrong; these may not require active human oversight and can instead rely on LLM-as-judge or post-hoc sampling and monitoring.
– ๐ฆ๐ฒ๐ฒ ๐ผ๐๐ฒ๐ฟ๐๐ถ๐ด๐ต๐ ๐ฎ๐ ๐น๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด, ๐ป๐ผ๐ ๐ด๐๐ฎ๐ฟ๐ฑ๐ถ๐ป๐ด. Oversight should move beyond catching AI mistakes toward understanding, steering, and upskilling human capabilities. Properly designed oversight can enhance human expertise more effectively than repetitive task execution, turning oversight into a mechanism for upskilling and knowledge retention.
Our research at CSIRO’s Data61 is working deeply on these issuesโquantifying oversight efficiency, designing oversight-enabling tools, and developing insights to identify and engineer โsolve-verify asymmetry.โ
We are seeking partners across industry and government who want to rethink human oversightโnot merely for compliance or safety, but as a ๐๐๐ฟ๐ฎ๐๐ฒ๐ด๐ถ๐ฐ ๐ฐ๐ฎ๐ฝ๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ that unlocks both dramatic productivity gains and accelerated workforce learning
Slides: https://www.linkedin.com/feed/update/urn:li:activity:7384708499882160128/

