A great panel yesterday at the Aus Gov Data Summit. Here were my key messages.
Why do we call AI a black box when we can inspect every parameter, activation, and optimisation step? The challenge sits with understanding. Even with full visibility, we still cannot explain why the system behaves the way it does. But we have long operated under partial understanding:
– Software is just logic over 1s and 0s. In theory, correctness can be “proven”. In practice, testing relies on a small, best-effort subset of cases because the interaction space is too large to understand exhaustively.
– Aircraft are often seen as the pinnacle of engineering certainty. In reality, fluid dynamics is approximated, intuitions about lift are still debated, and turbulence remains difficult to model. Flight is possible because uncertainty is bounded through testing, certification, and redundancy.
– Then there is the human brain. Its internal workings are largely inaccessible. Society functions through institutions, peer review, procedures, and accountability.
Across these domains, the common requirement is reliable behaviour under controlled conditions.
This leads to two practical directions that CSIRO focuses on.
First, evaluation is not just measurement, it is design. Evaluation separates reliable behaviour from unreliable behaviour. Once the uncertain edges are identified, systems are designed accordingly: scalable automated guardrails where possible, and targeted human-in-the-loop processes with the right tools where necessary. Not everywhere, but where it adds value.
Second, system-level control matters more than model-level fixes.
Complex systems are rarely fixed by attempting to control them from the inside, particularly when internal behaviour is not well understood. Instead, they are managed through monitoring, guardrails, redundancy, and governance. AI follows the same pattern.
I ended the panel with a point that often makes people pause.
As AI systems outperform humans across more tasks, the instinct is to search for areas where humans still have the advantage. That is understandable, but it narrows the question.
If a system, with proper evaluation and guardrails, is demonstrably reliable and exceeds human performance on a task or domain, then insisting on a human in the loop beyond design and evaluation does not automatically improve safety. It can introduce noise, inconsistency, and false assurance.
A more useful question is what humans should do within domains where AI already outperforms them, especially when the underlying mechanisms remain unclear.
The role shifts. Humans move from being the primary doer and error catcher to setting values on what to explore, interpreting outcomes, and translating AI’s superior performance into human-understandable knowledge that advances the boundaries of what we know


Leave a Reply