What do We Mean by Scalable AI?

At the recent GovAI Summit, I spoke on “Engineering Trustworthy AI at Scale: How CSIRO Enables Responsible AI Across Government.”

I have given similar talks before, but this one was a bit different. I focused more directly on a question that is often misunderstood: What do we actually mean by scalable AI?

Most discussions interpret scalability as simply larger systems or broader adoption. In practice, that is not the core issue. The real challenge is the journey from a single prototype/system that supports one use case, often built and evaluated with significant effort, to a system that supports many use cases, emerging risks, and continuously evolving AI capabilities, requirements, and constraints.

Along that journey, the problem is not scale in volume, but scale in friction.
Each additional use case often feels like starting again. Risk assessments take months and need to be repeated. Testing and evaluation struggle to keep up with rapidly evolving AI capabilities. Human-in-the-loop processes become a bottleneck.

A system that truly scales should behave differently. Adding one more use case should cost less than building from scratch. Model updates should not trigger full re-evaluation. Human oversight should be supported by well-designed specific tools, rather than equivalent to redoing the work.

In the talk, I shared three approaches we have been applying across government case studies.

First is what I call verification-first design. Instead of inserting AI into existing workflows and asking humans to check mixed-quality AI outputs, we redesign the workflow and the verification layer itself. Outputs that can be automatically verified are separated from genuinely hard part. With automated verification as a goal, human effort is focused where it actually matters.

Second is context-specific risk assessment, developed with the NSW Government Office for AI (Daniel Roelink). Rather than working through long lists of generic risks and controls, we narrow down to the small set that truly matters in a given AI design context, dramatically reducing the time spent on risk assessment. This also ensures effort is focused where it has real impact.

Third is how we think about use case prioritisation, in collaboration with the Audit Office of New South Wales (Xiaoyan Lu). Instead of evaluating each use case independently, we group them by shared technology patterns and invest in the underlying stack, along with reusable risk assessment/testing. Once that foundation is in place, the marginal cost of adding new use cases drops significantly.

What becomes clear across these examples is that, AI does not automatically deliver system-level productivity gains. It often creates local improvements while exposing bottlenecks elsewhere. Without redesign, those bottlenecks will dominate.

Scalable AI, in practice, is about identifying and removing these constraints early, so the system can grow with low marginal cost

Professor Liming Zhu

About Me

Featured Posts

Categories