Takeaways on AI Safety from a Week in San Francisco

⭐️✨ It’s been an incredible week in San Francisco discussing AI Safety with experts across AI labs, civil society, government, and academia. A few key learnings:
🔍 𝐀𝐈 𝐒𝐚𝐟𝐞𝐭𝐲 𝐎𝐧𝐥𝐲 𝐌𝐚𝐭𝐭𝐞𝐫𝐬 𝐚𝐭 𝐭𝐡𝐞 𝐒𝐲𝐬𝐭𝐞𝐦 𝐋𝐞𝐯𝐞𝐥: Real capabilities emerge only when AI is tested in realistic environments with access to powerful tools, real-world settings, agentic planning, and ample inference time/self-correction. The new US-UK joint testing of Sonet-3.5 highlights this, particularly in tool use ablation analysis showing the capability difference.
⚙️ 𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐓𝐢𝐦𝐞 𝐒𝐜𝐚𝐥𝐢𝐧𝐠 > 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐓𝐢𝐦𝐞 𝐒𝐜𝐚𝐥𝐢𝐧𝐠?: Scaling at inference time, including temporary model updates, raw compute, and agent iteration numbers, is shaping up to be potentially more powerful than just training scaling laws. This opens up new possibilities for “sovereign AI.” No one entity controls the full “sovereign AI” stack—focusing on strategic elements like inference capabilities/stack is more vital than ever for many.
🧪 𝐓𝐡𝐞 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 𝐨𝐟 𝐀𝐈 𝐒𝐚𝐟𝐞𝐭𝐲: Robust evaluation is core to AI Safety. No amount of ad hoc red-teaming or probing will suffice. It demands a deep, multidisciplinary scientific approach—measurement science, risk quantification, and the study of complex systems.

All of this deeply resonates with CSIRO’s Data61 approach (which I presented at various occasions) to AI safety engineering https://lnkd.in/gPhid9tX:
➡️ End-to-end system evaluation
➡️ Focus on inference-time capabilities as key sovereign AI leverage
➡️ Multidisciplinary scientific rigor for AI safety evaluation

And while in SF, I couldn’t miss full autonomous driving—every trip was in a Waymo! 🚘✨ The ride? Smooth and confidently decisive, not what I had worried about being overly cautious or jerky. From the outside, it’s still a spectacle at times—people smiling, staring, waving, taking photos.

But one intriguing thing—some social behaviours around autonomous vehicles are emerging. Jaywalkers or people crossing backstreets seem more comfortable stepping in front of an AV than in front of a human-driven car. Is it that they trust the AV more or just don’t feel the need for politeness towards a machine? 🤔 Interesting times!

Professor Liming Zhu

Takeaways on AI Safety from a Week in San Francisco

About Me

Featured Posts

Categories