Our Pioneering AI Safety Work Featured in the Latest Scientific Report

As foreshadowed in my last post, the most anticipated scientific report on the safety of advanced AI is out, led by Yoshua Bengio for the AI Safety Summit in Seoul. https://lnkd.in/gkmKEYUC It’s a balanced and thoughtful report. CSIRO’s Chief Scientist, Bronwyn Fox is on the Expert Advisory Panel, and I had the privilege of reviewing the draft. I’m happy to see CSIRO’s Data61 cited, highlighting three key approaches we have been championing internationally:

1. 𝐌𝐨𝐫𝐞 𝐀𝐈 𝐬𝐚𝐟𝐞𝐭𝐲 𝐫𝐞𝐬𝐞𝐚𝐫𝐜𝐡 𝐦𝐮𝐬𝐭 𝐛𝐞 𝐝𝐨𝐧𝐞 𝐚𝐭 𝐭𝐡𝐞 𝐀𝐈 𝐬𝐲𝐬𝐭𝐞𝐦/𝐚𝐠𝐞𝐧𝐭-𝐥𝐞𝐯𝐞𝐥, 𝐛𝐞𝐲𝐨𝐧𝐝 𝐭𝐡𝐞 𝐀𝐈 𝐦𝐨𝐝𝐞𝐥 𝐥𝐞𝐯𝐞𝐥. Out-of-model guardrails or the lack thereof (from smart input/output filters to sophisticated risk mitigation components) and AI model access to tools/knowledgebase and other environment affordances play an outsized role in risk compared to the AI model. Our work in the system architecture of foundation-model-based agents was cited:
Qinghua Lu, Liming Zhu, Xiwei (Sherry) Xu, Zhenchang Xing, Stefan Harrer, PhD, Jon Whittle, “𝑇𝑜𝑤𝑎𝑟𝑑𝑠 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑖𝑏𝑙𝑒 𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑣𝑒 𝐴𝐼: 𝐴 𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒 𝑓𝑜𝑟 𝐷𝑒𝑠𝑖𝑔𝑛𝑖𝑛𝑔 𝐹𝑜𝑢𝑛𝑑𝑎𝑡𝑖𝑜𝑛 𝑀𝑜𝑑𝑒𝑙 𝑏𝑎𝑠𝑒𝑑 𝐴𝑔𝑒𝑛𝑡𝑠,” (to be presented at in 2 weeks time at 𝐼𝐶𝑆𝐴 2024. https://lnkd.in/gjgsE7RK ).

2. 𝐌𝐨𝐫𝐞 𝐀𝐈 𝐬𝐚𝐟𝐞𝐭𝐲 𝐫𝐞𝐬𝐞𝐚𝐫𝐜𝐡 𝐦𝐮𝐬𝐭 𝐛𝐞 𝐝𝐨𝐧𝐞 𝐟𝐫𝐨𝐦 𝐚𝐧 𝐞𝐧𝐝-𝐭𝐨-𝐞𝐧𝐝 𝐩𝐨𝐢𝐧𝐭 𝐨𝐟 𝐯𝐢𝐞𝐰. Many existing research efforts focus on a single step in the AI lifecycle, such as training. Often the same risk propagates through the entire lifecycle end-to-end, with each step mitigating or amplifying it. Without seeing these steps next to each other, gaps or wasteful efforts can occur. Our work on a lifecycle view of privacy and copyright in Generative AI was cited:
David Zhang et al., “𝑃𝑟𝑖𝑣𝑎𝑐𝑦 𝑎𝑛𝑑 𝐶𝑜𝑝𝑦𝑟𝑖𝑔ℎ𝑡 𝑃𝑟𝑜𝑡𝑒𝑐𝑡𝑖𝑜𝑛 𝑖𝑛 𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑣𝑒 𝐴𝐼: 𝐴 𝐿𝑖𝑓𝑒𝑐𝑦𝑐𝑙𝑒 𝑃𝑒𝑟𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒” (𝐶𝐴𝐼𝑁 2024; https://lnkd.in/ghNMM8sg).

3. 𝐌𝐨𝐫𝐞 𝐀𝐈 𝐬𝐚𝐟𝐞𝐭𝐲 𝐫𝐞𝐬𝐞𝐚𝐫𝐜𝐡 𝐦𝐮𝐬𝐭 𝐛𝐞 𝐝𝐨𝐧𝐞 𝐟𝐫𝐨𝐦 𝐚 𝐦𝐮𝐥𝐭𝐢-𝐫𝐢𝐬𝐤 𝐭𝐫𝐚𝐝𝐞𝐨𝐟𝐟 𝐩𝐨𝐢𝐧𝐭 𝐨𝐟 𝐯𝐢𝐞𝐰, 𝐞𝐧𝐚𝐛𝐥𝐢𝐧𝐠 𝐬𝐭𝐚𝐤𝐞𝐡𝐨𝐥𝐝𝐞𝐫𝐬 𝐭𝐨 𝐦𝐚𝐤𝐞 𝐜𝐨𝐧𝐭𝐞𝐱𝐭-𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜 𝐝𝐞𝐜𝐢𝐬𝐢𝐨𝐧𝐬. Many existing research efforts take a single aspect and optimise it with a general approach for one set of stakeholders concerned about that risk. Our work in trading off privacy, fairness, and utility was cited:
David Zhang et al., “𝑇𝑜 𝑏𝑒 𝑓𝑜𝑟𝑔𝑜𝑡𝑡𝑒𝑛 𝑜𝑟 𝑡𝑜 𝑏𝑒 𝑓𝑎𝑖𝑟: 𝑢𝑛𝑣𝑒𝑖𝑙𝑖𝑛𝑔 𝑓𝑎𝑖𝑟𝑛𝑒𝑠𝑠 𝑖𝑚𝑝𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑚𝑎𝑐ℎ𝑖𝑛𝑒 𝑢𝑛𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑚𝑒𝑡ℎ𝑜𝑑𝑠,” 𝐴𝐼 𝑎𝑛𝑑 𝐸𝑡ℎ𝑖𝑐𝑠 4, 83–93 (2024). https://lnkd.in/gXfY49NZ .

Overall, our vision is actually a more nuanced and balanced approach compared to the blind “shift-left” methodology, which advocates for comprehensive risk elimination early or by design. One cannot effectively mitigate uncertainties too early in abstraction without downstream artifacts, data, and feedback. The key is the balance supported by measurement science and mutually reinforcing technical mitigations across the supply chain.


For more information, see here:
Book: https://lnkd.in/gsQz5swy
Science: https://lnkd.in/gPhid9tX


About Me


About me – According to AI

Research Director, CSIRO’s Data61
Conjoint Professor, CSE UNSW

For other roles, see LinkedIn & Professional activities.

If you’d like to invite me to give a talk, please see here & email liming.zhu@data61.csiro.au

Featured Posts

    Categories