Our Pioneering AI Safety Work Featured in the Latest Scientific Report

As foreshadowed in my last post, the most anticipated scientific report on the safety of advanced AI is out, led by Yoshua Bengio for the AI Safety Summit in Seoul. https://lnkd.in/gkmKEYUC It’s a balanced and thoughtful report. CSIRO’s Chief Scientist, Bronwyn Fox is on the Expert Advisory Panel, and I had the privilege of reviewing the draft. I’m happy to see CSIRO’s Data61 cited, highlighting three key approaches we have been championing internationally:

1. ๐Œ๐จ๐ซ๐ž ๐€๐ˆ ๐ฌ๐š๐Ÿ๐ž๐ญ๐ฒ ๐ซ๐ž๐ฌ๐ž๐š๐ซ๐œ๐ก ๐ฆ๐ฎ๐ฌ๐ญ ๐›๐ž ๐๐จ๐ง๐ž ๐š๐ญ ๐ญ๐ก๐ž ๐€๐ˆ ๐ฌ๐ฒ๐ฌ๐ญ๐ž๐ฆ/๐š๐ ๐ž๐ง๐ญ-๐ฅ๐ž๐ฏ๐ž๐ฅ, ๐›๐ž๐ฒ๐จ๐ง๐ ๐ญ๐ก๐ž ๐€๐ˆ ๐ฆ๐จ๐๐ž๐ฅ ๐ฅ๐ž๐ฏ๐ž๐ฅ. Out-of-model guardrails or the lack thereof (from smart input/output filters to sophisticated risk mitigation components) and AI model access to tools/knowledgebase and other environment affordances play an outsized role in risk compared to the AI model. Our work in the system architecture of foundation-model-based agents was cited:
Qinghua Lu, Liming Zhu, Xiwei (Sherry) Xu, Zhenchang Xing, Stefan Harrer, PhD, Jon Whittle, “๐‘‡๐‘œ๐‘ค๐‘Ž๐‘Ÿ๐‘‘๐‘  ๐‘…๐‘’๐‘ ๐‘๐‘œ๐‘›๐‘ ๐‘–๐‘๐‘™๐‘’ ๐บ๐‘’๐‘›๐‘’๐‘Ÿ๐‘Ž๐‘ก๐‘–๐‘ฃ๐‘’ ๐ด๐ผ: ๐ด ๐‘…๐‘’๐‘“๐‘’๐‘Ÿ๐‘’๐‘›๐‘๐‘’ ๐ด๐‘Ÿ๐‘โ„Ž๐‘–๐‘ก๐‘’๐‘๐‘ก๐‘ข๐‘Ÿ๐‘’ ๐‘“๐‘œ๐‘Ÿ ๐ท๐‘’๐‘ ๐‘–๐‘”๐‘›๐‘–๐‘›๐‘” ๐น๐‘œ๐‘ข๐‘›๐‘‘๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘€๐‘œ๐‘‘๐‘’๐‘™ ๐‘๐‘Ž๐‘ ๐‘’๐‘‘ ๐ด๐‘”๐‘’๐‘›๐‘ก๐‘ ,” (to be presented at in 2 weeks time at ๐ผ๐ถ๐‘†๐ด 2024. https://lnkd.in/gjgsE7RK ).

2. ๐Œ๐จ๐ซ๐ž ๐€๐ˆ ๐ฌ๐š๐Ÿ๐ž๐ญ๐ฒ ๐ซ๐ž๐ฌ๐ž๐š๐ซ๐œ๐ก ๐ฆ๐ฎ๐ฌ๐ญ ๐›๐ž ๐๐จ๐ง๐ž ๐Ÿ๐ซ๐จ๐ฆ ๐š๐ง ๐ž๐ง๐-๐ญ๐จ-๐ž๐ง๐ ๐ฉ๐จ๐ข๐ง๐ญ ๐จ๐Ÿ ๐ฏ๐ข๐ž๐ฐ. Many existing research efforts focus on a single step in the AI lifecycle, such as training. Often the same risk propagates through the entire lifecycle end-to-end, with each step mitigating or amplifying it. Without seeing these steps next to each other, gaps or wasteful efforts can occur. Our work on a lifecycle view of privacy and copyright in Generative AI was cited:
David Zhang et al., “๐‘ƒ๐‘Ÿ๐‘–๐‘ฃ๐‘Ž๐‘๐‘ฆ ๐‘Ž๐‘›๐‘‘ ๐ถ๐‘œ๐‘๐‘ฆ๐‘Ÿ๐‘–๐‘”โ„Ž๐‘ก ๐‘ƒ๐‘Ÿ๐‘œ๐‘ก๐‘’๐‘๐‘ก๐‘–๐‘œ๐‘› ๐‘–๐‘› ๐บ๐‘’๐‘›๐‘’๐‘Ÿ๐‘Ž๐‘ก๐‘–๐‘ฃ๐‘’ ๐ด๐ผ: ๐ด ๐ฟ๐‘–๐‘“๐‘’๐‘๐‘ฆ๐‘๐‘™๐‘’ ๐‘ƒ๐‘’๐‘Ÿ๐‘ ๐‘๐‘’๐‘๐‘ก๐‘–๐‘ฃ๐‘’” (๐ถ๐ด๐ผ๐‘ 2024; https://lnkd.in/ghNMM8sg).

3. ๐Œ๐จ๐ซ๐ž ๐€๐ˆ ๐ฌ๐š๐Ÿ๐ž๐ญ๐ฒ ๐ซ๐ž๐ฌ๐ž๐š๐ซ๐œ๐ก ๐ฆ๐ฎ๐ฌ๐ญ ๐›๐ž ๐๐จ๐ง๐ž ๐Ÿ๐ซ๐จ๐ฆ ๐š ๐ฆ๐ฎ๐ฅ๐ญ๐ข-๐ซ๐ข๐ฌ๐ค ๐ญ๐ซ๐š๐๐ž๐จ๐Ÿ๐Ÿ ๐ฉ๐จ๐ข๐ง๐ญ ๐จ๐Ÿ ๐ฏ๐ข๐ž๐ฐ, ๐ž๐ง๐š๐›๐ฅ๐ข๐ง๐  ๐ฌ๐ญ๐š๐ค๐ž๐ก๐จ๐ฅ๐๐ž๐ซ๐ฌ ๐ญ๐จ ๐ฆ๐š๐ค๐ž ๐œ๐จ๐ง๐ญ๐ž๐ฑ๐ญ-๐ฌ๐ฉ๐ž๐œ๐ข๐Ÿ๐ข๐œ ๐๐ž๐œ๐ข๐ฌ๐ข๐จ๐ง๐ฌ. Many existing research efforts take a single aspect and optimise it with a general approach for one set of stakeholders concerned about that risk. Our work in trading off privacy, fairness, and utility was cited:
David Zhang et al., “๐‘‡๐‘œ ๐‘๐‘’ ๐‘“๐‘œ๐‘Ÿ๐‘”๐‘œ๐‘ก๐‘ก๐‘’๐‘› ๐‘œ๐‘Ÿ ๐‘ก๐‘œ ๐‘๐‘’ ๐‘“๐‘Ž๐‘–๐‘Ÿ: ๐‘ข๐‘›๐‘ฃ๐‘’๐‘–๐‘™๐‘–๐‘›๐‘” ๐‘“๐‘Ž๐‘–๐‘Ÿ๐‘›๐‘’๐‘ ๐‘  ๐‘–๐‘š๐‘๐‘™๐‘–๐‘๐‘Ž๐‘ก๐‘–๐‘œ๐‘›๐‘  ๐‘œ๐‘“ ๐‘š๐‘Ž๐‘โ„Ž๐‘–๐‘›๐‘’ ๐‘ข๐‘›๐‘™๐‘’๐‘Ž๐‘Ÿ๐‘›๐‘–๐‘›๐‘” ๐‘š๐‘’๐‘กโ„Ž๐‘œ๐‘‘๐‘ ,” ๐ด๐ผ ๐‘Ž๐‘›๐‘‘ ๐ธ๐‘กโ„Ž๐‘–๐‘๐‘  4, 83โ€“93 (2024). https://lnkd.in/gXfY49NZ .

Overall, our vision is actually a more nuanced and balanced approach compared to the blind โ€œshift-leftโ€ methodology, which advocates for comprehensive risk elimination early or by design. One cannot effectively mitigate uncertainties too early in abstraction without downstream artifacts, data, and feedback. The key is the balance supported by measurement science and mutually reinforcing technical mitigations across the supply chain.


For more information, see here:
Book: https://lnkd.in/gsQz5swy
Science: https://lnkd.in/gPhid9tX


About Me

Research Director, CSIRO’s Data61
Conjoint Professor, CSE UNSW

For other roles, see LinkedIn & Professional activities.

If you’d like to invite me to give a talk, please see here & email liming.zhu@data61.csiro.au

Featured Posts

    Categories