As foreshadowed in my last post, the most anticipated scientific report on the safety of advanced AI is out, led by Yoshua Bengio for the AI Safety Summit in Seoul. https://lnkd.in/gkmKEYUC It’s a balanced and thoughtful report. CSIRO’s Chief Scientist, Bronwyn Fox is on the Expert Advisory Panel, and I had the privilege of reviewing the draft. I’m happy to see CSIRO’s Data61 cited, highlighting three key approaches we have been championing internationally:
1. ๐๐จ๐ซ๐ ๐๐ ๐ฌ๐๐๐๐ญ๐ฒ ๐ซ๐๐ฌ๐๐๐ซ๐๐ก ๐ฆ๐ฎ๐ฌ๐ญ ๐๐ ๐๐จ๐ง๐ ๐๐ญ ๐ญ๐ก๐ ๐๐ ๐ฌ๐ฒ๐ฌ๐ญ๐๐ฆ/๐๐ ๐๐ง๐ญ-๐ฅ๐๐ฏ๐๐ฅ, ๐๐๐ฒ๐จ๐ง๐ ๐ญ๐ก๐ ๐๐ ๐ฆ๐จ๐๐๐ฅ ๐ฅ๐๐ฏ๐๐ฅ. Out-of-model guardrails or the lack thereof (from smart input/output filters to sophisticated risk mitigation components) and AI model access to tools/knowledgebase and other environment affordances play an outsized role in risk compared to the AI model. Our work in the system architecture of foundation-model-based agents was cited:
Qinghua Lu, Liming Zhu, Xiwei (Sherry) Xu, Zhenchang Xing, Stefan Harrer, PhD, Jon Whittle, “๐๐๐ค๐๐๐๐ ๐
๐๐ ๐๐๐๐ ๐๐๐๐ ๐บ๐๐๐๐๐๐ก๐๐ฃ๐ ๐ด๐ผ: ๐ด ๐
๐๐๐๐๐๐๐๐ ๐ด๐๐โ๐๐ก๐๐๐ก๐ข๐๐ ๐๐๐ ๐ท๐๐ ๐๐๐๐๐๐ ๐น๐๐ข๐๐๐๐ก๐๐๐ ๐๐๐๐๐ ๐๐๐ ๐๐ ๐ด๐๐๐๐ก๐ ,” (to be presented at in 2 weeks time at ๐ผ๐ถ๐๐ด 2024. https://lnkd.in/gjgsE7RK ).
2. ๐๐จ๐ซ๐ ๐๐ ๐ฌ๐๐๐๐ญ๐ฒ ๐ซ๐๐ฌ๐๐๐ซ๐๐ก ๐ฆ๐ฎ๐ฌ๐ญ ๐๐ ๐๐จ๐ง๐ ๐๐ซ๐จ๐ฆ ๐๐ง ๐๐ง๐-๐ญ๐จ-๐๐ง๐ ๐ฉ๐จ๐ข๐ง๐ญ ๐จ๐ ๐ฏ๐ข๐๐ฐ. Many existing research efforts focus on a single step in the AI lifecycle, such as training. Often the same risk propagates through the entire lifecycle end-to-end, with each step mitigating or amplifying it. Without seeing these steps next to each other, gaps or wasteful efforts can occur. Our work on a lifecycle view of privacy and copyright in Generative AI was cited:
David Zhang et al., “๐๐๐๐ฃ๐๐๐ฆ ๐๐๐ ๐ถ๐๐๐ฆ๐๐๐โ๐ก ๐๐๐๐ก๐๐๐ก๐๐๐ ๐๐ ๐บ๐๐๐๐๐๐ก๐๐ฃ๐ ๐ด๐ผ: ๐ด ๐ฟ๐๐๐๐๐ฆ๐๐๐ ๐๐๐๐ ๐๐๐๐ก๐๐ฃ๐” (๐ถ๐ด๐ผ๐ 2024; https://lnkd.in/ghNMM8sg).
3. ๐๐จ๐ซ๐ ๐๐ ๐ฌ๐๐๐๐ญ๐ฒ ๐ซ๐๐ฌ๐๐๐ซ๐๐ก ๐ฆ๐ฎ๐ฌ๐ญ ๐๐ ๐๐จ๐ง๐ ๐๐ซ๐จ๐ฆ ๐ ๐ฆ๐ฎ๐ฅ๐ญ๐ข-๐ซ๐ข๐ฌ๐ค ๐ญ๐ซ๐๐๐๐จ๐๐ ๐ฉ๐จ๐ข๐ง๐ญ ๐จ๐ ๐ฏ๐ข๐๐ฐ, ๐๐ง๐๐๐ฅ๐ข๐ง๐ ๐ฌ๐ญ๐๐ค๐๐ก๐จ๐ฅ๐๐๐ซ๐ฌ ๐ญ๐จ ๐ฆ๐๐ค๐ ๐๐จ๐ง๐ญ๐๐ฑ๐ญ-๐ฌ๐ฉ๐๐๐ข๐๐ข๐ ๐๐๐๐ข๐ฌ๐ข๐จ๐ง๐ฌ. Many existing research efforts take a single aspect and optimise it with a general approach for one set of stakeholders concerned about that risk. Our work in trading off privacy, fairness, and utility was cited:
David Zhang et al., “๐๐ ๐๐ ๐๐๐๐๐๐ก๐ก๐๐ ๐๐ ๐ก๐ ๐๐ ๐๐๐๐: ๐ข๐๐ฃ๐๐๐๐๐๐ ๐๐๐๐๐๐๐ ๐ ๐๐๐๐๐๐๐๐ก๐๐๐๐ ๐๐ ๐๐๐โ๐๐๐ ๐ข๐๐๐๐๐๐๐๐๐ ๐๐๐กโ๐๐๐ ,” ๐ด๐ผ ๐๐๐ ๐ธ๐กโ๐๐๐ 4, 83โ93 (2024). https://lnkd.in/gXfY49NZ .
Overall, our vision is actually a more nuanced and balanced approach compared to the blind โshift-leftโ methodology, which advocates for comprehensive risk elimination early or by design. One cannot effectively mitigate uncertainties too early in abstraction without downstream artifacts, data, and feedback. The key is the balance supported by measurement science and mutually reinforcing technical mitigations across the supply chain.
For more information, see here:
Book: https://lnkd.in/gsQz5swy
Science: https://lnkd.in/gPhid9tX