Bridging Gaps in AI Safety: New Framework for Comprehensive Evaluation

I’m excited to share our new paper on AI safety evaluation!

One of the biggest challenges I’ve encountered in my work is the confusion caused when different communities use the same terms to mean very different things. For example, “AI testing” means one thing to policymakers, another to AI system/software testers, and yet another to AI model testers. Add terms like evaluation, validation, verification, and assessment into the mix, and the misunderstandings multiply. Even authoritative sources like the OECD, AI safety summits, and government agencies often make glaring mistakes by approaching these terms from a single perspective.

Simply defining these terms won’t solve the problem, as each community has its own authoritative definitions and practical meanings. This can hinder efforts to assure AI model safety and AI system safety, leading to gaps, wasteful overlapping efforts and confusion among AI model developers, AI system developers, and AI system deployers.

Our new paper, “𝑨𝒏 𝑨𝑰 π‘Ίπ’šπ’”π’•π’†π’Ž π‘¬π’—π’‚π’π’–π’‚π’•π’Šπ’π’ π‘­π’“π’‚π’Žπ’†π’˜π’π’“π’Œ 𝒇𝒐𝒓 π‘¨π’…π’—π’‚π’π’„π’Šπ’π’ˆ 𝑨𝑰 π‘Ίπ’‚π’‡π’†π’•π’š: π‘»π’†π’“π’Žπ’Šπ’π’π’π’π’ˆπ’š, π‘»π’‚π’™π’π’π’π’Žπ’š, π‘³π’Šπ’‡π’†π’„π’šπ’„π’π’† π‘΄π’‚π’‘π’‘π’Šπ’π’ˆ,” (Boming Xia Qinghua Lu Zhenchang Xing) aims to bridge these gaps by positioning each community’s distinctive understanding in relation to each other. We also cover concepts like capability evaluation, benchmarks, red teaming, test data, test cases, and test suites to enhance AI safety from both model-level and system-level perspectives.

With the upcoming AI Safety Summit in South Korea and the imminent release of the International Scientific Report on Advanced AI Safety: INTERIM REPORT, led by Yoshua Bengio (which I had the privilege of reviewing and seeing CSIRO’s Data61 work being cited multiple times), this field is poised for significant growth. We hope our work clarifies directions and fosters better communication across communities and contributes to Australia’s AI Safety journey (National AI Centre)

Paper: https://arxiv.org/abs/2404.05388


About Me

Research Director, CSIRO’s Data61
Conjoint Professor, CSE UNSW

For other roles, see LinkedIn & Professional activities.

If you’d like to invite me to give a talk, please see here & email liming.zhu@data61.csiro.au

Featured Posts

    Categories