Runtime Guardrails for Foundation Models

The word โ€œ๐’ˆ๐’–๐’‚๐’“๐’…๐’“๐’‚๐’Š๐’โ€ has become very popular recently as a major approach for AI developers, deployers, and regulators to achieve responsible and safe AI. But what does it mean exactly? ๐Ÿค”

For some, it means anything that can help safeguard and achieve responsible and safe AI, ranging from governance practices to stakeholder engagement, design, testing/assessment/evaluation, and transparency mechanisms before deployment, and post-deployment runtime controls. However, this might be stretching the word a bit far. ๐Ÿ›ค๏ธ

Literally, guardrails, i.e., protective/derailment-prevention rails, are less about safer train design itself and more about preventing a train from derailing at runtime via outside control.

This is especially relevant for advanced accelerating trains we do not fully underrstand and find hard to steerโ€”๐š.๐ค.๐š ๐€๐ˆ. Once you deploy an AI model or an AI system, whether developed by others or by yourself, you need to control/steer and safeguard it via runtime guardrails built for your specific organisational context, risk appetite, and risk profiles. This is particularly crucial for many organisations using third-party AI models, especially the less controllable/steerable foundation models. ๐Ÿ”ง

While CSIRO’s Data61 works on some very specific guardrails for AI and agentic AI, we have just released a general paper on runtime guardrails and all the associated concepts. ๐Ÿ“„โœจ https://lnkd.in/guSScDcg


About Me

Research Director, CSIRO’s Data61
Conjoint Professor, CSE UNSW

For other roles, see LinkedIn & Professional activities.

If you’d like to invite me to give a talk, please see here & email liming.zhu@data61.csiro.au

Featured Posts

    Categories