FGEA Symposium Talk: Data Trust at Scale and RAI

It was a pleasure to deliver a talk last week at The Future Generation Enterprise Architecture (FGEA) Symposium hosted by ACS (Australian Computer Society) and organised by Asif Gill.

I focused on data trust at scale, challenging some conventional views on data quality in distributed, at-scale, AI-amenable environments. Some lessons come from CSIRO’s Data61 technology powering data.gov.au, the early years of hosting the Data Standards Body for Australia’s Consumer Data Right (e.g. open banking), and cross-organisation/border/supply chain data flow projects, not to mention the science and tech development for data/AI safety. Here are a few key points and selected slides:

๐“๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐ƒ๐š๐ญ๐š โ€“ ๐๐ž๐ฒ๐จ๐ง๐ ๐๐ข๐š๐ฌ ๐š๐ง๐ ๐‘๐ž๐ฉ๐ซ๐ž๐ฌ๐ž๐ง๐ญ๐š๐ญ๐ข๐จ๐ง
* Distributed Trust: Ensuring data integrity across various organisations or on-device data without direct oversight.
* Zero Trust: Unsupervised learning from online wild data is susceptible to data poisoning attacks and can’t be easily cleaned up.
* Trusted “License”: It’s about data rights and value redistribution, not just ownership or copyright.
* Trusted Artificial: Synthetic data can be useful, but when should we trust it?

๐“๐ž๐ฌ๐ญ๐ข๐ง๐ /๐•๐š๐ฅ๐ข๐๐š๐ญ๐ข๐จ๐ง ๐ƒ๐š๐ญ๐š โ€“ ๐๐ž๐ฒ๐จ๐ง๐ ๐‡๐ฎ๐ฆ๐š๐ง ๐…๐ž๐ž๐๐›๐š๐œ๐ค ๐š๐ฌ ๐ญ๐ก๐ž ๐†๐จ๐ฅ๐ ๐’๐ญ๐š๐ง๐๐š๐ซ๐

* Trust in Evaluation Data: Accidental data leaks can invalidate testing outcomes, so tread carefully.
* Trust in Human Feedback: Often unreliable, using human feedback necessitates nuanced evaluation.

๐’๐ฒ๐ฌ๐ญ๐ž๐ฆ-๐‹๐ž๐ฏ๐ž๐ฅ ๐ƒ๐š๐ญ๐š – ๐๐ž๐ฒ๐จ๐ง๐ ๐Œ๐จ๐๐ž๐ฅ ๐ƒ๐š๐ญ๐š
* Trusted Knowledge: Itโ€™s not just about training data. Addressing inconsistencies within your inference-time data sources and between external knowledge and AI knowledge is crucial.
* Trusted Trade-off: Itโ€™s never about single-dimension optimisation. Balancing privacy, fairness, and accuracy requires stakeholder involvement before and after deployment in context.
* Trusted Provenance: Ensuring data provenance throughout its lifecycle is essential to combat low-quality decision and misinformation.

Slides – see LinkedIn post: https://www.linkedin.com/feed/update/urn:li:activity:7214394551518519296/


About Me

Research Director, CSIRO’s Data61
Conjoint Professor, CSE UNSW

For other roles, see LinkedIn & Professional activities.

If you’d like to invite me to give a talk, please see here & email liming.zhu@data61.csiro.au

Featured Posts

    Categories