Beyond Training: Expanding AI Capabilities Without Massive Compute Costs

I often get asked which underlying foundation model one should choose for their use cases, and whether an organisation or a country should even train one given the immense compute and talent challenges. It’s a complex space to navigate. 🌐

This question becomes even more perplexing with some recent intriguing discoveries, such as:
1. Picking and merging existing models without significant compute can outperform more resource-intensive approaches. https://lnkd.in/gtCYmAZE
2. Scaling inference-time compute can yield better results than scaling model size. https://lnkd.in/gQJizp8q
3. Simple scaling strategies like β€œmore agents/XYZ-is-all-you-need” can outperform convoluted scaffolding and prompt designs, such as https://lnkd.in/gAqbdbj3

We just released a paper to help you navigate the architecture options for building foundation model-based agents and applications. It covers things from design-time model building, selection, and merging, to runtime model/agent coordination, external capability access, and the usual reflection, planning, and learning considerations. πŸ› οΈ

One tantalising takeaway is that an organisation or country might not always need to spend huge resources to β€œtrain” a model or settle for adapting at the edge for specific use. There are many options to 𝐬𝐒𝐠𝐧𝐒𝐟𝐒𝐜𝐚𝐧𝐭π₯𝐲 𝐠𝐫𝐨𝐰 𝐭𝐑𝐞 𝐠𝐞𝐧𝐞𝐫𝐚π₯ πœπšπ©πšπ›π’π₯𝐒𝐭𝐒𝐞𝐬 𝐨𝐟 𝐚𝐧 π€πˆ π¬π²π¬π­πžπ¦β€”π°π‘πžπ­π‘πžπ« 𝐬𝐨𝐯𝐞𝐫𝐞𝐒𝐠𝐧 𝐨𝐫 𝐨𝐭𝐑𝐞𝐫𝐰𝐒𝐬𝐞.Β What we need is more careful research and analysis. πŸš€

Paper link https://lnkd.in/gUj_wbmg


About Me

Research Director, CSIRO’s Data61
Conjoint Professor, CSE UNSW

For other roles, see LinkedIn & Professional activities.

If you’d like to invite me to give a talk, please see here & email liming.zhu@data61.csiro.au

Featured Posts

    Categories