I often get asked which underlying foundation model one should choose for their use cases, and whether an organisation or a country should even train one given the immense compute and talent challenges. Itβs a complex space to navigate. π
This question becomes even more perplexing with some recent intriguing discoveries, such as:
1. Picking and merging existing models without significant compute can outperform more resource-intensive approaches. https://lnkd.in/gtCYmAZE
2. Scaling inference-time compute can yield better results than scaling model size. https://lnkd.in/gQJizp8q
3. Simple scaling strategies like βmore agents/XYZ-is-all-you-needβ can outperform convoluted scaffolding and prompt designs, such as https://lnkd.in/gAqbdbj3
We just released a paper to help you navigate the architecture options for building foundation model-based agents and applications. It covers things from design-time model building, selection, and merging, to runtime model/agent coordination, external capability access, and the usual reflection, planning, and learning considerations. π οΈ
One tantalising takeaway is that an organisation or country might not always need to spend huge resources to βtrainβ a model or settle for adapting at the edge for specific use. There are many options to π¬π’π π§π’ππ’πππ§ππ₯π² π π«π¨π° ππ‘π π ππ§ππ«ππ₯ πππ©πππ’π₯π’ππ’ππ¬ π¨π ππ§ ππ π¬π²π¬πππ¦βπ°π‘πππ‘ππ« π¬π¨π―ππ«ππ’π π§ π¨π« π¨ππ‘ππ«π°π’π¬π.Β What we need is more careful research and analysis. π
Paper link https://lnkd.in/gUj_wbmg