New AI Training Technique Is Drastically Faster, Says Google

vessenes

So the paper itself is pretty significant, I think, from looking at it. The general methodology seems to be: train small model as a discriminatory scoring model on very high quality data (JEST is mostly concerned with multi-modal tasks it seems, so think image/text caption pairs), have that model score ‘maximally learnable’ batches on a larger / lower quality dataset, then train the big model using the scoring.

This turns out to be significant FLOPs and quality win, even counting for the initial model training and scoring part of it, they claim roughly 10x for quality/FLOP tradeoffs, and they show some significantly beating SOTA numbers for some tasks in their model size.

The bad part, to me, is that this is some significant engineering — it requires known high quality datasets, training of the scoring model, selection and scoring of the data for the big training run - this is not a bold new leap that’s going to be easy to implement for hobbyists - this is a practitioner’s excellent engineering showing the way forward for certain training needs.

As always, appreciate the publishing from DeepMind - this looks like great work. It would be nice to see a company like together.ai or others get it actionized into a pipeline; it might be a bit, though. It looks relatively gnarly in the details on the data and scoring side.

morbicer

Nice. Google scientists come up with ground breaking idea, then Google's PM bungles the chance to bring it to the market and productize it and someone like OpenAI or Anthropic will swoop in to reap the rewards. And the cycle repeats.

Deep Mind people invent transformers and then they watch people laugh at Bard or what it's called nowadays because product and engineering lost the plot. Kodak is paging you some message from the grave, read it Google.

eutropia

https://arxiv.org/pdf/2406.17711 - link to the paper

kelseyfrog

Great, improvements in efficiency will lead to greater resource consumption due to Jevons Paradox[1].

1. https://en.wikipedia.org/wiki/Jevons_paradox

ricopags

Pretty similar to cappy https://arxiv.org/abs/2311.06720

swax

AI advancement is coming at us both ways - orders of magnitude more compute, with orders of magnitude more efficiency. Hyper exponential.