AMD Unveils Its First Small Language Model AMD-135M

diggan

> The training code, dataset and weights for this model are open sourced so that developers can reproduce the model and help train other SLMs and LLMs.

Wow, an actual open source language model (first of its kind [from a larger company] maybe even?), includes all you need to be able to recreate it from scratch. Thanks AMD!

Available under this funky GitHub organization it seems: https://github.com/AMD-AIG-AIMA/AMD-LLM

n_ary

Now this here is the beginning on real innovation of AI. With AMD coming in(albeit late and slowly), meta with LLama improving, we will soon see some real adaptation and development in next few thousand days. At this moment, I see OAI as the yahoo of the pre-Google era.

highfrequency

Looks like they are using sixteen $13k GPUs [1] (around $210k hardware) for 6 days of training.

Anyone know the recommended cloud provider and equivalent rental price?

[1] https://www.wiredzone.com/shop/product/10025451-supermicro-g...

benterix

I'm happy to see a truly open source model.

Actually, AMD has excellent reasons to make this kind of development and I hope they continue.

luyu_wu

The section on speculative execution is interesting. "This approach allows each forward pass to generate multiple tokens without compromising performance, thereby significantly reducing memory access consumption, and enabling several orders of magnitude speed improvements."

Does anyone know if the "several orders of magnitude speed improvement" is accurate? I'm doubtful.

Very interesting though! I'll be playing around with this on the weekend!

craftkiller

I see multiple mentions of NPU on this page, but its still not clear to me: is this something that can finally use the NPU on my processor?

loufe

It's always encouraging to see wider hardware platform competition for AI inference and training. Access to affordable and capable hardware for consumers will only benefit (I imagine) from increasing competition.

bjt12345

> [1] The training code for AMD-135M is based on TinyLlama, utilizing multi-node distributed training with PyTorch FSDP.

I thought PyTorch didn't work well with AMD architecture, and read of many people using JAX instead?

Decabytes

Since most people can’t run these LLMs locally, I wonder what a model would look like where we have hyper tuned models for specific purposes, IE a model for code, a model for prose, etc. you have a director model that interprets what downstream model should be used and then it runs that. That way you can run the model locally, without needing beefy GPUs. It’s a trade off of using more disk space vs needing more vram

rsolva

Can this model run on ollama?