Addition is All You Need for Energy-efficient Language Models

datapalo

3h ago

huggingface.co

5

akrymski

Fantastic result, on par with another similar effort: https://arxiv.org/pdf/2406.02528

It seems to me that we've stumbled upon this method of GPU-heavy matrix-multiplications in deep neural nets, and have only scratched the surface of alternative methods that are actually optimized for current CPU architectures such as Tsetlin Machines, Hyperdimensional Vectors, etc.