In 3 years, we will see LLM ASICs on a USB Stick.
This paper eliminates the need for costly matrix multiplication in LLMs claiming a 10x reduction of memory use during compute. If they can turn a 70b model into a 7b model, we are running these things on phones.


arXiv.org
Scalable MatMul-free Language Modeling
Large Language Models (LLMs) have fundamentally altered how we approach scaling in machine learning. However, these models pose substantial computa...