https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/
The Gemma 3n model has so many optimizations that it’s hard to keep track of them. Take a read through the post and you’ll be up to speed with nearly every memory and compute optimization that’s been invented in the last few years.
I think the most interesting feature is the MatFormer architecture. The cool thing this lets you do is that just like with Matryoshka embedding where you can lop off the last part of the embedding and vary the size of the embedding depending on how accurate you need it to be, the new architecture lets you vary the size of the model on the fly depending on how much memory and compute you have available.
AFAIK this is a novel architecture for LLM models.
Also side note, kind of embarrassing that after all the stupid games that Meta played to get their huge crappy Llama 4 models to the top of the LMArena Gemma 3n still beat them with a tiny model.
Login to reply