https://deepmind.google/models/gemini-diffusion/
This is really exciting stuff. Google released a diffusion language model. The speeds you can get with this type of architecture are absolutely nuts.
It's also good to see AI labs experimenting with different architectures. I have worried for a while that perhaps AI development has fallen into a local minimum and we didn't investigate other approaches deeply enough. Perhaps Transformers aren't the absolute best architecture for language models. But if you iterate on a substandard solution long enough it might be possible to outdo other approaches that won't be as good initially but would eventually lead to much better results if pursued long enough. Narrowing the search space too soon and only focusing on Transformer-like architectures might mean we might miss out on better solutions. Trying other architectures expands the search space.
Login to reply