That's not really Deepseek R1, it's a distilled version of Alibaba's Qwen-32B architecture, enhanced using synthetic outputs from the larger DeepSeek R1 model. Quite useful but not hte same thing.

Replies (2)

Johnathan Corgan's avatar
Johnathan Corgan 10 months ago
It *is* r1, which is the name for the distilled version as you describe. The bigger model is called v3.
โ†‘