That's not really Deepseek R1, it's a distilled version of Alibaba's Qwen-32B architecture, enhanced using synthetic outputs from the larger DeepSeek R1 model.
Quite useful but not hte same thing.
Login to reply
Replies (2)
Yes, it's all described on the model choice
It *is* r1, which is the name for the distilled version as you describe. The bigger model is called v3.