Global
0 replies
⚙️
Login
someone's avatar
someone 5 months ago
Benchmarked 4 new models. Deepseek R1 score improved. All these are below average, so p(doom) probably increased! Coming soon: Kimi K2. They say it is very good at coding, but my leaderboard is about being beneficial to humans. So we will see! Full leaderboard https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08 More info
AHA Leaderboard
A Blog post by Emin Temiz on Hugging Face
image
Login to reply

Generated: 05:51:23 · Zero-JS Hypermedia Browser

↑