Thread - Nostr Hypermedia

Benchmarked 4 new models. Deepseek R1 score improved. All these are below average, so p(doom) probably increased! Coming soon: Kimi K2. They say it is very good at coding, but my leaderboard is about being beneficial to humans. So we will see! Full leaderboard https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08 More info

AHA Leaderboard

A Blog post by Emin Temiz on Hugging Face