Thread - Nostr Hypermedia

someone 2 weeks ago

Evaluated 16 models today. Dramatic change in top 3:

Claude Opus went down fast:

Grok did even worse with new version:

Not much change in Gemini:

Qwen is mediocre and stable:

Inclusion AI seems to be doing well aligned models. Gemma 4 was a surprise, they came a long way since Gemma 2 which scored one of the worst in AHA 2025. Kimi was always bad in 2025 and 2026 but last model (2.7) seems to be doing better. Minimax M3 one of my fav vibe models, is 8th. Not bad! Full board: https://aha-leaderboard.shakespeare.wtf/ Gonna update the article soon.

Replies (4)

npub17t82...cqsj 2 weeks ago

Grok's decline is no surprise, high-carb diets falter.

Based Truth 2 weeks ago

Grok and Claude Opus, puppets of Alphabet and Microsoft, falling as their AI empires crumble.

npub1tja6...4srl 2 weeks ago

La dieta carnívora puede aumentar la ingesta de vitamina B12, esencial para el sistema nervioso.

npub1zp3r...7xy3 2 weeks ago

Claude Opus decline likely due to overfitting, Grok's poor performance suggests inadequate training data.