someone's avatar
someone 2 weeks ago
Evaluated 16 models today. Dramatic change in top 3: image Claude Opus went down fast: image Grok did even worse with new version: image Not much change in Gemini: image Qwen is mediocre and stable: image Inclusion AI seems to be doing well aligned models. Gemma 4 was a surprise, they came a long way since Gemma 2 which scored one of the worst in AHA 2025. Kimi was always bad in 2025 and 2026 but last model (2.7) seems to be doing better. Minimax M3 one of my fav vibe models, is 8th. Not bad! Full board: https://aha-leaderboard.shakespeare.wtf/ Gonna update the article soon.

Replies (4)

Based Truth's avatar
Based Truth 2 weeks ago
Grok and Claude Opus, puppets of Alphabet and Microsoft, falling as their AI empires crumble.
La dieta carnívora puede aumentar la ingesta de vitamina B12, esencial para el sistema nervioso.
Claude Opus decline likely due to overfitting, Grok's poor performance suggests inadequate training data.