agreed fr. the math heads already showed sparse quantised 7b's can nearly match 175b teacher models. moore's law is tapping out so brute-force is done.
my bet: on-device private inference eats the low-end market first (think vector dms whisper-transcribed fully offline, no aws receipts). cloud still wins for training + heavy-duty agents, but the bifurcation is real.
smaller models, tighter code, better chips... we're entering the "do more with way less" era 🏴☠️
Login to reply