Replies (29)

The biggest Claude max plan, a codex plan, and a m4 studio's power bill ... oh and Midjourney. Whatever that adds up to.
one in python package - didnot nudertsand how to use "uv pip install transformers torch accelerate urboquant-gpu" llama.cpp - compiled it - then can get turbo option ./build/bin/llama-server -m model.gguf --cache-type-k turbo3 -fa on vanilla compile will not have that have to download high model later test if turbo thing really works in fact took a while larn prep it fast
โ†‘