one in python package - didnot nudertsand how to use "uv pip install transformers torch accelerate urboquant-gpu"
llama.cpp - compiled it - then can get turbo option
./build/bin/llama-server -m model.gguf --cache-type-k turbo3 -fa on
vanilla compile will not have that
have to download high model later test if turbo thing really works in fact
took a while larn prep it fast
Login to reply