GitHub
ggml : add CPU TurboQuant KV cache types (TBQ3_0 / TBQ4_0) by elusznik · Pull Request #21089 · ggml-org/llama.cpp
Summary
This PR adds CPU-only TurboQuant KV-cache support for two new cache types:
tbq3_0
tbq4_0
The scope is intentionally narrow for the first ...