PSA: if you’re running oQ4 quantized MoE LLMs on your Mac, do yourself a favor and upgrade to oQ6. Yeah, it’s a little more RAM usage but oQ4 seems to be a bit too far over the cliff to be useful and stay out of reasoning loops. If it’s getting into reasoning loops then it’s more likely to generate garbage in its responses. Dense models are too slow for my liking on a MacBook Pro M4 Pro with 48 GB RAM, but oQ4 probably performs better on them.
If you’re on a Mac and not running one of the MLX models, you’re unwittingly costing yourself RAM usage and token generation speed. For instance, running Gemma 4 26b Q4_K_M in GGUF format came in around 21 GB RAM usage, but the same model in MLX oQ4 (oQ quantizations are MLX), came in around 16 GB RAM and generated tokens about 30% faster. Same model, just optimized for Apple Silicon. It makes a difference.
Here’s a pretty bad #npm supply-chain attack with #Axios.
Check to see if you’ve got Axios 1.14.1 or 0.30.4 installed. If you do, dig further. Check using “npm list -g axios”.