For fully local functionality, you need to run something like ollama on your device. You can then add localhost:11434/v1 as a custom provider in Shakespeare and it will run 100% on device. In your current setup Shakespeare is running on your device but the AI provider isn't.

Replies (2)

Note that to run local models you need a computer with a very powerful graphics card, and you still won't get close to the level of performance of models like Claude or even GLM. You need $1M worth of graphics cards not even counting electricity to get something remotely close to GLM