There was a trade-off. Better voice quality means more compute, which would limit support to flagship devices. So I chose a balanced approach. I can add a model picker, support multiple voices depending on device capability, Improving text parsing should further enhance sentence-to-sentence flow and consistency. I’m planning to roll out the next version in about a week, which should include some of these features.

Replies (1)

I also created a PWA variant that uses the latest Supertone 2 model and lets you choose between 10 voices, both male and female. This version uses a higher-quality model, and the parsing is also improved. It’s open source as well. If you like it, let me know and I can share the code so you can self-host it.
↑