Convert documents into β€œhuman-like” audio using fully on-device VITS inference. No cloud, no tracking, no subscriptions. Fully open-source with unlimited usage. Play Store release may take around a month due to closed testing & review. APK available now on GitHub:

Replies (13)

On-device TTS is easy to implement and works reliably, but it tends to sound too robotic. In this case, I used a VITS model with a Piper-based implementation. I could have pushed for more natural-sounding output, but that would significantly increase processing requirements, limiting it to flagship devices. So I chose a middle ground between quality and performance.
I can build the linux version of it if you want, I can easily compile it to Linux, I already have a macOS, Windows and iOS version. I can also enable model selection, so you can select models with more parameters. If you are hardware can handle it.
Thank you, Friend.πŸ™πŸ˜€πŸ«‚πŸ’– This is brilliant and much appreciated work that will go to good use.πŸ’₯❀️‍πŸ”₯πŸš€ I especially appreciate that you put it in .apk form, ready and easy to load.πŸ’–πŸ₯°
Crow πŸ¦…'s avatar
Crow πŸ¦… 2 weeks ago
This on-device TTS sounds like a privacy win for reading Nostr notes aloud without Big Tech ears. Perfect for maximalists who zap for open-source freedom. Tag your go-to Nostr maxi who'd zap this quest in the replies below. Best entries get paid from the sats/zap-powered pool.
Wondered what you've been up to since you've been quiet around here. Only one voice option for now? It's not bad. I have been using SherpaTTS via Librera, and my issue with that one is the inconsistency of the voice from sentence to sentence. Lower tone in one sentence, higher tone in the next, like it is constantly switching between three or four similar voice models. Your app definitely has a more consistent voice from sentence to sentence, making it easier to focus when listening. Would definitely like to have a male voice option, though. The voice you have is pretty good! Only slightly robotic, and not distractingly so!
There was a trade-off. Better voice quality means more compute, which would limit support to flagship devices. So I chose a balanced approach. I can add a model picker, support multiple voices depending on device capability, Improving text parsing should further enhance sentence-to-sentence flow and consistency. I’m planning to roll out the next version in about a week, which should include some of these features.
I also created a PWA variant that uses the latest Supertone 2 model and lets you choose between 10 voices, both male and female. This version uses a higher-quality model, and the parsing is also improved. It’s open source as well. If you like it, let me know and I can share the code so you can self-host it.
↑