On-device TTS is easy to implement and works reliably, but it tends to sound too robotic. In this case, I used a VITS model with a Piper-based implementation. I could have pushed for more natural-sounding output, but that would significantly increase processing requirements, limiting it to flagship devices. So I chose a middle ground between quality and performance.

Replies (1)

โ†‘