On-device TTS is easy to implement and works reliably, but it tends to sound too robotic.
In this case, I used a VITS model with a Piper-based implementation. I could have pushed for more natural-sounding output, but that would significantly increase processing requirements, limiting it to flagship devices.
So I chose a middle ground between quality and performance.
Login to reply
Replies (1)
Is there anything I can do to run it on Linux and go all out on quality?