Whisper-WebUI works but it's not simple to set up. It produces a subtitle file, and can include word-level timestamps and diarization, but you'd need something else to create a video with the subs hardcoded.
https://github.com/jhj0517/Whisper-WebUI
Login to reply