I absolutely love how good the voices are in the VCTK-VIS dataset (109 of them!)...

I absolutely love how good the voices are in the VCTK-VIS dataset (109 of them!). I found it easy to install Coqui on WSL and it is able to use CUDA + the GPU quite effectively. p236 male and p237 female are my choices, but holy cow, 109 quality voices still blows my mind. Crazy how you had to pay for a good TTS just a year ago, but now, it's commoditized. Hope you find this useful:

    CUDA_VISIBLE_DEVICES="0" python TTS/server/server.py --model_name tts_models/en/vctk/vits --use_cuda True


 def play_sound(response):
     #learning : you have to use a semaphore to serialize calls to winsound.PlaySound(), which freaks out with "Failed to play sound" if you try to play 2 clips at once
     semaphore.acquire()
     try:
         winsound.PlaySound(response.content, winsound.SND_MEMORY | winsound.SND_NOSTOP)
     finally:
         # Always release the permit, even if PlaySound raises an exception
         semaphore.release()