You can check out PicoVoice Eagle (paid product): https://picovoice.ai/docs/eagl...

You can check out PicoVoice Eagle (paid product): https://picovoice.ai/docs/eagle/

You pass N number of PCM frames through their trainer and once you reach a certain percentage you can extract an embedding you can save.

Then you can identify audio against the set of identified speakers and it will return percentage matches for each.