You pass N number of PCM frames through their trainer and once you reach a certain percentage you can extract an embedding you can save.
Then you can identify audio against the set of identified speakers and it will return percentage matches for each.
You pass N number of PCM frames through their trainer and once you reach a certain percentage you can extract an embedding you can save.
Then you can identify audio against the set of identified speakers and it will return percentage matches for each.