Speaker Diarization 3.1 is a pure PyTorch implementation that identifies and segments speakers in audio. It outputs speaker diarization annotations marking who spoke when.
Reach for it when an agent must separate multiple speakers in a recording, for instance to attribute turns in a multi-party conversation or meeting.