Looking ahead, updating the library is really the best course of action, but given your current setup, the migration process is quite complicated:
Path B — later migration: use Community-1 and pyannote.audio 4.x
Short version
Path B means intentionally leaving the old pyannote.audio==3.3.0 recovery stack and moving to the newer pyannote stack:
pyannote.audio 4.x
pyannote/speaker-diarization-community-1
Pipeline.from_pretrained(..., token=...)
output.speaker_diarization
output.exclusive_speaker_diarization
TorchCodec-backed audio decoding
FFmpeg installed
This is not just a one-line model change.
It is a real migration because your current brouhaha dependency pins:
pyannote-audio==3.3.0
while the newer Community-1 examples expect the newer pyannote API surface:
Pipeline.from_pretrained(
"pyannote/speaker-diarization-community-1",
token="<HUGGINGFACE_ACCESS_TOKEN>",
)
The current pyannote README shows this community-1 + token=... style and says FFmpeg must be installed because TorchCodec handles audio decoding:
Why you should not do Path B casually
Your current stack has two separate constraints:
brouhaha==0.9.0
↓
requires pyannote-audio==3.3.0
and:
Community-1 / pyannote 4.x examples
↓
use token=...
use output.speaker_diarization
use output.exclusive_speaker_diarization
expect TorchCodec/FFmpeg audio decoding
Those are different worlds.
The pyannote 3.3 recovery world uses:
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token="<HUGGINGFACE_ACCESS_TOKEN>",
)
diarization = pipeline("audio.wav")
for turn, _, speaker in diarization.itertracks(yield_label=True):
...
The pyannote 4 / Community-1 world uses:
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-community-1",
token="<HUGGINGFACE_ACCESS_TOKEN>",
)
output = pipeline("audio.wav")
for turn, speaker in output.speaker_diarization:
...
And, when available, the newer path also gives:
output.exclusive_speaker_diarization
That exclusive_speaker_diarization output is especially relevant for your transcription project because the Community-1 model card describes it as simplifying reconciliation between diarization timestamps and transcription timestamps.
Source links:
What Path B is for
Choose Path B if you want one or more of these:
- newer
pyannote.audio API;
- the open-source
pyannote/speaker-diarization-community-1 pipeline;
- better diarization quality than the old
speaker-diarization-3.1 baseline;
- easier reconciliation with transcripts using
exclusive_speaker_diarization;
- a forward-looking stack instead of living on TorchAudio 2.8 deprecation warnings;
- a cleaner long-term project layout.
Do not choose Path B if your immediate goal is only:
make the old script run with the least changes
For the least-change recovery path, stay with:
pyannote.audio==3.3.0
pyannote/speaker-diarization-3.1
use_auth_token=...
torch==2.8.0
torchaudio==2.8.0
torchcodec==0.7.*
Path B is the better long-term migration, but the worse emergency fix.
The main blocker: brouhaha
The problem
Your resolver already told you:
brouhaha==0.9.0 depends on pyannote-audio==3.3.0
So this cannot work:
"pyannote.audio>=4,<5",
"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",
unless you change something about brouhaha.
The resolver is correct. If brouhaha requires exactly:
pyannote-audio==3.3.0
then the environment cannot also contain:
pyannote.audio>=4
Your options
You have five realistic choices.
| Option |
What it means |
Good if |
Risk |
Remove brouhaha |
Delete it from dependencies and remove/replace its VAD calls. |
You do not strictly need Brouhaha VAD. |
You may lose the current VAD behavior. |
Replace brouhaha |
Use pyannote’s own diarization behavior, faster-whisper VAD, Silero VAD, or another VAD stage. |
You only used Brouhaha as a helper. |
May change segmentation and final transcript quality. |
Fork/edit brouhaha |
Change its dependency metadata from pyannote-audio==3.3.0 to a looser or newer version. |
You control the local package and can test it. |
Its code may actually depend on pyannote 3.3 internals. |
| Split environments |
Run Brouhaha preprocessing in one script/env, then run pyannote 4 diarization in another script/env. |
You need Brouhaha but also want Community-1. |
More moving parts and file handoff. |
| Stay on Path A |
Do not migrate now. Keep pyannote 3.3. |
You want stability first. |
You do not get Community-1 yet. |
My recommendation: do not start by editing brouhaha dependency metadata blindly.
First inspect why it pins pyannote:
grep -R "pyannote" -n /home/user/diarization/repos/.venv/brouhaha-vad
Look for files like:
pyproject.toml
setup.py
setup.cfg
requirements.txt
Then inspect imports:
grep -R "from pyannote\|import pyannote" -n /home/user/diarization/repos/.venv/brouhaha-vad
If Brouhaha only uses public, stable APIs, loosening the pin might work. If it uses pyannote internals or pyannote 3.x-specific output structures, expect breakage.
Recommended migration strategy
Do not migrate the production script all at once.
Use a three-stage migration.
Stage 1: build a tiny Community-1 proof-of-life script
Stage 2: port only diarization code
Stage 3: reintegrate transcription, VAD, and speaker-label alignment
This prevents one common failure mode:
changed model + changed pyannote version + changed TorchCodec + changed FFmpeg + changed CUDA + changed VAD + changed transcript alignment
↓
too many variables
↓
impossible to tell what broke
Stage 1 — prove Community-1 works by itself
Create a new test file, separate from diaritranscribe3.py.
For example:
check_pyannote4_community1.py
Use this as a minimal proof-of-life script:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
# "pyannote.audio>=4,<5",
# "torch",
# "torchaudio",
# "torchcodec",
# ]
# ///
import os
from importlib.metadata import version
import torch
from pyannote.audio import Pipeline
from pyannote.audio.pipelines.utils.hook import ProgressHook
MODEL_ID = "pyannote/speaker-diarization-community-1"
AUDIO_PATH = "audio.wav"
token = os.environ.get("HF_TOKEN")
if not token:
raise RuntimeError("Set HF_TOKEN before running this script.")
print("pyannote.audio:", version("pyannote.audio"))
print("torch:", torch.__version__)
print("torch cuda build:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())
print("torchaudio:", version("torchaudio"))
print("torchcodec:", version("torchcodec"))
pipeline = Pipeline.from_pretrained(
MODEL_ID,
token=token,
)
if torch.cuda.is_available():
pipeline.to(torch.device("cuda"))
with ProgressHook() as hook:
output = pipeline(AUDIO_PATH, hook=hook)
print("\nRegular diarization:")
for turn, speaker in output.speaker_diarization:
print(f"{turn.start:.3f}\t{turn.end:.3f}\t{speaker}")
print("\nExclusive diarization:")
if hasattr(output, "exclusive_speaker_diarization"):
for turn, speaker in output.exclusive_speaker_diarization:
print(f"{turn.start:.3f}\t{turn.end:.3f}\t{speaker}")
else:
print("exclusive_speaker_diarization is not available on this output.")
Run it like:
export HF_TOKEN="<HUGGINGFACE_ACCESS_TOKEN>"
uv run --refresh --script check_pyannote4_community1.py
In normal prose, write the token placeholder as \<HUGGINGFACE_ACCESS_TOKEN\>.
Before running it, make sure:
- you accepted the Community-1 user conditions;
- your token can access the model;
- FFmpeg is installed;
- the test file
audio.wav exists.
Relevant setup docs:
Stage 2 — choose a coherent Torch/TorchCodec version family
The current pyannote project metadata says the modern branch requires:
Python >=3.10
torch >=2.8.0
torchaudio >=2.8.0
torchcodec >=0.7.0
Source:
But “greater than or equal” does not mean every arbitrary combination is equally good.
TorchCodec publishes a compatibility table. Current table highlights include:
torchcodec 0.7 ↔ torch 2.8
torchcodec 0.8 ↔ torch 2.9
torchcodec 0.9 ↔ torch 2.9
torchcodec 0.10 ↔ torch 2.10
torchcodec 0.11 ↔ torch 2.11
Source:
So do not mix randomly.
Conservative modern family
This is the least aggressive Community-1 migration target:
pyannote.audio>=4,<5
torch==2.8.0
torchaudio==2.8.0
torchcodec==0.7.*
Pros:
- close to the minimum modern pyannote requirements;
- avoids jumping all the way to newer Torch/TorchAudio generations;
- TorchCodec
0.7 matches Torch 2.8;
- likely easier if the rest of your audio stack was stabilized around Torch 2.8.
Cons:
- still close to the old TorchAudio transition boundary;
- may not represent the newest pyannote-tested stack.
Newer Torch family
A newer family might look like:
pyannote.audio>=4,<5
torch==2.9.*
torchaudio==2.9.*
torchcodec==0.9.*
or:
pyannote.audio>=4,<5
torch==2.10.*
torchaudio==2.10.*
torchcodec==0.10.*
Pros:
- more aligned with the post-TorchAudio-2.9 world;
- better long-term direction if your other dependencies support it.
Cons:
- may expose TorchCodec/FFmpeg issues;
- may conflict with faster-whisper/CTranslate2 expectations;
- may require more careful PyTorch CUDA wheel/index selection.
Practical advice
For a migration branch, start with the conservative modern family:
"pyannote.audio>=4,<5",
"torch==2.8.0",
"torchaudio==2.8.0",
"torchcodec==0.7.*",
Then, after Community-1 works, decide whether to move Torch upward.
Do not solve every modernization problem at once.
Stage 3 — remove or isolate brouhaha
Because brouhaha pins pyannote 3.3, your Community-1 test script should not include Brouhaha.
For Path B, the dependency block should start without it:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
# "pyannote.audio>=4,<5",
# "torch==2.8.0",
# "torchaudio==2.8.0",
# "torchcodec==0.7.*",
# ]
# ///
Only after Community-1 works should you decide what to do with Brouhaha.
If you remove Brouhaha
Delete:
"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",
and remove code like:
import brouhaha
or any function calls into Brouhaha.
Then rely on pyannote diarization directly, or use another VAD/preprocessing layer.
If you fork Brouhaha
Edit its dependency metadata.
For example, if its pyproject.toml contains:
dependencies = [
"pyannote-audio==3.3.0",
]
you could test:
dependencies = [
"pyannote-audio>=4,<5",
]
or, if Brouhaha does not actually need pyannote at runtime after your refactor:
dependencies = []
But do this only in a branch or copy.
Then run its own tests, or at least import it:
uv run --refresh --script check_brouhaha_import.py
where:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
# "brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",
# "pyannote.audio>=4,<5",
# ]
# ///
import brouhaha
from importlib.metadata import version
print("brouhaha import OK")
print("pyannote.audio:", version("pyannote.audio"))
If this fails, Brouhaha is not pyannote-4-compatible yet.
If you split environments
Use two scripts.
First script:
vad_preprocess.py
uses Brouhaha and pyannote 3.3 if needed.
Second script:
diarize_community1.py
uses pyannote 4 and Community-1.
The handoff should be a file, JSON, RTTM, or plain timestamp list. This is clunkier, but it avoids forcing incompatible libraries into one dependency graph.
Stage 4 — update the pyannote call
Old Path A code:
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token=tokens["diarization"],
)
diarization = pipeline(audio_path)
for turn, _, speaker in diarization.itertracks(yield_label=True):
...
New Path B code:
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-community-1",
token=tokens["diarization"],
)
output = pipeline(audio_path)
for turn, speaker in output.speaker_diarization:
...
And, for transcript alignment, prefer testing:
for turn, speaker in output.exclusive_speaker_diarization:
...
The current Community-1 model card says exclusive_speaker_diarization is provided on top of regular diarization and is meant to simplify reconciliation with transcription timestamps.
Source:
Stage 5 — rewrite speaker/transcript alignment around exclusive diarization
This is the most important practical benefit for your script.
Your final goal is not just diarization. Your goal is:
audio file
↓
transcript segments or words
↓
speaker labels
↓
speaker-attributed transcript
Old diarization can produce fine-grained, overlapping, or awkward speaker turns. That can be hard to align to Whisper/faster-whisper transcript segments.
Community-1 adds:
output.exclusive_speaker_diarization
Use that first for transcript alignment.
Basic maximum-overlap assignment
Use this when your ASR gives segment-level timestamps.
def overlap_seconds(a_start, a_end, b_start, b_end):
return max(0.0, min(a_end, b_end) - max(a_start, b_start))
def assign_speaker_to_segment(segment_start, segment_end, diarization_turns):
best_speaker = None
best_overlap = 0.0
for turn_start, turn_end, speaker in diarization_turns:
overlap = overlap_seconds(segment_start, segment_end, turn_start, turn_end)
if overlap > best_overlap:
best_overlap = overlap
best_speaker = speaker
return best_speaker or "UNKNOWN"
def diarization_to_turns(exclusive_speaker_diarization):
turns = []
for turn, speaker in exclusive_speaker_diarization:
turns.append((float(turn.start), float(turn.end), str(speaker)))
return turns
Then:
turns = diarization_to_turns(output.exclusive_speaker_diarization)
for segment in whisper_segments:
speaker = assign_speaker_to_segment(segment.start, segment.end, turns)
print(f"[{segment.start:.2f}-{segment.end:.2f}] {speaker}: {segment.text}")
Word-level assignment
If faster-whisper returns word timestamps, word-level assignment is usually better.
Conceptually:
for each word:
find the speaker turn with max overlap
assign that speaker to the word
then merge adjacent words with the same speaker
This handles speaker changes inside a long ASR segment better than assigning one speaker to the whole segment.
Stage 6 — verify FFmpeg and TorchCodec
Community-1 uses TorchCodec-backed decoding. The pyannote README explicitly says FFmpeg must be installed because TorchCodec handles audio decoding.
Check FFmpeg:
ffmpeg -version
Check TorchCodec import:
import torchcodec
print("torchcodec import OK")
Check versions:
from importlib.metadata import version
import torch
print("torch:", torch.__version__)
print("torchcodec:", version("torchcodec"))
TorchCodec supports FFmpeg major versions in [4, 8], and on Windows it needs FFmpeg builds with separate shared libraries. The TorchCodec README also provides the TorchCodec/Torch/Python compatibility table.
Source:
If TorchCodec fails
Common error shapes:
RuntimeError: Could not load libtorchcodec
FFmpeg is not properly installed
No compatible FFmpeg found
Likely causes:
- FFmpeg missing;
- FFmpeg installed but not visible on
PATH;
- Windows FFmpeg build is not a shared build;
- TorchCodec version does not match Torch version;
- Python version is outside the wheel’s supported range;
- unsupported architecture, especially Linux ARM64/aarch64.
Check the compatibility table before changing random packages.
Stage 7 — choose uv layout: inline script vs project
You can do Path B with inline script metadata, but a project layout is cleaner once you are juggling:
pyannote.audio
torch
torchaudio
torchcodec
faster-whisper
ctranslate2
ffmpeg
CUDA
tokens
local packages
Inline script version
Good for quick experiments:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
# "pyannote.audio>=4,<5",
# "torch==2.8.0",
# "torchaudio==2.8.0",
# "torchcodec==0.7.*",
# ]
# ///
from pyannote.audio import Pipeline
Lock after success:
uv lock --script check_pyannote4_community1.py
Source:
Project version
Better for the real app.
pyproject.toml:
[project]
name = "diaritranscribe"
version = "0.1.0"
requires-python = ">=3.10,<3.14"
dependencies = [
"pyannote.audio>=4,<5",
"faster-whisper",
"numpy",
"scikit-learn",
"omegaconf",
"torch==2.8.0",
"torchaudio==2.8.0",
"torchcodec==0.7.*",
]
[tool.uv]
required-version = ">=0.5.3"
Then:
uv lock
uv sync
uv run python scripts/diaritranscribe4.py
If you need explicit CUDA PyTorch indexes, use uv’s PyTorch guide:
PyTorch packaging is unusual because CPU and CUDA builds may live on different indexes and use local version specifiers such as +cpu or +cu130.
Stage 8 — update token handling
Use environment variables rather than hardcoding tokens.
export HF_TOKEN="<HUGGINGFACE_ACCESS_TOKEN>"
Python:
import os
token = os.environ.get("HF_TOKEN")
if not token:
raise RuntimeError("Set HF_TOKEN.")
Then:
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-community-1",
token=token,
)
In normal prose, write the placeholder as \<HUGGINGFACE_ACCESS_TOKEN\>.
Make sure the token’s Hugging Face account has accepted the model conditions:
Missing access usually gives errors like:
401 Unauthorized
403 Forbidden
Repository not found
gated repo
Those are different from the old unexpected keyword argument 'token' error.
Stage 9 — account for telemetry
Current pyannote docs mention optional telemetry. The README says it tracks privacy-preserving information such as pipeline origin, pipeline class, file duration, and speaker-count parameters, and documents ways to control it.
Disable for the current process if desired:
export PYANNOTE_METRICS_ENABLED=0
Or in Python:
from pyannote.audio.telemetry import set_telemetry_metrics
set_telemetry_metrics(False)
Source:
Stage 10 — test accuracy and runtime before deleting Path A
Do not delete the working pyannote 3.3 path until you compare:
- same audio file;
- same hardware;
- same preprocessing;
- same transcript segments;
- same speaker-label assignment policy;
- same output format.
Compare:
speaker count
number of turns
total diarization time
overlap behavior
transcript speaker-label quality
GPU memory use
runtime
failure rate on long files
A migration is successful only if the final speaker-attributed transcript improves or remains acceptable.
Suggested branch layout
Keep two scripts for a while:
diaritranscribe3.py # recovery path, pyannote 3.3
diaritranscribe4.py # migration path, pyannote 4 / Community-1
Keep two lockfiles if using inline scripts:
diaritranscribe3.py.lock
diaritranscribe4.py.lock
This prevents accidentally breaking the known-good path while testing the new one.
Minimal diaritranscribe4.py starting point
This is a clean starting point for just the diarization part.
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
# "pyannote.audio>=4,<5",
# "torch==2.8.0",
# "torchaudio==2.8.0",
# "torchcodec==0.7.*",
# ]
# ///
import argparse
import os
from importlib.metadata import version
import torch
from pyannote.audio import Pipeline
from pyannote.audio.pipelines.utils.hook import ProgressHook
MODEL_ID = "pyannote/speaker-diarization-community-1"
def print_versions():
print("pyannote.audio:", version("pyannote.audio"))
print("torch:", torch.__version__)
print("torch cuda build:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())
print("torchaudio:", version("torchaudio"))
print("torchcodec:", version("torchcodec"))
def load_pipeline(token: str):
pipeline = Pipeline.from_pretrained(
MODEL_ID,
token=token,
)
if torch.cuda.is_available():
pipeline.to(torch.device("cuda"))
return pipeline
def run_diarization(audio_path: str):
token = os.environ.get("HF_TOKEN")
if not token:
raise RuntimeError("Set HF_TOKEN before running this script.")
print_versions()
print(f"Loading {MODEL_ID}...")
pipeline = load_pipeline(token)
with ProgressHook() as hook:
output = pipeline(audio_path, hook=hook)
return output
def print_diarization(output):
print("\nRegular speaker diarization:")
for turn, speaker in output.speaker_diarization:
print(f"{turn.start:.3f}\t{turn.end:.3f}\t{speaker}")
print("\nExclusive speaker diarization:")
if hasattr(output, "exclusive_speaker_diarization"):
for turn, speaker in output.exclusive_speaker_diarization:
print(f"{turn.start:.3f}\t{turn.end:.3f}\t{speaker}")
else:
print("Not available.")
def main():
parser = argparse.ArgumentParser()
parser.add_argument("audio_path")
args = parser.parse_args()
output = run_diarization(args.audio_path)
print_diarization(output)
if __name__ == "__main__":
main()
Run:
export HF_TOKEN="<HUGGINGFACE_ACCESS_TOKEN>"
uv run --refresh --script diaritranscribe4.py audio.wav
Lock after it works:
uv lock --script diaritranscribe4.py
Adding faster-whisper back later
After Community-1 works by itself, add faster-whisper back.
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
# "pyannote.audio>=4,<5",
# "torch==2.8.0",
# "torchaudio==2.8.0",
# "torchcodec==0.7.*",
# "faster-whisper",
# "numpy",
# "scikit-learn",
# "omegaconf",
# ]
# ///
Then test faster-whisper separately before combining:
from faster_whisper import WhisperModel
model = WhisperModel("small", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", beam_size=5)
for segment in segments:
print(segment.start, segment.end, segment.text)
If faster-whisper fails with CUDA/cuDNN/CTranslate2 errors, that is separate from pyannote.
Source:
Common Path B failure modes
Failure: No solution found
Usually means you still have a dependency pin like:
brouhaha -> pyannote-audio==3.3.0
Fix:
- remove Brouhaha from the pyannote 4 environment;
- fork/update Brouhaha;
- split environments.
Failure: unexpected keyword argument 'token'
This means you are still on old pyannote.
Check:
from importlib.metadata import version
print(version("pyannote.audio"))
If it prints 3.3.0, you are not on Path B yet.
Failure: unexpected keyword argument 'use_auth_token'
This means you are probably on newer pyannote but still using old code.
Use:
token="<HUGGINGFACE_ACCESS_TOKEN>"
not:
use_auth_token="<HUGGINGFACE_ACCESS_TOKEN>"
Failure: Could not load libtorchcodec
Check:
- TorchCodec/Torch version compatibility;
- FFmpeg installation;
- Python version;
- platform wheel availability.
Source:
Failure: model access denied
Check that you accepted the model conditions and used a valid token:
Failure: CUDA not available
Check PyTorch install:
import torch
print(torch.__version__)
print(torch.version.cuda)
print(torch.cuda.is_available())
Use uv’s PyTorch guide for accelerator-specific builds:
Recommended Path B checklist
- Create
diaritranscribe4.py.
- Remove
brouhaha from that script.
- Use
pyannote.audio>=4,<5.
- Start with a coherent Torch/TorchAudio/TorchCodec family.
- Install FFmpeg.
- Accept Community-1 model conditions.
- Set
HF_TOKEN.
- Load with
token=....
- Use
output.speaker_diarization.
- Prefer
output.exclusive_speaker_diarization for transcript alignment.
- Test pyannote alone.
- Add faster-whisper back only after pyannote works.
- Rebuild speaker assignment around maximum overlap or word-level timestamps.
- Lock the migrated script.
- Keep the pyannote 3.3 script until the new output is verified.
Bottom line
Path B is not:
change speaker-diarization-3.1 to speaker-diarization-community-1
and it is not:
change use_auth_token= to token=
It is:
remove or isolate the Brouhaha pyannote 3.3 pin
↓
move to pyannote.audio 4.x
↓
use Community-1
↓
install/verify TorchCodec and FFmpeg
↓
change the output parsing code
↓
use exclusive diarization for transcript alignment
↓
lock the new environment
For your project, the safest approach is to keep:
diaritranscribe3.py
as the recovery script and create:
diaritranscribe4.py
as the Community-1 migration script.
Do not merge them until Community-1 works alone, faster-whisper works alone, and the speaker-attributed transcript is at least as good as your pyannote 3.3 path.