23 voices across 6 engines · Rate 1-10, review aggregates, pipeline tracks moves · M4 Pro 64GB
Click 1-10 under each card to rate. Ratings persist locally (localStorage). Sort/filter at top, but listening order stays shuffled until you opt in.
Aggregated from your ratings. Updates live.
Per-engine: what it is, how it works, license, Apple Silicon viability, the invocation that produced its sample.
fishaudio/s2-pro, 906 downloads, last updated Mar 2026). Zero-shot clone works.nanovllm-voxcpm (CUDA fork). Clone mode produces "fluent-foreigner accent" (HF discussion #14).ja-JP-Chirp3-HD-{Name}. Voices tested: Charon, Kore, Aoede, Zephyr, Achernar.speaker_kv_scale) — first public test of the post-PR-#18 configuration nextHF discussion #14 + VoxCPM issue #222 maintainer-confirmed:
--mode default --instruct "...") not clone--cfg-value 1.5)speaker_kv_scale for ref-adherence vs naturalness<laugh>, <breath>, <sigh>, +7lang="na"[whisper], [excited], [angry], etc.Default: Google Chirp 3 HD Charon / Kore. You own output, $1.50 for whole catalog at scale.
Voice variety: VoxCPM2 Voice Design with character-instructs for non-narrator lines.
Avoid: AivisSpeech / SBV2 (transitive AGPL).
Default: top-rated MIT/Apache engine from your blind A/B.
Personal viewing = AGPL is acceptable if needed, but Irodori MIT / Supertonic MIT are cleaner defaults.
Default: Google Chirp 3 HD Charon — calm, polished, no fiddling. Personal use, no license worry either way.
If offline matters: top-rated local engine from A/B.
Air-gapped Silo machine: no cloud. Local-only mandate.
Bet on top-rated local Apache/MIT engine. Chunk per-character with distinct captions/instructs.
Things that aren't changing soon.
ml-explore/mlx team has zero in-flight TTS work in 2026.