Speech Studio

Open-source Mac app for local voice cloning and multi-speaker dialog generation. Drop a voice sample, clone it, write a scene, synthesize — all on your laptop. No API keys, no cloud, no per-character pricing.

A 30-second blind test: a real voice, the same voice cloned locally by Speech Studio, and the same voice cloned by ElevenLabs in the cloud. Can you tell which is which?

What it does

Requirements

Install

Download the build for your platform from GitHub Releases — macOS .dmg, Windows .msi/.exe, or Linux .deb/.AppImage — then launch it:

The builds are unsigned: on macOS open via right-click → Open (or System Settings → Privacy & Security → Open anyway); on Windows choose More info → Run anyway in SmartScreen. First launch downloads the VoxCPM2 speech model (~2.75 GB on macOS, ~4.6 GB on Windows/Linux) and caches it; later launches reuse the cache.

Prefer the CLI?

The same voice cloning pipeline ships in the speech CLI: brew install speech, then speech speak --engine voxcpm2 --voxcpm2-ref-audio reference.wav -o cloned.wav "Hello, this is my cloned voice." — useful for scripting or pre-rendering batches. See the voice cloning guide for the full flow.

Status

Speech Studio is in active preview (v0.0.4), with installers for macOS, Windows, and Linux — macOS clones via MLX, Windows and Linux via speech-core's LiteRT VoxCPM2 engine. The source repo at github.com/soniqo/speech-studio tracks the GUI app; star/watch it for release notifications.

What it's built on

Speech Studio is a thin GUI on top of speech-swift, the open-source Swift library that ships every model used in the demo:

Roadmap

Feedback

Open an issue at github.com/soniqo/speech-studio/issues — every one gets read.