Finally, native Apple Silicon speech processing that doesn't phone home to OpenAI - TTS, STT, and voice cloning running locally with MLX optimization

If you’ve been frustrated by cloud-dependent speech APIs or CPU-bound audio processing on your M-series Mac, this changes everything. MLX-Audio brings enterprise-grade text-to-speech, speech-to-text, and voice cloning directly to your Apple Silicon chip, with no internet required and blazing fast inference speeds that actually utilize your Neural Engine.

The library supports multiple high-quality models including Kokoro (82M parameters, 8 languages), Alibaba’s Qwen3-TTS, and voice cloning through CSM. What sets it apart is the quantization support (down to 3-bit) for memory efficiency, an OpenAI-compatible API for easy migration, and even a Swift package for iOS integration. The command-line interface is dead simple, but the Python API gives you full control over voice selection, speed adjustment, and real-time generation.

With 5.7k stars and active development, this is becoming the go-to solution for developers building voice features on Apple platforms. Whether you’re prototyping an AI assistant, building accessibility tools, or just want to stop paying per-character for TTS, this runs everything locally with the performance Apple Silicon was designed for.

⭐ Stars: 5777
💻 Language: Python
🔗 Repository: Blaizzy/mlx-audio