Local voice AI with a 235 billion parameter LLM

July 27, 2025

Local voice AI with a 235 billion parameter LLM. ✅

- smart-turn v2
- MLX Whisper (large-v3-turbo-q4)
- Qwen3-235B-A22B-Instruct-2507-3bit-DWQ
- Kokoro

All models running local on an M4 mac. Max RAM usage ~110GB.

Voice-to-voice latency is ~950ms. There are a couple of relatively easy ways to carve another ~100ms off that number. But it's not a bad start!

Code is here:

This was made possible by the quant @ivanfioravanti posted today and his advice about changing the memory limits for this big model.

And @Prince_Canuma's work on mlx-audio made implementing…