March 12, 2026
Hoisting this up to a top-level thread because I'd like advicea bout Qwen3.5 27B ...
I'm still figuring 27b. I *want* to talk more about it, because it's clealy a good model in a bunch of ways. But it falls into a middle category that's not super useful for me. Maybe skill issue on my part. But:
1. So far I don't have a vLLM/SGLang configuration with a TTFT low enough for the conversational loop part of voice AI. With thinking disabled it's not good enough at tool calling. With thinking enabled, TTFT to first non-thinking token is >1,000ms.
2. It does not do well on the sub-agent tasks I'm most interested in, which are long, multi-turn, and include structured data inputs.
@kwindla U should talk more about 27b thinking..
Benchmark that 27b (thinking) is surprisingly good at, except for TTFT[1]
Benchmark that 27b (thinking) is surprisingly bad at:
[2]