Hoisting this up to a top-level thread because I'd like advicea bout Qwen3.5…

March 12, 2026

Hoisting this up to a top-level thread because I'd like advicea bout Qwen3.5 27B ...

I'm still figuring 27b. I *want* to talk more about it, because it's clealy a good model in a bunch of ways. But it falls into a middle category that's not super useful for me. Maybe skill issue on my part. But:

1. So far I don't have a vLLM/SGLang configuration with a TTFT low enough for the conversational loop part of voice AI. With thinking disabled it's not good enough at tool calling. With thinking enabled, TTFT to first non-thinking token is >1,000ms.

2. It does not do well on the sub-agent tasks I'm most interested in, which are long, multi-turn, and include structured data inputs.

Sunny@sunnypause

@kwindla U should talk more about 27b thinking..

Benchmark that 27b (thinking) is surprisingly good at, except for TTFT^[1]

Benchmark that 27b (thinking) is surprisingly bad at:
^[2]

https://github.com/kwindla/aiewf-eval ↩
https://github.com/pipecat-ai/gb-benchmarks ↩