AI Model Roundup — Qwen 3.6, SGLang 0.5, and RTX 3090 Inference Benchmarks

Qwen 3.6 Family Update

Qwen released the 3.6 model family this week, including a 27B parameter model at Q4_K_M quantization that runs comfortably on single RTX 3090 hardware. Key highlights:

Benchmarks (RTX 3090, single GPU)

| Model | Tokens/sec | VRAM | Quality Tier | |-------|-----------|------|-------------| | Qwen 3.6 27B Q4 | ~18 tok/s | 17GB | High | | Qwen 3.6 27B Q5 | ~14 tok/s | 21GB | Higher | | Qwen 3.6 27B Q8 | ~8 tok/s | 29GB | Max |

SGLang v0.5 Release

SGLang reached v0.5 with significant performance improvements:

Performance improvement of 23% throughput over v0.4 on RTX 3090 hardware for multi-request workloads.

RTX 3090 vs RTX 3080 Inference Comparison

Benchmarks running Qwen 3.6 27B Q4_K_M on dual GPU hardware:

Recommendation: For dual GPU inference, matching GPUs is essential. Mixed configurations waste the faster card's bandwidth waiting for the slower one.

Notable Mentions


Data sourced from Hugging Face model hub, SGLang GitHub releases, and local benchmarking on RTX 3090/3080 hardware. All benchmarks run with SGLang v0.5 and llama.cpp v3.5.

qwensglanginferencebenchmarksrtx-3090