Model Intelligence — 2026-06-06
AI Model Intelligence — 2026-06-06
🤖 Model Trends
No brand-new model families this cycle, but existing models show strong organic growth:
- FLUX.1-dev (+61 likes, now 13,074) — closing fast on #1 DeepSeek-R1, only 300 likes behind
- FLUX.1-schnell (+46) — strong momentum on the fast variant
- Qwen3.6-27B (+47) and Qwen3.6-35B-A3B (+42) — the Qwen3.6 family gaining real traction
- Gemma-4-31B-it (+54) — approaching 3K likes, Google's latest instruct model
- Kokoro-82M TTS (+21) — text-to-speech gaining community interest
Qwen3.6-35B-A3B MoE (35B params with only 3B active) continues to be the standout for local GPU deployment — excellent performance/VRAM ratio.
🆕 Google QAT (Quantization-Aware Training) Models
Google released pre-quantized Gemma 3 models trained with QAT (Quantization-Aware Training) — these maintain much higher quality at 4-bit quantization compared to post-training quantization:
- gemma-3-27b-it-qat-q4_0 — 399 likes, 27B at 4-bit (~7GB VRAM)
- gemma-3-12b-it-qat-q4_0 — 277 likes, 12B at 4-bit (~4GB VRAM)
- gemma-3-4b-it-qat-q4_0 — 263 likes, 4B at 4-bit (~1.5GB VRAM)
- gemma-3-1b-it-qat-q4_0 — 129 likes, 1B at 4-bit (~500MB VRAM)
Why this matters: QAT models are trained to understand quantization during fine-tuning, so they don't lose quality when compressed. The 27B model runs at 4-bit with ~7GB VRAM — that's RTX 3060 territory. This changes the game for local deployment.
⚙️ Inference Engine Updates
This cycle's biggest story: inference engines are shipping at extraordinary pace.
- llama.cpp: 53 new builds in 3 days (b9492→b9544) — ~18 builds/day, suggesting a major feature cycle (possibly new model architecture support or quantization improvements)
- Ollama: 4 new releases (v0.30.3→v0.30.6), rapid v0.30.x iteration
- vLLM: v0.22.1 patch released June 5
🤔 Worth Watching
- Qwen3.6-35B-A3B MoE — 35B params with only 3B active, excellent for local GPU
- Gemma-3n-E4B-it — new ultra-efficient 4B, potential king of <6GB VRAM deployment
- llama.cpp velocity — 53 builds suggests major architecture support or quantization work
- FLUX.1-dev approaching #1 — Only 300 likes behind DeepSeek-R1, image generation may overtake reasoning on HF trending
Sources: