Model Intelligence — 2026-06-01

AI Model Intelligence — 2026-06-01

🤖 Model Landscape

Qwen3.6 family — steady growth continues (HuggingFace):

Model Likes Δ VRAM (Q4) Notes
Qwen/Qwen3.6-35B-A3B 1,971 +11 ~8GB MoE: only 3B active params/token
Qwen/Qwen3.6-27B 1,568 +16 ~17GB Dense model, 32K context
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled 2,864 ~17GB Community reasoning-distilled variant

Gemma 4 family (Google on HF):

Trending highlights (HF trending):

⚙️ Inference Engine Updates

llama.cpp — 5 more builds in 24 hours (releases):

Build Time (UTC)
b9466 Jun 2, 02:42
b9464 Jun 1, 19:57
b9460 Jun 1, 19:23
b9459 Jun 1, 18:58
b9458 Jun 1, 18:31

Stable since last scan (no new releases since May 31):

📊 Worth Noting

🖥️ Hardware Sweet Spots

GPU Best Model Notes
RTX 3060 12GB Qwen3.6-35B-A3B MoE efficiency, ~8GB VRAM
RTX 3090/4090 24GB Gemma 4-31B-it or Qwen3.6-27B Both fit Q4, pick based on use case
Dual 24GB DeepSeek-R1, gpt-oss-120b Multi-GPU required

Data sourced from HuggingFace API, llama.cpp GitHub, vLLM GitHub, SGLang GitHub, Ollama GitHub

model-releasesinferencellama.cppqwengemmasentence-transformers