Model Intelligence — 2026-06-02

AI Model Intelligence — 2026-06-02

🤖 New Model Releases

No brand new model families released today. The ecosystem is consolidating around recent launches:

Qwen3.6 Series — Growing Adoption

Gemma 4 — Steady Growth

Community Distillates

Trending on HuggingFace (Top 5)

  1. DeepSeek-R1: 13,362 likes (+4)
  2. FLUX.1-dev: 13,004 likes (+14) — Approaching 13K milestone
  3. Meta-Llama-3-8B: 6,556 likes
  4. Kokoro-82M: 6,253 likes — TTS model
  5. Llama-3.1-8B-Instruct: 5,965 likes (+8) — Notable growth

⚙️ Inference Engine Updates

🔴 Ollama v0.30.0 — STABLE RELEASE (May 13, now promoted)

Ollama has exited the RC phase. v0.30.0 is the stable release.

🔴 llama.cpp — 5 New Builds Today (b9467–b9471)

The release cadence is aggressive — 5 builds in 24 hours:

Build Time (UTC)
b9471 2026-06-02 10:20
b9470 2026-06-02 09:35
b9469 2026-06-02 07:16
b9468 2026-06-02 05:53
b9467 2026-06-02 03:30

This pace (~5/day) means active development on a significant feature or fix. Without detailed changelog diffs available, the safest approach is to check the GitHub PR list before updating. b9471 is the current latest.

🟡 SGLang v0.5.12.post1 — No change since May 26

Still the latest. DeepSeek V4 support, TokenSpeed MLA, CUDA 13 compatibility.

🟢 vLLM v0.22.0 — No change since May 29

Latest stable. KV Offload + Hybrid Memory Allocator is the key feature for memory-constrained setups.

📊 Worth Noting

  1. Ollama v0.30.0 is now stable — The architecture rewrite from GGML to llama.cpp is production-ready. This brings Ollama closer to llama.cpp's bleeding-edge performance. If you use Ollama, upgrade.

  2. llama.cpp release velocity is extraordinary — 5 builds in a single day is unusual even for this project. Something significant is being developed or fixed. Watch the PR list.

  3. MoE models are the efficiency winners — Qwen3.6-35B-A3B (3B active) and Gemma-4-26B-A4B (4B active) deliver large-model quality on small-footprint hardware. This is the current sweet spot.

  4. FLUX.1-dev approaching 13K likes — The image generation space remains hot. BFL's model is the de facto standard for local image gen.

  5. No major new model families today — The ecosystem is absorbing recent releases (Qwen3.6, Gemma 4, DeepSeek V4). Expect the next wave in late June or early July.

🖥️ Hardware Sweet Spots

GPU Best Models Today Notes
RTX 3090 (24GB) Qwen3.6-35B-A3B (Q6), Gemma-4-31B-it (Q4), Qwen3.6-27B (Q4) Comfortable with dense 27-31B at Q4
RTX 3080 (10-12GB) Qwen3.6-35B-A3B (Q4), Gemma-4-E4B-it (Q8), Qwen3.6-27B (Q3) MoE models shine here — 3B active fits easily
RTX 4060 Ti (16GB) Qwen3.6-35B-A3B (Q5), Gemma-4-31B-it (Q4) 16GB is a great mid-tier option

Sources: HuggingFace API · llama.cpp Releases · Ollama Releases · SGLang Releases · vLLM Releases

model-releasesinferenceollamallama.cpp