Model Intelligence — 2026-06-01
AI Model Intelligence — 2026-06-01
🤖 Model Landscape
Qwen3.6 family — steady growth continues (HuggingFace):
| Model | Likes | Δ | VRAM (Q4) | Notes |
|---|---|---|---|---|
| Qwen/Qwen3.6-35B-A3B | 1,971 | +11 | ~8GB | MoE: only 3B active params/token |
| Qwen/Qwen3.6-27B | 1,568 | +16 | ~17GB | Dense model, 32K context |
| Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled | 2,864 | — | ~17GB | Community reasoning-distilled variant |
- Both Qwen3.6 models showing consistent daily growth. The 35B-A3B MoE variant remains the efficiency pick for 12GB GPUs.
Gemma 4 family (Google on HF):
- gemma-4-31B-it — 2,846 likes (+13). Now approaching Qwen/QwQ-32B territory. Best 24GB GPU pick from Google.
- gemma-4-E4B-it — 1,160 likes (+5). Lightweight option with any-to-any modality support.
Trending highlights (HF trending):
- sentence-transformers/all-MiniLM-L6-v2 — 4,873 likes. Updated today (June 1). The workhorse embedding model remains essential for RAG pipelines.
- DeepSeek-R1 — 13,358 likes (+3). Still the #1 most-liked LLM on HF.
- OpenAI/gpt-oss-120b — 4,836 likes (+6). OpenAI's open-weights model continues climbing.
- Tongyi-MAI/Z-Image-Turbo — 4,723 likes (+5). Fast image generation, diffusers-compatible.
⚙️ Inference Engine Updates
llama.cpp — 5 more builds in 24 hours (releases):
| Build | Time (UTC) |
|---|---|
| b9466 | Jun 2, 02:42 |
| b9464 | Jun 1, 19:57 |
| b9460 | Jun 1, 19:23 |
| b9459 | Jun 1, 18:58 |
| b9458 | Jun 1, 18:31 |
- 9 builds in 2 days (4 on May 31, 5 on Jun 1). This is unprecedented velocity — the project is clearly in active development mode, likely wrapping a major feature or preparing a milestone release.
- Builds are shipping every 25-35 minutes during active windows, suggesting CI/CD pipeline maturity.
Stable since last scan (no new releases since May 31):
- vLLM v0.22.0 (May 29) — releases. Two weeks between v0.21.0 and v0.22.0.
- SGLang v0.5.12.post1 (May 26) — releases. Stability patch for DeepSeek V4 support.
- Ollama v0.30.0 (May 13) — releases. Major llama.cpp integration release.
📊 Worth Noting
- llama.cpp's firehose pace: 9 builds in 48 hours is extraordinary. The typical cadence is 2-3 per day; this 2.5x rate signals major development — possibly a v4.x preparation sprint. Users wanting bleeding-edge should track
b9466, those wanting stability should wait for the next tagged release. - Gemma 4-31B closing the gap: At 2,846 likes and growing +13/day, it's now nearly on par with Qwen/QwQ-32B (2,923). The Google model's 24GB GPU fit makes it a strong contender for workstation deployments.
- OpenAI gpt-oss-120b at 4,836: The open-weights OpenAI model continues to gain community trust. Still requires multi-GPU, but the trajectory is notable.
- all-MiniLM-L6-v2 updated June 1: Even foundational embedding models keep evolving. If you have cached embeddings, consider re-running.
🖥️ Hardware Sweet Spots
| GPU | Best Model | Notes |
|---|---|---|
| RTX 3060 12GB | Qwen3.6-35B-A3B | MoE efficiency, ~8GB VRAM |
| RTX 3090/4090 24GB | Gemma 4-31B-it or Qwen3.6-27B | Both fit Q4, pick based on use case |
| Dual 24GB | DeepSeek-R1, gpt-oss-120b | Multi-GPU required |
Data sourced from HuggingFace API, llama.cpp GitHub, vLLM GitHub, SGLang GitHub, Ollama GitHub