Model Intelligence — 2026-06-14
🔥 Top Stories
1. Rio de Janeiro's "Homegrown" LLM Exposed as Model Merge
Hacker News is dissecting Rio de Janeiro's city government model Rio3.5, which was claimed to "beat Qwen3.7 in recent benchmarks" (123 pts on HN). The reality check: it appears to be a merge of an existing model, not a trained-from-scratch system. Scored 210 points on HN — the community's response is appropriately skeptical.
Signal: Government "AI sovereignty" announcements often mask simple model adaptations. The benchmark claims are worth examining but the merge origin significantly reduces the technical novelty. If you're tracking sovereign AI efforts, this is a cautionary tale about marketing vs. substance.
2. llama.cpp Blazes Through 3 Releases Today — Cohere2MoE Support Arrives
llama.cpp shipped 3 builds today alone (b9635, b9636, b9637), reaching b9637. The highlight is Cohere2MoE (North Code) parser support — adding a dedicated chat template parser for the Cohere MoE architecture. Combined with jinja count/d/e filter aliases and a fix for preserved tokens not copying correctly, this release cycle is addressing real template and tokenization edge cases.
Signal: Cohere2MoE support in llama.cpp means the Cohere MoE models can now be run locally with proper chat formatting. If you've been curious about Cohere's MoE models on consumer hardware, the plumbing is now there.
3. DeepSeek-V4-Pro Hits 3.07M Downloads — Now #14 Trending
deepseek-ai/DeepSeek-V4-Pro is now at 4,828 likes and 3,075,369 downloads, cementing its place at #14 in the HF trending leaderboard. With vLLM v0.23.0's TRTLLM kernel and SGLang's full inference path, this model is getting serious production treatment from the inference community.
Signal: V4-Pro is the practical DeepSeek choice for local inference. The download count tells the real story — people are running it, not just liking it. At this download velocity, it could crack the top 10 within weeks.
📊 Model Trends
HuggingFace Top 15
| Rank | Model | Likes | Downloads | Change |
|---|---|---|---|---|
| 1 | DeepSeek-R1 | 13,390 | 4.4M | +1 |
| 2 | FLUX.1-dev | 13,199 | 586K | +11 |
| 3 | SDXL 1.0 | 7,815 | 1.0M | — |
| 4 | SD 1.4 | 7,020 | 298K | — |
| 5 | Llama-3-8B | 6,577 | 905K | — |
| 6 | Kokoro-82M | 6,323 | 11.1M | — |
| 7 | Llama-3.1-8B-Instruct | 6,075 | 6.6M | — |
| 8 | Whisper-large-v3 | 5,817 | 4.0M | — |
| 9 | FLUX.1-schnell | 5,123 | 227K | — |
| 10 | bloom | 5,011 | 3.3K | — |
| 13 | gpt-oss-120b | 4,883 | 2.8M | +3 |
| 14 | DeepSeek-V4-Pro | 4,828 | 3.07M | +15 ⬆️ |
| 15 | Z-Image-Turbo | 4,803 | 600K | +3 |
Notable: The leaderboard is remarkably stable — same 15 models. DeepSeek-V4-Pro and gpt-oss-120b are the steady climbers. Kokoro-82M at 11.1M downloads continues to be the silent giant of TTS models.
Qwen Family
| Model | Likes | Downloads | Notes |
|---|---|---|---|
| QwQ-32B | 2,930 | 38K | Top Qwen model |
| Qwen3.5-27B-Claude-4.6 (distilled) | 2,879 | 87K | Community distillation |
| Qwen-Image | 2,511 | 124K | Text-to-image |
| Qwen-Image-Edit | 2,424 | 41K | Image editing |
| Qwen3.6-35B-A3B | 2,101 | 3.37M | ⬆️ MoE, 3B active params |
Qwen3.6-35B-A3B continues to be the practical king. 3.37M downloads and counting. The MoE architecture (3B active params out of 35B total) makes this a real option for consumer hardware.
Gemma Family
| Model | Likes | Downloads | Notes |
|---|---|---|---|
| gemma-7b | 3,356 | 22K | Classic |
| gemma-4-31B-it | 2,979 | 7.3M | Latest flagship |
| gemma-3-27b-it | 1,978 | 1.0M | |
| gemma-3n-E4B-it-litert-preview | 1,484 | 0 | Edge-ready |
| gemma-3-4b-it | 1,367 | 1.1M |
Gemma 4 31B-it at 7.3M downloads — Google's latest flagship is finding massive adoption. With Ollama's Gemma 4 QAT weights support from v0.30.6, deployment is straightforward.
⚙️ Engine Updates
llama.cpp — b9637 (June 14) ⬆️ +9 builds since yesterday
- Cohere2MoE (North Code) dedicated chat parser — proper formatting for Cohere MoE models
- Jinja filter aliases:
count,d,e— template flexibility improvements - Fixed preserved tokens not copying in CLI (#24258) — real bug fix for token handling
- 9 builds in one day (b9628 → b9637) — aggressive release cadence continues
- Releases
Ollama — v0.30.8 (June 12) — no change
- Prompt caching decoupled from context shift
- MLX inference hardened
- Releases
vLLM — v0.23.0 (June 12) — no change
- DeepSeek-V4 hardening with TRTLLM kernel, EPLB, XPU support
- Model Runner V2 default for Llama + Mistral
- Releases
SGLang — v0.5.13 (June 13) — no change
- Nemotron 3 Ultra day-0 support
- 7 diffusion models added
- Spec V2 tree drafting now default
- Releases
📰 AI News (HN)
- [210 pts] Rio de Janeiro's "homegrown" LLM appears to be a merge — GitHub Issue — Community tears apart the "sovereign AI" narrative
- [123 pts] Rio3.5 beats Qwen3.7 in recent benchmarks — Twitter — Benchmark claims under scrutiny
🔄 What Changed Since Yesterday
| Area | Yesterday (Jun 13) | Today (Jun 14) | Change |
|---|---|---|---|
| llama.cpp | b9628 | b9637 | +9 builds 🔥 |
| Ollama | v0.30.8 | v0.30.8 | — |
| vLLM | v0.23.0 | v0.23.0 | — |
| SGLang | v0.5.13 | v0.5.13 | — |
| DeepSeek-V4-Pro | 4,813 likes | 4,828 likes | +15 ⬆️ |
| DeepSeek-V4-Pro | 3.03M downloads | 3.07M downloads | +40K ⬆️ |
| FLUX.1-dev | 13,188 likes | 13,199 likes | +11 |
| gpt-oss-120b | 4,880 likes | 4,883 likes | +3 |
| Z-Image-Turbo | 4,800 likes | 4,803 likes | +3 |
| Qwen3.6-35B-A3B | 2,098 likes | 2,101 likes | +3 |
| gemma-4-31B-it | 2,975 likes | 2,979 likes | +4 |
Local Inference Recommendations
RTX 3060 (12GB):
- Qwen3.6-35B-A3B at Q4_K_M — MoE with 3B active params, still the value king
- Gemma-4-31B-it at Q4 — 7.3M downloads validate the quality
- Gemma 3n-E4B — purpose-built for edge, runs comfortably
RTX 3090 (24GB):
- Qwen3.6-35B-A3B at Q6/Q8 — full quality with headroom
- DeepSeek-V4-Pro quantized — vLLM TRTLLM kernel makes this practical now
- Gemma-4-31B-it at Q6 — high quality with room for context
Key takeaway: llama.cpp's Cohere2MoE parser is the day's practical addition. If you haven't updated your inference engines since yesterday, the 9 new llama.cpp builds are worth grabbing — especially if you're experimenting with Cohere models.
Scan completed: 2026-06-14 | Sources: HuggingFace API, llama.cpp GitHub, Ollama GitHub, vLLM GitHub, SGLang GitHub