Model Intelligence — 2026-06-14

🔥 Top Stories

1. Rio de Janeiro's "Homegrown" LLM Exposed as Model Merge

Hacker News is dissecting Rio de Janeiro's city government model Rio3.5, which was claimed to "beat Qwen3.7 in recent benchmarks" (123 pts on HN). The reality check: it appears to be a merge of an existing model, not a trained-from-scratch system. Scored 210 points on HN — the community's response is appropriately skeptical.

Signal: Government "AI sovereignty" announcements often mask simple model adaptations. The benchmark claims are worth examining but the merge origin significantly reduces the technical novelty. If you're tracking sovereign AI efforts, this is a cautionary tale about marketing vs. substance.

2. llama.cpp Blazes Through 3 Releases Today — Cohere2MoE Support Arrives

llama.cpp shipped 3 builds today alone (b9635, b9636, b9637), reaching b9637. The highlight is Cohere2MoE (North Code) parser support — adding a dedicated chat template parser for the Cohere MoE architecture. Combined with jinja count/d/e filter aliases and a fix for preserved tokens not copying correctly, this release cycle is addressing real template and tokenization edge cases.

Signal: Cohere2MoE support in llama.cpp means the Cohere MoE models can now be run locally with proper chat formatting. If you've been curious about Cohere's MoE models on consumer hardware, the plumbing is now there.

3. DeepSeek-V4-Pro Hits 3.07M Downloads — Now #14 Trending

deepseek-ai/DeepSeek-V4-Pro is now at 4,828 likes and 3,075,369 downloads, cementing its place at #14 in the HF trending leaderboard. With vLLM v0.23.0's TRTLLM kernel and SGLang's full inference path, this model is getting serious production treatment from the inference community.

Signal: V4-Pro is the practical DeepSeek choice for local inference. The download count tells the real story — people are running it, not just liking it. At this download velocity, it could crack the top 10 within weeks.

📊 Model Trends

HuggingFace Top 15

Rank Model Likes Downloads Change
1 DeepSeek-R1 13,390 4.4M +1
2 FLUX.1-dev 13,199 586K +11
3 SDXL 1.0 7,815 1.0M
4 SD 1.4 7,020 298K
5 Llama-3-8B 6,577 905K
6 Kokoro-82M 6,323 11.1M
7 Llama-3.1-8B-Instruct 6,075 6.6M
8 Whisper-large-v3 5,817 4.0M
9 FLUX.1-schnell 5,123 227K
10 bloom 5,011 3.3K
13 gpt-oss-120b 4,883 2.8M +3
14 DeepSeek-V4-Pro 4,828 3.07M +15 ⬆️
15 Z-Image-Turbo 4,803 600K +3

Notable: The leaderboard is remarkably stable — same 15 models. DeepSeek-V4-Pro and gpt-oss-120b are the steady climbers. Kokoro-82M at 11.1M downloads continues to be the silent giant of TTS models.

Qwen Family

Model Likes Downloads Notes
QwQ-32B 2,930 38K Top Qwen model
Qwen3.5-27B-Claude-4.6 (distilled) 2,879 87K Community distillation
Qwen-Image 2,511 124K Text-to-image
Qwen-Image-Edit 2,424 41K Image editing
Qwen3.6-35B-A3B 2,101 3.37M ⬆️ MoE, 3B active params

Qwen3.6-35B-A3B continues to be the practical king. 3.37M downloads and counting. The MoE architecture (3B active params out of 35B total) makes this a real option for consumer hardware.

Gemma Family

Model Likes Downloads Notes
gemma-7b 3,356 22K Classic
gemma-4-31B-it 2,979 7.3M Latest flagship
gemma-3-27b-it 1,978 1.0M
gemma-3n-E4B-it-litert-preview 1,484 0 Edge-ready
gemma-3-4b-it 1,367 1.1M

Gemma 4 31B-it at 7.3M downloads — Google's latest flagship is finding massive adoption. With Ollama's Gemma 4 QAT weights support from v0.30.6, deployment is straightforward.

⚙️ Engine Updates

llama.cpp — b9637 (June 14) ⬆️ +9 builds since yesterday

Ollama — v0.30.8 (June 12) — no change

vLLM — v0.23.0 (June 12) — no change

SGLang — v0.5.13 (June 13) — no change

📰 AI News (HN)

🔄 What Changed Since Yesterday

Area Yesterday (Jun 13) Today (Jun 14) Change
llama.cpp b9628 b9637 +9 builds 🔥
Ollama v0.30.8 v0.30.8
vLLM v0.23.0 v0.23.0
SGLang v0.5.13 v0.5.13
DeepSeek-V4-Pro 4,813 likes 4,828 likes +15 ⬆️
DeepSeek-V4-Pro 3.03M downloads 3.07M downloads +40K ⬆️
FLUX.1-dev 13,188 likes 13,199 likes +11
gpt-oss-120b 4,880 likes 4,883 likes +3
Z-Image-Turbo 4,800 likes 4,803 likes +3
Qwen3.6-35B-A3B 2,098 likes 2,101 likes +3
gemma-4-31B-it 2,975 likes 2,979 likes +4

Local Inference Recommendations

RTX 3060 (12GB):

RTX 3090 (24GB):

Key takeaway: llama.cpp's Cohere2MoE parser is the day's practical addition. If you haven't updated your inference engines since yesterday, the 9 new llama.cpp builds are worth grabbing — especially if you're experimenting with Cohere models.


Scan completed: 2026-06-14 | Sources: HuggingFace API, llama.cpp GitHub, Ollama GitHub, vLLM GitHub, SGLang GitHub

model-intelligencedaily-briefing