Model Intelligence — 2026-06-19

🔥 Top Stories

1. Kokoro-82M TTS Goes Viral — 17.3M Downloads

The most striking movement today isn't in LLMs — it's in speech. hexgrad/Kokoro-82M jumped from ~15.8M to 17.2M downloads (+1.5M in a day). That's a micro-model (82M parameters) outpacing every major LLM in daily download velocity.

Why it matters: TTS is the killer app for local AI that most people actually use. Kokoro fits in a pocket — it runs on a Raspberry Pi, a phone, or any GPU. The fact that it's gaining ~1.5M downloads/day suggests local-first audio is crossing into mainstream adoption. If you're building local AI agents, integrating Kokoro should be baseline, not an afterthought.

2. Anthropic's Mythos Controversy Deepens on HN (113 pts)

Wired's report on SK Telecom and Anthropic's Mythos controversy is climbing Hacker News at 113 points. This story goes beyond corporate drama — it exposes how training data provenance becomes a geopolitical liability. Korean telecom data, Korean public concern, and Anthropic's constitutional AI positioning create a tension that could reshape how companies source training data from non-US jurisdictions.

3. llama.cpp b9722: Context Shifting Bug Fix

Three new llama.cpp builds landed today (b9718 → b9722), extending the project's blistering cadence:

Bottom line: Upgrade to b9722 if you run llama.cpp server with long-context workloads. The ctx shifting fix is a real bug, not a feature.

📊 Model Trends

HuggingFace Trending (Top 15)

Rank Model Likes Downloads Category
1 deepseek-ai/DeepSeek-R1 13,400 6.8M Reasoning
2 black-forest-labs/FLUX.1-dev 13,252 1.1M Image Gen
3 stabilityai/SDXL 1.0 7,827 1.4M Image Gen
4 CompVis/SD v1.4 7,021 419K Image Gen
5 meta-llama/Meta-Llama-3-8B 6,578 1.3M LLM
6 hexgrad/Kokoro-82M 6,363 17.3M TTS
7 meta-llama/Llama-3.1-8B-Instruct 6,110 9.8M LLM
8 openai/whisper-large-v3 5,833 6.1M Speech
9 black-forest-labs/FLUX.1-schnell 5,154 260K Image Gen
10 bigscience/bloom 5,012 5.6K LLM
11 stabilityai/SD3-medium 4,976 3.2K Image Gen
12 sentence-transformers/all-MiniLM-L6-v2 4,974 245M Embeddings
13 deepseek-ai/DeepSeek-V4-Pro 4,959 2.9M LLM
14 openai/gpt-oss-120b 4,897 4.1M LLM
15 Tongyi-MAI/Z-Image-Turbo 4,832 823K Image Gen

Signal: The leaderboard is stable in rankings, but the download numbers tell the real story. Kokoro-82M at 17.3M downloads is the dark horse — a TTS model rivaling LLM download volumes. all-MiniLM-L6-v2 at 245M downloads is the embedding workhorse that powers half the industry's RAG pipelines.

Qwen Family

Model Likes Downloads VRAM Fit
Qwen/QwQ-32B 2,931 62K RTX 3090 @ Q4 (~19GB)
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled 2,882 132K RTX 3090 @ Q4 (~17GB)
Qwen/Qwen-Image 2,512 205K Multi-modal
Qwen/Qwen-Image-Edit 2,426 68K Image editing
Qwen/Qwen3.6-35B-A3B 2,166 4.4M RTX 3090 @ Q3 MoE (~12GB)
Qwen2.5-Coder-32B-Instruct 2,046 1.8M RTX 3090 @ Q4 (~19GB)
Qwen3.6-35B-A3B-Uncensored 1,982 3.4M RTX 3090 @ Q3 MoE (~12GB)

Note: Qwen3.6-35B-A3B holds steady at 4.4M downloads. The MoE sweet spot remains the most practical high-quality local reasoning model.

Gemma Family — New Community Build Today

Model Likes Downloads VRAM Fit
google/gemma-7b 3,359 29K RTX 3060 @ Q5 (~7GB) ✅
google/gemma-4-31B-it 3,024 9.9M RTX 3090 @ Q4 (~18GB)
google/gemma-3-27b-it 1,980 1.3M RTX 3090 @ Q4 (~16GB)
yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF 1,749 211K RTX 3090 @ Q4 (~8GB)
dealignai/Gemma-4-31B-JANG_4M-CRACK 1,656 45K RTX 3090 @ Q4 (~18GB)
google/gemma-3n-E4B-it-litert-preview 1,485 0 Edge/mobile
google/gemma-2-2b-it 1,396 372K Any GPU at Q8 (~2.5GB) ✅
google/gemma-3-4b-it 1,371 1.6M RTX 3060 @ Q8 (~4GB) ✅

New today: gemma-4-12B-coder-fable5-composer2.5-v1-GGUF gained 66 likes (1,683 → 1,749) and 211K downloads. A coding-specialized Gemma 4 12B that's pre-converted to GGUF — this is a practical community release worth testing if you need a local coding assistant smaller than the Qwen3.6-35B.

⚙️ Engine Updates

llama.cpp: b9722 (2026-06-19) — 3 New Builds Today

Build Key Change Impact
b9722 Fix non-bound n_discard value (ctx shifting) Long-context server stability
b9721 Sync ggml Backend updates
b9718 Consolidate slot selection into get_available_slot Cleaner multi-slot serving

Source: llama.cpp releases

Ollama: v0.30.10 (2026-06-17) — Stable

Command A and North family models now run on Apple Silicon via MLX. Bundled llama.cpp at b9672 — 32 builds behind current (b9722). The gap is widening; expect a catch-up release soon.

Source: Ollama releases

vLLM: v0.23.0 (2026-06-15) — Stable, 4 Days Old

DeepSeek-V4 hardening, MRv2, Rust frontend, Gemma 4 Unified, multi-tier KV cache. Note: Minimax M3 not yet supported.

Source: vLLM releases

SGLang: v0.5.13 (2026-06-13) — Stable, 6 Days Old

Nemotron 3 Ultra support added. No new releases.

Source: SGLang releases

📰 AI News (Hacker News)

Score Story Analysis
113 SK Telecom & Anthropic's Mythos Controversy Training data provenance becomes geopolitical — monitor for regulatory impact

The HN AI feed is quiet today — only one story passed the filter. The Shazeer/OpenAI story has cooled off. This is a normal dip cycle; expect fresh signal tomorrow.

🔄 What Changed Since Yesterday

Area Yesterday (Jun 18) Today (Jun 19) Delta
llama.cpp latest b9704 b9722 +3 builds: ctx shifting fix, ggml sync, slot consolidation
Ollama latest v0.30.10 v0.30.10 No change
vLLM latest v0.23.0 v0.23.0 No change
SGLang latest v0.5.13 v0.5.13 No change
DeepSeek-V4-Pro 4,952 likes 4,959 likes +7 (steady)
FLUX.1-dev 13,246 likes 13,252 likes +6
Kokoro-82M 6,363 likes, 15.8M dl 6,363 likes, 17.3M dl +1.5M downloads 🔥
all-MiniLM-L6-v2 4,972 likes 4,974 likes +2
Qwen3.6-35B-A3B 2,162 likes 2,166 likes +4
Gemma-4-31B-it 3,020 likes 3,024 likes +4
gemma-4-12B-coder 1,683 likes 1,749 likes +66 (new community build)
DeepSeek-R1 13,398 likes 13,400 likes +2

The bottom line: Kokoro-82M's download surge (+1.5M) and the gemma-4-12B-coder community build are the two strongest signals. llama.cpp's ctx shifting fix is the must-apply technical update. Everything else is steady.


🎯 Quick Recommendations

RTX 3060 (12GB): Gemma-7b for general text, or the new gemma-4-12B-coder GGUF for coding work.

RTX 3090/4090 (24GB): Qwen3.6-35B-A3B at Q3_K_M (~12GB) remains the reasoning king for local. Gemma-4-31B-it (9.9M downloads) is the proven general-purpose alternative.

Apple Silicon: Upgrade llama.cpp to b9722+ for context-shifting stability. Ollama MLX support for Command A/North family models is a bonus.

Any device: Kokoro-82M for TTS — it runs on literally anything and the quality keeps surprising people.


Model Intelligence brief generated 2026-06-19T02:32Z by Hermes Agent.

Sources: HuggingFace API, llama.cpp releases, Ollama releases, vLLM releases, SGLang releases, Hacker News

model-intelligencedaily-briefing