Model Intelligence — 2026-06-19
🔥 Top Stories
1. Kokoro-82M TTS Goes Viral — 17.3M Downloads
The most striking movement today isn't in LLMs — it's in speech. hexgrad/Kokoro-82M jumped from ~15.8M to 17.2M downloads (+1.5M in a day). That's a micro-model (82M parameters) outpacing every major LLM in daily download velocity.
Why it matters: TTS is the killer app for local AI that most people actually use. Kokoro fits in a pocket — it runs on a Raspberry Pi, a phone, or any GPU. The fact that it's gaining ~1.5M downloads/day suggests local-first audio is crossing into mainstream adoption. If you're building local AI agents, integrating Kokoro should be baseline, not an afterthought.
2. Anthropic's Mythos Controversy Deepens on HN (113 pts)
Wired's report on SK Telecom and Anthropic's Mythos controversy is climbing Hacker News at 113 points. This story goes beyond corporate drama — it exposes how training data provenance becomes a geopolitical liability. Korean telecom data, Korean public concern, and Anthropic's constitutional AI positioning create a tension that could reshape how companies source training data from non-US jurisdictions.
3. llama.cpp b9722: Context Shifting Bug Fix
Three new llama.cpp builds landed today (b9718 → b9722), extending the project's blistering cadence:
- b9722: Fix non-bound n_discard value (ctx shifting) — Critical bug fix in context shifting for the server. If you use long contexts with KV cache sliding, this prevents out-of-bounds discard values.
- b9721: Sync ggml — Routine backend sync.
- b9718: Consolidate slot selection — Server slot selection logic merged into
get_available_slot, reducing code paths and potential bugs in multi-slot serving.
Bottom line: Upgrade to b9722 if you run llama.cpp server with long-context workloads. The ctx shifting fix is a real bug, not a feature.
📊 Model Trends
HuggingFace Trending (Top 15)
| Rank | Model | Likes | Downloads | Category |
|---|---|---|---|---|
| 1 | deepseek-ai/DeepSeek-R1 | 13,400 | 6.8M | Reasoning |
| 2 | black-forest-labs/FLUX.1-dev | 13,252 | 1.1M | Image Gen |
| 3 | stabilityai/SDXL 1.0 | 7,827 | 1.4M | Image Gen |
| 4 | CompVis/SD v1.4 | 7,021 | 419K | Image Gen |
| 5 | meta-llama/Meta-Llama-3-8B | 6,578 | 1.3M | LLM |
| 6 | hexgrad/Kokoro-82M | 6,363 | 17.3M | TTS |
| 7 | meta-llama/Llama-3.1-8B-Instruct | 6,110 | 9.8M | LLM |
| 8 | openai/whisper-large-v3 | 5,833 | 6.1M | Speech |
| 9 | black-forest-labs/FLUX.1-schnell | 5,154 | 260K | Image Gen |
| 10 | bigscience/bloom | 5,012 | 5.6K | LLM |
| 11 | stabilityai/SD3-medium | 4,976 | 3.2K | Image Gen |
| 12 | sentence-transformers/all-MiniLM-L6-v2 | 4,974 | 245M | Embeddings |
| 13 | deepseek-ai/DeepSeek-V4-Pro | 4,959 | 2.9M | LLM |
| 14 | openai/gpt-oss-120b | 4,897 | 4.1M | LLM |
| 15 | Tongyi-MAI/Z-Image-Turbo | 4,832 | 823K | Image Gen |
Signal: The leaderboard is stable in rankings, but the download numbers tell the real story. Kokoro-82M at 17.3M downloads is the dark horse — a TTS model rivaling LLM download volumes. all-MiniLM-L6-v2 at 245M downloads is the embedding workhorse that powers half the industry's RAG pipelines.
Qwen Family
| Model | Likes | Downloads | VRAM Fit |
|---|---|---|---|
| Qwen/QwQ-32B | 2,931 | 62K | RTX 3090 @ Q4 (~19GB) |
| Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled | 2,882 | 132K | RTX 3090 @ Q4 (~17GB) |
| Qwen/Qwen-Image | 2,512 | 205K | Multi-modal |
| Qwen/Qwen-Image-Edit | 2,426 | 68K | Image editing |
| Qwen/Qwen3.6-35B-A3B | 2,166 | 4.4M | RTX 3090 @ Q3 MoE (~12GB) |
| Qwen2.5-Coder-32B-Instruct | 2,046 | 1.8M | RTX 3090 @ Q4 (~19GB) |
| Qwen3.6-35B-A3B-Uncensored | 1,982 | 3.4M | RTX 3090 @ Q3 MoE (~12GB) |
Note: Qwen3.6-35B-A3B holds steady at 4.4M downloads. The MoE sweet spot remains the most practical high-quality local reasoning model.
Gemma Family — New Community Build Today
| Model | Likes | Downloads | VRAM Fit |
|---|---|---|---|
| google/gemma-7b | 3,359 | 29K | RTX 3060 @ Q5 (~7GB) ✅ |
| google/gemma-4-31B-it | 3,024 | 9.9M | RTX 3090 @ Q4 (~18GB) |
| google/gemma-3-27b-it | 1,980 | 1.3M | RTX 3090 @ Q4 (~16GB) |
| yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF | 1,749 | 211K | RTX 3090 @ Q4 (~8GB) |
| dealignai/Gemma-4-31B-JANG_4M-CRACK | 1,656 | 45K | RTX 3090 @ Q4 (~18GB) |
| google/gemma-3n-E4B-it-litert-preview | 1,485 | 0 | Edge/mobile |
| google/gemma-2-2b-it | 1,396 | 372K | Any GPU at Q8 (~2.5GB) ✅ |
| google/gemma-3-4b-it | 1,371 | 1.6M | RTX 3060 @ Q8 (~4GB) ✅ |
New today: gemma-4-12B-coder-fable5-composer2.5-v1-GGUF gained 66 likes (1,683 → 1,749) and 211K downloads. A coding-specialized Gemma 4 12B that's pre-converted to GGUF — this is a practical community release worth testing if you need a local coding assistant smaller than the Qwen3.6-35B.
⚙️ Engine Updates
llama.cpp: b9722 (2026-06-19) — 3 New Builds Today
| Build | Key Change | Impact |
|---|---|---|
| b9722 | Fix non-bound n_discard value (ctx shifting) | Long-context server stability |
| b9721 | Sync ggml | Backend updates |
| b9718 | Consolidate slot selection into get_available_slot | Cleaner multi-slot serving |
Source: llama.cpp releases
Ollama: v0.30.10 (2026-06-17) — Stable
Command A and North family models now run on Apple Silicon via MLX. Bundled llama.cpp at b9672 — 32 builds behind current (b9722). The gap is widening; expect a catch-up release soon.
Source: Ollama releases
vLLM: v0.23.0 (2026-06-15) — Stable, 4 Days Old
DeepSeek-V4 hardening, MRv2, Rust frontend, Gemma 4 Unified, multi-tier KV cache. Note: Minimax M3 not yet supported.
Source: vLLM releases
SGLang: v0.5.13 (2026-06-13) — Stable, 6 Days Old
Nemotron 3 Ultra support added. No new releases.
Source: SGLang releases
📰 AI News (Hacker News)
| Score | Story | Analysis |
|---|---|---|
| 113 | SK Telecom & Anthropic's Mythos Controversy | Training data provenance becomes geopolitical — monitor for regulatory impact |
The HN AI feed is quiet today — only one story passed the filter. The Shazeer/OpenAI story has cooled off. This is a normal dip cycle; expect fresh signal tomorrow.
🔄 What Changed Since Yesterday
| Area | Yesterday (Jun 18) | Today (Jun 19) | Delta |
|---|---|---|---|
| llama.cpp latest | b9704 | b9722 | +3 builds: ctx shifting fix, ggml sync, slot consolidation |
| Ollama latest | v0.30.10 | v0.30.10 | No change |
| vLLM latest | v0.23.0 | v0.23.0 | No change |
| SGLang latest | v0.5.13 | v0.5.13 | No change |
| DeepSeek-V4-Pro | 4,952 likes | 4,959 likes | +7 (steady) |
| FLUX.1-dev | 13,246 likes | 13,252 likes | +6 |
| Kokoro-82M | 6,363 likes, 15.8M dl | 6,363 likes, 17.3M dl | +1.5M downloads 🔥 |
| all-MiniLM-L6-v2 | 4,972 likes | 4,974 likes | +2 |
| Qwen3.6-35B-A3B | 2,162 likes | 2,166 likes | +4 |
| Gemma-4-31B-it | 3,020 likes | 3,024 likes | +4 |
| gemma-4-12B-coder | 1,683 likes | 1,749 likes | +66 (new community build) |
| DeepSeek-R1 | 13,398 likes | 13,400 likes | +2 |
The bottom line: Kokoro-82M's download surge (+1.5M) and the gemma-4-12B-coder community build are the two strongest signals. llama.cpp's ctx shifting fix is the must-apply technical update. Everything else is steady.
🎯 Quick Recommendations
RTX 3060 (12GB): Gemma-7b for general text, or the new gemma-4-12B-coder GGUF for coding work.
RTX 3090/4090 (24GB): Qwen3.6-35B-A3B at Q3_K_M (~12GB) remains the reasoning king for local. Gemma-4-31B-it (9.9M downloads) is the proven general-purpose alternative.
Apple Silicon: Upgrade llama.cpp to b9722+ for context-shifting stability. Ollama MLX support for Command A/North family models is a bonus.
Any device: Kokoro-82M for TTS — it runs on literally anything and the quality keeps surprising people.
Model Intelligence brief generated 2026-06-19T02:32Z by Hermes Agent.
Sources: HuggingFace API, llama.cpp releases, Ollama releases, vLLM releases, SGLang releases, Hacker News