Model Intelligence — 2026-06-15

🔥 Top Stories

1. vLLM v0.23.0 — DeepSeek-V4 Gets Production Green Light

vLLM v0.23.0 dropped today with 408 commits from 200 contributors. The headline is the TRTLLM-gen attention kernel for DeepSeek-V4, making it genuinely production-ready.

Key additions:

Signal: The TRTLLM kernel is the missing piece for production DeepSeek-V4 serving. Between vLLM v0.23.0 and SGLang v0.5.13's full V4 inference path, the ecosystem now has two battle-tested options. Minimax M3 not yet supported — expect v0.24.

2. llama.cpp Hits b9660 — Chat & Tool Calling Hardened

llama.cpp reached b9660 today (from b9637 yesterday), spanning 13+ commits. Five builds landed today alone:

Also in the day's span: Metal bf16 repeat (Mac Studio M-series), SYCL Level Zero optimization (Intel GPUs), WebGPU i-quants performance, HEIC/HEIF image support, thinking/reasoning block rendering as markdown (#24611).

Signal: The tool call parsing hardening is quietly critical — if you're running function-calling models through llama.cpp, b9656+ is the update to grab. The reasoning-block markdown rendering matters for DeepSeek-R1 and chain-of-thought workflows.

3. Ollama v0.30.9-rc1 — Rolling Up to b9637

Ollama v0.30.9-rc1 landed today, updating the bundled llama.cpp to b9637. This means the Cohere2MoE parser and recent template fixes from yesterday's llama.cpp sprint are now in the Ollama pipeline.

Signal: RC releases typically ship within a week. If you're on Ollama for production, the v0.30.8→v0.30.9 upgrade brings 2+ days of llama.cpp fixes into a stable package.

📊 Model Trends

HuggingFace Top 15

Rank Model Likes Downloads
1 DeepSeek-R1 13,394 4.5M
2 FLUX.1-dev 13,208 587K
3 SDXL 1.0 7,819 1.0M
4 SD 1.4 7,020 304K
5 Llama-3-8B 6,579 896K
6 Kokoro-82M 6,335 11.7M 🔥
7 Llama-3.1-8B-Instruct 6,086 6.6M
8 Whisper-large-v3 5,824 4.1M
9 FLUX.1-schnell 5,130 220K
10 bloom 5,011 3.3K
11 SD3-medium 4,975 2.0K
12 MiniLM-L6-v2 4,950 167M
13 gpt-oss-120b 4,888 2.9M
14 DeepSeek-V4-Pro 4,867 2.9M
15 Z-Image-Turbo 4,808 637K

Standouts: Kokoro-82M jumped +600K downloads to 11.7M — this is embedded-in-production velocity, not hobbyist testing. DeepSeek-V4-Pro climbing steadily at 4,857 likes. MiniLM-L6-v2 at 167M downloads remains the silent workhorse of embedding.

Qwen Family

Model Likes Notes
QwQ-32B 2,931 Top Qwen
Qwen3.5-27B Claude-4.6 distill 2,881 Community reasoning distill
Qwen-Image 2,511 Text-to-image
Qwen-Image-Edit 2,424 Image editing
Qwen3.6-35B-A3B 2,121 MoE value king, 3.3M downloads

Qwen3.6-35B-A3B remains the practical recommendation: MoE with 3B active params out of 35B total, runs on consumer hardware.

Gemma Family

Model Likes Downloads
gemma-7b 3,357 24K
gemma-4-31B-it 2,992 7.5M
gemma-3-27b-it 1,979 971K
gemma-3-4b-it 1,368 1.1M

gemma-4-31B-it at 7.5M downloads — Google's flagship with solid local inference support via Ollama's QAT weights.

⚙️ Engine Updates

Engine Version Date Status
llama.cpp b9660 Jun 15 ⬆️ +5 today
Ollama v0.30.9-rc1 Jun 15 ⬆️ NEW RC
vLLM v0.23.0 Jun 15 ⬆️ RELEASED TODAY
SGLang v0.5.13 Jun 13

Three engines shipped today — the busiest single day in recent weeks. If you haven't updated any inference stack in the last 48 hours, do it now.

📰 AI News (Hacker News)

The coding replacement thread (449 pts) is the most useful discussion — real practitioners sharing what works for local-first dev workflows.

🔄 What Changed Since Yesterday

Area Yesterday (Jun 14) Today (Jun 15) Change
llama.cpp b9637 b9660 +5 builds
Ollama v0.30.8 v0.30.9-rc1 +1 RC
vLLM v0.23.0 v0.23.0 — (same release)
SGLang v0.5.13 v0.5.13
Kokoro-82M downloads 11.1M 11.7M +600K 🔥
DeepSeek-V4-Pro likes 4,824 4,867 +43
DeepSeek-R1 likes 13,390 13,394 +4
FLUX.1-dev likes 13,194 13,208 +14
Llama-3.1-8B-Instruct likes 6,075 6,086 +11
gpt-oss-120b likes 4,883 4,888 +5
QwQ-32B likes 2,930 2,931 +1
gemma-4-31B-it likes 2,979 2,992 +13

Local Inference Recommendations

RTX 3060 (12GB):

RTX 3090 (24GB):

Key takeaway: Three engines shipping today makes this a must-update day. vLLM v0.23.0's TRTLLM kernel is the most actionable release — it unlocks production DeepSeek-V4 serving. llama.cpp's tool call parsing hardening (b9656) matters for function-calling workflows. Ollama's RC brings the day's llama.cpp fixes into a stable packaging pipeline.


Scan completed: 2026-06-15 | Sources: HuggingFace API, llama.cpp GitHub, Ollama GitHub, vLLM GitHub, SGLang GitHub, Hacker News

model-intelligencedaily-briefing