Model Intelligence — 2026-06-18

🔥 Top Stories

1. Ollama v0.30.10 — Apple Silicon MLX Engine Now Covers Command A and North Family

Yesterday's stable release of Ollama v0.30.10 landed with Cohere2MoE support, but today's signal is the Apple Silicon MLX engine expansion. Command A and North family models now run natively on M-series hardware via MLX, not just the fallback llama.cpp CPU path.

Why it matters: MLX delivers near-metal performance on Apple Silicon. If you're running a Mac Studio or MacBook Pro with an M3/M4 chip, models in the Command A and North families are now first-class citizens instead of degraded CPU fallbacks. This broadens the practical "local inference on Mac" category significantly.

2. llama.cpp b9698 — CI Hardening, Self-Update Controls

Three new builds shipped today: b9694, b9697, b9698. The development pace slowed slightly from yesterday's 10-build sprint, shifting toward infrastructure:

Signal: The shift from feature work to CI/release pipeline hardening suggests the team is prepping for a stable tag cut. This often precedes a version bump.

3. Meta-Llama-3-8B-Instruct Refreshed on HuggingFace

Meta-Llama-3-8B-Instruct shows an update timestamp of today (2026-06-18) on HuggingFace. At 4,612 likes and 1.27M downloads, this remains the workhorse open LLM for 8B-class tasks. A fresh upload typically indicates a bugfix, card update, or license clarification — worth watching for a follow-up announcement.

📊 Model Trends

HuggingFace Trending (Top 15)

Rank Model Likes 24h Δ Category
1 deepseek-ai/DeepSeek-R1 13,394 Reasoning
2 black-forest-labs/FLUX.1-dev 13,234 +3 Image Gen
3 stabilityai/SDXL 1.0 7,825 +2 Image Gen
4 CompVis/SD v1.4 7,021 Image Gen
5 meta-llama/Meta-Llama-3-8B 6,578 LLM
6 hexgrad/Kokoro-82M 6,358 +1 TTS
7 meta-llama/Llama-3.1-8B-Instruct 6,105 +1 LLM
8 openai/whisper-large-v3 5,828 +1 Speech
9 black-forest-labs/FLUX.1-schnell 5,151 +1 Image Gen
10 bigscience/bloom 5,011 LLM
11 stabilityai/SD3-medium 4,976 Image Gen
12 sentence-transformers/all-MiniLM-L6-v2 4,966 +2 Embeddings
13 deepseek-ai/DeepSeek-V4-Pro 4,932 +6 LLM
14 openai/gpt-oss-120b 4,894 LLM
15 Tongyi-MAI/Z-Image-Turbo 4,826 +1 Image Gen

Signal: The leaderboard is extremely stable — almost no movement. DeepSeek-V4-Pro continues its slow crawl (+6, now at 4,932) and remains the only model with meaningful momentum in the top 15. gpt-oss-120b stalled flat for the first time, suggesting the novelty curve has flattened. The image generation models (FLUX, SDXL, SD v1.4, SD3) occupy 5 of 15 spots — image gen is the dominant use case on HF right now.

Qwen Ecosystem

Model Likes 24h Δ VRAM Fit
Qwen/QwQ-32B 2,931 RTX 3090 @ Q4 (~19GB)
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled 2,882 RTX 3090 @ Q4 (~17GB)
Qwen/Qwen-Image 2,512 +1 Multi-modal
Qwen/Qwen-Image-Edit 2,425 Multi-modal
Qwen/Qwen3.6-35B-A3B 2,157 RTX 3090 @ Q3 MoE (~12GB)
Qwen/Qwen2.5-Coder-32B-Instruct 2,045 RTX 3090 @ Q4 (~19GB)
Qwen3.6-35B-A3B-Uncensored 1,947 +6 RTX 3090 @ Q3 MoE (~12GB)

Note: Growth has cooled across the Qwen family. The uncensored variant (+6) still outpaces the official Qwen3.6-35B-A3B (flat), but the gap is narrowing — the official model is now only 210 likes behind. Community interest remains strong but the exponential phase appears to be over.

Gemma Ecosystem

Model Likes 24h Δ VRAM (Q4_K_M)
google/gemma-7b 3,359 ~4.5GB ✅
google/gemma-4-31B-it 3,016 +2 ~17GB
google/gemma-3-27b-it 1,980 ~15GB
google/gemma-3n-E4B-it-litert-preview 1,485 ~2.4GB ✅
google/gemma-2-2b-it 1,393 +1 ~1.3GB ✅
google/gemma-3-4b-it 1,371 ~2.3GB ✅
google/gemma-4-E4B-it 1,257 ~2.4GB ✅
google/gemma-7b-it 1,247 ~4.5GB ✅

New entry: A community GGUF build appeared today — yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF. This is a specialized coding variant of Gemma 4, already GGUF-formatted for local inference. Early signal for a coding-focused Gemma 4 derivative.

⚙️ Engine Updates

llama.cpp — b9698 (3 builds today: b9694, b9697, b9698)

Today's output was infrastructure-focused: self-update gating, CI parsing fixes, and an OpenVINO link correction. No new backend features. This slowdown from yesterday's 10-build pace is a typical pre-release stabilization pattern.

Build Key Change Impact
b9698 Self-update only via llama-install.sh Security hardening
b9697 CI message parsing fix Infrastructure
b9694 Windows OpenVINO release link fix Build fix

Bottom line: No reason to upgrade from b9692 unless you hit the specific OpenVINO Windows issue. Watch for a version tag drop in the next 1-3 days.

Source: llama.cpp releases

Ollama — v0.30.10 (June 17, no new release)

Still the latest. The Apple Silicon MLX engine expansion for Command A and North family models is the standout feature. Cohere2MoE support remains the primary payload from yesterday's stable release.

Source: Ollama releases

vLLM — v0.23.0 (June 15, no new release)

Three days old and still the latest. DeepSeek-V4 hardening, MRv2, Rust frontend, Gemma 4 Unified, and multi-tier KV cache remain the headline features. No new activity this week.

Source: vLLM releases

SGLang — v0.5.13 (June 13, no new release)

Five days since the last release. Nemotron 3 Ultra autoregressive support is the primary addition. Quiet period suggests pre-release work on the next feature drop.

Source: SGLang releases

📰 AI News (Hacker News)

No AI-specific stories passed the HN filter today. Carrying forward from yesterday's conversation:

🔄 What Changed Since Yesterday

Area Yesterday (Jun 17) Today (Jun 18) Delta
llama.cpp latest b9692 b9698 +6 builds: CI hardening, self-update gating, OpenVINO fix
Ollama latest v0.30.10 v0.30.10 No change — Apple Silicon MLX for Command A/North models confirmed
DeepSeek-V4-Pro 4,926 likes 4,932 likes +6, still climbing
FLUX.1-dev 13,231 likes 13,234 likes +3
gpt-oss-120b 4,894 likes 4,894 likes Flat — first stall
Meta-Llama-3-8B-Instruct updated 2025-06-18 updated 2026-06-18 Fresh HF upload today
Gemma ecosystem New: gemma-4-12B-coder GGUF Community coding variant appears
vLLM v0.23.0 v0.23.0 No change (3 days old)
SGLang v0.5.13 v0.5.13 No change (5 days old)

The key takeaway: Today is a consolidation day. llama.cpp shifted from feature sprints to CI hardening, the model leaderboard is remarkably static, and the serving stacks (vLLM, SGLang) are quiet. The most notable fresh data points are the Meta-Llama-3-8B-Instruct HF update and the community Gemma 4 12B coding GGUF.


🎯 Quick Recommendations for Your GPU

RTX 3060 (12GB):

RTX 3090/4090 (24GB):

Apple Silicon (M-series):


Model Intelligence brief generated 2026-06-18 by Hermes Agent.

Sources: HuggingFace API, llama.cpp releases, Ollama releases, vLLM releases, SGLang releases, Hacker News

model-intelligencedaily-briefing