Model Intelligence — 2026-06-18
🔥 Top Stories
1. Ollama v0.30.10 — Apple Silicon MLX Engine Now Covers Command A and North Family
Yesterday's stable release of Ollama v0.30.10 landed with Cohere2MoE support, but today's signal is the Apple Silicon MLX engine expansion. Command A and North family models now run natively on M-series hardware via MLX, not just the fallback llama.cpp CPU path.
Why it matters: MLX delivers near-metal performance on Apple Silicon. If you're running a Mac Studio or MacBook Pro with an M3/M4 chip, models in the Command A and North families are now first-class citizens instead of degraded CPU fallbacks. This broadens the practical "local inference on Mac" category significantly.
2. llama.cpp b9698 — CI Hardening, Self-Update Controls
Three new builds shipped today: b9694, b9697, b9698. The development pace slowed slightly from yesterday's 10-build sprint, shifting toward infrastructure:
- b9698 — Self-update gating:
llama-install.shis the only build path that enables self-update, tightening security for non-install.sh builds. - b9697 — CI check-release message parsing fix.
- b9694 — Windows x64 OpenVINO release link fix.
Signal: The shift from feature work to CI/release pipeline hardening suggests the team is prepping for a stable tag cut. This often precedes a version bump.
3. Meta-Llama-3-8B-Instruct Refreshed on HuggingFace
Meta-Llama-3-8B-Instruct shows an update timestamp of today (2026-06-18) on HuggingFace. At 4,612 likes and 1.27M downloads, this remains the workhorse open LLM for 8B-class tasks. A fresh upload typically indicates a bugfix, card update, or license clarification — worth watching for a follow-up announcement.
📊 Model Trends
HuggingFace Trending (Top 15)
| Rank | Model | Likes | 24h Δ | Category |
|---|---|---|---|---|
| 1 | deepseek-ai/DeepSeek-R1 | 13,394 | — | Reasoning |
| 2 | black-forest-labs/FLUX.1-dev | 13,234 | +3 | Image Gen |
| 3 | stabilityai/SDXL 1.0 | 7,825 | +2 | Image Gen |
| 4 | CompVis/SD v1.4 | 7,021 | — | Image Gen |
| 5 | meta-llama/Meta-Llama-3-8B | 6,578 | — | LLM |
| 6 | hexgrad/Kokoro-82M | 6,358 | +1 | TTS |
| 7 | meta-llama/Llama-3.1-8B-Instruct | 6,105 | +1 | LLM |
| 8 | openai/whisper-large-v3 | 5,828 | +1 | Speech |
| 9 | black-forest-labs/FLUX.1-schnell | 5,151 | +1 | Image Gen |
| 10 | bigscience/bloom | 5,011 | — | LLM |
| 11 | stabilityai/SD3-medium | 4,976 | — | Image Gen |
| 12 | sentence-transformers/all-MiniLM-L6-v2 | 4,966 | +2 | Embeddings |
| 13 | deepseek-ai/DeepSeek-V4-Pro | 4,932 | +6 | LLM |
| 14 | openai/gpt-oss-120b | 4,894 | — | LLM |
| 15 | Tongyi-MAI/Z-Image-Turbo | 4,826 | +1 | Image Gen |
Signal: The leaderboard is extremely stable — almost no movement. DeepSeek-V4-Pro continues its slow crawl (+6, now at 4,932) and remains the only model with meaningful momentum in the top 15. gpt-oss-120b stalled flat for the first time, suggesting the novelty curve has flattened. The image generation models (FLUX, SDXL, SD v1.4, SD3) occupy 5 of 15 spots — image gen is the dominant use case on HF right now.
Qwen Ecosystem
| Model | Likes | 24h Δ | VRAM Fit |
|---|---|---|---|
| Qwen/QwQ-32B | 2,931 | — | RTX 3090 @ Q4 (~19GB) |
| Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled | 2,882 | — | RTX 3090 @ Q4 (~17GB) |
| Qwen/Qwen-Image | 2,512 | +1 | Multi-modal |
| Qwen/Qwen-Image-Edit | 2,425 | — | Multi-modal |
| Qwen/Qwen3.6-35B-A3B | 2,157 | — | RTX 3090 @ Q3 MoE (~12GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | 2,045 | — | RTX 3090 @ Q4 (~19GB) |
| Qwen3.6-35B-A3B-Uncensored | 1,947 | +6 | RTX 3090 @ Q3 MoE (~12GB) |
Note: Growth has cooled across the Qwen family. The uncensored variant (+6) still outpaces the official Qwen3.6-35B-A3B (flat), but the gap is narrowing — the official model is now only 210 likes behind. Community interest remains strong but the exponential phase appears to be over.
Gemma Ecosystem
| Model | Likes | 24h Δ | VRAM (Q4_K_M) |
|---|---|---|---|
| google/gemma-7b | 3,359 | — | ~4.5GB ✅ |
| google/gemma-4-31B-it | 3,016 | +2 | ~17GB |
| google/gemma-3-27b-it | 1,980 | — | ~15GB |
| google/gemma-3n-E4B-it-litert-preview | 1,485 | — | ~2.4GB ✅ |
| google/gemma-2-2b-it | 1,393 | +1 | ~1.3GB ✅ |
| google/gemma-3-4b-it | 1,371 | — | ~2.3GB ✅ |
| google/gemma-4-E4B-it | 1,257 | — | ~2.4GB ✅ |
| google/gemma-7b-it | 1,247 | — | ~4.5GB ✅ |
New entry: A community GGUF build appeared today — yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF. This is a specialized coding variant of Gemma 4, already GGUF-formatted for local inference. Early signal for a coding-focused Gemma 4 derivative.
⚙️ Engine Updates
llama.cpp — b9698 (3 builds today: b9694, b9697, b9698)
Today's output was infrastructure-focused: self-update gating, CI parsing fixes, and an OpenVINO link correction. No new backend features. This slowdown from yesterday's 10-build pace is a typical pre-release stabilization pattern.
| Build | Key Change | Impact |
|---|---|---|
| b9698 | Self-update only via llama-install.sh |
Security hardening |
| b9697 | CI message parsing fix | Infrastructure |
| b9694 | Windows OpenVINO release link fix | Build fix |
Bottom line: No reason to upgrade from b9692 unless you hit the specific OpenVINO Windows issue. Watch for a version tag drop in the next 1-3 days.
Source: llama.cpp releases
Ollama — v0.30.10 (June 17, no new release)
Still the latest. The Apple Silicon MLX engine expansion for Command A and North family models is the standout feature. Cohere2MoE support remains the primary payload from yesterday's stable release.
Source: Ollama releases
vLLM — v0.23.0 (June 15, no new release)
Three days old and still the latest. DeepSeek-V4 hardening, MRv2, Rust frontend, Gemma 4 Unified, and multi-tier KV cache remain the headline features. No new activity this week.
Source: vLLM releases
SGLang — v0.5.13 (June 13, no new release)
Five days since the last release. Nemotron 3 Ultra autoregressive support is the primary addition. Quiet period suggests pre-release work on the next feature drop.
Source: SGLang releases
📰 AI News (Hacker News)
No AI-specific stories passed the HN filter today. Carrying forward from yesterday's conversation:
- [1,466 pts] "Running local models is good now" (Vicki Boykis) — Still dominating. The quantization + hardware + ecosystem convergence thesis remains the defining narrative of the week. Link
- [503 pts] "GLM-5.2 is the new leading open weights model on Artificial Analysis" — Competitive benchmark signal worth tracking. Link
🔄 What Changed Since Yesterday
| Area | Yesterday (Jun 17) | Today (Jun 18) | Delta |
|---|---|---|---|
| llama.cpp latest | b9692 | b9698 | +6 builds: CI hardening, self-update gating, OpenVINO fix |
| Ollama latest | v0.30.10 | v0.30.10 | No change — Apple Silicon MLX for Command A/North models confirmed |
| DeepSeek-V4-Pro | 4,926 likes | 4,932 likes | +6, still climbing |
| FLUX.1-dev | 13,231 likes | 13,234 likes | +3 |
| gpt-oss-120b | 4,894 likes | 4,894 likes | Flat — first stall |
| Meta-Llama-3-8B-Instruct | updated 2025-06-18 | updated 2026-06-18 | Fresh HF upload today |
| Gemma ecosystem | — | New: gemma-4-12B-coder GGUF | Community coding variant appears |
| vLLM | v0.23.0 | v0.23.0 | No change (3 days old) |
| SGLang | v0.5.13 | v0.5.13 | No change (5 days old) |
The key takeaway: Today is a consolidation day. llama.cpp shifted from feature sprints to CI hardening, the model leaderboard is remarkably static, and the serving stacks (vLLM, SGLang) are quiet. The most notable fresh data points are the Meta-Llama-3-8B-Instruct HF update and the community Gemma 4 12B coding GGUF.
🎯 Quick Recommendations for Your GPU
RTX 3060 (12GB):
google/gemma-4-E4B-itat Q4_K_M (~2.4GB) — fastest chat in the Gemma familygoogle/gemma-3-4b-itat Q4_K_M (~2.3GB) — lightweight alternative
RTX 3090/4090 (24GB):
deepseek-ai/DeepSeek-V4-Proat Q4_K_M — best reasoning with vLLM v0.23.0Qwen3.6-35B-A3Bat Q3_K_M (~11-13GB) — MoE efficiency championgoogle/gemma-4-31B-itat Q4_K_M (~17GB) — strong instruction-following
Apple Silicon (M-series):
- Ollama v0.30.10 with MLX engine for Command A/North models — native Apple Silicon performance
- llama.cpp b9692+ for Metal rope_back + concat operators
Model Intelligence brief generated 2026-06-18 by Hermes Agent.
Sources: HuggingFace API, llama.cpp releases, Ollama releases, vLLM releases, SGLang releases, Hacker News