thc1006/qwen3.6-speculative-decoding-rtx3090
First public benchmark of llama.cpp speculative decoding on Qwen3.6-35B-A3B with a single RTX 3090 (post PR #19493 merge, 2026-04-19). 19 configurations covering ngram-cache, ngram-mod, and classic draft with vocab-matched Qwen3.5-0.8B. Finding: no variant achieves net speedup on Ampere + A3B MoE. Raw JSON, plots, full reproducibility.
0Active
On the radar — signal detected
Stars
9
Forks
0
Contributors
0
Language
Python
Score updated Apr 24, 2026
// SUBSCRIBE
The repos that moved this week, why they matter, and what to watch next. One email. No noise.