cenconq25/delta-compress-llm
Exploiting temporal coherence in LLM inference-- delta encoding for KV cache compression and weight-skip prediction. Achieves F16-quality KV cache at Q4_0 compression ratios with zero perplexity loss on llama.cpp.
0Active
On the radar — signal detected
Stars
2
Forks
0
Contributors
0
Language
C++
Score updated Mar 23, 2026
// SUBSCRIBE
The repos that moved this week, why they matter, and what to watch next. One email. No noise.