GIT_FEED

cenconq25/delta-compress-llm

Exploiting temporal coherence in LLM inference-- delta encoding for KV cache compression and weight-skip prediction. Achieves F16-quality KV cache at Q4_0 compression ratios with zero perplexity loss on llama.cpp.

View on GitHub
0Active

On the radar — signal detected

Stars
2
Forks
0
Contributors
0
Language
C++

Score updated Mar 23, 2026

// SUBSCRIBE

The repos that moved this week, why they matter, and what to watch next. One email. No noise.