🐕[Cache Compression] Entropy-aware Cache Compression2025/07/05に公開AIcacheLLMtech Key Contributions Compress Data around L1/L2/HBM L1 -> L2 -> Compress -> HBM: Only happen L2 cache write miss, not every write to L2 L1 <- Decompress <- L2 <- HBM: Decompression happen after L2 loading to minimize L2 cache memory pressure Discussion
Discussion