💭
[Quantization] AWQ
Key Contributions
Activation Aware Quantize
High magnitude of activation have large impact to result
-> Keep FP16 based on activation magnitude instead of weight magnitude
Reference
AWQ: ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION
Discussion