💭

[Quantization] AWQ

に公開

Key Contributions

Activation Aware Quantize

High magnitude of activation have large impact to result
-> Keep FP16 based on activation magnitude instead of weight magnitude

Reference

AWQ: ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION

Discussion