<h1 id="key-contributions" data-line="0" class="code-line">
<a class="header-anchor-link" href="#key-contributions" aria-hidden="true"></a> Key Contributions</h1>
<h2 id="activation-aware-quantize" data-line="1" class="code-line">
<a class="header-anchor-link" href="#activation-aware-quantize" aria-hidden="true"></a> Activation Aware Quantize</h2>
<p data-line="2" class="code-line">High magnitude of activation have large impact to result<br>
-&gt; Keep FP16 based on activation magnitude instead of weight magnitude<br>
<img src="https://storage.googleapis.com/zenn-user-upload/8e2cfabe8fbc-20250621.png" loading="lazy" class="md-img"></p>
<h1 id="reference" data-line="6" class="code-line">
<a class="header-anchor-link" href="#reference" aria-hidden="true"></a> Reference</h1>
<p data-line="7" class="code-line"><a href="https://arxiv.org/pdf/2306.00978" target="_blank" rel="nofollow noopener noreferrer">AWQ: ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION</a></p>


[Quantization] AWQ

Key Contributions

Activation Aware Quantize

Reference

Discussion