📝

[Quantization] AutoRound

に公開

Overview

train paramter V[-1.0,1.0] for rounding up or down.

SignSGD

SignSGD limit gradient to -1 or 1 with sign function

  1. can be defined lr based on total_steps and up-down range(=1.0)

    total_gradient = lr * total_steps(arbitrary)
    = abs[-1.0,1.0](down and up) = 1.0

    ex) if gradient is all positive or negative for all steps

    all_positive: round(1.1 + 1.0) = 2 = up
    all_negative: round(1.1 - 1.0) = 1 = down
    
  2. not sensitive to gradient magnitude since the gradient will be -1 or 1

Reference

Optimize Weight Rounding via Signed Gradient Descent for the
Quantization of LLMs

Discussion