📝
[Quantization] AutoRound
Overview
train paramter V[-1.0,1.0] for rounding up or down.
SignSGD
SignSGD limit gradient to -1 or 1 with sign function
-
can be defined lr based on total_steps and up-down range(=1.0)
total_gradient = lr * total_steps(arbitrary)
= abs[-1.0,1.0](down and up) = 1.0ex) if gradient is all positive or negative for all steps
all_positive: round(1.1 + 1.0) = 2 = up all_negative: round(1.1 - 1.0) = 1 = down
-
not sensitive to gradient magnitude since the gradient will be -1 or 1
Reference
Optimize Weight Rounding via Signed Gradient Descent for the
Quantization of LLMs
Discussion