🦙

【SiLU】Swish activation function【Method】

2024/05/14に公開

1. What is Swish activation function

Swish is a activatiion function similar to ReLU. It can be expressed a "smooth ReLU"

・ Swish formula
f(x) = x \cdot \sigma(\beta x)
\sigma: sigmmoid

where \beta a learnable parameter, but nealy all imprementation don't use \beta, so formula will be

x \cdot \sigma(x)

this called SiLU, equivarent to when swish's beta is 1.

Swish achieved some improved score from ReLU, the result can be attributed to its smoothness.

Reference

[1] Swish, Paper With Code

Discussion