Open8

高速な剰余演算

Mizar/みざーMizar/みざー

a\times b\ (\operatorname{mod}m)a^r\ (\operatorname{mod}m) のような計算( m は毎回異なる値ではなく、ある程度は同じ値で \operatorname{mod}m を行うものとする )を、なるべく除算を使わずに計算量を抑えるにはどうするか問題。

最適化の方法の例:

  • コンパイラ等で用いられる除算最適化(商・剰余の計算、整除性の判定など)
  • Montgomery Reduction
  • Barrett Reduction

何故、汎用の除算命令・除算ルーチンを使う事が避けられるかというと、遅いからである。用途を絞れば、加減乗算や比較などの命令を上手く組み合わせた方が速いケースが多く存在する。

Mizar/みざーMizar/みざー

Garnerのアルゴリズム, 中国剰余定理, Montgomery Reduction

Garnerのアルゴリズム

(\gcd(M_x,M_y)=1 の時、 z\equiv x\ (\operatorname{mod}M_x),\ z\equiv y\ (\operatorname{mod}M_y) が成り立つような 整数 z の値を求めるアルゴリズム)

関連: 中国剰余定理

https://ja.wikipedia.org/wiki/中国の剰余定理

https://qiita.com/drken/items/ae02240cd1f8edfc86fd#2-2-garner-のアルゴリズム

  • x,y\in\text{整数}\mathbb{Z}
  • M_x,M_y,\overline{M_x}\in\text{自然数}\mathbb{N}
  • \gcd(M_x,M_y)=1
  • \overline{M_x}M_x\equiv 1\quad(\operatorname{mod}M_y)
  • t=M_x(\overline{M_x}(x-y)\operatorname{mod}M_y)
  • z=\begin{cases}x-t&(x \ge t)\\x-t+M_xM_y&(x \lt t)\\\end{cases}

とすると

  • 0\le t\le M_x(M_y-1)
  • x-M_x(M_y-1)\le x-t\le x+M_xM_y
  • z\equiv x\quad(\operatorname{mod}M_x)\quad\because\quad t\equiv 0\quad(\operatorname{mod}M_x)
  • z\equiv y\quad(\operatorname{mod}M_y)\quad\because\quad t\equiv x-y\quad(\operatorname{mod}M_y)
  • 特に、 0\le x\lt M_xM_y であれば 0\le z\lt M_xM_y

Montgomery Reduction

(\operatorname{mod}2^{64} のような、コンピュータが計算しやすい除算に変形させた剰余計算)

上の Garner のアルゴリズムからさらに、

  • \overline{M_y}\in\text{自然数}\mathbb{N}
  • \overline{M_y}M_y\equiv 1\quad(\operatorname{mod}M_x)
  • M_x\lt M_y
  • a,b\in\text{整数}\mathbb{Z}
  • a'=M_ya\operatorname{mod}M_x=\overline{M_y}((M_y)^2\operatorname{mod}M_x)a\operatorname{mod}M_x
  • b'=M_yb\operatorname{mod}M_x=\overline{M_y}((M_y)^2\operatorname{mod}M_x)b\operatorname{mod}M_x
  • 0\le x=a'b'\le (M_x-1)^2
  • y=0

とすると

  • z\operatorname{mod}M_y=0
  • 0\le z/M_y\lt M_x
  • z/M_y=\overline{M_y}z\operatorname{mod}M_x=\overline{M_y}x\operatorname{mod}M_x
  • z/M_y=\overline{M_y}a'b'\operatorname{mod}M_x=M_yab\operatorname{mod}M_x

この結果を使うと、 M_y=2^{64} のような、コンピュータにとって除算を計算しやすい数に定めることで

  • M_ya\times M_yb\times \overline{M_y} \equiv z/M_y\ (\operatorname{mod}M_x)

の形の計算を行うことができる。同じ \operatorname{mod} での剰余計算の回数が多く、かつ M_x が奇数に限られる場合(偶数の場合の考察は https://www.mathenachia.blog/even-mod-montgomery-impl/ 参照)は Barrett Reduction よりこちらのやり方で剰余計算した方が効率が良い場合がある。

この式の前提を改めてまとめると以下の通り。

  • \gcd(M_x,M_y)=1
  • M_x\lt M_y
  • \overline{M_x}M_x\equiv 1\quad(\operatorname{mod}M_y)
  • \overline{M_y}M_y\equiv 1\quad(\operatorname{mod}M_x)
  • a'=M_ya\operatorname{mod}M_x=\overline{M_y}((M_y)^2\operatorname{mod}M_x)a\operatorname{mod}M_x
  • b'=M_yb\operatorname{mod}M_x=\overline{M_y}((M_y)^2\operatorname{mod}M_x)b\operatorname{mod}M_x
  • 0\le x=a'b'\le (M_x-1)^2
  • y=0
  • t=M_x(\overline{M_x}(x-y)\operatorname{mod}M_y)
  • z=\begin{cases}x-t&(x \ge t)\\x-t+M_xM_y&(x \lt t)\\\end{cases}

https://ja.wikipedia.org/wiki/モンゴメリ乗算

https://www.mathenachia.blog/even-mod-montgomery-impl/

https://zenn.dev/mizar/articles/791698ea860581

(C++, モンゴメリリダクション実装例)

https://yukicoder.me/submissions/793551

(Rust, モンゴメリリダクション実装例)

https://yukicoder.me/submissions/793560

Mizar/みざーMizar/みざー

Barrett Reduction

  • AtCoder Library では汎用的な剰余算のライブラリとして使われている
  • 同じ除数(AtCoder Library の場合は31bit整数までを想定)で剰余算を多く行う場合に有用、 \bar{m}' = \left\lfloor\frac{2^{64}-1}{m}\right\rfloor+1\mod2^{64}=\left\lceil\frac{2^{64}}{m}\right\rceil\operatorname{mod}2^{64} の値は剰余算を繰り返して行う前に事前計算しておく事を想定
  • 31bit整数 × 31bit整数 \operatorname{mod} 31bit整数 を計算するために途中で 128bit整数型 が使われている
    • ab-qm の 符号なし64bit整数同士 ( abqm ) の減算部分の繰り下がりチェックを厳格化して m を加算するべきか判定するようにすれば 符号なし32bit整数 × 符号なし32bit整数 \operatorname{mod} 符号なし32bit整数 に対応可能
  • \operatorname{mod} 偶数でも計算量は変わらない
  • \lfloor x\rfloor は床関数(floor)、 \lceil x\rceil は天井関数(ceil)

https://ja.wikipedia.org/wiki/床関数と天井関数

https://en.wikipedia.org/wiki/Barrett_reduction

(アルゴリズム)

m\in\text{自然数(natural number)}\mathbb{N},\quad 1\le m\lt 2^{32}\\ \lbrace a,b\rbrace\in\text{整数(integer)}\mathbb{Z},\quad 0\le \lbrace a, b\rbrace\lt m\\ \bar{m}' = \left\lfloor\frac{2^{64}-1}{m}\right\rfloor+1\mod2^{64}=\left\lceil\frac{2^{64}}{m}\right\rceil\operatorname{mod}2^{64}\\ x=\left\lfloor\frac{ab\hspace{.1em}\bar{m}'}{2^{64}}\right\rfloor\\ ab\operatorname{mod}m=\begin{cases}ab-xm&(ab\ge xm)\\ ab-xm+m&(ab\lt xm)\end{cases}\\ (証明)
  1. when m=1, a=b=\bar{m}'=0, so okey
  2. when 2\le m\lt 2^{32},
    • 2^{32}+2=\left\lceil\frac{2^{64}}{2^{32}-1}\right\rceil\le\bar{m}'=\left\lceil\frac{2^{64}}{m}\right\rceil\le \left\lceil\frac{2^{64}}{2}\right\rceil=2^{63}
    • \bar{m}'\hspace{.1em}m=2^{64}+r\quad(\lbrace r\rbrace\in\text{整数(integer)}\mathbb{Z},\quad 0\le r\lt m)
    • z = ab = cm + d\quad(\lbrace c,d\rbrace\in\text{整数(integer)}\mathbb{Z},\quad 0\le\lbrace c,d\rbrace\lt m)
    • z\hspace{.1em}\bar{m}'=ab\hspace{.1em}\bar{m}'=(cm+d)\hspace{.1em}\bar{m}'=c(\bar{m}'\hspace{.1em}m)+d\hspace{.1em}\bar{m}'=2^{64}c+c\hspace{.1em}r+d\hspace{.1em}\bar{m}'
    • 2^{64}c\le ab\hspace{.1em}\bar{m}'\lt 2^{64}(c+2)
      • ab\hspace{.1em}\bar{m}'=2^{64}c+c\hspace{.1em}r+d\hspace{.1em}\bar{m}'
      • 0\le c\hspace{.1em}r\le (m-1)^2\le(2^{32}-2)^2=2^{64}-2^{34}+4\lt 2^{64}
      • 0\le d\hspace{.1em}\bar{m}'\le\bar{m}'\hspace{.1em}(m-1)=2^{64}+r-\bar{m}'\le 2^{64}+(2^{32}-2)-(2^{32}+2)=2^{64}-4\lt 2^{64}
    • x=\left\lfloor\frac{ab\hspace{.1em}\bar{m}'}{2^{64}}\right\rfloor=\lbrace c or (c+1)\rbrace
    • ab-xm=ab-\left\lfloor\frac{ab\hspace{.1em}\bar{m}'}{2^{64}}\right\rfloor m=\lbrace d or (d-m)\rbrace

(元コード)

https://github.com/atcoder/ac-library/blob/6c88a70c8f95fef575af354900d107fbd0db1a12/atcoder/internal_math.hpp#L22-L62

https://github.com/rust-lang-ja/ac-library-rs/blob/b473a615a7d7cfe4d8b26e369808cb2aa2d2f5a0/src/internal_math.rs#L19-L84

(C++: 1\le m\lt 2^{32} 対応修正案)

https://godbolt.org/z/P8749355T

#ifdef _MSC_VER
#include <intrin.h>
#endif

// @param a `0 <= a < m`
// @param b `0 <= b < m`
// @return `a * b % m`
unsigned int barrett_mul_before(unsigned int a, unsigned int b, unsigned int _m, unsigned long long im) {
    // [1] m = 1
    // a = b = im = 0, so okay

    // [2] m >= 2
    // im = ceil(2^64 / m)
    // -> im * m = 2^64 + r (0 <= r < m)
    // let z = a*b = c*m + d (0 <= c, d < m)
    // a*b * im = (c*m + d) * im = c*(im*m) + d*im = c*2^64 + c*r + d*im
    // c*r + d*im < m * m + m * im < m * m + 2^64 + m <= 2^64 + m * (m + 1) < 2^64 * 2
    // ((ab * im) >> 64) == c or c + 1
    unsigned long long z = a;
    z *= b;
#ifdef _MSC_VER
    unsigned long long x;
    _umul128(z, im, &x);
#else
    unsigned long long x =
        (unsigned long long)(((unsigned __int128)(z)*im) >> 64);
#endif
    unsigned int v = (unsigned int)(z - x * _m);
    if (_m <= v) v += _m;
    return v;
}

// @param a `0 <= a < m`
// @param b `0 <= b < m`
// @return `a * b % m`
unsigned int barrett_mul_after(unsigned int a, unsigned int b, unsigned int _m, unsigned long long im) {
    // [1] m = 1
    // a = b = im = 0, so okay

    // [2] m >= 2
    // im = ceil(2^64 / m)
    // -> im * m = 2^64 + r (0 <= r < m)
    // let z = a*b = c*m + d (0 <= c, d < m)
    // a*b * im = (c*m + d) * im = c*(im*m) + d*im = c*2^64 + c*r + d*im
    // c*r + d*im < m * m + m * im < m * m + 2^64 + m <= 2^64 + m * (m + 1) < 2^64 * 2
    // ((ab * im) >> 64) == c or c + 1
    unsigned long long z = a;
    z *= b;
#ifdef _MSC_VER
    unsigned long long x;
    _umul128(z, im, &x);
#else
    unsigned long long x =
        (unsigned long long)(((unsigned __int128)(z)*im) >> 64);
#endif
    unsigned long long y = x * _m;
#ifdef __GNUC__
    // https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
    unsigned long long v;
    unsigned int w = __builtin_usubll_overflow(z, y, &v) ? _m : 0;
    return (unsigned int)(v + w);
#elif defined(_MSC_VER) && defined(_M_AMD64)
    // https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_subborrow_u64&ig_expand=7252
    unsigned long long v;
    unsigned int w = _subborrow_u64(0, z, y, &v) ? _m : 0;
    return (unsigned int)(v + w);
#else
    return (unsigned int)((z - y) + (z < y ? _m : 0));
#endif
}

(Rust: 1\le m\lt 2^{32} 対応修正案)

https://rust.godbolt.org/z/7P5rjahMn

/// Calculates `a * b % m`.
///
/// * `a` `0 <= a < m`
/// * `b` `0 <= b < m`
/// * `m` `1 <= m <= 2^31`
/// * `im` = ceil(2^64 / `m`)
#[allow(clippy::many_single_char_names)]
pub fn mul_mod_before(a: u32, b: u32, m: u32, im: u64) -> u32 {
    // [1] m = 1
    // a = b = im = 0, so okay

    // [2] m >= 2
    // im = ceil(2^64 / m)
    // -> im * m = 2^64 + r (0 <= r < m)
    // let z = a*b = c*m + d (0 <= c, d < m)
    // a*b * im = (c*m + d) * im = c*(im*m) + d*im = c*2^64 + c*r + d*im
    // c*r + d*im < m * m + m * im < m * m + 2^64 + m <= 2^64 + m * (m + 1) < 2^64 * 2
    // ((ab * im) >> 64) == c or c + 1
    let mut z = a as u64;
    z *= b as u64;
    let x = (((z as u128) * (im as u128)) >> 64) as u64;
    let mut v = z.wrapping_sub(x.wrapping_mul(m as u64)) as u32;
    if m <= v {
        v = v.wrapping_add(m);
    }
    v
}

/// Calculates `a * b % m`.
///
/// * `a` `0 <= a < m`
/// * `b` `0 <= b < m`
/// * `m` `1 <= m < 2^32`
/// * `im` = ceil(2^64 / `m`) = floor((2^64 - 1) / `m`) + 1
#[allow(clippy::many_single_char_names)]
pub fn mul_mod_after(a: u32, b: u32, m: u32, im: u64) -> u32 {
    // [1] m = 1
    // a = b = im = 0, so okay

    // [2] m >= 2
    // im = ceil(2^64 / m) = floor((2^64 - 1) / m) + 1
    // -> im * m = 2^64 + r (0 <= r < m)
    // let z = a*b = c*m + d (0 <= c, d < m)
    // a*b * im = (c*m + d) * im = c*(im*m) + d*im = c*2^64 + c*r + d*im
    // c*r + d*im < m * m + m * im < m * m + 2^64 + m <= 2^64 + m * (m + 1) < 2^64 * 2
    // ((ab * im) >> 64) == c or c + 1
    let z = (a as u64) * (b as u64);
    let x = (((z as u128) * (im as u128)) >> 64) as u64;
    match z.overflowing_sub(x.wrapping_mul(m as u64)) {
        (v, true) => (v as u32).wrapping_add(m),
        (v, false) => v as u32,
    }
}

https://github.com/atcoder/ac-library/issues/149

https://github.com/rust-lang-ja/ac-library-rs/issues/111

Mizar/みざーMizar/みざー

関数 function

fn barrett_reduction(z: u64, m: u64, im: u64) -> (u64, u64)

出力 output

return (\lfloor z/m\rfloor, z\mod m);

入力の制約 input constraints

  • \left\{z\in\mathbb{Z}\;\middle|\; 0 \le z \lt\max((im-1)\times m,1)\right\}
  • \left\{m\in\mathbb{Z}\;\middle|\; 2 \le m \lt 2^{32.5}\right\}
  • \left\{im\in\mathbb{Z}\;\middle|\; im\equiv\left\lceil 2^{64}/m\right\rceil\mod 2^{64}\equiv\left\lfloor(2^{64}-1)/m\right\rfloor +1\mod 2^{64}\right\}

アルゴリズム algorithm

  • v=\lfloor(z\times im)/2^{64}\rfloor
  • w=v\times m
  • \lfloor z/m\rfloor=\begin{cases}v&(z\ge w)\\v-1&(z\lt w)\end{cases}
  • z\mod m=\begin{cases}z-w&(z\ge w)\\z-w+m&(z\lt w)\end{cases}

実装例 implementation example

pub fn barrett_reduction(z: u64, m: u64, im: u64) -> (u64, u64) {
    // return `(floor(z / m), z \mod m)`
    // * `z` : `0 <= z < max((im - 1) * m, 1)`
    // * `m` : `1 <= m < 2^32.5`
    // * `im = ceil(2^64 / m) mod 2^64 = floor((2^64 - 1) / m) + 1 mod 2^64`

    // [1] m = 1
    // im = 0, so may fail (must be z = 0 to return correct result)

    // [2] m >= 2
    // im = ceil(2^64 / m) = floor((2^64 - 1) / m) + 1 >= 2
    // -> im * m = 2^64 + r (0 <= r < m)
    // let z = c*m + d (0 <= c < im, 0 <= d < m)
    // z * im = (c*m + d) * im = c*(im*m) + d*im = c*2^64 + c*r + d*im
    // c*r + d*im = (c+1)*r - (m-d)*im + 2^64 <= im*(m-2) + 2^64 = 2 * (2^64 - im) < 2 * 2^64
    // ((z * im) >> 64) == c or c + 1

    debug_assert_eq!(im, ((!0u64) / m).wrapping_add(1));
    debug_assert!(z < im.saturating_sub(1).saturating_mul(m).max(1));
    let v = (((z as u128) * (im as u128)) >> 64) as u64;
    let (t, f) = z.overflowing_sub(v.wrapping_mul(m));
    (
        v - (f as u64),
        t.wrapping_add((f as u64).wrapping_neg() & m),
    )
}

証明 proof

  1. m=0 \quad\to\quad im: 未定義 (Undefined)

    • ゼロ除算、imが制約を満たすことが出来ない
    • divide by zero, cannot satisfy the im constraint
  2. m=1 \quad\to\quad im=0 \quad\to\quad z=0

    • 制約を 0\le z\lt 2^{64} とすると、 z\ne 0の時、失敗する(z=0でないと正しい結果を返さない)
    • If the constraint is 0\le z<2^{64} then it fails when z\ne 0 (must be z=0 to return correct result)
  3. m\ge 2 \quad\to\quad im=\lceil 2^{64}/m\rceil=\lfloor(2^{64}-1)/m\rfloor+1

    • 0\le z\lt(im-1)\times m=\lfloor(2^{64}-1)/m\rfloor\times m\le 2^{64}-1\lt 2^{64}
    • 2^{31.5}\lt im\le 2^{63}
    • m\times im=2^{64}+r\quad\left\{r\in\mathbb{Z}\;\middle|\; 0\le r\lt m\right\}
      \to\quad r-m\times im+2^{64}=0
    • z=c\times m+d\quad\left\{c\in\mathbb{Z}\;\middle|\; 0\le c\lt im\right\},\left\{d\in\mathbb{Z}\;\middle|\; 0\le d\lt m\right\}
    • \left\{r\in\mathbb{Z}\;\middle|\; 0\le r\lt m\right\}\quad\to\quad r\le m-1
    • \left\{c\in\mathbb{Z}\;\middle|\; 0\le c\lt im\right\}\quad\to\quad c+1\le im
    • \left\{d\in\mathbb{Z}\;\middle|\; 0\le d\lt m\right\}\quad\to\quad d-m\le -1
    • z\times im=(c\times m+d)\times im=c\times (m\times im)+d\times im=2^{64}\times c+c\times r+d\times im
    • c\times r+d\times im=(c\times r+d\times im)+(r-m\times im+2^{64})=(c+1)\times r+(d-m)\times im+2^{64}
      \le im\times (m-1)+(-1)\times im+2^{64}=2\times 2^{64}-2\times im+m-1
      \lt 2\times 2^{64}-2\times 2^{31.5}+2^{32.5}-1= 2\times 2^{64}-1\lt 2\times 2^{64}
    • 2^{64}\times c\le z\times im\lt 2^{64}\times(c+2)
    • v=\lfloor(z\times im)/2^{64}\rfloor=c\text{ or }c+1
    • w=\lfloor(z\times im)/2^{64}\rfloor\times m\le\lfloor z/m\rfloor\times m+m
      \le (im-1)\times m=\lfloor(2^{64}-1)/m\rfloor\times m\le 2^{64}-1\lt 2^{64}
Mizar/みざーMizar/みざー

(下書き中)


a. 丁度の値を求める (d: 除数, x: 被除数, t: パラメータ [ x の範囲に影響 ], r: 補助 [ t,x から求まる ])

\displaystyle d,r,t,x\in\mathbb{Z},\quad 1\le d,\quad 0\le t,\quad 0\leq r=\left(-2^t\right)\operatorname{mod}d\lt d,\quad 0\le rx\lt 2^t\displaystyle\quad\longrightarrow\quad\left\lfloor\left\lceil\frac{2^t}{d}\right\rceil\times\frac{x}{2^t}\right\rfloor=\left\lfloor\frac{2^t+r}{d}\times\frac{x}{2^t}\right\rfloor=\left\lfloor\frac{x}{d}+\frac{rx}{2^t d}\right\rfloor=\left\lfloor\frac{x}{d}\right\rfloor

\displaystyle x\in\mathbb{Z},\quad 0\leq x\lt 2^{67}\lt\frac{2^{93}}{39898175}\quad\longrightarrow\quad \left\lfloor\left\lceil\frac{2^{93}}{998244353}\right\rceil\times\frac{x}{2^{93}}\right\rfloor=\left\lfloor\frac{2^{93}+39898175}{998244353}\times\frac{x}{2^{93}}\right\rfloor=\left\lfloor\frac{x}{998244353}\right\rfloor


b. 丁度か1小さい値を求める (d: 除数, x: 被除数, s,t: パラメータ [ x の範囲に影響 ])

\displaystyle d,s,t,x\in\mathbb{Z},\quad 0\le s,\quad 0\le t,\quad 2^s\le d\le 2^{(s+t)},\quad\begin{cases}0\lt 2^{(s+t)}\operatorname{mod}d\lt d,\quad 0\le x\lt \frac{2^{(s+t)}(d-2^s)}{2^{(s+t)}\operatorname{mod}d}+2^s\displaystyle\quad\longrightarrow\quad\left(\left\lfloor\frac{x}{d}\right\rfloor-1\right)\le\left\lfloor\left\lfloor\frac{x}{2^s}\right\rfloor\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right\rfloor\le\left\lfloor\frac{x}{d}\right\rfloor\\2^{(s+t)}\operatorname{mod}d=0,\quad 0\le \displaystyle\quad\longrightarrow\quad\left\lfloor\left\lfloor\frac{x}{2^s}\right\rfloor\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right\rfloor=\left\lfloor\frac{x}{d}\right\rfloor\\ \end{cases}

\displaystyle d,s,t,x\in\mathbb{Z},\quad 0\le s,\quad 0\le t,\quad 2^s\le d\le 2^{(s+t)},\quad 2^{(s+t)}\not\equiv 0\ (\operatorname{mod}d),\quad 0\le x\lt \frac{2^{(s+t)}(d-2^s)}{2^{(s+t)}\operatorname{mod}d}+2^s

\displaystyle\quad\longrightarrow\quad\left(\frac{2^s\left(2^{(s+t)}-\left(2^{(s+t)}\operatorname{mod}d\right)\right)}{2^{(s+t)}d}-1\right)\lt\left(-\frac{2^{(s+t)}\operatorname{mod}d}{2^{(s+t)}d}x\right)\leq 0 \quad\quad \displaystyle\left(-\frac{2^{(s+t)}\operatorname{mod}d}{2^{(s+t)}d}\text{ を掛ける}\right)

\displaystyle\quad\longrightarrow\quad\left(\frac{x}{d}-1\right)\lt\frac{\left(x-2^s\right)\left(2^{(s+t)}-\left(2^{(s+t)}\operatorname{mod}d\right)\right)}{2^{(s+t)}d}=\left(\frac{x-2^s}{2^s}\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right)\leq\left(\frac{x}{d}-\frac{1}{2^t}\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\right)\lt\frac{x}{d}\quad\quad \displaystyle\left(\left(\frac{x}{d}-\frac{1}{2^t}\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\right)=\frac{2^{(s+t)}x-2^s\left(2^{(s+t)}-\left(2^{(s+t)}\operatorname{mod}d\right)\right)}{2^{(s+t)}d}\text{ を足す},\ \frac{2^{(s+t)}-\left(2^{(s+t)}\operatorname{mod}d\right)}{d}=\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\right)

\displaystyle\quad\longrightarrow\quad \left(\frac{x}{d}-1\right)\lt\left(\frac{x-2^s}{2^s}\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right)\lt\left(\left\lfloor\frac{x}{2^s}\right\rfloor\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right)\le\frac{x}{d}\quad\quad \displaystyle\left(\frac{x}{2^s}-1=\frac{x-2^s}{2^s}\lt\left\lfloor\frac{x}{2^s}\right\rfloor\le\frac{x}{2^s}\right)

\displaystyle\quad\longrightarrow\quad\left(\left\lfloor\frac{x}{d}\right\rfloor-1\right)\le\left\lfloor\left\lfloor\frac{x}{2^s}\right\rfloor\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right\rfloor\le\left\lfloor\frac{x}{d}\right\rfloor\quad\quad \left(\text{床関数(floor)を適用}\right)

\displaystyle x\in\mathbb{Z},\quad 0\leq x\lt 2^{64}\lt\frac{2^{64}\times(998244353-1)}{2^{64}\operatorname{mod}998244353}+1\quad\longrightarrow\quad\left\lfloor\frac{x}{998244353}\right\rfloor-1\leq\left\lfloor\left\lfloor\frac{2^{64}}{998244353}\right\rfloor\times\frac{x}{2^{64}}\right\rfloor\leq\left\lfloor\frac{x}{998244353}\right\rfloor


c. 丁度か1大きい値を求める (d: 除数, x: 被除数, s,t: パラメータ [ x の範囲に影響 ])

\displaystyle d,s,t,x\in\mathbb{Z},\quad 0\le s,\quad 0\le t,\quad 2^s\le d\le 2^{(s+t)},\quad\begin{cases}0\lt\left(-2^{(s+t)}\right)\operatorname{mod}d\lt d,\quad 0\le x\lt \frac{2^{(s+t)}(d-2^s)}{\left(-2^{(s+t)}\right)\operatorname{mod}d}-2^s\displaystyle\quad\longrightarrow\quad\left\lfloor\frac{x}{d}\right\rfloor\le\left\lfloor\left\lceil\frac{x}{2^s}\right\rceil\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right\rfloor\le\left(\left\lfloor\frac{x}{d}\right\rfloor+1\right)\\\left(-2^{(s+t)}\right)\operatorname{mod}d=0,\quad 0\le x\displaystyle\quad\longrightarrow\quad\left\lfloor\frac{x}{d}\right\rfloor\le\left\lfloor\left\lceil\frac{x}{2^s}\right\rceil\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right\rfloor\le\left(\left\lfloor\frac{x}{d}\right\rfloor+1\right)\\\end{cases}

\displaystyle d,s,t,x\in\mathbb{Z},\quad 0\le s,\quad 0\le t,\quad 2^s\le d\le 2^{(s+t)},\quad 2^{(s+t)}\not\equiv 0\ (\operatorname{mod}d),\quad 0\le x\lt \frac{2^{(s+t)}(d-2^s+1)}{\left(-2^{(s+t)}\right)\operatorname{mod}d}-2^s+1

\displaystyle\quad\longrightarrow\quad 0\leq \frac{\left(-2^{(s+t)}\right)\operatorname{mod}d}{2^{(s+t)}d}x\lt \left(1-\frac{(2^s-1)\left(2^{(s+t)}+\left(\left(-2^{(s+t)}\right)\operatorname{mod}d\right)\right)}{2^{(s+t)}d}\right) \quad\quad \displaystyle\left(\frac{\left(-2^{(s+t)}\right)\operatorname{mod}d}{2^{(s+t)}d}\text{ を掛ける}\right)

\displaystyle\quad\longrightarrow\quad\frac{x}{d}\lt\left(\frac{x}{d}+\frac{2^s-1}{2^{(s+t)}}\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\right)\leq \frac{\left(x+2^s\right)\left(2^{(s+t)}+\left(\left(-2^{(s+t)}\right)\operatorname{mod}d\right)\right)}{2^{(s+t)}d}=\left(\frac{x+2^s}{2^s}\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right)\lt\left(\frac{x}{d}+1\right)
\displaystyle\left(\left(\frac{x}{d}+\frac{2^s-1}{2^{(s+t)}}\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\right)=\frac{2^{(s+t)}x+(2^s-1)\left(2^{(s+t)}+\left(\left(-2^{(s+t)}\right)\operatorname{mod}d\right)\right)}{2^{(s+t)}d}\text{ を足す},\ \frac{2^{(s+t)}+\left(\left(-2^{(s+t)}\right)\operatorname{mod}d\right)}{d}=\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\right)

\displaystyle\quad\longrightarrow\quad\frac{x}{d}\leq\left(\left\lceil\frac{x}{2^s}\right\rceil\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right)\leq\left(\frac{x+2^s-1}{2^s}\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right)\lt\left(\frac{x}{d}+1\right)\quad\quad \displaystyle\left(\frac{x}{2^s}\leq\left\lceil\frac{x}{2^s}\right\rceil\leq\frac{x+2^s-1}{2^s}\lt\frac{x}{2^s}+1\right)

\displaystyle\quad\longrightarrow\quad\left\lfloor\frac{x}{d}\right\rfloor\le\left\lfloor\left\lceil\frac{x}{2^s}\right\rceil\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right\rfloor\le\left(\left\lfloor\frac{x}{d}\right\rfloor+1\right)\quad\quad \left(\text{床関数(floor)を適用}\right)

\displaystyle x\in\mathbb{Z},\quad 0\leq x\lt 2^{67}\lt\frac{2^{64}\times(998244353-1)}{(-2^{64})\operatorname{mod}998244353}-1\quad\longrightarrow\quad\left\lfloor\frac{x}{998244353}\right\rfloor\leq\left\lfloor\left\lceil\frac{2^{64}}{998244353}\right\rceil\times\frac{x}{2^{64}}\right\rfloor\leq\left\lfloor\frac{x}{998244353}\right\rfloor+1


Mizar/みざーMizar/みざー

(下書き中)

a. 丁度の値を求める (d: 除数, x: 被除数, W: 整数型のbit長)

W,d,x\in\mathbb{Z},\quad 2\leq W,\quad 1\leq d\lt 2^W,\quad 0\leq x\lt 2^W において 除算の商 \left\lfloor x/d\right\rfloor を計算:

mulh(m, x) \displaystyle=\left\lfloor\frac{mx}{2^W}\right\rfloor\quad\left(m,x\in\mathbb{Z},\quad 0\le m\lt 2^W,\quad 0\le x\lt 2^W\right)

  • d=1 の場合:
    • \displaystyle\left\lfloor\frac{x}{d}\right\rfloor=x
    • 擬似コード: x
  • \log_2d=\lceil\log_2d\rceil (d2 の累乗数) の場合:
    • \displaystyle t=\log_2d,\quad\left\lfloor\frac{x}{d}\right\rfloor=\left\lfloor\frac{x}{2^{\log_2d}}\right\rfloor
    • 前計算: s=\log_2d
    • 擬似コード: x >> s
  • \displaystyle\log_2d\ne\lceil\log_2d\rceil,\quad t=\lfloor\log_2((2^W-1)d)\rfloor,\quad\left(-2^t\right)\operatorname{mod}d\lt 2^{(t-W)} の場合:
    • \displaystyle t=\lfloor\log_2((2^W-1)d)\rfloor,\quad\left\lfloor\frac{x}{d}\right\rfloor=\left\lfloor\left\lceil\frac{2^t}{d}\right\rceil\frac{x}{2^t}\right\rfloor=\left\lfloor\left\lfloor\left\lceil\frac{2^t}{d}\right\rceil\frac{x}{2^W}\right\rfloor\frac{1}{2^{(t-W)}}\right\rfloor
    • 前計算: \displaystyle t=\lfloor\log_2((2^W-1)d)\rfloor,\quad s=t-W,\quad m=\left\lceil\frac{2^t}{d}\right\rceil
    • 擬似コード: mulh(m, x) >> s
  • 上記以外の場合:
    • \displaystyle t=W+\lceil\log_2d\rceil,\quad y=\left\lfloor\left(\left\lceil\frac{2^t}{d}\right\rceil-2^W\right)\frac{x}{2^W}\right\rfloor,
      \displaystyle\left\lfloor\frac{x}{d}\right\rfloor=\left\lfloor\left\lceil\frac{2^t}{d}\right\rceil\frac{x}{2^t}\right\rfloor=\left\lfloor\left(\left\lfloor\frac{x-y}{2}\right\rfloor+y\right)\frac{1}{2^{(t-W-1)}}\right\rfloor
    • 前計算: \displaystyle t=W+\lceil\log_2d\rceil,\quad s=t-W-1,\quad m=\left\lceil\frac{2^t}{d}\right\rceil-2^W
    • 擬似コード: y = mulh(m, x); (((x - y) >> 1) + y) >> s

b. 丁度か 1 小さい値を求める (d: 除数, x: 被除数, s,t: パラメータ [ x の範囲に影響 ])

\displaystyle d,s,t,x\in\mathbb{Z},\quad 0\le s,\quad 0\le t,\quad 2^s\le d\le 2^{(s+t)},

\begin{cases}\displaystyle\left(\left\lfloor\frac{x}{d}\right\rfloor-1\right)\le\left\lfloor\left\lfloor\frac{x}{2^s}\right\rfloor\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right\rfloor\le\left\lfloor\frac{x}{d}\right\rfloor\\ \quad\quad\displaystyle\left(0\lt 2^{(s+t)}\operatorname{mod}d\lt d,\quad 0\le x\lt \frac{2^{(s+t)}(d-2^s+1)}{2^{(s+t)}\operatorname{mod}d}+2^s-1\right)\\\displaystyle\left\lfloor\frac{x}{d}\right\rfloor=\left\lfloor\left\lfloor\frac{x}{2^s}\right\rfloor\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right\rfloor\\ \quad\quad\displaystyle\left(2^{(s+t)}\operatorname{mod}d=0,\quad 0\le x\displaystyle\right)\end{cases}

前計算: \displaystyle m=\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor

擬似コード: ((x >> s) * m) >> t

s=0 の場合の擬似コード: (x * m) >> t


c. 丁度か 1 大きい値を求める (d: 除数, x: 被除数, s,t: パラメータ [ x の範囲に影響 ])

\displaystyle d,s,t,x\in\mathbb{Z},\quad 0\le s,\quad 0\le t,\quad 2^s\le d\le 2^{(s+t)},

\begin{cases}\displaystyle\left\lfloor\frac{x}{d}\right\rfloor\le\left\lfloor\left\lceil\frac{x}{2^s}\right\rceil\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right\rfloor\le\left(\left\lfloor\frac{x}{d}\right\rfloor+1\right)\\ \quad\quad\displaystyle\left(0\lt\left(-2^{(s+t)}\right)\operatorname{mod}d\lt d,\quad 0\le x\lt \frac{2^{(s+t)}(d-2^s+1)}{\left(-2^{(s+t)}\right)\operatorname{mod}d}-2^s+1\right)\\ \displaystyle\left\lfloor\frac{x}{d}\right\rfloor\le\left\lfloor\left\lceil\frac{x}{2^s}\right\rceil\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right\rfloor\le\left(\left\lfloor\frac{x}{d}\right\rfloor+1\right)\\ \quad\quad\displaystyle\left(\left(-2^{(s+t)}\right)\operatorname{mod}d=0,\quad 0\le x\right)\end{cases}

前計算: \displaystyle u=2^s-1,\quad m=\left\lceil\frac{2^{(s+t)}}{d}\right\rceil

擬似コード: (((x + u) >> s) * m) >> t

s=0 の場合の擬似コード: (x * m) >> t

s=0 の場合のみの実用を想定。 (s\gt 0 だと b. の手法に比べて利点が薄いため)


(a * b) >> 64, x / 998244353, x mod 998244353, x / 7, x mod 7 を計算する関数とそのコンパイル結果の例 (C++): https://godbolt.org/z/6fbWGsEhr

#include <cstdint>
#ifdef _MSC_VER
#include <intrin.h>
#endif
// u64mulh_shim(a, b) == floor(a * b / 2**64)
uint64_t u64mulh_shim(uint64_t a, uint64_t b) {
#ifdef _MSC_VER // msvc https://learn.microsoft.com/ja-jp/cpp/intrinsics/umulh?view=msvc-170
    return __umulh(a, b);
#else // gcc/clang 
    return (unsigned __int128)a * (unsigned __int128)b >> 64;
#endif
}
// div998244353_shim(x) == floor(x / 998244353)
uint64_t div998244353_shim(uint64_t x) {
    // 0 <= x < (2**93 / ((-2**93) mod 998244353)) --> floor(ceil(2**93 / 998244353) * x / 2**93) == floor(x / 998244353)
    // ceil(2**93 / 998244353) == floor((2**93 - 1) / 998244353) + 1 == 9920937979283557439
    // floor(x / 998244353) == floor((x * 9920937979283557439) / 2**93) == floor(floor((x * 9920937979283557439) / 2**64) / 2**29)
    return u64mulh_shim(x, 9920937979283557439ULL) >> 29;
}
uint64_t div998244353(uint64_t x) {
    return x / 998244353ULL;
}
// rem998244353_shim(x) == (x mod 998244353)
uint64_t rem998244353_shim(uint64_t x) {
    return x - div998244353_shim(x) * 998244353ULL;
}
uint64_t rem998244353(uint64_t x) {
    return x % 998244353ULL;
}
// div7_shim(x) == floor(x / 7)
uint64_t div7_shim(uint64_t x) {
    // 0 <= x < (2**67 / ((-2**67) mod 7)) --> floor(ceil(2**67 / 7) * x / 2**67) == floor(x / 7)
    // y := floor((ceil(2**67 / 7) - 2**64) * x / 2**64)
    // floor(x / 7) == floor((x + y) / 2**3) == floor((floor((x - y) / 2) + y) / 2**2)
    // ceil(2**67 / 7) - 2**64 == 2635249153387078803
    uint64_t y = u64mulh_shim(x, 2635249153387078803ULL);
    return (((x - y) >> 1) + y) >> 2;
}
uint64_t div7(uint64_t x) {
    return x / 7;
}
// rem7_shim(x) == (x mod 7)
uint64_t rem7_shim(uint64_t x) {
    return x - div7_shim(x) * 7ULL;
}
uint64_t rem7(uint64_t x) {
    return x % 7ULL;
}
# x86-64 gcc
u64mulh_shim(unsigned long, unsigned long):
    mov     rax, rdi
    mul     rsi
    mov     rax, rdx
    ret
div998244353_shim(unsigned long):
    movabs  rax, -8525806094425994177
    mul     rdi
    mov     rax, rdx
    shr     rax, 29
    ret
div998244353(unsigned long):
    movabs  rax, -8525806094425994177
    mul     rdi
    mov     rax, rdx
    shr     rax, 29
    ret
rem998244353_shim(unsigned long):
    movabs  rax, -8525806094425994177
    mul     rdi
    mov     rax, rdi
    shr     rdx, 29
    imul    rdx, rdx, 998244353
    sub     rax, rdx
    ret
rem998244353(unsigned long):
    movabs  rax, -8525806094425994177
    mul     rdi
    mov     rax, rdx
    shr     rax, 29
    imul    rdx, rax, 998244353
    mov     rax, rdi
    sub     rax, rdx
    ret
div7_shim(unsigned long):
    movabs  rax, 2635249153387078803
    mul     rdi
    sub     rdi, rdx
    shr     rdi
    lea     rax, [rdi+rdx]
    shr     rax, 2
    ret
div7(unsigned long):
    movabs  rax, 2635249153387078803
    mul     rdi
    sub     rdi, rdx
    shr     rdi
    lea     rax, [rdx+rdi]
    shr     rax, 2
    ret
rem7_shim(unsigned long):
    movabs  rax, 2635249153387078803
    mov     rcx, rdi
    mul     rdi
    mov     rax, rdi
    sub     rcx, rdx
    shr     rcx
    add     rdx, rcx
    shr     rdx, 2
    lea     rcx, [0+rdx*8]
    sub     rcx, rdx
    sub     rax, rcx
    ret
rem7(unsigned long):
    movabs  rax, 2635249153387078803
    mul     rdi
    mov     rax, rdi
    sub     rax, rdx
    shr     rax
    add     rax, rdx
    shr     rax, 2
    lea     rdx, [0+rax*8]
    sub     rdx, rax
    mov     rax, rdi
    sub     rax, rdx
    ret
# ARM64 gcc
u64mulh_shim(unsigned long, unsigned long):
    umulh   x0, x0, x1
    ret
div998244353_shim(unsigned long):
    mov     x1, 52287
    movk    x1, 0x5de0, lsl 16
    movk    x1, 0x4087, lsl 32
    movk    x1, 0x89ae, lsl 48
    umulh   x0, x0, x1
    lsr     x0, x0, 29
    ret
div998244353(unsigned long):
    mov     x1, 52287
    movk    x1, 0x5de0, lsl 16
    movk    x1, 0x4087, lsl 32
    movk    x1, 0x89ae, lsl 48
    umulh   x0, x0, x1
    lsr     x0, x0, 29
    ret
rem998244353_shim(unsigned long):
    mov     x2, 52287
    movk    x2, 0x5de0, lsl 16
    movk    x2, 0x4087, lsl 32
    movk    x2, 0x89ae, lsl 48
    umulh   x2, x0, x2
    lsr     x2, x2, 29
    lsl     x1, x2, 4
    sub     x1, x1, x2
    lsl     x1, x1, 3
    sub     x1, x1, x2
    add     x1, x2, x1, lsl 23
    sub     x0, x0, x1
    ret
rem998244353(unsigned long):
    mov     x2, 52287
    movk    x2, 0x5de0, lsl 16
    movk    x2, 0x4087, lsl 32
    movk    x2, 0x89ae, lsl 48
    umulh   x2, x0, x2
    lsr     x2, x2, 29
    lsl     x1, x2, 4
    sub     x1, x1, x2
    lsl     x1, x1, 3
    sub     x1, x1, x2
    add     x1, x2, x1, lsl 23
    sub     x0, x0, x1
    ret
div7_shim(unsigned long):
    mov     x1, 9363
    movk    x1, 0x9249, lsl 16
    movk    x1, 0x4924, lsl 32
    movk    x1, 0x2492, lsl 48
    umulh   x1, x0, x1
    sub     x0, x0, x1
    add     x0, x1, x0, lsr 1
    lsr     x0, x0, 2
    ret
div7(unsigned long):
    mov     x1, 9363
    movk    x1, 0x9249, lsl 16
    movk    x1, 0x4924, lsl 32
    movk    x1, 0x2492, lsl 48
    umulh   x1, x0, x1
    sub     x0, x0, x1
    add     x0, x1, x0, lsr 1
    lsr     x0, x0, 2
    ret
rem7_shim(unsigned long):
    mov     x2, 9363
    movk    x2, 0x9249, lsl 16
    movk    x2, 0x4924, lsl 32
    movk    x2, 0x2492, lsl 48
    umulh   x2, x0, x2
    sub     x1, x0, x2
    add     x1, x2, x1, lsr 1
    lsr     x1, x1, 2
    lsl     x2, x1, 3
    sub     x1, x2, x1
    sub     x0, x0, x1
    ret
rem7(unsigned long):
    mov     x2, 9363
    movk    x2, 0x9249, lsl 16
    movk    x2, 0x4924, lsl 32
    movk    x2, 0x2492, lsl 48
    umulh   x2, x0, x2
    sub     x1, x0, x2
    add     x1, x2, x1, lsr 1
    lsr     x1, x1, 2
    lsl     x2, x1, 3
    sub     x1, x2, x1
    sub     x0, x0, x1
    ret
Mizar/みざーMizar/みざー

(下書き中)

除算の商を求める3つのアプローチ

この節では、除算の商 \lfloor x/d\rfloor を求めるアプローチを3つ考えてみます。

  • a. 丁度の値を求める
    • \lfloor x/d\rfloor=f_d(x)
  • b. 丁度か 1 小さい値を求める
    • \lfloor x/d\rfloor-1\leq f_d(x)\leq\lfloor x/d\rfloor
  • c. 丁度か 1 大きい値を求める
    • \lfloor x/d\rfloor\leq f_d(x)\leq\lfloor x/d\rfloor+1

コンパイラによる最適化で使われるパターンは主に 「a. 丁度の値を求める」 ですが、

後者の2つのアプローチは、求めた除算の商の近似値から、仮に除算の剰余を計算してみて、商が大きすぎたり小さすぎたりしたと分かれば商の近似値から1だけ調整して、正しい除算の商とするものです。後ほど、 「b. 丁度か 1 小さい値を求める」 パターンで 被除数 0\leq x\lt 2^{128} に対して 除数 d=10^{32} での除算最適化を行う例を紹介します。

また、以下のような略記を所々で使うので定義しておきます。

  • mulh(m, x) \displaystyle=\left\lfloor\frac{mx}{2^W}\right\rfloor\quad\left(m,x\in\mathbb{Z},\quad 0\le m\lt 2^W,\quad 0\le x\lt 2^W\right)
    • W ビット整数どうしの乗算の積の、上位 W ビットを取り出す関数。(主に W=64
  • \displaystyle\left(-2^t\right)\operatorname{mod}d=(d-1)-\left(\left(2^t-1\right)\operatorname{mod}d\right)\quad\left(d,t\in\mathbb{Z},\quad 1\leq d\lt 2^W,\quad 0\leq t\right)
    • 負の被除数に対する剰余計算。ここでは主に、「((2^t\text{ 以上で }d\text{ の倍数となる最小の整数}) - 2^t) はいくつ?」という文脈で使います。
    • 逆に、「(2^t-(2^t\text{ 以下で }d\text{ の倍数となる最大の整数})) はいくつ?」という文脈では、 2^t\operatorname{mod}d のように表記します。

a. 丁度の値を求める (d: 除数, x: 被除数, W: 整数型のbit長)

\displaystyle d,r,t,x\in\mathbb{Z},\quad 1\leq d,\quad 0\leq t,\quad 0\leq r=\left(-2^t\right)\operatorname{mod}d\lt d,\quad 0\le rx\lt 2^t

\displaystyle\quad\longrightarrow\quad\left\lfloor\left\lceil\frac{2^t}{d}\right\rceil\times\frac{x}{2^t}\right\rfloor=\left\lfloor\frac{2^t+r}{d}\times\frac{x}{2^t}\right\rfloor=\left\lfloor\frac{x}{d}+\frac{rx}{2^t d}\right\rfloor=\left\lfloor\frac{x}{d}\right\rfloor

2^t 以上で最小の d の倍数を (2^t+r) として、仮に 0\leq rx\lt 2^t だったとすると、 \displaystyle 0\leq\frac{rx}{2^t}\lt 1,\quad 0\leq\frac{rx}{2^td}\lt\frac{1}{d} が導かれることから、 \displaystyle\left\lfloor\left\lceil\frac{2^t}{d}\right\rceil\times\frac{x}{2^t}\right\rfloor=\left\lfloor\frac{x}{d}\right\rfloor となります。

2^W 未満の整数を扱う場合の例

W,d,x\in\mathbb{Z},\quad 2\leq W,\quad 1\leq d\lt 2^W,\quad 0\leq x\lt 2^W において 除算の商 \left\lfloor x/d\right\rfloor を計算:

  • d=1 の場合:
    • \displaystyle\left\lfloor\frac{x}{d}\right\rfloor=x
    • 擬似コード: x
  • \log_2d=\lceil\log_2d\rceil (d2 の累乗数) の場合:
    • \displaystyle t=\log_2d,\quad\left\lfloor\frac{x}{d}\right\rfloor=\left\lfloor\frac{x}{2^{\log_2d}}\right\rfloor
    • 前計算: s=\log_2d
    • 擬似コード: x >> s
  • \displaystyle 2\leq d\lt 2^W,\quad t=\lfloor\log_2(d-1)\rfloor+W,\quad\left((-2^t)\operatorname{mod}d\right)x\lt 2^t の場合:
    • 0\leq x\lt 2^{W-1}\quad\longrightarrow\quad\left((-2^t)\operatorname{mod}d\right)x\lt 2^t
    • 0\leq x\lt 2^W,\quad (-2^t)\operatorname{mod}d\lt 2^{(t-W)}\quad\longrightarrow\quad \left((-2^t)\operatorname{mod}d\right)x\lt 2^t
    • \displaystyle 2^{(t-W)}\leq(d-1)\lt d\leq 2^{(t-W+1)},\quad 2^{(W-1)}\leq\lceil 2^t/d\rceil\lt 2^W,\quad 0\leq t-W=\lfloor\log_2(d-1)\rfloor\lt W,
    • \displaystyle \left\lfloor\frac{x}{d}\right\rfloor=\left\lfloor\left\lceil\frac{2^t}{d}\right\rceil\frac{x}{2^t}\right\rfloor=\left\lfloor\left\lfloor\left\lceil\frac{2^t}{d}\right\rceil\frac{x}{2^W}\right\rfloor\frac{1}{2^{(t-W)}}\right\rfloor
    • 前計算: \displaystyle s=\lfloor\log_2(d-1)\rfloor,\quad t=s+W,\quad m=\left\lceil 2^t/d\right\rceil
    • 擬似コード: mulh(m, x) >> s
  • \left(2\leq d\lt 2^W,\quad 0\leq x\lt 2^W\right) の範囲にて、主に上記以外の場合:
    • \displaystyle t=\lfloor\log_2(d-1)\rfloor+W+1,\quad
    • \displaystyle 2^{(t-W-1)}\leq(d-1)\lt d\leq 2^{(t-W)},\quad 2^W\leq\lceil 2^t/d\rceil\lt 2^{(W+1)},\quad
    • \displaystyle y=\left\lfloor\left\lceil\frac{2^t}{d}\right\rceil\frac{x}{2^W}\right\rfloor-x=\left\lfloor\left(\left\lceil\frac{2^t}{d}\right\rceil-2^W\right)\frac{x}{2^W}\right\rfloor,\quad 0\leq y\leq x\lt 2^W,\quad
    • \displaystyle\left\lfloor\frac{x}{d}\right\rfloor=\left\lfloor\left\lceil\frac{2^t}{d}\right\rceil\frac{x}{2^t}\right\rfloor=\left\lfloor\frac{x+y}{2^{(t-W)}}\right\rfloor=\left\lfloor\left(\left\lfloor\frac{x-y}{2}\right\rfloor+y\right)\frac{1}{2^{(t-W-1)}}\right\rfloor
    • 前計算: \displaystyle s=\lfloor\log_2(d-1)\rfloor,\quad t=s+W+1,\quad m=\left\lceil 2^t/d\right\rceil-2^W
    • 擬似コード: y = mulh(m, x); (((x - y) >> 1) + y) >> s

\displaystyle x\in\mathbb{Z},\quad 0\leq x\lt 2^{67}\lt\frac{2^{93}}{39898175}\quad\longrightarrow\quad \left\lfloor\left\lceil\frac{2^{93}}{998244353}\right\rceil\times\frac{x}{2^{93}}\right\rfloor=\left\lfloor\frac{2^{93}+39898175}{998244353}\times\frac{x}{2^{93}}\right\rfloor=\left\lfloor\frac{x}{998244353}\right\rfloor

\displaystyle x\in\mathbb{Z},\quad 0\le x\lt 2^{70}\lt\frac{2^{90}}{875776}\quad\longrightarrow\quad\left\lfloor\left\lceil\frac{2^{90}}{10^8}\right\rceil\times\frac{x}{2^{90}}\right\rfloor=\left\lfloor\frac{2^{90}+875776}{10^8}\times\frac{x}{2^{90}}\right\rfloor=\left\lfloor\frac{x}{10^8}+\frac{875776 x}{2^{90}\times 10^8}\right\rfloor=\left\lfloor\frac{x}{10^8}\right\rfloor

\displaystyle x\in\mathbb{Z},\quad 0\le x\lt 10^{10}\lt 2^{34}\lt\frac{2^{45}}{1168}\quad\longrightarrow\quad\left\lfloor\left\lceil\frac{2^{45}}{10^4}\right\rceil\times\frac{x}{2^{45}}\right\rfloor=\left\lfloor\frac{2^{45}+1168}{10^4}\times\frac{x}{2^{45}}\right\rfloor=\left\lfloor\frac{x}{10^4}+\frac{1168 x}{2^{45}\times 10^4}\right\rfloor=\left\lfloor\frac{x}{10^4}\right\rfloor

\displaystyle x\in\mathbb{Z},\quad 0\le x\lt 43690\lt\frac{2^{19}}{12}\quad\longrightarrow\quad\left\lfloor\left\lceil\frac{2^{19}}{10^2}\right\rceil\times\frac{x}{2^{19}}\right\rfloor=\left\lfloor\frac{2^{19}+12}{10^2}\times\frac{x}{2^{19}}\right\rfloor=\left\lfloor\frac{x}{10^2}+\frac{12 x}{2^{19}\times 10^2}\right\rfloor=\left\lfloor\frac{x}{10^2}\right\rfloor

\displaystyle x\in\mathbb{Z},\quad 0\le x\lt 1024=\frac{2^{11}}{2}\quad\longrightarrow\quad \left\lfloor\left\lceil\frac{2^{11}}{10}\right\rceil\times\frac{x}{2^{11}}\right\rfloor=\left\lfloor\frac{2^{11}+2}{10}\times\frac{x}{2^{11}}\right\rfloor=\left\lfloor\frac{x}{10}+\frac{2 x}{2^{11}\times 10}\right\rfloor=\left\lfloor\frac{x}{10}\right\rfloor

\displaystyle x\in\mathbb{Z},\quad 0\le x\lt 2^{64}\lt\frac{2^{67}}{5}\quad\longrightarrow\quad \left\lfloor\left\lceil\frac{2^{67}}{7}\right\rceil\times\frac{x}{2^{67}}\right\rfloor=\left\lfloor\frac{2^{67}+5}{7}\times\frac{x}{2^{67}}\right\rfloor=\left\lfloor\frac{x}{7}+\frac{5 x}{2^{67}\times 7}\right\rfloor=\left\lfloor\frac{x}{7}\right\rfloor \quad\displaystyle\left(2^{64}\lt\frac{2^{67}+5}{7}\lt 2^{65}\text{ である事に実装上注意}\right)

b. 丁度か 1 小さい値を求める (d: 除数, x: 被除数, s,t: パラメータ [ x の範囲に影響 ])

\displaystyle d,s,t,x\in\mathbb{Z},\quad 0\le s,\quad 0\le t,\quad 2^s\le d\le 2^{(s+t)},

\begin{cases}\displaystyle\left(\left\lfloor\frac{x}{d}\right\rfloor-1\right)\le\left\lfloor\left\lfloor\frac{x}{2^s}\right\rfloor\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right\rfloor\le\left\lfloor\frac{x}{d}\right\rfloor\\ \quad\quad\displaystyle\left(0\lt 2^{(s+t)}\operatorname{mod}d\lt d,\quad 0\le x\lt \frac{2^{(s+t)}(d-2^s+1)}{2^{(s+t)}\operatorname{mod}d}+2^s-1\right)\\\displaystyle\left\lfloor\frac{x}{d}\right\rfloor=\left\lfloor\left\lfloor\frac{x}{2^s}\right\rfloor\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right\rfloor\\ \quad\quad\displaystyle\left(2^{(s+t)}\operatorname{mod}d=0,\quad 0\le x\displaystyle\right)\end{cases}

  • 前計算: \displaystyle m=\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor
  • 擬似コード: ((x >> s) * m) >> t
  • s=0 の場合の擬似コード: (m * x) >> t
  • s=0,t=W の場合の擬似コード: mulh(m, x)d=1 の場合、 m=\lfloor 2^W/d\rfloor=2^W となる事に注意

前半の導出

\displaystyle d,s,t,x\in\mathbb{Z},\quad 0\le s,\quad 0\le t,\quad 2^s\le d\le 2^{(s+t)},\quad 0\lt 2^{(s+t)}\operatorname{mod}d\lt d,\quad 0\le x\lt \frac{2^{(s+t)}(d-2^s+1)}{2^{(s+t)}\operatorname{mod}d}+2^s-1

\displaystyle\quad\longrightarrow\quad\left(\frac{\left(2^s-1\right)\left(2^{(s+t)}-\left(2^{(s+t)}\operatorname{mod}d\right)\right)}{2^{(s+t)}d}-1\right)\lt\left(-\frac{2^{(s+t)}\operatorname{mod}d}{2^{(s+t)}d}x\right)\leq 0 \quad\quad \displaystyle\left(-\frac{2^{(s+t)}\operatorname{mod}d}{2^{(s+t)}d}\text{ を掛ける}\right)

\displaystyle\quad\longrightarrow\quad\left(\frac{x}{d}-1\right)\lt\left(\frac{x-2^s+1}{2^s}\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right)=\frac{\left(x-2^s+1\right)\left(2^{(s+t)}-\left(2^{(s+t)}\operatorname{mod}d\right)\right)}{2^{(s+t)}d}\leq\left(\frac{x}{d}-\frac{2^s-1}{2^{(s+t)}}\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\right)\leq\frac{x}{d}\quad\quad \displaystyle\left(\left(\frac{x}{d}-\frac{2^s-1}{2^{(s+t)}}\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\right)=\frac{2^{(s+t)}x-(2^s-1)\left(2^{(s+t)}-\left(2^{(s+t)}\operatorname{mod}d\right)\right)}{2^{(s+t)}d}\text{ を足す},\ \frac{2^{(s+t)}-\left(2^{(s+t)}\operatorname{mod}d\right)}{d}=\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\right)

\displaystyle\quad\longrightarrow\quad \left(\frac{x}{d}-1\right)\lt\left(\frac{x-2^s+1}{2^s}\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right)\le\left(\left\lfloor\frac{x}{2^s}\right\rfloor\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right)\le\frac{x}{d}\quad\quad \displaystyle\left(\frac{x}{2^s}-1\lt\frac{x-2^s+1}{2^s}\le\left\lfloor\frac{x}{2^s}\right\rfloor\le\frac{x}{2^s}\right)

\displaystyle\quad\longrightarrow\quad\left(\left\lfloor\frac{x}{d}\right\rfloor-1\right)\le\left\lfloor\left\lfloor\frac{x}{2^s}\right\rfloor\left\lfloor\frac{2^{(s+t)}}{d}\right\rfloor\frac{1}{2^t}\right\rfloor\le\left\lfloor\frac{x}{d}\right\rfloor\quad\quad \left(\text{床関数(floor)を適用}\right)

\displaystyle x\in\mathbb{Z},\quad 0\leq x\lt 2^{64}\lt\frac{2^{64}\times 998244353}{2^{64}\operatorname{mod}998244353}\quad\longrightarrow\quad\left\lfloor\frac{x}{998244353}\right\rfloor-1\leq\left\lfloor\left\lfloor\frac{2^{64}}{998244353}\right\rfloor\times\frac{x}{2^{64}}\right\rfloor=\left\lfloor\frac{2^{64}-932051910}{998244353}\times\frac{x}{2^{64}}\right\rfloor\leq\left\lfloor\frac{x}{998244353}\right\rfloor

\displaystyle x\in\mathbb{Z},\quad 0\leq x\lt 2^{128}\displaystyle\quad\longrightarrow\quad\left(\left\lfloor\frac{x}{10^{32}}\right\rfloor-1\right)\leq\left\lfloor\left\lfloor\frac{x}{2^{64}}\right\rfloor\times\left\lfloor\frac{2^{128}}{10^{32}}\right\rfloor\times\frac{1}{2^{64}}\right\rfloor\leq\left\lfloor\frac{x}{10^{32}}\right\rfloor

\displaystyle x\in\mathbb{Z},\quad 0\leq x\lt 10^{32}\displaystyle\quad\longrightarrow\quad\begin{cases}\displaystyle\left(\left\lfloor\frac{x}{10^{16}}\right\rfloor-1\right)\leq\left\lfloor\left\lfloor\frac{x}{2^{43}}\right\rfloor\times\left\lfloor\frac{2^{107}}{10^{16}}\right\rfloor\times\frac{1}{2^{64}}\right\rfloor\leq\left\lfloor\frac{x}{10^{16}}\right\rfloor\\\displaystyle\left(\left\lfloor\frac{x}{10^{16}}\right\rfloor-1\right)\leq\left\lfloor\left\lfloor\frac{x}{2^{48}}\right\rfloor\times\left\lfloor\frac{2^{112}}{10^{16}}\right\rfloor\times\frac{1}{2^{64}}\right\rfloor\leq\left\lfloor\frac{x}{10^{16}}\right\rfloor\\\displaystyle\left(\left\lfloor\frac{x}{10^{16}}\right\rfloor-1\right)\leq\left\lfloor\left\lfloor\frac{x}{2^{52}}\right\rfloor\times\left\lfloor\frac{2^{116}}{10^{16}}\right\rfloor\times\frac{1}{2^{64}}\right\rfloor\leq\left\lfloor\frac{x}{10^{16}}\right\rfloor\end{cases}

c. 丁度か 1 大きい値を求める (d: 除数, x: 被除数, s,t: パラメータ [ x の範囲に影響 ])

\displaystyle d,s,t,x\in\mathbb{Z},\quad 0\le s,\quad 0\le t,\quad 2^s\le d\le 2^{(s+t)},

\begin{cases}\displaystyle\left\lfloor\frac{x}{d}\right\rfloor\le\left\lfloor\left\lceil\frac{x}{2^s}\right\rceil\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right\rfloor\le\left(\left\lfloor\frac{x}{d}\right\rfloor+1\right)\\ \quad\quad\displaystyle\left(0\lt\left(-2^{(s+t)}\right)\operatorname{mod}d\lt d,\quad 0\le x\lt \frac{2^{(s+t)}(d-2^s+1)}{\left(-2^{(s+t)}\right)\operatorname{mod}d}-2^s+1\right)\\ \displaystyle\left\lfloor\frac{x}{d}\right\rfloor\le\left\lfloor\left\lceil\frac{x}{2^s}\right\rceil\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right\rfloor\le\left(\left\lfloor\frac{x}{d}\right\rfloor+1\right)\\ \quad\quad\displaystyle\left(\left(-2^{(s+t)}\right)\operatorname{mod}d=0,\quad 0\le x\right)\end{cases}

  • 前計算: \displaystyle u=2^s-1,\quad m=\left\lceil\frac{2^{(s+t)}}{d}\right\rceil
  • 擬似コード: (((x + u) >> s) * m) >> t
  • s=0 の場合の擬似コード: (x * m) >> t
  • s=0, t=W の場合の擬似コード: mulh(m, x)d=1 の場合、 m=\lceil 2^W/d\rceil=2^W となる事に注意

s=0 の場合のみの実用を想定。 (s\gt 0 だと b. の手法に比べて利点が薄いため)

s=0, t=W=64 の場合の利用例: ac-library の barrett reduction 実装における、除算の商の近似値を求める部分

1\leq d\lt 2^{64}:\text{ 除数},\quad 0\leq x\lt 2^{64}:\text{ 被除数} \displaystyle\quad\longrightarrow\quad\left\lfloor\frac{x}{d}\right\rfloor\leq\left\lfloor\left\lceil\frac{2^{64}}{d}\right\rceil\frac{x}{2^{64}}\right\rfloor\leq\left\lfloor\frac{x}{d}\right\rfloor+1

https://github.com/atcoder/ac-library/blob/master/atcoder/internal_math.hpp#L51-L59

前半の導出

\displaystyle d,s,t,x\in\mathbb{Z},\quad 0\le s,\quad 0\le t,\quad 2^s\le d\le 2^{(s+t)},\quad 0\lt\left(-2^{(s+t)}\right)\operatorname{mod}d\lt d,\quad 0\le x\lt \frac{2^{(s+t)}(d-2^s+1)}{\left(-2^{(s+t)}\right)\operatorname{mod}d}-2^s+1

\displaystyle\quad\longrightarrow\quad 0\leq \frac{\left(-2^{(s+t)}\right)\operatorname{mod}d}{2^{(s+t)}d}x\lt \left(1-\frac{(2^s-1)\left(2^{(s+t)}+\left(\left(-2^{(s+t)}\right)\operatorname{mod}d\right)\right)}{2^{(s+t)}d}\right) \quad\quad \displaystyle\left(\frac{\left(-2^{(s+t)}\right)\operatorname{mod}d}{2^{(s+t)}d}\text{ を掛ける}\right)

\displaystyle\quad\longrightarrow\quad\frac{x}{d}\leq\left(\frac{x}{d}+\frac{2^s-1}{2^{(s+t)}}\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\right)\leq\left(\frac{x+2^s-1}{2^s}\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right)=\frac{\left(x+2^s-1\right)\left(2^{(s+t)}+\left(\left(-2^{(s+t)}\right)\operatorname{mod}d\right)\right)}{2^{(s+t)}d}\lt\left(\frac{x}{d}+1\right)
\displaystyle\left(\left(\frac{x}{d}+\frac{2^s-1}{2^{(s+t)}}\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\right)=\frac{2^{(s+t)}x+(2^s-1)\left(2^{(s+t)}+\left(\left(-2^{(s+t)}\right)\operatorname{mod}d\right)\right)}{2^{(s+t)}d}\text{ を足す},\ \frac{2^{(s+t)}+\left(\left(-2^{(s+t)}\right)\operatorname{mod}d\right)}{d}=\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\right)

\displaystyle\quad\longrightarrow\quad\frac{x}{d}\leq\left(\left\lceil\frac{x}{2^s}\right\rceil\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right)\leq\left(\frac{x+2^s-1}{2^s}\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right)\lt\left(\frac{x}{d}+1\right)\quad\quad \displaystyle\left(\frac{x}{2^s}\leq\left\lceil\frac{x}{2^s}\right\rceil\leq\frac{x+2^s-1}{2^s}\lt\frac{x}{2^s}+1\right)

\displaystyle\quad\longrightarrow\quad\left\lfloor\frac{x}{d}\right\rfloor\le\left\lfloor\left\lceil\frac{x}{2^s}\right\rceil\left\lceil\frac{2^{(s+t)}}{d}\right\rceil\frac{1}{2^t}\right\rfloor\le\left(\left\lfloor\frac{x}{d}\right\rfloor+1\right)\quad\quad \left(\text{床関数(floor)を適用}\right)

\displaystyle x\in\mathbb{Z},\quad 0\leq x\lt 2^{67}\lt\frac{2^{64}\times 998244353}{(-2^{64})\operatorname{mod}998244353}\quad\longrightarrow\quad\left\lfloor\frac{x}{998244353}\right\rfloor\leq\left\lfloor\left\lceil\frac{2^{64}}{998244353}\right\rceil\times\frac{x}{2^{64}}\right\rfloor=\left\lfloor\frac{2^{64}+66192443}{998244353}\times\frac{x}{2^{64}}\right\rfloor\leq\left\lfloor\frac{x}{998244353}\right\rfloor+1