Introduction
Pollard's rho algorithm is a known randomized algorithm for efficiently computing prime factorizations and solving discrete logarithm problems. In this article, I explain the operating principle of Pollard's rho algorithm and specifically provide the sufficient conditions for the PRNG (pseudorandom number generator).
In this article, let p \le q be primes, and we consider the factorization of a semiprime N = p q. Implementation techniques are left to other resources.
Pollard's \rho Algorithm
The Birthday Paradox
The Birthday Paradox refers to the counterintuitive fact that "if you randomly gather 23 or more people, there is a 50\% probability that at least two share the same birthday." Generalizing this, we obtain the theorem that "if you choose \Theta(\sqrt{N}) elements randomly from natural numbers up to N, they will contain duplicates with a probability of about 50\%."
Simple Proof
The probability that k < N distinct numbers are generated is approximated as follows using Stirling's approximation:
p_k = \prod_{0 \le i < k} \frac{N-i}{N} = \frac{1}{N^k}\frac{N!}{(N-k)!} \sim \frac{1}{e^k} \left(\frac{N}{N-k}\right)^{N-k+\frac{1}{2}}
Here, we assume k \ll N.
\log p_k = -k - \left(N - k - \frac{1}{2} \right) \log \left(1 + \frac{k}{N}\right) \sim -k + \left(N - k - \frac{1}{2} \right)\frac{k}{N}
Therefore, the minimum k satisfying p_k \ge 1/2 is approximated by the following equation:
2 k^2 - k + 2 N \log 2 = 0 \space \therefore k = \frac{1 + \sqrt{1 + 16 N \log 2}}{4} = \Theta(\sqrt{N})
When N is sufficiently large, k \ll N holds.
Consider generating \Theta(\sqrt{p}) random numbers up to N. Let r_i denote the i-th random number. Noting that p is a divisor of N, the Birthday Paradox implies that we can find a pair satisfying r_i \equiv r_j \pmod p \, (i \neq j) with a probability of about 50\%. When r_i \not \equiv r_j \pmod N, we can discover a prime factor because \gcd(r_i - r_j, N) = p.
If one naively searches for a pair satisfying r_i \equiv r_j \pmod p from \Theta(\sqrt{p}) random numbers, the computational complexity is O(p). This is not much of an improvement over trial division. Alternatively, it can be viewed as being constant-factor worse than checking the GCD of N and numbers chosen randomly from [0, N). In short, using true random numbers is not a good method.
Pseudorandom Number Generator
Consider a special pseudorandom number generator (PRNG) that can generate the next random number from the previous one. Letting this be f, we can write r_{i+1} = f(r_i). Since there are only N possible values, a single loop will eventually appear in the automaton. This is the origin of the name "rho algorithm".
We add the following constraint to the PRNG. Note that this must also hold for any divisor of N.
x \equiv y \pmod N \Rightarrow f(x) \equiv f(y) \pmod N
When x and y are both on the loop, for any natural number k, f^k(x) \equiv f^k(y) \pmod N holds. Since we can shift while maintaining intervals on the loop, only the distance between x and y matters. Once we find one point on the loop, we can factor N in O(\sqrt{p}).
If there are no good pairs on the loop, we must try a different PRNG. From the above, we have obtained the sufficient conditions for a PRNG to be used in Pollard's \rho algorithm:
- The next random number must be determined solely from the previous one.
-
x \equiv y \pmod N \Rightarrow f(x) \equiv f(y) \pmod N must hold.
- The period of the loop must be long.
- It must have good statistical properties as a random number generator.
Conclusion
I have explained the operating principle of Pollard's rho algorithm. In particular, I discussed the sufficient conditions required for a PRNG. Many explanatory articles adopt f(x) = x^2 + c \pmod N as the PRNG, which satisfies these conditions. In general, any polynomial is sufficient.
It is worth noting that the linear congruential generator also satisfies these conditions. This is because a linear congruential generator can jump in logarithmic time, allowing one to obtain a point on the loop by jumping N steps from an arbitrary initial value. This leads to performance improvements. Unfortunately, as far as I have tested, the quadratic congruential generator showed superior performance, so the linear congruential generator does not seem to be a very good choice.
Discussion