Comparing naive and Pollard’s p-1 factoring
For small, randomly chosen composites, naive factoring, i.e. sequential divisibility checking, is many times quicker than Pollard’s p-1 method.
I implemented both algorithms in C++ using GMP to handle the big numbers. I generated numbers with 5-50 digits, three numbers for each digit-class. I filtered the numbers to be composite and odd. I evaluated both algorithms on each list of numbers, as seen in table 1. A couple numbers took so many times longer than the others to factor – hours compared to the usual seconds – that I left them blank.
Digits | N1 (\(\mu\)s) | P1 (\(\mu\)s) | N2 (\(\mu\)s) | P2 (\(\mu\)s) | P3 (\(\mu\)s) | N3 (\(\mu\)s) |
---|---|---|---|---|---|---|
5 | 50 | 430 | 27 | 335 | 398 | 30 |
6 | 2 | 290 | 2 | 289 | 372 | 6 |
7 | 2 | 280 | 5 | 312 | 350 | 4 |
8 | 2 | 273 | 2 | 276 | 329 | 2 |
9 | 2 | 277 | 1 | 275 | 348 | 2 |
10 | 3 | 275 | 17 | 305 | 347 | 3 |
11 | 2 | 276 | 2 | 277 | 5641 | 2026 |
12 | 1 | 277 | 1 | 283 | 424 | 3 |
13 | 1 | 278 | 1 | 301 | 366 | 3 |
14 | 2 | 275 | 7 | 339 | 1023 | 75 |
15 | 2 | 276 | 17 | 316 | 297 | 2 |
16 | 22 | 553 | 2 | 276 | 302 | 2 |
17 | 3 | 290 | 1 | 279 | 327 | 2 |
18 | 3 | 277 | 4 | 367 | 327 | 5 |
19 | 8 | 280 | 58 | 1136 | 367 | 30 |
20 | 2 | 277 | 2 | 283 | 320 | 4 |
21 | 2 | 291 | 11 | 363 | 330 | 5 |
22 | 34 | 411 | 184 | 835 | 3948 | 417 |
23 | 2 | 287 | 80 | 1074 | 306 | 2 |
24 | 64501813 | 2107606294 | 2 | 291 | 290 | 3 |
25 | 12 | 429 | 264611 | 1810016 | 292 | 2 |
26 | 2 | 278 | 3 | 496 | 286 | 2 |
27 | 2 | 275 | 136139284 | 1152083995 | 288 | 2 |
28 | 2 | 347 | 2419 | 6599 | 1539853 | 141744 |
29 | 73 | 1916 | 21 | 1324 | 342 | 4 |
30 | 9 | 1299 | 285 | 2 | ||
31 | 26 | 630 | 11 | 1411 | 271 | 2 |
32 | 87235 | 1090536 | 9 | 1265 | 276 | 2 |
33 | 6 | 505 | 45 | 1614 | 313 | 22 |
34 | 2 | 432 | 8 | 1305 | 2127 | 109 |
35 | 2 | 395 | 9 | 1297 | 310 | 2 |
36 | 3 | 336 | 8 | 1300 | 327 | 4 |
37 | 1 | 361 | 8 | 1334 | 369 | 3 |
38 | 8 | 386 | 9 | 1291 | 343 | 2 |
39 | 8 | 484 | 26 | 1566 | 419 | 8 |
40 | 9 | 595 | 10 | 1293 | 316 | 6 |
41 | 2 | 446 | 40 | 1298 | 358 | 2 |
42 | 3 | 357 | 12 | 1358 | 320 | 2 |
43 | 8 | 669 | 147963 | 877520 | 54677 | 1538 |
44 | 8 | 405 | 17 | 1386 | 384 | 4 |
45 | 2 | 365 | 23 | 1415 | 304 | 2 |
46 | 2 | 330 | 10 | 1319 | ||
47 | 2 | 418 | 12 | 1350 | 29 | 495 |
48 | 5 | 550 | 738 | 2472 | 3 | 441 |
49 | 2 | 330 | 13 | 1322 | 3 | 519 |
50 | 2 | 420 | 18 | 1324 | 462 | 959 |
In absolute terms, a few numbers stand out as taking way longer than the average. We can see in table 2 these are the ones with larger smaller factors. For example, the first 24-digit number, 776618064461348516680447, takes 5,375,151 times longer to factor with the naive algorithm than the first 25-digit number (4,912,835 times with Pollard’s). When choosing a number to be difficult to factor, the size of the number itself doesn’t matter – only the size of the smallest factor matters.
Naive factoring pretty much directly corresponds to the smallest factor of the number (c.f. table 2), which makes sense, but there is clearly a relatively large constant time factor for Pollard’s. Where does the time go? I profiled Pollard’s running on the first 24-digit number, and it turns out the program spent a whopping 64% of its time in the gcd function performing assignments and modulos.
/*
* Use the Euclidean algorithm to find the GCD.
*/
mpz_class Factorer::gcd(mpz_class a, mpz_class b) {
if (b < a) {
mpz_class temp;
temp = b;
b = a;
a = temp;
}
mpz_class r = a, old_r;
do {
old_r = r;
r = b % a;
b = a;
a = r;
} while (r != 0);
return old_r;
}
As a matter of fact, the naive method’s main speed advantage came from not having to call the gcd function. While we think of the EEA as having linear time complexity, in this case it was a major drag on performance.
All these results make sense when we compare the algorithms. The naive obviously just depends on the size of the smallest factor:
int naive(n) {
for (i=2; i <= sqrt(n); i++)
if (n % i == 0)
return i;
}
Pollard’s is a little more subtle. First let’s describe the algorithm.
Assume we factor \(n\). Choose an integer \(a \in (1, n)\). If \(\gcd(a, n) \neq 1\), we’ve gotten lucky and found a divisor of \(n\).
Otherwise, we iteratively compute \(a^{k!} \mod n\) for increasing values of \(k\):
\[\begin{aligned} a_1 &= a^{1!} = a \mod n \\ a_2 &= a_{1}^2 \mod n \\ &\vdots \\ a_i &= a_{i-1}^i \mod n \end{aligned}\] Notice how we use the previous number in the sequence to easily find the next. For each \(a_k\), we compute \(g = \gcd(a_k - 1, n)\). If \(g \in (1, n)\), \(g\) is a proper factor of \(n\). If \(g = n\), then the algorithm fails (we can retry with a different \(a\)). Otherwise we keep calculating \(a_k\) values.
At first, it’s not very clear what’s happening here. We seem to be cycling through arbitrary numbers until we stumble on a factor. And why take \(\gcd(a_k-1, n)\)?
Theorem 1. If \(p\) is a prime factor of \(n\) and \(p-1 \ | \ B!\) for some integer \(B\), then Pollard’s p-1 algorithm terminates in at most \(B\) steps.
Proof. We assume \(p\) is a factor of \(n\) and \(p-1 \ | \ B!\). When \(a_k = a^{k!} \mod n \iff \gcd(a_k - 1, n) \neq 1\), we want to show \(k \leq B\). Since by assumption \(B! = l(p-1)\),
\[\begin{aligned} &a^{B!} \equiv (a^{p-1})^l \equiv 1^l \mod p &\text{ by Fermat’s little theorem} \\ &\implies a^{B!} - 1 \equiv 0 \mod p \\ &\implies p \ | \ a^{B!} - 1 \\ &\implies \gcd(a^{B!} - 1, n) \neq 1 &\text{ since by assumption p divides n} \end{aligned}\] ◻
Now we can see the efficiency of Pollard’s depends on the factors of \(p-1\). If \(p-1\) has small factors, there will be a small \(B!\) for it it divide. Hence Wikipedia lists Pollard’s as a special-purpose algorithm; a well-chosen prime will be strong against it.
I didn’t implement elliptic curve factorization, but that would be the next obvious step. It would be interesting to see how EC would perform on the numbers I couldn’t factor here, and test EC factorization on even bigger numbers.
Digits | Num1 Smallest Factor | Num2 Smallest Factor | Num3 Smallest Factor |
---|---|---|---|
5 | 103 | 3 | 5 |
6 | 5 | 5 | 17 |
7 | 3 | 19 | 11 |
8 | 3 | 7 | 3 |
9 | 3 | 3 | 3 |
10 | 7 | 71 | 7 |
11 | 3 | 3 | 7529 |
12 | 3 | 3 | 3 |
13 | 3 | 3 | 3 |
14 | 3 | 17 | 239 |
15 | 3 | 71 | 3 |
16 | 89 | 3 | 5 |
17 | 7 | 3 | 3 |
18 | 7 | 13 | 11 |
19 | 29 | 233 | 17 |
20 | 3 | 3 | 7 |
21 | 5 | 43 | 17 |
22 | 137 | 787 | 1669 |
23 | 5 | 337 | 3 |
24 | 270609551 | 5 | 7 |
25 | 41 | 1103779 | 3 |
26 | 3 | 3 | 3 |
27 | 3 | 569306393 | 3 |
28 | 5 | 2251 | 561109 |
29 | 307 | 13 | 3 |
30 | 3 | 3 | |
31 | 17 | 5 | 3 |
32 | 342179 | 3 | 3 |
33 | 17 | 37 | 89 |
34 | 5 | 3 | 257 |
35 | 3 | 3 | 3 |
36 | 7 | 3 | 13 |
37 | 3 | 3 | 11 |
38 | 29 | 3 | 3 |
39 | 23 | 17 | 29 |
40 | 31 | 3 | 19 |
41 | 3 | 31 | 3 |
42 | 3 | 5 | 5 |
43 | 29 | 139291 | 5849 |
44 | 29 | 5 | 3 |
45 | 3 | 13 | 3 |
46 | 3 | 3 | |
47 | 3 | 5 | 3 |
48 | 17 | 673 | 3 |
49 | 5 | 5 | 3 |
50 | 5 | 11 | 1327 |
Acknowledgment: I learned about cryptography and factoring from Prof. Daniel Hernandez’s excellent course at KU.