Comparing naive and Pollard’s p-1 factoring

For small, randomly chosen composites, naive factoring, i.e. sequential divisibility checking, is many times quicker than Pollard’s p-1 method.

I implemented both algorithms in C++ using GMP to handle the big numbers. I generated numbers with 5-50 digits, three numbers for each digit-class. I filtered the numbers to be composite and odd. I evaluated both algorithms on each list of numbers, as seen in table 1. A couple numbers took so many times longer than the others to factor – hours compared to the usual seconds – that I left them blank.

Comparing Naive (Nx) and Pollard (Px) algorithms via clock time in microseconds, over three lists of numbers.
Digits	N1 (\(\mu\)s)	P1 (\(\mu\)s)	N2 (\(\mu\)s)	P2 (\(\mu\)s)	P3 (\(\mu\)s)	N3 (\(\mu\)s)
5	50	430	27	335	398	30
6	2	290	2	289	372	6
7	2	280	5	312	350	4
8	2	273	2	276	329	2
9	2	277	1	275	348	2
10	3	275	17	305	347	3
11	2	276	2	277	5641	2026
12	1	277	1	283	424	3
13	1	278	1	301	366	3
14	2	275	7	339	1023	75
15	2	276	17	316	297	2
16	22	553	2	276	302	2
17	3	290	1	279	327	2
18	3	277	4	367	327	5
19	8	280	58	1136	367	30
20	2	277	2	283	320	4
21	2	291	11	363	330	5
22	34	411	184	835	3948	417
23	2	287	80	1074	306	2
24	64501813	2107606294	2	291	290	3
25	12	429	264611	1810016	292	2
26	2	278	3	496	286	2
27	2	275	136139284	1152083995	288	2
28	2	347	2419	6599	1539853	141744
29	73	1916	21	1324	342	4
30			9	1299	285	2
31	26	630	11	1411	271	2
32	87235	1090536	9	1265	276	2
33	6	505	45	1614	313	22
34	2	432	8	1305	2127	109
35	2	395	9	1297	310	2
36	3	336	8	1300	327	4
37	1	361	8	1334	369	3
38	8	386	9	1291	343	2
39	8	484	26	1566	419	8
40	9	595	10	1293	316	6
41	2	446	40	1298	358	2
42	3	357	12	1358	320	2
43	8	669	147963	877520	54677	1538
44	8	405	17	1386	384	4
45	2	365	23	1415	304	2
46	2	330	10	1319
47	2	418	12	1350	29	495
48	5	550	738	2472	3	441
49	2	330	13	1322	3	519
50	2	420	18	1324	462	959

In absolute terms, a few numbers stand out as taking way longer than the average. We can see in table 2 these are the ones with larger smaller factors. For example, the first 24-digit number, 776618064461348516680447, takes 5,375,151 times longer to factor with the naive algorithm than the first 25-digit number (4,912,835 times with Pollard’s). When choosing a number to be difficult to factor, the size of the number itself doesn’t matter – only the size of the smallest factor matters.

Naive factoring pretty much directly corresponds to the smallest factor of the number (c.f. table 2), which makes sense, but there is clearly a relatively large constant time factor for Pollard’s. Where does the time go? I profiled Pollard’s running on the first 24-digit number, and it turns out the program spent a whopping 64% of its time in the gcd function performing assignments and modulos.

/*
 * Use the Euclidean algorithm to find the GCD.
 */
mpz_class Factorer::gcd(mpz_class a, mpz_class b) {
    if (b < a) {
        mpz_class temp;
        temp = b;
        b = a;
        a = temp;
    }

    mpz_class r = a, old_r;
    do {
        old_r = r;
        r = b % a;
        b = a;
        a = r;
    } while (r != 0);

    return old_r;
}

As a matter of fact, the naive method’s main speed advantage came from not having to call the gcd function. While we think of the EEA as having linear time complexity, in this case it was a major drag on performance.

All these results make sense when we compare the algorithms. The naive obviously just depends on the size of the smallest factor:

int naive(n) {
    for (i=2; i <= sqrt(n); i++)
        if (n % i == 0)
            return i;
}

Pollard’s is a little more subtle. First let’s describe the algorithm.

Assume we factor \(n\). Choose an integer \(a \in (1, n)\). If \(\gcd(a, n) \neq 1\), we’ve gotten lucky and found a divisor of \(n\).

Otherwise, we iteratively compute \(a^{k!} \mod n\) for increasing values of \(k\):

\[\begin{aligned} a_1 &= a^{1!} = a \mod n \\ a_2 &= a_{1}^2 \mod n \\ &\vdots \\ a_i &= a_{i-1}^i \mod n \end{aligned}\] Notice how we use the previous number in the sequence to easily find the next. For each \(a_k\), we compute \(g = \gcd(a_k - 1, n)\). If \(g \in (1, n)\), \(g\) is a proper factor of \(n\). If \(g = n\), then the algorithm fails (we can retry with a different \(a\)). Otherwise we keep calculating \(a_k\) values.

At first, it’s not very clear what’s happening here. We seem to be cycling through arbitrary numbers until we stumble on a factor. And why take \(\gcd(a_k-1, n)\)?

Theorem 1. If \(p\) is a prime factor of \(n\) and \(p-1 \ | \ B!\) for some integer \(B\), then Pollard’s p-1 algorithm terminates in at most \(B\) steps.

Proof. We assume \(p\) is a factor of \(n\) and \(p-1 \ | \ B!\). When \(a_k = a^{k!} \mod n \iff \gcd(a_k - 1, n) \neq 1\), we want to show \(k \leq B\). Since by assumption \(B! = l(p-1)\),

\[\begin{aligned} &a^{B!} \equiv (a^{p-1})^l \equiv 1^l \mod p &\text{ by Fermat’s little theorem} \\ &\implies a^{B!} - 1 \equiv 0 \mod p \\ &\implies p \ | \ a^{B!} - 1 \\ &\implies \gcd(a^{B!} - 1, n) \neq 1 &\text{ since by assumption p divides n} \end{aligned}\] ◻

Now we can see the efficiency of Pollard’s depends on the factors of \(p-1\). If \(p-1\) has small factors, there will be a small \(B!\) for it it divide. Hence Wikipedia lists Pollard’s as a special-purpose algorithm; a well-chosen prime will be strong against it.

I didn’t implement elliptic curve factorization, but that would be the next obvious step. It would be interesting to see how EC would perform on the numbers I couldn’t factor here, and test EC factorization on even bigger numbers.

Listing the smallest factor found via naive factoring.
Digits	Num1 Smallest Factor	Num2 Smallest Factor	Num3 Smallest Factor
5	103	3	5
6	5	5	17
7	3	19	11
8	3	7	3
9	3	3	3
10	7	71	7
11	3	3	7529
12	3	3	3
13	3	3	3
14	3	17	239
15	3	71	3
16	89	3	5
17	7	3	3
18	7	13	11
19	29	233	17
20	3	3	7
21	5	43	17
22	137	787	1669
23	5	337	3
24	270609551	5	7
25	41	1103779	3
26	3	3	3
27	3	569306393	3
28	5	2251	561109
29	307	13	3
30		3	3
31	17	5	3
32	342179	3	3
33	17	37	89
34	5	3	257
35	3	3	3
36	7	3	13
37	3	3	11
38	29	3	3
39	23	17	29
40	31	3	19
41	3	31	3
42	3	5	5
43	29	139291	5849
44	29	5	3
45	3	13	3
46	3	3
47	3	5	3
48	17	673	3
49	5	5	3
50	5	11	1327

Acknowledgment: I learned about cryptography and factoring from Prof. Daniel Hernandez’s excellent course at KU.