Two applications of finite fields to computational number theory

In this post I’d like to discuss how finite fields (and more generally finite rings) can be used to study two fundamental computational problems in number theory: computing square roots modulo a prime and proving primality for Mersenne numbers.  The corresponding algorithms (Cipolla’s algorithm and the Lucas-Lehmer test) are often omitted from undergraduate-level number theory courses because they appear, at first glance, to be relatively sophisticated.  This is a shame because these algorithms are really marvelous — not to mention useful — and I think they’re more accessible than most people realize. I’ll attempt to describe these two algorithms assuming only a standard first-semester course in elementary number theory and familiarity with complex numbers, without assuming any background in abstract algebra.  These algorithms could also be used in a first-semester abstract algebra course to help motivate the practical utility of concepts like rings and fields.

Anecdotal Prelude

In researching this blog post on Wikipedia, I learned a couple of interesting historical tidbits that I’d like to share before getting on with the math.

elucasThe French mathematician Edouard Lucas (of Lucas sequence fame) showed in 1876 that the 39-digit Mersenne number 2^{127}-1 is prime.  This stood for 75 years as the largest known prime.  Lucas also invented (or at least popularized) the Tower of Hanoi puzzle and published the first description of the Dots and Boxes game.  He died in quite an unusual way: apparently a waiter at a banquet dropped what he was carrying and a piece of broken plate cut Lucas on the cheek. Lucas developed a severe skin inflammation and died a few days later at age 47.

Derrick Henry Lehmer was a long-time professor of mathematics at U.C. Berkeley.  In 1950, Lehmer was one DHLehmerof 31 University of California faculty members fired for refusing to sign a loyalty oath during the McCarthy era.  In 1952, the California Supreme Court declared the oath unconstitutional, and Lehmer returned to Berkeley shortly thereafter.  Lehmer also built a number of fascinating mechanical sieve machines designed to factor large numbers.

Rings and fields

Let n \geq 2 be an integer and let {\mathbf Z} / n{\mathbf Z} denote the set \{ 0,1,\ldots, n-1 \} together with the operations of addition and multiplication modulo n.  Such a structure — a set together with two binary operations + and \cdot satisfying various axioms such as the associative, commutative, and distributive laws — is called a (commutative) ring.  (Click here for a more detailed and formal discussion.)  One of the axioms for a ring is that we can subtract any two elements, giving an inverse to the addition operation.  If we can also divide by any nonzero element, giving an inverse to multiplication, then we call the ring a field.  In this post we will be particularly interested in studying finite rings and fields.

The prototypical example of a finite field is {\mathbf F}_p := {\mathbf Z} / p{\mathbf Z} for p a prime number.  We can also construct fields with p^2 elements by analogy with the complex numbers.  Choose a non-square d \in {\mathbf F}_p and define {\mathbf F}_p[\sqrt{d}] to be the set of all formal expressions of the form a+b\sqrt{d} with a,b \in {\mathbf F}_p, together with the addition and multiplication laws (a_1 + b_1 \sqrt{d}) + (a_2 + b_2 \sqrt{d}) = (a_1 + a_2) + (b_1 + b_2)\sqrt{d} and (a_1 + b_1 \sqrt{d}) \cdot (a_2 + b_2 \sqrt{d}) = (a_1 a_2 + d b_1 b_2) + (a_1 b_2 + a_2 b_1)\sqrt{d}.  One can easily check that {\mathbf F}_p[\sqrt{d}] is a field, with the reciprocal of \alpha = a+b \sqrt{d} given by \bar{\alpha} / \| \alpha \|, where \bar{\alpha} = a-b\sqrt{d} and \| \alpha \| = \alpha \bar\alpha = a^2 + d b^2.  We can identify {\mathbf F}_p with the subfield \{a + 0 \cdot \sqrt{d} \} of {\mathbf F}_p[\sqrt{d}].

Here are two basic properties of fields, both of which are usually known as “Lagrange’s Theorem”:

Lagrange #1: A polynomial of degree k \geq 1 with coefficients in a field F can have at most k roots in F.  (This is completely false if we replace “field” by “ring”.)  See here for a proof in the special case F={\mathbf F}_p which generalizes immediately to any field.  We just need the following special case: the only roots of x^2 - b^2 in F are x = \pm b.  This follows from the fact that x^2 - b^2 = (x-b)(x+b)=0 in F implies that either x-b = 0 or x+b=0.

Lagrange #2:  Let F be a finite field with q elements and let F^\times denote the nonzero elements of F.  Then the order k of any \alpha \in F^\times (defined as the smallest positive integer \ell such that \alpha^\ell = 1) divides |F^\times| = q-1.  (This can be viewed as an extension of Fermat’s Little Theorem, the proof is basically the same: multiplication by \alpha permutes the elements of F^\times, and thus \prod_{\beta \in F^*} \alpha \beta = \prod_{\beta \in F^*} \beta.  Hence \alpha^{q-1} = 1.  Writing q-1 = k\ell + r with \ell,r integers such that 0 \leq r < \ell, it follows that \alpha^r=1 and hence r = 0.)

The following property of the fields {\mathbf F}_p[\sqrt{d}] will be crucial in what follows.

Key Lemma: For \alpha \in {\mathbf F}_p[\sqrt{d}], we have \alpha^p = \bar\alpha.

Proof: Since d is a quadratic non-residue modulo p, Euler’s Criterion tells us that d^{(p-1)/2} = -1 in {\mathbf F}_p.  Thus (\sqrt{d})^{p} = \sqrt{d} \cdot (\sqrt{d})^{p-1} = -\sqrt{d} in {\mathbf F}_p[\sqrt{d}].  From the binomial theorem, the fact that p=0 in {\mathbf F}_p[\sqrt{d}], and Fermat’s Little Theorem, it follows that (a+b\sqrt{d})^p = a^p + b^p (\sqrt{d})^p = a - b \sqrt{d} as desired.

Computing modular square roots

We now turn to Cipolla’s algorithm for finding a solution to x^2 \equiv a \pmod{p} when a is a nonzero square (quadratic residue) modulo the odd prime p.  If p \equiv 3 \pmod{4}, one checks easily that the two solutions are x \equiv \pm a^{(p+1)/4} \pmod{p}, which can be efficiently computed via repeated squaring.  However, the case p \equiv 1 \pmod{4} cannot be treated in a similarly elementary and explicit way.

The first step in Cipolla’s algorithm is to find an integer t with 0 \leq t \leq p-1 such that d := t^2 - a is not a square modulo p.  The only known method to do this efficiently is probabilistic: for a random value of t, the number t^2 - a will be a non-square modulo p with probability 1/2.  Thus, if t_1,\ldots,t_n are chosen randomly, the probability that none of the t_i^2 - a works will be 1/2^n.  So in practice, we will always very quickly be able to find a suitable value of t, since for any particular t_i we can efficiently decide whether or not t_i^2 - a is a square mod p (using, for example, the Law of Quadratic Reciprocity for the Jacobi symbol).

Theorem 1: The two solutions in {\mathbf F_p} to x^2 = a (i.e., the two solutions to x^2 \equiv a \pmod{p}) are x = \pm (t + \sqrt{d})^{(p+1)/2}.

Proof: Let x_0 = (t + \sqrt{d})^{(p+1)/2}.  Then x_0^2 = (t+\sqrt{d})^{p+1} = (t+\sqrt{d}) (t+\sqrt{d})^p , which by the Key Lemma equals (t+\sqrt{d}) (t-\sqrt{d}) = t^2 - d = a.  Thus \pm x_0 are square roots of a in {\mathbf F}_p[\sqrt{d}].  Moreover, \pm x_0 \in {\mathbf F}_p, since by assumption the polynomial x^2 - a has two roots in {\mathbf F}_p, and it follows from Lagrange #1 that these are the only roots in the larger field {\mathbf F}_p[\sqrt{d}].

We thus obtain:

Cipolla’s Algorithm: Let a be a non-zero quadratic residue modulo p.  Compute (t + \sqrt{d})^{(p+1)/2} using repeated squaring in the finite field {\mathbf F}_p[\sqrt{d}], where d = t^2 - a is a quadratic non-residue mod p for some integer t.  The result is an element of {\mathbf F}_p whose square is equal to a.

For computer implementation of this algorithm, it is useful to identify an element a+b\sqrt{d} \in {\mathbf F}_p[\sqrt{d}] with ordered pair (a,b), just as a complex number is often identified with a point in the Euclidean plane.  The addition law on ordered pairs is given by the rule (a_1,b_1) + (a_2,b_2) = (a_1+a_2,b_1+b_2) and the multiplication law  is given by (a_1, b_1) \cdot (a_2, b_2) = (a_1 a_2 + d b_1 b_2, a_1 b_2 + a_2 b_1).

We note, for use in the next section, that these addition and multiplication laws make sense on ordered pairs of integers modulo n even if n is not prime, or if d is a square modulo n.  Abusing terminology somewhat, we denote the corresponding ring by ({\mathbf Z} / n{\mathbf Z})[\sqrt{d}].

Primality proving for Mersenne numbers

We now turn to the problem of determining whether a large positive integer n is prime.  Though there are general algorithms for doing this (e.g. the celebrated polynomial-time test due to Agrawal, Kayal, and Saxena), such algorithms are much slower than special-purpose algorithms designed for numbers of a specific form.  To illustrate our point, all of the largest known prime numbers are Mersenne numbers, meaning that they have the form n = 2^m - 1 for some odd positive integer m.  More precisely, the largest number proven to be prime (as of November 2013) is the 17,425,170-digit Mersenne prime 2^{57885161} - 1.  It was discovered by the GIMPS project in January 2013.  The reason that Mersenne numbers can be tested for primality in a particularly efficient way is because of the Lucas-Lehmer test, which we now describe.

We begin by observing that if p is a Mersenne prime, then 2 is a quadratic residue modulo p and 3 is a quadratic non-residue modulo p.  This is a straightforward consequence of the Law of Quadratic Reciprocity, since one shows easily that p \equiv 1 \pmod{3} and p \equiv 7 \pmod{8}.

Theorem 2: Let n be a Mersenne number.  Then n is prime if and only if (2+\sqrt{3})^{(n+1)/2} = -1 in the ring ({\mathbf Z} / n{\mathbf Z}) [\sqrt{3}].

Proof: Suppose first that n = p is prime.  Let e be a square root of 2 in {\mathbf F}_p and let \tau = (1+\sqrt{3})/e in the field {\mathbf F}_p[\sqrt{3}].  Then \tau^2 = 2+\sqrt{3} and (2+\sqrt{3})^{(p+1)/2} = \tau^{p+1} = \tau \cdot \tau^p = \tau \bar{\tau} = -1.  Conversely, suppose (2+\sqrt{3})^{(n+1)/2} = -1 and let q be a prime divisor of n.  Let {\mathbf F} = {\mathbf F}_q if 3 is a square mod q, and let {\mathbf F} = {\mathbf F}_q [\sqrt{3}] otherwise.  Since (2+\sqrt{3})^{2^m} = 1 but (2+\sqrt{3})^{2^{m-1}} \neq 1, it follows that the order of 2+\sqrt{3} in {\mathbf F} is n+1 = 2^m.  By Lagrange #2, n+1 divides |{\mathbf F}^\times|, which is either q-1 or q^2 - 1.  In either case, q > \sqrt{n}.  Since this is true for all primes q dividing n, it follows that n is prime.

Since one can compute (2+\sqrt{3})^{2^{m-1}} in the ring ({\mathbf Z} / n{\mathbf Z}) [\sqrt{3}] via repeated squaring, the theorem gives a practical method for testing Mersenne numbers for primality.  However, there is a reformulation of the theorem which is more elementary and better suited to computer implementation; we describe this now.  (We first note that the theorem remains true, with the same proof, if we replace 2+\sqrt{3} by 2-\sqrt{3}.)

Let f(x) = x^2 - 2, and recursively define a sequence (S_j) of integers modulo n by the rule S_0 = 4 and S_j = f(S_{j-1}) for j \geq 1.  (In other words, S_j = f^{(j)}(4) where f^{(j)} is the j^{\rm th} iterate of f.)

Lucas-Lehmer Test: The Mersenne number n = 2^m - 1 with m odd is prime if and only if S_{m-2} \equiv 0 \pmod{n}.

Proof: We first observe that S_j = (2+\sqrt{3})^{2^j} + (2-\sqrt{3})^{2^j} in ({\mathbf Z} / n{\mathbf Z}) [\sqrt{3}] for all j \geq 0: for j=0 this is clear, and a simple calculation shows that \left( (2+\sqrt{3})^{2^{j-1}} + (2-\sqrt{3})^{2^{j-1}} \right)^2 - 2 = (2+\sqrt{3})^{2^j} + (2-\sqrt{3})^{2^j}.

If n is prime, then S_{m-1} = (2+\sqrt{3})^{(n+1)/2} + (2-\sqrt{3})^{(n+1)/2} = -1 + -1 = -2.  Thus S_{m-2}^2 =0 in {\mathbf Z} / n{\mathbf Z}, which implies that S_{m-2} = 0 since n is prime.

Conversely, suppose S_{m-2} = 0 in ({\mathbf Z} / n{\mathbf Z}) [\sqrt{3}].  Then (2+\sqrt{3})^{(n+1)/4} = -(2-\sqrt{3})^{(n+1)/4}, so (2+\sqrt{3})^{(n+1)/2} = (2-\sqrt{3})^{(n+1)/2}.  On the other hand, S_{m-2} = 0 implies that S_{m-1} = -2, which means that (2+\sqrt{3})^{(n+1)/2} + (2-\sqrt{3})^{(n+1)/2} = -2.  It follows that (2+\sqrt{3})^{(n+1)/2} = -1 in ({\mathbf Z} / n{\mathbf Z}) [\sqrt{3}], so by our theorem n is prime.

Concluding observations:

1. The following is a variant of Cipolla’s algorithm based on Lucas sequences (see Exercise 2.31 in the book “Prime Numbers: A Computational Perspective” by Crandall and Pomerance).  Choose h so that h^2 - 4a is a quadratic non-residue mod p.  Define a Lucas sequence (V_k) by V_0 = 2, V_1 = h, and V_k = hV_{k-1} - aV_{k-2} for k \geq 2.  Then \frac{1}{2} V_{(p+1)/2} is a square root of a modulo p.

2. The proof of the Lucas-Lehmer test presented above is due to Michael Rosen, with simplifications by J.W. Bruce.  Our exposition was influenced by this blog post written by Terry Tao, which the interested reader should definitely consult.

3. Our “Key Lemma” can also be proved using Galois theory: the Galois group of {\mathbf F}_p[\sqrt{d}] over {\mathbf F}_p has order 2 and both \alpha \mapsto \alpha^p and \alpha \mapsto \bar\alpha are non-trivial automorphisms, so they must coincide.

4. The complexity of Cipolla’s Algorithm is O(ln^3 p) bit operations, which is asymptotically better than the worst case of Tonelli-Shanks Algorithm.  There is no known deterministic polynomial-time algorithm for computing square roots modulo a prime.  However, assuming the generalized Riemann hypothesis, it is known that there is a quadratic non-residue less than 2 \ln^2 p, which makes it possible to determine suitable values of t and d in Cipolla’s algorithm in polynomial time.

5. Our statement and proof of the main theorem used in the Lucas-Lehmer test is reminiscent of another classical result attributed to Lucas: if a and n are relatively prime positive integers with a^{n-1} \equiv 1 \pmod{n} and a^{(n-1)/q} \not\equiv 1 \pmod{n} for all prime divisors q of n-1, then n is prime (and a is a primitive root mod n).  Lucas viewed this as the correct “converse” to Fermat’s Little Theorem.

6. The Chebyshev polynomial f(x) = x^2 - 2 which one iterates in the Lucas-Lehmer test is very special from the point of view of algebraic dynamics (for example, one can explicitly write down its preperiodic points and its Julia set is the real segment [-2,2]).  I don’t know if this connection has any significance…

3 thoughts on “Two applications of finite fields to computational number theory

  1. I updated this post based on helpful comments and corrections by Keith Conrad. Keith pointed out that in the proof of Theorem 2, a little more should be said about the compatibility of the calculation of (2+sqrt(3))^((n+1)/2) in (Z/n)[sqrt(3)], as the theorem requires, and its calculation in the field F, which is carried out only in the proof. I will clarify this in my next post (coming soon!). Keith also noted: “In the 4th observation, this use of GRH is for Dirichlet L-functions. If Dirichlet L-functions were proved to be zero free for Re(s) > 1 – epsilon then the least quadratic nonresidue mod p would be O((log p)^(1/epsilon)), which would make the algorithm polynomial-time as long as there were any vertical strip to the left of Re(s) = 1 that is zero-free for all Dirichlet L-functions.”

    Reply
  2. Pingback: Lucas sequences and Chebyshev polynomials | Matt Baker's Math Blog

Leave a comment