SIAM J. COMPUT.

1981 Society for Industrial and Applied Mathematics

Vol, 10, No. 1, February 1981

MARTIN H. ELLIS AND J. MICHAEL STEELE,

Abstract. An algorithm is given which makes only 0(log n) comparisons, and which will determine the ordering of the uniformly distributed (pseudo random) Weyl sequences given by (ka) mod 1 ,

where a is an unspecified irrational number. This result is shown to be best possible in the sense that no algorithm can perform the same task with fewer than Ol(log n) comparisons.

Key words. sorting, Weyl sequences, information theory lower bound, alpha-sort

1. Introduction. Any algorithm which sorts sets of n real numbers only on the basis of comparisons will always require, in the worst case, at least 1092 (n!) = 0(n 1092 n) comparisons. Similarly, if n reals are chosen at random from any continuous distribution, the expected number of comparisons required for sorting them is also 0(n 1092 n). These familiar facts may make it surprising that there are sequences which share many properties with random sequences, but whose order can always be determined with fewer than 4 1092 n comparisons.

The sequences considered here are the so called Weyl sequences given by XIC = ka mod 1, where a is an irrationafnumber. These sequences share with the independent uniformly distributed random variables the basic property that the number of elements from X,, X2, X,, in (a, b) is asymptotic to n (b a), f or 0 a < b

(For a purely probabilistic proof of this property, see Feller [2, p. 268].) Since the Weyl sequences are "uniformly distributed" in the sense described, Franklin [3] has further examined the pseudorandom virtues of {Xk} by a variety of statistical tests. This inherent randomness, together with their rich and well studied mathematical structure, makes it intriguing to see just how efficiently the Weyl sequences can be ordered.

The principal objective of this paper is to provide an algorithm which determines the order of X,, X2, X. on the basis of fewer than 4 1092 n comparisons. We further show that any algorithm for sorting {(ka) mod 1: 1 :_5 k :_5 n} by comparisons must make at least £1(10g2 n) comparisons, so the algorithm given here is the best possible.

One key motivation for studying the sorting of Weyl sequences is the general question: "How does one use the fact that a sequence is of a certain structure to provide a sorting algorithm which is information theoretically optimalT' This problem was explicitly posed in M. L. Friedman [4] and is implicit in Berlecamp's problem on sum set sorting (see, e.g., Harperi et al. [51]).

A second motivation for studying the sorting of Weyl sequences by comparisons is

provided by recent work of Papadimitriou on efficient search for rationals. Papadimi

triou [6] gives an elegant algorithm which establishes that 0 (log M) queries of the form

"is X :_5 plq ", where p, q 5 M, are sufficient to determine any rational x = alb with

a, b:SM. The present algorithm is quite distinct from Papadimitriou's in method

(relative comparisons vs. absolute comparisons) and in purpose (sort vs. search). Still,

there is a close connection s 1 ince (as the following sections implicitly show) the ordering

Received by the editors December 18, 1978 and in final revised form May 16, 1980. This research was supported in part by the National Science Foundation under grants MCS 7703659 and MCS 7716974.

t Professor Martin H. Ellis of the Department of Mathematics, Northeastern University, Boston, Massachusetts, died on February 15, 1980, after a brief illness.

Department of Statistics, Stanford University, Stanford, California 94305.

0, fl, 0, o denote "order of exactly", "order of at least", "order of at most", and "order of less than", respectively.

88

Jil 11,111111

FAST SORTING OF WEYL SEQUENCES USING COMPARISONS 89

of {(ka): 1 k 25 n} is closely connected to the location of the irrational a in the Farey dissection of the unit interval.

In the next section, we give an algorithm called AlphaSort, which is a very simple procedure which sorts any collection of the form {(ka)mod 1: l:5 k:5 n} with fewer than o(n) comparisons. The third section then uses the structures uncovered by AlphaSort to provide the required information theoretic lower bound f10092 n). The fourth section applies a binary search, speedup of AlphaSort which gives an explicit algorithm which perforras as well as the theoretlcal lower bound can permit. The final section makes a brief speculation about the use of sorting as an appropriate measure of complexity of a pseudo random sequence.

15k:5r. For brevity, we will subsequently write (ka) for the representative of (ka) mod 1 in [0, 1). The key idea for efficient sorting of {(ka): 1 '_ k:!:: n} is that the order structure can be completely determined from the largest and smallest elements of the set. We define L* and R * to be the integers in (1, 2, n} satisfying (L*a) = Iniril;s~k 5~ (ka) and (R *a

rtiaxl,5ks,, ~ka). The AlphaSort algorithm shows how one can compute L* and R*, and how these integers can be used to determine the ordering of (ka), 1 k n.

AlphaSort algorithm. Given Xk = (ka) mod 1, 1 t_s k:5 n, this algorithm returns il, 6 ' . . , i. such that Xi, < Xi:, < ... < Xi~.

Al. [Initialize] Set L x 1, R,( 1, M.( 1.

A2. [Compute L* and R*] While L + R:5 n, set R < L + R if XL+R1 < XL+R; otherwise, set L,( L+ R.

A3. Print ML mod (L + R) if ML mod (L + R) n.

A4. Set M < M + 1. If M < L + R go to A3; otherwise end program.

The fact that AlphaSort correctly performs the task of sorting {(ka): 1 k:5 n} with 0 (n) comparisons, will follow from the next two lemmas. These elementary results will form the theoretical core for the rest of the analysis.

LEMMA 1. Suppose min {(a), (2a), (ja)} = (La) and max {(a), (2a),

(ja)} = Q?a), L, R e {1, 2,. j}. Then

(i) min {(a), (2 a ((L + R 1)a) (La),

(ii) mak {(a), (2a), ((L + R 1)a)} = (Ra),

(iii) either ((R + L)a) < (La) or ((R + L)a) > (Ra).

Proof. (i). If 1 H < R and ((L + H)a) < (La), then ((L + H)a) 0 (La,) + (Ha); hence,

(1)

(La) + (Ha) > 1.

The definitions of L and R and (1) imply

(2)

This contradiction establishes (i).

(Ra)=(Ra)(Ha)+(Ha)

((RH.)a)+(Ha)

~t(La)+(Ha)

> 1.

(ii). If l:5 H < R and ((L + H)a) > (Ra), then the definitions of. L and R imply ((L + H R,)a) = ((L + H)a) (Ra) = (La) + (Ha) (Ra)

< (La).

90

MARTIN H. ELLIS AND J. MICHAEL STEELE

This is a contradiction to the fact that ~(L +H R)a~>~La~.

(iii). Either

~(R + L)a ~ = ~Ra) + ~La ~

> (Ra),

or

so (iii) is also established. n

((R + L)a ~ = ~Ra) + ~La ~ 1

< ~La),

LEMMA 2. For (La) = min {(ka): 1:5 k:5 n} and (Ra) = max {(ka): 1 S k n} we have (La) = (m a)< (M2a) < ... < ~msa) = (Ra~ where Mk = U mod (L +R) and S = L + R 1. Furthermore, L and R are relatively prime, and {mi: l:5 i S} is a permutation of the numbers 1, 2, . . . , S.

Proof. First we will show by induction on n that L and R are relatively prime. If n = 1 then L = R = 1, so L and R are relatively prime. Suppose the assertion is true for n = 1. If the maximum and minimum remain unchanged when n is raised to 1 + 1, they remain relatively prime. If not, then Lemma 1 implies that 1 + 1 = L + R and either L or R (but not both) must be replaced with L +R. Since

i

gcd {L, L + R} = gcd {L + R, R} = gcd {L, R} = 1,

the assertion is established.

Since L and R +L are relatively prime,

Emia): l:5 i:5 S} = ffla): 1 i S S}.

Suppose for some l:5 i S

. (1)

if

(2)

then (1) implies

(mia)>(mi+la).

(mi+la) = ((mi +L)a),

(mi+la)=(mia)+(La)1

< (La);

this is impossible since by Lemma 1 ffi, (La) must be minimal. Since (2) fails to hold,

(3)

(mi, a) = ((mi R)a).

Since R = ms and i < S, Lemma 1 (ii) impI ies (mia) < (Ra). But then,

(mi+la) = (t~ia) (Ra) + 1

> (mia),

contradicting (1). Thus, (mia)<(mi+ia) for 1:5i.~~
The first lemma proves that step (A2) of AlphaSort correctly determines L* and R*. The second lemma shows how these two quantities completely determine the ordering of {(ka): 1:5 k ;9 n}. We actually showed that L* and R* determine the ordering of the larger set {(ka): l:5 k S L* + R * 11; step (A3) of AlphaSort deletes the irrelevant members from the ordering of the larger set. ~~

1

1

FAST SORTING OF WEYL SEQUENCES USING COMPARISONS 91

Since each comparison perfornied byAilplillaSod either L or R, and since

these qu,2..,fitities never exceed n, the total number of comparisons perfoirned is at most

0(n). One can actually show that for any fixed a, as re inci.eases only o(n) comparisons

are required. In fact, one can show that for almost every a (in. the Sense of Lebesque

Ineasure), for every E>0 AlphaSort wiH fiDd L* LI* with

Still, by irrationals like 10"~

is a rapidly sequence, one e2A.~ also o(p) is the

which one can make albout tile of by tilpha

Sort. We do riot need to elaberate orh these points s~"r3(~e the section, wil'kl shov,7 that

0(n) is, far from thee'retically optinnal, and the final section will sharpen AlphsSort t3

attain that optimal rate.

In the second lemma of the preceding section, we saw that the two quantities L* and R* completely determine the ordering of {~ka): 1 5 k :5 n}. We will now show that this suggests that it may be possible to sort

125 k :_5 n} with only 0 (lo,,Y2 rt) comparisons, but no fewer.

the fact that 's hinary tree with m leaves rnast at least

1092 (m), and the fact that there are n! orderings of n real numbers, collectively imply

that at least 0(n 1092 n) comparisons are required to determine their order. This

information theoretic perspective makes it interesting to determine E,,, the total

number of orderings of {(ka): 1 k :_5 n} as a varies through all real values.

Explicitly, we let JAI denote the cardinality of a finite set A and let o, denote any

permutation of {1, 2, n}. For

1

A

1

1

1 1

En = 1{a: for some a E (0, 1)(o(1)a) < (o(2)a) < ... < (cr(n)a)}1,

we have the following fact.

2

PROPOSITION. n =c En ~ n

Proof. For any a, we have (by Lemma 2 above) that there are integers l:5 L* = n,

1 R n 2 which completely determine the ordering of {(ka): 1 k:5 n}. Since there

are only n such pairs L* and R*, the upper bound is established.

To see that E. n, we consider the n irrationals defined by CQ = 1 / k e for k = 1, 2,. . . , n and some very small positive irrational c. For CQ one can see that (ak) < (2ak) < ... < (kak) but ((k + 1W < (ak). The (ak) thus each yield a different ordering, so E,, n as claimed.

. Since the conclusions to be drawn from this proposition depend only on 0 (1092 E)q we have obtained only the simplest bounds. One can actually show that if .0 is the Euler phifunction, we have E. = X k...,, 0 (k) = 3 / ir 2 n 2 + 0(n log n) (for facts on 0 (k) see [11). The proposition immediately establishes the following result.

COROLLARY. At least f10092 n) comparisons are required in order to sort {(ka,): 1 k 5 n}.

The upper bound in the preceding proposition also suggests that, it might be possible to sort {(ka): 1;9 k:5 n} with only 0(log2 n) comparisons. The main objective of the next section will be to show that this is in fact the case.

4. Fast AlphaSort: An 0(log n) algorithm. In Lemma 2, we proved that the ordering of S = Ea), (2a), . . . , (na)} is completely determined by L* and R* where (L*.a)minl:gkl.(ka) and (R*a)=maxlk;,,,(ka). Now we will show how AlphaSort can be improved to compute these values with only 0(1092 n) comparisons.

The improvement over AlphaSort is made by replacing the linear process for computing L and R by a geometric process. The details are somewhat complicated due to the presence of several cases, but the conceptual essence of the matter is brought out

i

P

i

1

1

i

1

i

1

'i

1 1

i

1

92

MARTIN H. ELLIS AND J. MICHAEL STEELE

in the following Lemma 3, which shows essentially that if certain conditions are valid at

times k and 2k they are valid at afl tinies.

DEril'~,'ITXOIII. A value (ja) is called a left extreme (respectively right extrerne) if (ja)

is, Che (respectively of {~ic,,): 1 :_5 i j}.

LT7I,4D,4A 3. Letk be a positive s6~,prsose (R(,j) is a right extroine and eauk

(L 4. i.R ) a 0 :_5 i :~ k, is o IIt ex t7I.3 1~ ((L + 2 kR) < '(L, + kR 0~ea

1, 4. 1 '_5 g' n~ 2k, is a

F~. of~ L(: ', ~. ' ((!, 0:5 i is 21

'be, a in

Le~ f be Che. for ((L is noc 2,

ex t 2 c Yni e, a n ds' S U p rjnjs, e 2 k. Le n a 1 i i n~ ,,r) 1 i e 2 i a z {~ cl ), (2 01 )

(f 1)R)a)} = (Ra), and since ((L + 1),R)a~,) is a left extrerfle, 1 (iii) inifflies

((L +jR,)a) is an, extrerne; by cholee of 'i ill. is no! a left so it is a right extreme,

hence,

((L > (Ra).

((L + iR)a) ((L + (i + 1)R)a,) = 1 (Ra), for all 0 l~ i < k.

Since ((L+ 1R)a) >(La), (2) implies

(3) ((L + (i + i)R)a) ((L + (i + i + 1)R)a) = 1 (Ra), for all 0 i < k.

Expressing ((L + 2kR)a) and ((L + kR)a) as telescoping sums and applying (1),

(2), (3) and the fact that at k + 1, we have

(4) ((L + UR)a ((L +fR)a)

i

1 i

1

1

1

1 1

2k~il

+ Y_ (((L+(!+i+l)R)a)((L+(j+i)R)a))

i0

> (Ra) (2k j)Q (Ra))

= 1 (2k + 1 j)Q (Ra))

~ 1 k(l (Ra))

> (La) k(l (Ra))

k1

=(La)+ Y_ (((L + (i + 1)R)a) ((L + M)a))

i0

= ((L + Ma).

Inequality (4) shows that if there is a 1 E {i: k + l;9 i;g 2k} for which ((L +iR)a) is not a left extreme, then ((L + UR)a) > ((L + kR)a). The lemma follows by contraposi ion.

A similar argument establishes the following result.

LEmMA 4. Let 1 be a positive integer, and suppose that each ((R + iL)a), 0 i 9 1, is a right extreme. If ((R + 21L)a) > ((R + ILW, then each ((L + iR)a), 1 + 15 i 21, is a right extreme.

1 ' Besides serving to prove the validity of the following Fast AlphaSort algorithms, the preceding lemmas should also serve to motivatethe algorithm. As a tool for use within Fast. AlphaSort, we will require a binary search procedure which we . call SEARCHW R, v, z). The parameters L, R, v are integers provided in the course of the Fast AlphaSort Algorithm, and z is either 0 or 1, depending on whether the algorithm

1

1

i

1,

1

FAST SORTING OF WEYL SEQUENCES USING COMPARISONS 93

is lookincy for a new candidate for a left extreme or a right extreme. (The duality

c

between the left and right procedures can be immediately seen, but for clarity we will not strain to unify the two.)

SEARCH(L, R, v, z).

S1. If v = 1 set SEARCH(L, R, v, z).~ 1 and stop.

S2. If v = 2 set SEARCH(L, R, v, z) 3 and stop if z = 0 and L. + 3R :_5 n and ((L + 2R)a) > ((L + 3R)a), or if z = 1 and 3L + R S n and ((2L + R)a) < ((M + R)a ~; otherwise if v = 2 set SEARCH(L, R, v, z),~ 2 and stop.

S3. If v ~t 3 set S < 3 .2' 2 and set T < 2`3.

S4. If z = 0 and L + SR t_5 n and ((L + 21SR)a) > ((L + SR)a), or if z = 1 and SL+R_n and ((2'SL+R)a)<((SL+R)a), set SS+T; otherwise set SST

S5. Set T 12 T If T 1 go back to S4.

S6. If z = 0 and L + SR 5 n and ((L + (S 1)R)a) > ((L + SR)a), or if z = 1 and .SL+R25n and (((S1)L+R)a)<((SL+R)a), set SEARCH(L,R,v,z) x S and stop; otherwise set SEARCH(L, R, v, z) <. S 1 and stop.

The SEARCH subroutine is used in the Fast AlphaSort algorithm, and the role it plays there is described in the proof of Lemmas 6 and 7.

Fast AlphaSort. Given Xk = (ka), 1 t5 k:5 n, this algorithm returns L~ and R* such that (aL*) = minirk;g,, (ka), (aR*) = max,~k.,,, (ka).

FAL Set L * 0, R ~ 1.

FA2, Starting with k=l, increment k until either L+21'R>n or ((L+ 2 k 1R)a) < ((L + 2 k R)a).

FA3. Set L,( L + R SEARCH(L, R, k, 0). If L + R > n go to FA6.

FA4. Starting with 1 = 1, increment 1 until, either R + 2L > n or ((R + 2'_1L)a) > ((R + A)a).

FAS. Set R < R + L SEARCH(L, R, 1, 1). If L + R n, go to FA2.

FA6. Let L* = L and R* = R, then stop.

The main result of this section is the following:

THEOREM. The Fast AlphaSort Algorithm returns L* and R* after at most 0 (log n) comparisons between pairs in {(ia); 1 i 9 n}.

Before proving the theorem we will establish three lemmas.

LEMMA 5. Computing SEARCH(L, R, p, z) requires at most p 1 comparisons between pairs in {(ia): l:5 i n}.

Proof. We simply dissect the possibilities. If p = 1, no comparisons are made. If p 2, the only comparison that may be required is between ((L + 2R)a) and ((L + 3R)a) if z = 0, between ((2L + R)a) and ((M + R)a) if z = 1. If p ~ 3, then T is set to 2' 3 and single comparisons or no comparisons alternate with dividing T by 2, until T < 1. Thus, at most p 2 comparisons are made before T becomes less than one. A single additional comparison may be made in (S6), for a total of at most p 1 comparisons. E]

LEMMA 6. Suppose Fast AlphaSort has just ente * red step (FA2), L = p, R = q and

(qa).is a right extreme. If each ((p + iq)a), 1 = i;g j, is a left extreme but ((p + (1 + 1)q)a)

1 . S not a left extreme, Fast AlphaSort will setL equal to p+q * min (f, (n p)l )after

making at most 1 + 2 min Q292j p)lq) additional comparisons between pairs

j092 ((n In {(ia): 1:5 i :_5 n}.

Proof. Let b = 1092 and let c j092 ((n p)lq) .

b41

If p+2 , q;Sn, Fast AlphaSort will sequentially compare ((p+2' q)a) with

1

1

i .

i

1

1

94

MARTIN H. ELLIS AND J. MICHAEL STEELE

~(p+2'q)a~, It will then compute SEARCH(p,q,b+1,0), which by

Lemma 5 requires at most b additional comparisons, after which it will set L,~ p + qj.

The total number of comparisons made is thus at most 1 + 2b.

lfp+f125n

If n < p + iq, Fast Alpha Sort will sequentially compare ((p + 211q)a) with ((p +

2` q)a), Upon learning that p+2'+1q>n, it will then compute

SEARCH(p, q, c + 1, 0), which by Lemnia 5 requires at rnost c additional comparisons,

after which it will set Lp +J(n p)lq],q~ The total number of comparisons made, is

thus at most 2c.

In any case, the total number of comparisons is at most 1 + 2 min (b, c). G

The following lemmas can be proved analogously to Lemma 6.

LEM11/fA 7. Suppose Fast A lpha Sort has just en tered step (FA4), L = p, R = q, a nd ~pa, ) is a left extre~,~ie. If each ffip + q)a), 1 _:5 i S_ j is a righ t ext.,ewie bu t (((j + 1)p + q) a) is not a right extreme, Fast AlphaSort will set L equal to q + p * min (j,,, (r q)lp) Cfter making at most 1 + 2 min ~~, j092 ((n 1)1p) additional comparisons between pairs in Wa): 1 :_5 i 5 n}.

Proof of Theorem. Lemmas 6 and 7 imply that Fast AlphaSort correctly computes L* and R*. It remains to show that the number of comparisons between pairs in {(,ia): l:5 i_5 n} made by Fast AlphaSort in computing L* and R* is 0(log n).

Assume Fast AlphaSort has just computed L* and R*. Let

1 =qO

denote the values taken by R during the course of Fast AlphaSort, let

O=P1

denote the values taken by L during the course of Fast AlphaSort, and let m = max (2 V, 2 W 1). (Note that m = 2 V = 2 W if step (FA6) was entered from (FAS) and m = 2 W 1 = 2 V + 1 if step (FA6) was entered from (FA3).) For l:5 i 5_ m, let ji pilqil if i is odd and let jl,=,qilpil if i is even. Note

(1)

m

11 ii:s max (L*, R *):5 n.

J1

Lemmas 6 and 7 imply that the number of comparisons between pairs in {(ia): l:5 i 9 n} made by Fast AlphaSort in computing L* and R* is bounded above by m

(1 +,2 1092

which by (1) is bounded by

(2)

m + 2 1092 m + 2 log2 n.

The largest value m can have would occur if the pi afid qi grew as slowly as possible (i.e.,

ji = 1 for 1:5 i:s m), in whid case each pi or qi would be the (i + 1)st number in the

Fibonacci sequence. In this case,

(3)

1092 n

where 0 = (VS + 1)/2. Inequalities (2) and (3) show that the number of comparisons

1

1

1

FAST SORTING OF WEYL SEQUENCES USING COMPARISONS 95

between pairs in {(ia): i:5 n} made by Fast AlphaSort in computing L* and R* is

bounded above by

1 + (2 + (1092 (k) ') 1092 n,

which establishes the theorem. n

5. A brief speculation. The introduction isolated two motivations, for studying the sorting of Weyl sequences, and a third motivation was deferred until now. This comes from the problem of measuring the complexity of a class of sequences and using this measurement to aid one's choice of pseudorandom number generators. The Weyl sequences are not genuine candidates for pseudorandorn numbers, and this is reinforced by the speed with which they are sorted. One would especially like to determine the number of comparisons needed to sort sequences generated by the widely used classes of PRN generators. This analysis has many practical and conceptual complications, but the Weyl sequences can be considered a preliminary case in this wider program.

REFERENCES

[1] T. M. APOSTOL, Introduction to Analytic Number Theory, SpringerVerlag, New York, 1976.

[21 W. FELLER, An Introduction to Probability Theory and Its Applications, Vol. 2, Second Ed., John Wiley, New York, 1971.

[31 J. N. FRANKLIN, Deterministic simulation of random processes, Math. Comp., 17 (1963), pp. 2859.

[4] M. L. FRIEDMAN, How good is the information theory bound in sorting, Theoret. Coniput. Sci., 1 (1976), pp.355361.

[5] L. H. HARPER, T. H. PAYNE, J. E. SAVAGE AND E. STRAUSS, Sorting X+ Y, Comm. AM, 18 (1975), pp. 347349.

[6] C. H. PAPADIMITRI0u, Efficient Search for Rationals, Technical Report 0 1 7 8, Center for Research in Computing Technology, Harvard University, 1978.