Removing a number and reinserting all its multiples (by the primes in the set) into a priority queue is wrong (in the sense of the question) - i.e. it produces correct sequence but inefficiently so.
It is inefficient in two ways - first, it overproduces the sequence; second, each PriorityQueue operation incurs extra cost (the operations remove_top
and insert
are not usually both O(1), certainly not in any list- or tree-based PriorityQueue implementation).
The efficient O(n) algorithm maintains pointers back into the sequence itself as it is being produced, to find and append the next number in O(1) time. In pseudocode:
return array h where
h[0]=1; n=0; ps=[2,3,5, ... ]; // base primes
is=[0 for each p in ps]; // indices back into h
xs=[p for each p in ps] // next multiples: xs[k]==ps[k]*h[is[k]]
repeat:
h[++n] := minimum xs
for each ref (i,x,p) in (is,xs,ps):
if( x==h[n] )
{ x := p*h[++i]; } // advance the minimal multiple/pointer
For each minimal multiple it advances its pointer, while at the same time calculating its next multiple value. This too effectively implements a PriorityQueue but with crucial distinctions - it is before the end point, not after; it doesn't create any additional storage except for the sequence itself; and its size is constant (just k numbers, for k base primes) whereas the size of past-the-end PriorityQueue grows as we progress along the sequence (in the case of Hamming sequence, based on set of 3 primes, as n2/3, for n numbers of the sequence).
The classic Hamming sequence in Haskell is essentially the same algorithm:
h = 1 : map (2*) h `union` map (3*) h `union` map (5*) h
union a@(x:xs) b@(y:ys) = case compare x y of LT -> x : union xs b
EQ -> x : union xs ys
GT -> y : union a ys
We can generate the smooth numbers for arbitrary base primes using the foldi
function (see Wikipedia) to fold lists in a tree-like fashion for efficiency, creating a fixed sized tree of comparisons:
smooth base_primes = h where -- strictly increasing base_primes NB!
h = 1 : foldi g [] [map (p*) h | p <- base_primes]
g (x:xs) ys = x : union xs ys
foldi f z [] = z
foldi f z (x:xs) = f x (foldi f z (pairs f xs))
pairs f (x:y:t) = f x y : pairs f t
pairs f t = t
It is also possible to directly calculate a slice of Hamming sequence around its nth member in O(n2/3) time, by direct enumeration of the triples and assessing their values through logarithms, logval(i,j,k) = i*log 2+j*log 3+k*log 5
. This Ideone.com test entry calculates 1 billionth Hamming number in 1.12 0.05 seconds (2016-08-18: main speedup due to usage of Int
instead of the default Integer
where possible, even on 32-bit; additional 20% thanks to the tweak suggested by @GordonBGood, bringing band size complexity down to O(n1/3)).
This is discussed some more in this answer where we also find its full attribution:
slice hi w = (c, sortBy (compare `on` fst) b) where -- hi is a top log2 value
lb5=logBase 2 5 ; lb3=logBase 2 3 -- w<1 (NB!) is (log2 width)
(Sum c, b) = fold -- total count, the band
[ ( Sum (i+1), -- total triples w/this j,k
[ (r,(i,j,k)) | frac < w ] ) -- store it, if inside the band
| k <- [ 0 .. floor ( hi /lb5) ], let p = fromIntegral k*lb5,
j <- [ 0 .. floor ((hi-p)/lb3) ], let q = fromIntegral j*lb3 + p,
let (i,frac) = pr (hi-q) ; r = hi - frac -- r = i + q
] -- (sum . map fst &&& concat . map snd)
pr = properFraction
This can be generalized for k base primes as well, probably running in O(n(k-1)/k) time.
(see this SO entry for an important later development. also, this answer is interesting. and another related answer.)