As a matter of proper naming, merge
shouldn't be removing duplicates, as its name suggests its being part of mergesort
which ought to preserve them. Union
is a better name for such operation, working with sets represented (here) by increasing lists of unique numbers, which constraint it ought to preserve by removing the duplicates that can only come from both of its arguments.
Back to the problem itself, let's write it symbolically as
S235 = {1} ∪ 2*S235 ∪ 3*S235 ∪ 5*S235
Premature implementation is the mother of all evil! (wait, what?) We won't even yet try to establish how exactly those ∪
s do their job, not even in which order. Or even how many of the terms there are there:
S23 = {1} ∪ 2*S23 ∪ 3*S23
or even
S2 = {1} ∪ 2*S2
Now this looks simple enough. We can even fake-implement the union of A
and B
here simply as, first, taking all the elements of A
, and then -- of B
. And it will work just fine, here, because there's only one element in this ∪
's left input:
{1} ----∪-->--->--S₂--.--->S₂
/ \
\______*2_______/
---<----<---
How does this work? 1
enters the ∪
combiner, exits it first, unconditionally (NB this discovered requirement is important, for if ∪
had to examine both of its arguments right away we'd get ourselves an infinite loop, a black hole in Haskell argot), is split in two by the .
splitter, then the first copy of 1
continues forth to the output point while the second copy of 1
goes back through the *2
multiplier, the resulting 2
enters back the ∪
this time on the right, unopposed by anything on the left (which is at this time already empty), and continues on in the same fashion so 2
goes to the output point, then 4
, then 8
, etc. etc. (mumble mumble the looping control path of the recursive definition has become the data path looping back onto itself mumble mumble).
To put it differently, S₂
contains all elements of {1}
; plus all elements of {1}
that went through the *2
multiplier once; and twice; and three times; and so on and so forth -- all the powers of 2 in increasing order:
S2 = {1} ∪ 2*{1} ∪ 2*2*{1} ;; == {1, 2, 4, 8, 16, 32, ...}
∪ 2*2*2*{1}
∪ 2*2*2*2*{1}
∪ ..........
The two S₂
's in the diagram are the same because whatever we siphon from it at the splitter point does not affect it.
Wasn't this fun?
So how do we go about adding the multiples of 3
to it? One way to do it is
S23 = S2 ∪ 3*S23
{1} ----∪-->--->--S₂--.---S₂----∪-->--->--S₂₃--.--->S₂₃
/ \ / \
\______*2_______/ \______*3________/
---<----<--- ---<----<---
Here 1
from S₂
enters the second ∪
combiner and proceeds to the output point S₂₃
as well as back through the *3
multiplier, turning into 3
. Now the second ∪
has 2,4,8,...
and 3,...
as its inputs; 2
goes through as well as turning into 6
. Next, ∪
has 4,8,16,...
and 3,6,...
; 3
goes through. Next, 4
; etc., etc., and so on and so forth.
Thus all elements of S₂
are part of S₂₃
, but so are also all elements of S₂
that went through the *3
multiplier once, and twice, etc., -- all the powers of 2 and 3 multiplied together, in increasing order:
S23 = S2 ∪ 3*S2 ∪ 3*3*S2 ;; = S2 ∪ 3*( S2 ∪ 3*S2
∪ 3*3*3*S2 ;; ∪ 3*3*S2
∪ 3*3*3*3*S2 ;; ∪ 3*3*3*S2
∪ .......... ;; ∪ ........ ) !!
Why the increasing order? How? Why, that is the responsibility of ∪
! Hello, another discovered requirement. Whatever enters it on either side, it must produce the smaller element before the larger one.
And what is it to do in the event the two are equal? Do we even need to concern ourselves with this question in this here scheme? Can this ever happen, here?
It can't. And so we can implement the ∪
here as a merge
, not as a union
(but remember the first discovered requirement! -- is it still valid? needed? with the addition of new cases). Merge
ought to be more efficient than union
as it doesn't concern itself with the case of equals.
And for the multiples of 5 also? We continue, as
S235 = S23 ∪ 5*S235
{1} ----∪-->--->--S₂--.---S₂----∪-->--->--S₂₃--.---S₂₃----∪-->--->--S₂₃₅--.--->S₂₃₅
/ \ / \ / \
\______*2_______/ \______*3________/ \_______*5________/
---<----<--- ---<----<--- ---<----<---
- Does this describe the code from the book? _______
- Does this describe a code which is about twice faster than the one from the book? _______
- Why is it about twice faster than the code from the book? _______
- Does this answer your question? _______
- Does this help you answer your question? _______
(fill in the blanks).
See also:
And the signal processing block diagram for the book's code is:
1 --->---.
\
cons-stream ->-- S ---.---> S
.---->---->--- *2 --->---. / |
/ \ / |
/ union ->--* .
.-->-- *3 -->--. / /
| \ / /
| union ->--* /
| / /
.-->-- *5 -->--* /
\ /
*--------<--------<--------<--------<---------<---------*
where the duplicates-removing "union" is called merge
in the book, implementing the original formula
S235 = {1} ∪ (2*S235 ∪ (3*S235 ∪ 5*S235))
with a specific ordering, and correspondingly different implementation of ∪.
tee
function with its intricacies). then, switching back to the streams, you'll be able to appreciate how it is done automatically (and / or better) by the streams, and even see the different choices in the possible streams' implementations. – Olgaolguin