Building a Binary Tree (not BST) in Haskell Breadth-First
Asked Answered
E

3

7

I recently started using Haskell and it will probably be for a short while. Just being asked to use it to better understand functional programming for a class I am taking at Uni.

Now I have a slight problem I am currently facing with what I am trying to do. I want to build it breadth-first but I think I got my conditions messed up or my conditions are also just wrong.

So essentially if I give it [“A1-Gate”, “North-Region”, “South-Region”, “Convention Center”, “Rectorate”, “Academic Building1”, “Academic Building2”] and [0.0, 0.5, 0.7, 0.3, 0.6, 1.2, 1.4, 1.2], my tree should come out like

Tree Layout I am trying to achieve

But my test run results are haha not what I expected. So an extra sharp expert in Haskell could possibly help me spot what I am doing wrong. Output:

*Main> l1 = ["A1-Gate", "North-Region", "South-Region", "Convention Center", 
             "Rectorate", "Academic Building1", "Academic Building2"]
*Main> l3 = [0.0, 0.5, 0.7, 0.3, 0.6, 1.2, 1.4, 1.2]
*Main> parkingtree = createBinaryParkingTree l1 l3
*Main> parkingtree
Node "North-Region" 0.5 
   (Node "A1-Gate" 0.0 EmptyTree EmptyTree) 
   (Node "Convention Center" 0.3 
     (Node "South-Region" 0.7 EmptyTree EmptyTree) 
     (Node "Academic Building2" 1.4 
       (Node "Academic Building1" 1.2 EmptyTree EmptyTree) 
       (Node "Rectorate" 0.6 EmptyTree EmptyTree)))

A-1 Gate should be the root but it ends up being a child with no children so pretty messed up conditions.

If I could get some guidance it would help. Below is what I've written so far::

data Tree = EmptyTree | Node [Char] Float Tree Tree deriving (Show,Eq,Ord)

insertElement location cost EmptyTree = 
   Node location cost EmptyTree EmptyTree
insertElement newlocation newcost (Node location cost left right) = 
   if (left == EmptyTree && right == EmptyTree)
   then Node location cost (insertElement newlocation newcost EmptyTree) 
                           right
   else if (left == EmptyTree && right /= EmptyTree)
        then Node location cost (insertElement newlocation newcost EmptyTree) 
                                right
        else if (left /= EmptyTree && right == EmptyTree)
             then Node location cost left 
                                (insertElement newlocation newcost EmptyTree)
             else Node newlocation newcost EmptyTree
                                (Node location cost left right)

buildBPT [] = EmptyTree
--buildBPT (xs:[]) = insertElement (fst xs) (snd xs) (buildBPT [])
buildBPT (x:xs) = insertElement (fst x) (snd x) (buildBPT xs)

createBinaryParkingTree a b = buildBPT (zip a b)

Thank you for any guidance that might be provided. Yes I have looked at some of the similar questions I do think my problem is different but if you think a certain post has a clear answer that will help I am willing to go and take a look at it.

Eyespot answered 3/3, 2020 at 22:31 Comment(5)
You say your results are wrong. Please include an example usage, and the incorrect output that it gives.Amor
Breadth-first is a difficult direction to build a tree in, whether in a functional or an imperative language. Functional trees are typically much easier to build bottom-up, creating a parent node to merge the results of the nodes below it. #2839170 looks relevant.Amor
@Amor ah thanks I'll check that question out and oh sure will make an edit to show the output it broughtEyespot
@Amor I can build a tree in bfs order quite easily in Prolog, or any language with mutable structure like Scheme or C. takes O(n) time and O(n) auxiliary space. It's essentially a one-liner in Prolog: bfst(Xs,T):- bfst(Xs,[T|R],R). bfst(Xs,Ns,Z):- Xs=[] -> Z=[], maplist(=(empty),Ns) ; Xs=[X|Xs2], Ns=[node(X,L,R)|Ns2], Z=[L,R|Z2], bfst(Xs2,Ns2,Z2)..Tridentine
@WillNess, it's not too hard to do efficiently in Haskell (see my solution, based on amalloy's general thought), but I haven't yet found a truly beautiful way.Mondragon
T
9

Here's a corecursive solution.

{-#  bft(Xs,T) :- bft( Xs, [T|Q], Q).    % if you don't read Prolog, see (*) 

     bft(     [],    Nodes ,      [])  :-  maplist( =(empty), Nodes).
     bft( [X|Xs], [N|Nodes], [L,R|Q])  :-  N = node(X,L,R), 
        bft( Xs,     Nodes,       Q).
#-}

data Tree a = Empty | Node a (Tree a) (Tree a) deriving Show

bft :: [a] -> Tree a
bft xs = head nodes    -- Breadth First Tree
  where
  nodes = zipWith g (map Just xs ++ repeat Nothing)  -- values and
                                                     --   Empty leaves...
                    (pairs $ tail nodes)             -- branches...
  g (Just x) (lt,rt)  =  Node x lt rt
  g Nothing  _        =  Empty
  pairs ~(a: ~(b:c))  =  (a,b) : pairs c
{-
  nodes!!0 = g (Just (xs!!0)) (nodes!!1, nodes!!2)            .
  nodes!!1 = g (Just (xs!!1)) (nodes!!3, nodes!!4)        .       .
  nodes!!2 = g (Just (xs!!2)) (nodes!!5, nodes!!6)      .   .   .   .
  ................                                    .................
-}

nodes is the breadth-first enumeration of all the subtrees of the result tree. The tree itself is the top subtree, i.e., the first in this list. We create Nodes from each x in the input xs, and when the input is exhausted we create Emptys by using an indefinite number of Nothings instead (the Empty leaves' true length is length xs + 1 but we don't need to care about that).

And we didn't have to count at all.

Testing:

> bft [1..4]
Node 1 (Node 2 (Node 4 Empty Empty) Empty) (Node 3 Empty Empty)

> bft [1..10]
Node 1 
   (Node 2 
      (Node 4 
         (Node 8  Empty Empty) 
         (Node 9  Empty Empty)) 
      (Node 5 
         (Node 10 Empty Empty) 
         Empty)) 
   (Node 3 
      (Node 6 Empty Empty) 
      (Node 7 Empty Empty))

How does it work: the key is g's laziness, that it doesn't force lt's nor rt's value, while the tuple structure is readily served by -- very lazy in its own right -- pairs. So both are just like the not-yet-set variables in that Prolog pseudocode(*), when served as 2nd and 3rd arguments to g. But then, for the next x in xs, the node referred to by this lt becomes the next invocation of g's result.

And then it's rt's turn, etc. And when xs end, and we hit the Nothings, g stops pulling the values from pairs's output altogether. So pairs stops advancing on the nodes too, which is thus never finished though it's defined as an unending stream of Emptys past that point, just to be on the safe side.


(*) Prolog's variables are explicitly set-once: they are allowed to be in a not-yet-assigned state. Haskell's (x:xs) is Prolog's [X | Xs].

The pseudocode: maintain a queue; enqueue "unassigned pointer"; for each x in xs: { set pointer in current head of the queue to Node(x, lt, rt) where lt, rt are unassigned pointers; enqueue lt; enqueue rt; pop queue }; set all pointers remaining in queue to Empty; find resulting tree in the original head of the queue, i.e. the original first "unassigned pointer" (or "empty box" instead of "unassigned pointer" is another option).

This Prolog's "queue" is of course fully persistent: "popping" does not mutate any data structure and doesn't change any outstanding references to the queue's former head -- it just advances the current pointer into the queue. So what's left in the wake of all this queuing, is the bfs-enumeration of the built tree's nodes, with the tree itself its head element -- the tree is its top node, with the two children fully instantiated to the bottom leaves by the time the enumeration is done.


Update: @dfeuer came up with much simplified version of it which is much closer to the Prolog original (that one in the comment at the top of the post), that can be much clearer. Look for more efficient code and discussion and stuff in his post. Using the simple [] instead of dfeuer's use of the more efficient infinite stream type data IS a = a :+ IS a for the sub-trees queue, it becomes

bftree :: [a] -> Tree a
bftree xs = t
    where
    t : q  =  go xs q
    go []       _              =  repeat Empty
    go (x:ys) ~(l : ~(r : q))  =  Node x l r : go ys q

      ---READ-- ----READ----      ---WRITE---

{-
   xs    =  [ x  x2  x3  x4  x5  x6  x7  x8   … ]
   (t:q) =  [ t  l   r   ll  lr  rl  rr  llr  …  Empty Empty  … … ]
-}

For comparison, the opposite operation of breadth-first enumeration of a tree is

bflist :: Tree a -> [a]
bflist t = [x | Node x _ _ <- q]
    where
    q  =  t : go 1 q
    go 0  _                =          []
    go i (Empty      : q)  =          go (i-1) q
    go i (Node _ l r : q)  =  l : r : go (i+1) q

         -----READ------     --WRITE--

How does bftree work: t : q is the list of the tree's sub-trees in breadth-first order. A particular invocation of go (x:ys) uses l and r before they are defined by subsequent invocations of go, either with another x further down the ys, or by go [] which always returns Empty. The result t is the very first in this list, the topmost node of the tree, i.e. the tree itself.

This list of tree nodes is created by the recursive invocations of go at the same speed with which the input list of values xs is consumed, but is consumed as the input to go at twice that speed, because each node has two child nodes.

These extra nodes thus must also be defined, as Empty leaves. We don't care how many are needed and simply create an infinite list of them to fulfill any need, although the actual number of empty leaves will be one more than there were xs.

This is actually the same scheme as used in computer science for decades for array-backed trees where tree nodes are placed in breadth-first order in a linear array. Curiously, in such setting both conversions are a no-op -- only our interpretation of the same data is what's changing, our handling of it, how are we interacting with / using it.

Tridentine answered 6/3, 2020 at 9:49 Comment(1)
Comments are not for extended discussion; this conversation has been moved to chat.Housemother
M
8

Update: the below solution is big-O optimal and (I think) pretty easy to understand, so I'm leaving it here in case anyone's interested. However, Will Ness's solution is much more beautiful and, especially when optimized a bit, can be expected to perform better in practice. It is much more worthy of study!


I'm going to ignore the fake edge labels for now and just focus on the core of what's happening.

A common pattern in algorithm design is that it's sometimes easier to solve a more general problem. So instead of trying to build a tree, I'm going to look at how to build a forest (a list of trees) with a given number of trees. I'll make the node labels polymorphic to avoid having to think about what they look like; you can of course use the same building technique with your original tree type.

data Tree a = Empty | Node a (Tree a) (Tree a)

-- Built a tree from a breadth-first list
bft :: [a] -> Tree a
bft xs = case dff 1 xs of
  [] -> Empty
  [t] -> t
  _ -> error "something went wrong"

-- Build a forest of nonempty trees.
-- The given number indicates the (maximum)
-- number of trees to build.
bff :: Int -> [a] -> [Tree a]
bff _ [] = []
bff n xs = case splitAt n xs of
  (front, rear) -> combine front (bff (2 * n) rear)
  where
    combine :: [a] -> [Tree a] -> [Tree a]
    -- you write this

Here's a full, industrial-strength, maximally lazy implementation. This is the most efficient version I've been able to come up with that's as lazy as possible. A slight variant is less lazy but still works for fully-defined infinite inputs; I haven't tried to test which would be faster in practice.

bft' :: [a] -> Tree a
bft' xs = case bff 1 xs of
  [] -> Empty
  [t] -> t
  _ -> error "whoops"

bff' :: Int -> [a] -> [Tree a]
bff' !_ [] = []
bff' n xs = combine n xs (bff (2 * n) (drop n xs))
  where
    -- The "take" portion of the splitAt in the original
    -- bff is integrated into this version of combine. That
    -- lets us avoid allocating an intermediate list we don't
    -- really need.
    combine :: Int -> [a] -> [Tree a] -> [Tree a]
    combine 0 !_ ~[] = [] -- These two lazy patterns are just documentation
    combine _k [] ~[] = []
    combine k (y : ys) ts = Node y l r : combine (k - 1) ys dropped
      where
        (l, ~(r, dropped)) = case ts of  -- This lazy pattern matters.
          [] -> (Empty, (Empty, []))
          t1 : ts' -> (t1, case ts' of
            [] -> (Empty, [])
            t2 : ts'' -> (t2, ts''))

For the less-lazy variant, replace (!l, ~(!r, dropped)) with (!l, !r, dropped) and adjust the RHS accordingly.

For true industrial strength, forests should be represented using lists strict in their elements:

data SL a = Cons !a (SL a) | Nil

And the pairs in the above (l, ~(r, dropped)) should both be represented using a type like

data LSP a b = LSP !a b

This should avoid some (pretty cheap) run-time checks. More importantly, it makes it easier to see where things are and aren't getting forced.

Mondragon answered 4/3, 2020 at 1:56 Comment(16)
Hmm thanks for the idea will try to implement it. Actually had thought of something along those lines to and using the map function to sort of map the list of trees I create. Do you think that could also be possible and if so any hints on how I would then map the children to correct parents?Eyespot
@JDMykiibs, I've filled in a bit more for you. See if you can finish the job.Mondragon
@JDMukiibs, map is a bit helpful in two cases of the combining function, but it's not really the right place to start thinking about the problem.Mondragon
oh I see well let me try fill in today and see how I could get around it thank you.Eyespot
good old recursive thinking! this could be good enough. I was trying to find the corecursive solution, mimicking the Prolog code.Tridentine
@WillNess, I don't speak Prolog. Do you think you could explain, roughly, what your solution does?Mondragon
P.S. With a little care writing the combining function, this can even be completely lazy.Mondragon
re your last comment, yes. that's why I said "good enough". :) (I have your lazy combine as combine xs ts = zipWith g xs (unfoldr (Just . splitAt 2) $ ts ++ repeat Empty) where g x ~[lt,rt] = Node x lt rt)Tridentine
Prolog's variables are explicitly set-once: they are allowed to be in a not-yet-assigned state. Haskell's (x:xs) is Prolog's [X | Xs]. ---- in pseudocode: maintain a queue; enqueue "unassigned pointer"; for each x in xs: { set pointer in current head of the queue to Node(x, lt, rt) where lt, rt are unassigned pointers; enqueue lt; enqueue rt; pop queue }; set all pointers remaining in queue to Empty; find resulting tree in the original head of the queue, i.e. the original first "unassigned pointer". (or "empty box" instead of "unassigned pointer" is another option).Tridentine
this Prolog's "queue" is of course fully persistent: "popping" does not mutate any data structure and doesn't change any outstanding references to the queue's former head -- it just advances the current pointer into the queue. so what's left in the wake of all this queuing, is the bfs-enumeration of the built tree's nodes, with the tree itself its head element -- the tree is its top node, with the two children fully instantiated to the bottom leaves by the time the enumeration is done.Tridentine
@WillNess, I caved and wrote an "industrial-strength" version for fun. Do you see any way to write something just as efficient more cleanly?Mondragon
I thought splitAt was already lazy enough, I tested take 1 . fst . splitAt 2 $ [1]++undefined and it returned [1]. of course it does have that space leak problem, and I dislike very much the fact that we as programmers are forced to contend with it by doing all that manual code refactoring, instead of the compiler doing it for us. splitAt should not allocate its first result, at all. if I were writing an equivalent code in C, I'd just use a counter to know where to stop, not actually looking for the actual []. but Haskell's datatypes are too married to their representation.Tridentine
So for me, it's not fun. :) I want to write high-level, declarative, semantics-oriented code. the zipWith code in my earlier comment is all I want to have to write in such situations. :)Tridentine
@WillNess, so find a maximally beautiful expression of the whole thing! Maybe one that avoids counting altogether? I've wondered if an intermediate representation as a higher-order nested (non-regular) datatype could do the trick. Unclear if that could be lazy enough.Mondragon
I think it should be possible to code up the queue thing, tying the knots somehow. BTW setting the empty children to Empty is a separate step; we could stop when the supply is exhausted, with the difference-tree result -- which can either be continued, grafted upon with some existing forest, or finished with the Emptys (just like difference-lists). But how? that is still very much in the fog for me. :) (we'll still need to count, to know how many children are ahead in the queue -- like in the Wikipedia's queue example in the Corecursion article)Tridentine
@Mondragon my reading of my own Prolog code was way too imperative, :( that's what was blocking me. the Haskell code in my answer is actually a close translation of it.Tridentine
S
5

The method that you appear to have chosen is to build the tree up backwards: from bottom-to-top, right-to-left; starting from the last element of your list. This makes your buildBPT function look nice, but requires your insertElement to be overly complex. To construct a binary tree in a breadth-first fashion this way would require some difficult pivots at every step past the first three.

Adding 8 nodes to the tree would require the following steps (see how the nodes are inserted from last to first):

   .              4                                                                                                                                                                                          
                6   6                                                                                                                                                                                        
   8           7 8 . .                                                                                                                                                                                       
  . .                                                                                                                                                                                                           
                  3                                                                                                                                                                                          
   7            4   5                                                                                                                                                                                        
  8 .          6 7 8 .                                                                                                                                                                                       

   6              2                                                                                                                                                                                          
  7 8           3   4                                                                                                                                                                                        
               5 6 7 8                                                                                                                                                                                       
   5                                                                                                                                                                                                         
 6   7            1                                                                                                                                                                                      
8 . . .       2       3                                                                                                                                                                                  
            4   5   6   7                                                                                                                                                                                
           8 . . . . . . .

If, instead, you insert the nodes left-to-right, top-to-bottom, you end up with a much simpler solution, requiring no pivoting, but instead some tree structure introspection. See the insertion order; at all times, the existing values remain where they were:

   .              1                                                                                                                                                                                               
                2   3                                                                                                                                                                                             
   1           4 5 . .                                                                                                                                                                                            
  . .                                                                                                                                                                                                             
                  1                                                                                                                                                                                               
   1            2   3                                                                                                                                                                                             
  2 .          4 5 6 .                                                                                                                                                                                            

   1              1                                                                                                                                                                                               
  2 3           2   3                                                                                                                                                                                             
               4 5 6 7                                                                                                                                                                                            
   1                                                                                                                                                                                                              
 2   3            1                                                                                                                                                                                           
4 . . .       2       3                                                                                                                                                                                       
            4   5   6   7                                                                                                                                                                                     
           8 . . . . . . .

The insertion step has an asymptotic time complexity on the order of O(n^2) where n is the number of nodes to insert, as you are inserting the nodes one-by-one, and then iterating the nodes already present in the tree.

As we insert left-to-right, the trick is to check whether the left sub-tree is complete:

  • if it is, and the right sub-tree is not complete, then recurse to the right.
  • if it is, and the right sub-tree is also complete, then recurse to the left (starting a new row).
  • if it is not, then recurse to the left.

Here is my (more generic) solution:

data Tree a = Leaf | Node a (Tree a) (Tree a)
            deriving (Eq, Show)

main = do
    let l1 = ["A1-Gate", "North-Region", "South-Region", "Convention Center", 
              "Rectorate", "Academic Building1", "Academic Building2"]
    let l2 = [0.0, 0.5, 0.7, 0.3, 0.6, 1.2, 1.4, 1.2]
    print $ treeFromList $ zip l1 l2

mkNode :: a -> Tree a
mkNode x = Node x Leaf Leaf

insertValue :: Tree a -> a -> Tree a
insertValue Leaf y = mkNode y
insertValue (Node x left right) y
    | isComplete left && nodeCount left /= nodeCount right = Node x left (insertValue right y)
    | otherwise = Node x (insertValue left y) right
    where nodeCount Leaf = 0
          nodeCount (Node _ left right) = 1 + nodeCount left + nodeCount right
          depth Leaf = 0
          depth (Node _ left right) = 1 + max (depth left) (depth right)
          isComplete n = nodeCount n == 2 ^ (depth n) - 1

treeFromList :: (Show a) => [a] -> Tree a
treeFromList = foldl insertValue Leaf

EDIT: more detailed explanation:

The idea is to remember in what order you insert nodes: left-to-right first, then top-to-bottom. I compressed the different cases in the actual function, but you can expand them into three:

  1. Is the left side complete? If not, then insert to the left side.
  2. Is the right side as complete as the left side, which is complete? If not, then insert to the right side.
  3. Both sides are full, so we start a new level by inserting to the left side.

Because the function fills the nodes up from left-to-right and top-to-bottom, then we always know (it's an invariant) that the left side must fill up before the right side, and that the left side can never be more than one level deeper than the right side (nor can it be shallower than the right side).

By following the growth of the second set of example trees, you can see how the values are inserted following this invariant. This is enough to describe the process recursively, so it extrapolates to a list of any size (the recursion is the magic).

Now, how do we determine whether a tree is 'complete'? Well, it is complete if it is perfectly balanced, or if – visually – its values form a triangle. As we are working with binary trees, then the base of the triangle (when filled) must have a number of values equal to a power of two. More specifically, it must have 2^(depth-1) values. Count for yourself in the examples:

  • depth = 1 -> base = 1: 2^(1-1) = 1
  • depth = 2 -> base = 2: 2^(2-1) = 2
  • depth = 3 -> base = 4: 2^(3-1) = 4
  • depth = 4 -> base = 8: 2^(4-1) = 8

The total number of nodes above the base is one less than the width of the base: 2^(n-1) - 1. The total number of nodes in the complete tree is therefore the number of nodes above the base, plus those of the base, so:

num nodes in complete tree = 2^(depth-1) - 1 + 2^(depth-1)
                           = 2 × 2^(depth-1) - 1
                           = 2^depth - 1

So now we can say that a tree is complete if it has exactly 2^depth - 1 non-empty nodes in it.

Because we go left-to-right, top-to-bottom, when the left side is complete, we move to the right, and when the right side is just as complete as the left side (meaning that it has the same number of nodes, which is means that it is also complete because of the invariant), then we know that the whole tree is complete, and therefore a new row must be added.

I originally had three special cases in there: when both nodes are empty, when the left node is empty (and therefore so was the right) and when the right node is empty (and therefore the left could not be). These three special cases are superseded by the final case with the guards:

  • If both sides are empty, then countNodes left == countNodes right, so therefore we add another row (to the left).
  • If the left side is empty, then both sides are empty (see previous point).
  • If the right side is empty, then the left side must have depth 1 and node count 1, meaning that it is complete, and 1 /= 0, so we add to the right side.
Stockist answered 4/3, 2020 at 18:32 Comment(5)
Oh wow you just literally lit the light bulb in my head. I was really struggling with the complexity required for the route I was taking. Could you just enlighten me more on what you have done with your insertValue function.Eyespot
I will edit my answer with a more in-depth explanation at the bottom.Stockist
This looks very inefficient!Mondragon
Can you see how to complete my solution so it works in O(n) time?Mondragon
@Mondragon Oh I don't claim that it's not inefficient (in fact I mention that it's O(n^2) in the answer). I think that it is quite an intuitive solution however; good to start with, but if efficiency is desired, then one could look for an O(n) solution instead.Stockist

© 2022 - 2025 — McMap. All rights reserved.