Can you formulate the insertion sort as a monoid in Clojure?
Asked Answered
B

2

5

This is the code for an insertion sort in Clojure:

(defn in-sort! [data]
  (letfn [(insert ([raw x](insert [] raw x))
          ([sorted [y & raw] x]
             (if (nil? y) (conj sorted x)
             (if (<= x y ) (concat sorted [x,y] raw)
                 (recur (conj sorted y)  raw x )))))]   
    (reduce insert [] data)))
;Usage:(in-sort! [6,8,5,9,3,2,1,4,7])
;Returns: [1 2 3 4 5 6 7 8 9]

This is the insertion sort formulated as a monoid in Haskell:

newtype OL x = OL [x]
instance Ord x => Monoid (OL x) where
  mempty = OL []
  mappend (OL xs) (OL ys) = OL (merge xs ys) where
    merge [] ys = ys
    merge xs [] = xs
    merge xs@(x : xs') ys@(y : ys')
       | x <= y = x : merge xs' ys
       | otherwise = y : merge xs ys'

isort :: Ord x => [x] -> OL x
isort = foldMap (OL . pure)

This is how to write a monoid in Clojure:

(def  mempty (+)) ;; 0
(def  mappend +)
(defn mconcat [ms]
    (reduce mappend mempty ms))

(mappend 3 4) ;; 7

(mconcat [2 3 4]) ;; 9

My question is: Can you formulate the insertion sort as a monoid in Clojure?

Braasch answered 24/2, 2014 at 10:17 Comment(3)
That haskell code implements a merge sort, not an insertion sort. I'm not sure, but insertion sort doesn't seem to have the monoidal structure that merge sort does.Levine
@Levine no, it's insertion sort, because the folding tree is skewed fully to the right: isort xs == foldr (merge . (:[])) [] xs (i.e. left list is always singleton).Motorbus
In other words, it maps the collection to sort into a collection of singleton lists for the monoid to work on. The fold then merges the singleton lists into the accumulator. That's the same action as an insert.Loralorain
L
2

Here, for reference, is another version which turns the tail recursion modulo cons into tail recursion with an accumulator. For the sake of variety, here is also one way to partially simulate the absent type-classes.

(defprotocol Monoid 
  (mempty [_] ) 
  (mappend [_ xs ys]))

(defn fold-map
  [monoid f xs]
  (reduce (partial mappend monoid) (mempty monoid) (map f xs)))

(defn- ord-mappend* 
  [[x & rx :as xs] [y & ry :as ys] a] 
  (cond
    (empty? xs) (concat a ys)
    (empty? ys) (concat a xs)
    :else (if (< x y) 
             (recur rx ys (conj a x))
             (recur xs ry (conj a y)))))

(def Ord 
  (reify Monoid 
    (mempty [_] (list))
    (mappend [_ xs ys] (ord-mappend* xs ys []))))

(defn isort [xs] (fold-map Ord list xs))

(defn is-sorted? [xs] (apply < xs))

(is-sorted? (isort (shuffle (range 10000))))
;=> true (sometime later)
Loralorain answered 24/2, 2014 at 19:25 Comment(5)
Could you expand on what you mean by 'one way to partially simulate the absent type-classes'Braasch
@Braasch Monoid in Haskell is a typeclass, a contract for types. As you are aware, Clojure eschews a formal type system at the language level (gradually being added at the library level). So, your series of questions - "Can you do X with typeclass Foo in Clojure?" - are somewhat awkward to answer. Often, the most truthful answer would be "Yes, but why bother when you don't have types?" Or, "That's not idiomatic for Clojure." Clojurians tend to roll their own DSLs to manage complexity as it arises rather than work in formalisms from the beginning.Loralorain
@Braasch Typeclasses in Haskell are extremely powerful for polymorphism. In Clojure, we take a lot of polymorphism for granted because we don't bother to specify type to begin with. But, there are protocols and multi-methods. These are very powerful, but not as powerful as Haskell's typeclasses. You can introduce something like a typeclass by specifying your contract in a protocol. This only dispatches on the first argument, whereas Haskell can dispatch on the entire signature, including return value. Multi-methods are more powerful than protocols, but not as powerful as Haskell's dispatch.Loralorain
@Braasch So, here I used a dummy first argument to dispatch the kind of Monoid I want and left the rest typeless. You can extend protocols to types in Clojure, but a type can be a Monoid in many different ways (e.g. [Integer, +, 0], [Integer, *, 1]). You'd need some way to distinguish which you are talking about. In Haskell, you can do this in the type signature. Here, I have you specify the desired Monoid at each use. When this becomes awkward, you can introduce a sugar macro as I did in answering one of your Monad questions.Loralorain
@Braasch This is only one possible solution. The various libraries that implement Monads, etc. in Clojure each seem to take a different approach.Loralorain
C
3

Here is my attempt, might not be the best one though :)

It's quite a direct translation of the Haskell monoid. Since we don't have auto-currying in Clojure, I needed to make a special comp-2 function.

(defn comp-2 [f g]
  (fn [x y] (f (g x) (g y))))

(defn pure-list [x]
  (cond
   (sequential? x) (if (empty? x) '() (seq x))
   :else (list x)))

(def OL-mempty (list))
(defn OL-mappend [xs ys]
  (letfn [(merge [xs ys]
            (cond
             (empty? xs) ys
             (empty? ys) xs
             :else (let [[x & xs'] xs
                         [y & ys'] ys]
                     (if (<= x y) 
                       (cons x (lazy-seq (merge xs' ys)))
                       (cons y (lazy-seq (merge xs ys')))))))]
    (doall (merge xs ys))))

(defn foldmap [mempty mappend l]
  (reduce mappend mempty l))

(def i-sort (partial foldmap OL-mempty (comp-2 OL-mappend pure-list)))
(i-sort (list 5 3 4 1 2 6)) ;; (1 2 3 4 5 6)

Here is a link to a very nice paper about morphisms in sorts.

Compatibility with reducers

If we want to go with Reducers style monoid then we could embed "mempty" in our "mappend" as a zero-arity branch. Once we do that, we can make our monoid fit right away in the Reducers library:

(require '[clojure.core.reducers :as re])

(defn pure-list [x]
  (cond
   (sequential? x) (if (empty? x) '() (seq x))
   :else (list x)))

(defn sort-monoid 
  ([] '())      ;; mempty
  ([xs ys]      ;; mappend
     (letfn [(merge [xs ys]
               (cond
                (empty? xs) ys
                (empty? ys) xs
                :else (let [[x & xs'] xs
                            [y & ys'] ys]
                        (if (<= x y) 
                          (cons x (lazy-seq (merge xs' ys)))
                          (cons y (lazy-seq (merge xs ys')))))))]
       (doall (merge (pure-list xs) (pure-list ys))))))

(re/reduce sort-monoid (list 2 4 1 2 5))
Cragsman answered 24/2, 2014 at 11:44 Comment(10)
This is nice. You should wrap the recursion in merge in a lazy-seq so you don't cause a stack overflow when sorting long sequences.Loralorain
Actually, just that won't solve the problem, sorry. I'll have to think about it more later.Loralorain
I've edited in the fix to save you one edit towards Community Wiki status (10), if you don't mind. Problem of possibly deep recursion was replaced by stacking of unrealized lazy sequences. I don't know enough about Haskell to know whether or not it would have the same problem. In any event, this would easily be solved by using an accumulator as this is tail recursive modulo a cons, but your version shows the parallel to the Haskell one.Loralorain
@A.Webb in Haskell the guarded recursion (lazy equivalent of TRMC optimization) is usually the most efficient in these situations.Motorbus
@WillNess Doing a lazy cons in these situations is usually not shied away from in Clojure either. This fails here in Clojure on larger inputs without some enforced strictness because the thunks become too deeply nested. If you iteratively build lazy sequences based on lazy sequences then you potentially blow the stack when forced to realize as the chain of thunks unravels. I rather suspect the same is true in Haskell based on your "beware the thunks blow-out!" comment back here #13042853.Loralorain
@A.Webb wow, the precision of your reference! :) :) I don't know Clojure, but in Haskell this problem doesn't happen with merge because e.g. in merge xs@(x : xs') ys@(y : ys') | x <= y = x : merge xs' ys the thunk merge xs' ys is not explored until x is consumed (the thunk creation is guarded against by the lazy cons-tructor) and then it is destroyed when its head is demanded; so there's no nesting of thunks. Nesting of thunks (e.g. created by foldl) is the problem; chaining of guarded definitions is not. :)Motorbus
@A.Webb btw I've re-written this kind of merge code in Scheme with explicit delay and it worked OK (here, towards the end of the post). It goes smth like (define (merge a b) (if ... (cons (car a) (lambda () (merge (cdr a) b))) ... )) and so there's no nesting of thunks either.Motorbus
@WillNess It is exactly the same here. A single merge is not a problem. Its the realization of the unrealized merge of an unrealized merge of an unrealized merge of an unrealized...from the fold that is. Does the Haskell version in the question blow the stack on a large shuffled input list? Do you fix it by forcing some strictness?Loralorain
@A.Webb ah, that's embarrassing. Yes, of course, the insertion will force the right argument i.e. the recursive result; so yes, it will be a problem in Haskell too. Even though an insertion doesn't force its right arg to its end, but it demands at least one element from it, and so the whole chain will be forced to the last element of the input list. But this is sort, it must force through all the elts to produce head of result... The only solution is rearranging the tree of insertions into more balanced form to limit the depth of thunk-nesting. But then it becomes mergesort.Motorbus
@A.Webb so for insertion sort, the foldr formulation is a disaster; foldl must be used, and exactly as you say, strictness better be enforced.Motorbus
L
2

Here, for reference, is another version which turns the tail recursion modulo cons into tail recursion with an accumulator. For the sake of variety, here is also one way to partially simulate the absent type-classes.

(defprotocol Monoid 
  (mempty [_] ) 
  (mappend [_ xs ys]))

(defn fold-map
  [monoid f xs]
  (reduce (partial mappend monoid) (mempty monoid) (map f xs)))

(defn- ord-mappend* 
  [[x & rx :as xs] [y & ry :as ys] a] 
  (cond
    (empty? xs) (concat a ys)
    (empty? ys) (concat a xs)
    :else (if (< x y) 
             (recur rx ys (conj a x))
             (recur xs ry (conj a y)))))

(def Ord 
  (reify Monoid 
    (mempty [_] (list))
    (mappend [_ xs ys] (ord-mappend* xs ys []))))

(defn isort [xs] (fold-map Ord list xs))

(defn is-sorted? [xs] (apply < xs))

(is-sorted? (isort (shuffle (range 10000))))
;=> true (sometime later)
Loralorain answered 24/2, 2014 at 19:25 Comment(5)
Could you expand on what you mean by 'one way to partially simulate the absent type-classes'Braasch
@Braasch Monoid in Haskell is a typeclass, a contract for types. As you are aware, Clojure eschews a formal type system at the language level (gradually being added at the library level). So, your series of questions - "Can you do X with typeclass Foo in Clojure?" - are somewhat awkward to answer. Often, the most truthful answer would be "Yes, but why bother when you don't have types?" Or, "That's not idiomatic for Clojure." Clojurians tend to roll their own DSLs to manage complexity as it arises rather than work in formalisms from the beginning.Loralorain
@Braasch Typeclasses in Haskell are extremely powerful for polymorphism. In Clojure, we take a lot of polymorphism for granted because we don't bother to specify type to begin with. But, there are protocols and multi-methods. These are very powerful, but not as powerful as Haskell's typeclasses. You can introduce something like a typeclass by specifying your contract in a protocol. This only dispatches on the first argument, whereas Haskell can dispatch on the entire signature, including return value. Multi-methods are more powerful than protocols, but not as powerful as Haskell's dispatch.Loralorain
@Braasch So, here I used a dummy first argument to dispatch the kind of Monoid I want and left the rest typeless. You can extend protocols to types in Clojure, but a type can be a Monoid in many different ways (e.g. [Integer, +, 0], [Integer, *, 1]). You'd need some way to distinguish which you are talking about. In Haskell, you can do this in the type signature. Here, I have you specify the desired Monoid at each use. When this becomes awkward, you can introduce a sugar macro as I did in answering one of your Monad questions.Loralorain
@Braasch This is only one possible solution. The various libraries that implement Monads, etc. in Clojure each seem to take a different approach.Loralorain

© 2022 - 2024 — McMap. All rights reserved.