Pseudo-quicksort time complexity
Asked Answered
E

6

15

I know that quicksort has O(n log n) average time complexity. A pseudo-quicksort (which is only a quicksort when you look at it from far enough away, with a suitably high level of abstraction) that is often used to demonstrate the conciseness of functional languages is as follows (given in Haskell):

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = quicksort [y | y<-xs, y<p] ++ [p] ++ quicksort [y | y<-xs, y>=p]

Okay, so I know this thing has problems. The biggest problem with this is that it does not sort in place, which is normally a big advantage of quicksort. Even if that didn't matter, it would still take longer than a typical quicksort because it has to do two passes of the list when it partitions it, and it does costly append operations to splice it back together afterwards. Further, the choice of the first element as the pivot is not the best choice.

But even considering all of that, isn't the average time complexity of this quicksort the same as the standard quicksort? Namely, O(n log n)? Because the appends and the partition still have linear time complexity, even if they are inefficient.

Exanthema answered 6/7, 2012 at 3:58 Comment(4)
In short: No, and use Vector + quicksort on vectors if you want n Log n. See my answer here.Sienna
@ThomasM.DuBuisson I know there are ways to get an efficient Haskell quicksort, but my question was really just: is this thing O(n log n)?. Could you elaborate on why it isn't? The link you gave doesn't really answer that question.Exanthema
Right, it was a comment not an answer because of that omission. I know there's a "Haskell 'quicksort' is not quicksort" analysis out there (PDF) but can't find a link right now. I would expect it to provide a nice, well thought out, answer if anyone can give a pointer.Sienna
"it would still take longer than a typical quicksort because it has to do two passes of the list when it partitions it" No it wouldn't.Policewoman
F
11

This "quicksort" is actually deforested tree sort: http://www.reddit.com/r/programming/comments/2h0j2/real_quicksort_in_haskell

data Tree a = Leaf | Node a (Tree a) (Tree a)

mkTree [] = Leaf
mkTree (x:xs) = Node x (mkTree (filter (<= x) xs)) (mkTree (filter (x <) xs))

Binary tree is unbalanced, so O(N^2) worst-case and O(N*Log N) average-case complexity for building search tree.

foldTree f g Leaf = g
foldTree f g (Node x l r) = f x (foldTree f g l) (foldTree f g r)

treeSort l = foldTree (\x lft rht -> lft++[x]++rht) [] (mkTree l)

Retrieval algorithm have O(N^2) worst-case and O(N*Log N) average-case complexity.

Well-balanced:

Prelude> let rnds = iterate step where step x = (75*x) `mod` 65537
Prelude> length . quicksort . take 4000 . rnds $ 1
4000
(0.08 secs, 10859016 bytes)
Prelude> length . quicksort . take 8000 . rnds $ 1
8000
(0.12 secs, 21183208 bytes)
Prelude> length . quicksort . take 16000 . rnds $ 1
16000
(0.25 secs, 42322744 bytes)

Not-so-well-balanced:

Prelude> length . quicksort . map (`mod` 10) $ [1..4000]
4000
(0.62 secs, 65024528 bytes)
Prelude> length . quicksort . map (`mod` 10) $ [1..8000]
8000
(2.45 secs, 241906856 bytes)
Prelude> length . quicksort . map (`mod` 10) $ [1..16000]
16000
(9.52 secs, 941667704 bytes)
Flourishing answered 6/7, 2012 at 7:8 Comment(7)
This doesn't seem to answer the question (which is about average-case, not worst-case complexity). Also, the claim that insertion into an unbalanced binary tree is O(n^2) seems wrong to me.Campobello
@DanielWagner, I think the claim is about building the tree, so O(n) insertions.Ashy
@dbaupp It wasn't so obvious before I've edited my answer, so DanielWagner's criticism completely reasonable.Flourishing
@DanielWagner Complexity for retrieval part (catamorphism) is average-case O(N^2). For example this implementation en.literateprograms.org/… have O(N) complexity of retreival part of tree sort. Therefore overall complexity for "accumulator" implementation is O(N*Log N) (average) and O(N^2) (worst). For "naive" - O(N^2) average-case.Flourishing
Why is the average case for the retrieval O(n^2)? If the tree is approximately balanced, you get n log n levels, and at each level, you have to append many groups of elements together. Since append has time complexity O(n), wouldn't the whole thing have time complexity O(n log n)? This wikipedia article (en.wikipedia.org/wiki/Tree_sort) says that a tree sort is average case O(n log n), and even gives a Haskell implementation (with exactly the same retrieval algorithm), but it does say that worst case for that implementation is O(n^2) ...Exanthema
Sorry, I meant you get log n levels!Exanthema
@Ord, you're right and I was wrong. Average-case complexity for the retrieval actually is O(N*Log N).Flourishing
S
5

I agree with your assumption that the average time complexity still is O(n log n). I'm not an expert and 100% sure, but these are my thoughts:

This is a pseudo code of the in-place quicksort: (call quicksort with l=1 and r=length of the array)

Quicksort(l,r)  
--------------
IF r-l>=1 THEN  
    choose pivot element x of {x_l,x_l+1,...,x_r-1,x_r}   
    order the array-segment x_l,...x_r in such a way that  
        all elements < x are on the left side of x // line 6  
        all elements > x are on the right side of x // line 7  
    let m be the position of x in the 'sorted' array (as said in the two lines above)  
    Quicksort(l,m-1);  
    Quicksort(m+1,r)  
FI  

The average time complexity analysis then reasons by selecting the "<"-comparisons in line 6 and 7 as the dominant operation in this algorithm and finally comes to the conclusion that the average time complexity is O(n log n). As the cost of line "order the array-segment x_l,...x_r in such a way that..." are not considered (only the dominant operation is important in time complexity analysis if you want to find bounds), I think "because it has to do two passes of the list when it partitions it" is not a problem, also as your Haskell version would just take approximately twice as long in this step. The same holds true for the appendix-operation and I agree with on that this adds nothing to the asymptotic costs:

Because the appends and the partition still have linear time complexity, even if they are inefficient.

For the sake of convenience lets assume that this adds up "n" to our time complexity costs, so that we have "O(n log n+n)". As there exists a natural number o for that n log n > n for all natural numbers greater than o holds true, you can estimate n log n +n to the top by 2 n log n and to the bottom by n log n, therefore n log n+n = O(n log n).

Further, the choice of the first element as the pivot is not the best choice.

I think the choice of the pivot element is irrelevant here, because in the average case analysis you assume uniform distribution of the elements in the array. You can't know from which place in the array you should select it, and you therefore have to consider all these cases in which your pivot-element (independently from which place of the list you take it) is the i-st smallest element of your list, for i=1...r.

Schiro answered 6/7, 2012 at 7:25 Comment(0)
C
4

I can offer you a run time test on Ideone.com which seems to show more or less linearithmic run-times for both (++) based versions and the one using accumulator technique from the Landei's answer, as well as another one, using one-pass three-way partitioning. On ordered data this turns quadratic or worse for all of them.

-- random:   100k        200k         400k         800k
-- _O    0.35s-11MB  0.85s-29MB   1.80s-53MB   3.71s-87MB   n^1.3  1.1  1.0
-- _P    0.36s-12MB  0.80s-20MB   1.66s-45MB   3.76s-67MB   n^1.2  1.1  1.2
-- _A    0.31s-14MB  0.62s-20MB   1.58s-54MB   3.22s-95MB   n^1.0  1.3  1.0
-- _3    0.20s- 9MB  0.41s-14MB   0.88s-24MB   1.92s-49MB   n^1.0  1.1  1.1

-- ordered:   230     460     900     1800
-- _P        0.09s   0.33s   1.43s    6.89s                 n^1.9  2.1  2.3
-- _A        0.09s   0.33s   1.44s    6.90s                 n^1.9  2.1  2.3
-- _3        0.05s   0.15s   0.63s    3.14s                 n^1.6  2.1  2.3

quicksortO xs = go xs  where
  go []  =  []
  go (x:xs) = go [y | y<-xs, y<x] ++ [x] ++ go [y | y<-xs, y>=x]

quicksortP xs = go xs  where
  go []  =  []
  go (x:xs) = go [y | y<-xs, y<x] ++ (x : go [y | y<-xs, y>=x])

quicksortA xs = go xs [] where
  go [] acc = acc
  go (x:xs) acc = go [y | y<-xs, y<x] (x : go [y | y<-xs, y>=x] acc)

quicksort3 xs = go xs [] where
  go     (x:xs) zs = part x xs zs [] [] []
  go     []     zs = zs
  part x []     zs a b c = go a ((x : b) ++ go c zs)
  part x (y:ys) zs a b c =
      case compare y x of
                  LT -> part x ys zs (y:a) b c
                  EQ -> part x ys zs a (y:b) c
                  GT -> part x ys zs a b (y:c)

The empirical run-time complexities are estimated here as O(n^a) where a = log( t2/t1 ) / log( n2/n1 ). The timings are very approximate as ideone aren't very reliable with occasional far outlyers, but for checking the time complexity it's enough.

Thus these data seem to indicate that one-pass partition is faster by 1.5x-2x than two-pass schemes, and that using (++) is in no way slowing things down - at all. I.e. the "append operations" are not "costly" at all. The quadratic behaviour or (++)/append seems to be an urban myth — in Haskell context of course (edit: ... i.e. in the context of guarded recursion/tail recursion modulo cons; cf. this answer) (update: as user:AndrewC explains, it really is quadratic with the left folding; linear when (++) is used with the right folding; more about this here and here).


later addition: To be stable, the three-way partitioning quicksort version should too build its parts in the top-down manner:

q3s xs = go xs [] where
  go     (x:xs) z = part x xs  go  (x:)  (`go` z)
  go     []     z = z
  part x []      a  b  c = a [] (b (c []))
  part x (y:ys)  a  b  c =
      case compare y x of
                  LT -> part x ys  (a . (y:))  b  c
                  EQ -> part x ys  a  (b . (y:))  c
                  GT -> part x ys  a  b  (c . (y:))

(performance not tested).

Carte answered 7/7, 2012 at 8:36 Comment(1)
+1 for excellent decision to use actual performance data. You can get quadratic behaviour by using (++) repeatedly with foldl or the manual foldl of a badly-written recursive String-generating and appending function. I can vouch for this being a non-myth, because it was in my first major real-world Haskell program, back before Haskell 98, and it didn't output anything for ages until I realised my mistake and corrected it. If you use it sensibly, of course, with foldr, it's linear.Saturated
K
3

I don't know how much this improves the runtime complexity, but by using an accumulator you can avoid the expensive (++):

quicksort xs = go xs [] where
  go [] acc = acc
  go (x:xs) acc = go [y | y<-xs, y<x] (x : go [y | y<-xs, y>=x] acc)
Kraft answered 6/7, 2012 at 10:15 Comment(0)
M
0

Look here for a true O(n log n) quicksort that will work on both arrays and lists : http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.23.4398&rep=rep1&type=pdf It is quite easy to implement in Common Lisp, and it outperforms the sort implementation of many commercial lisps.

Moist answered 2/9, 2012 at 20:9 Comment(1)
Interesting, but I'm not sure how this answers the actual question in any way--the question was about a specific algorithm that superficially resembles quicksort, not about implementing a true quicksort.Ecru
I
0

Yes, this version has the same asymptotic complexity as the classic version -- you replace the linear-time partition with: two passes (< and >=), and you have the additional linear-time ++ (which includes linear re-allocing/copying). So it's a hefty constant-factor worse than an in-place partition, but it's still linear. All the other aspects of the algorithm are the same, so the same analysis that gives O(n log n) average-case for "true" (i.e. in-place) quicksort still holds here.

Inhumation answered 3/9, 2012 at 21:7 Comment(1)
(++) is not linear in time complexity due to lazy evaluation. Also < and >= can be easily replaced by partition from Data.List which makes only one pass.Barna

© 2022 - 2024 — McMap. All rights reserved.