Looking for a sort algorithm with as few as possible compare operations
Asked Answered
H

12

13

I want to sort items where the comparison is performed by humans:

  • Pictures
  • Priority of work items
  • ...

For these tasks the number of comparisons is the limiting factor for performance.

  • What is the minimum number of comparisons needed (I assume > N for N items)?
  • Which algorithm guarantees this minimum number?
Havener answered 15/5, 2009 at 6:19 Comment(5)
Is the person also doing the sort, or just performing the comparison? Some sorts are 'easier' than others in this regard, and would affect my selection.Broken
If you're talking about physical objects that the person also has to move around as they get sorted, don't underestimate the cost of shuffling the objects.Certifiable
I assume the sorting is done by a computer with one of the well known sorting algorithms. No physical objects are moved.Havener
@David, good point. The human equivalent of fetch and store could be much more expensive than the equvalent of compare. The cost of comparison also depends on the type of object under consideration and the number of possible variations. Sorting coins by value is just a bit easier than sorting grains of sand by weight ;)Broken
duplicate of Sorting an array with minimal number of comparisonsPepi
N
1

I don't think you're likely to get a better answer than the Wikipedia page on sorting.

Summary:

  • For arbitrary comparisons (where you can't use something like radix sorting) the best you can achieve is O(n log n)
  • Various algorithms achieve this - see the "comparison of algorithms" section.
  • The commonly used QuickSort is O(n log n) in a typical case, but O(n^2) in the worst case; there are often ways to avoid this, but if you're really worried about the cost of comparisons, I'd go with something like MergeSort or a HeapSort. It partly depends on your existing data structures.

If humans are doing the comparisons, are they also doing the sorting? Do you have a fixed data structure you need to use, or could you effectively create a copy using a balanced binary tree insertion sort? What are the storage requirements?

Narayan answered 15/5, 2009 at 6:23 Comment(4)
O(n log n) is only the best general sort. There are a few sorts, such as pigeon holing, that are o(n) albeit limited to specific types of data.Broken
Hence the "For arbitrary comparisons" part of my first point.Narayan
Fair enough, but if you have human interaction at every comparison based on recognising images, I would doubt the applicability of many arbitrary methods. Many manual sorts, e.g. filing, aim for o(n) even if they fail to achieve it. As you have asked, we need to know more about the specifics of the problem to give a good answer.Broken
Yup - it's definitely a case where the details could make a huge difference.Narayan
O
8

To answer this, we need to make a lot of assumptions.

Let's assume we are sorting pictures by cuteness. The goal is to get the maximum usable information from the human in the least amount of time. This interaction will dominate all other computation, so it's the only one that counts.

As someone else mentioned, humans can deal well with ordering several items in one interaction. Let's say we can get eight items in relative order per round.

Each round introduces seven edges into a directed graph where the nodes are the pictures. If node A is reachable from node B, then node A is cuter than node B. Keep this graph in mind.

Now, let me tell you about a problem the Navy and the Air Force solve differently. They both want to get a group of people in height order and quickly. The Navy tells people to get in line, then if you're shorter than the guy in front of you, switch places, and repeat until done. In the worst case, it's N*N comparison.

The Air Force tells people to stand in a square grid. They shuffle front-to-back on sqrt(N) people, which means worst case sqrt(N)*sqrt(N) == N comparisons. However, the people are only sorted along one dimension. So therefore, the people face left, then do the same shuffle again. Now we're up to 2*N comparisons, and the sort is still imperfect but it's good enough for government work. There's a short corner, a tall corner opposite, and a clear diagonal height gradient.

You can see how the Air Force method gets results in less time if you don't care about perfection. You can also see how to get the perfection effectively. You already know that the very shortest and very longest men are in two corners. The second-shortest might be behind or beside the shortest, the third shortest might be behind or beside him. In general, someone's height rank is also his maximum possible Manhattan distance from the short corner.

Looking back at the graph analogy, the eight nodes to present each round are eight of those with the currently most common length of longest inbound path. The length of the longest inbound path also represents the node's minimum possible sorted rank.

You'll use a lot of CPU following this plan, but you will make the best possible use of your human resources.

Overt answered 15/5, 2009 at 9:30 Comment(3)
In retrospect, cuteness cycles are absolutely possible.Overt
I absolutely love how you came back a decade later to comment on the fallibility of a cuteness sorting algorithm.Tug
These days, @CamiloMartin , I have concluded that cuteness is one big strongly-connected component. If you haven't murdered an axe, then there is a directed path of cuteness through yourself, to me, and back to yourself. Or to put it a completely different way, all humans are beautiful.Overt
F
5

From an assignment I once did on this very subject ...

The comparison counts are for various sorting algorithms operating on data in a random order

Size      QkSort    HpSort   MrgSort     ModQk   InsrtSort
  2500     31388     48792     25105     27646     1554230
  5000     67818    107632     55216     65706     6082243
 10000    153838    235641    120394    141623    25430257
 20000    320535    510824    260995    300319   100361684
 40000    759202   1101835    561676    685937
 80000   1561245   2363171   1203335   1438017
160000   3295500   5045861   2567554   3047186

These comparison counts are for various sorting algorithms operating on data that is started 'nearly sorted'. Amongst other things it shows a the pathological case of quicksort.

Size      QkSort    HpSort   MrgSort     ModQk   InsrtSort
  2500     72029     46428     16001     70618      76050
  5000    181370    102934     34503    190391    3016042
 10000    383228    226223     74006    303128   12793735
 20000    940771    491648    158015    744557   50456526
 40000   2208720   1065689    336031   1634659  
 80000   4669465   2289350    712062   3820384
160000  11748287   4878598   1504127  10173850

From this we can see that merge sort is the best by number of comparisons.

I can't remember what the modifications to the quick sort algorithm were, but I believe it was something that used insertion sorts once the individual chunks got down to a certain size. This sort of thing is commonly done to optimise quicksort.

You might also want to look up Tadao Takaoka's 'Minimal Merge Sort', which is a more efficient version of the merge sort.

Foursquare answered 15/5, 2009 at 7:4 Comment(0)
B
4

Pigeon hole sorting is order N and works well with humans if the data can be pigeon holed. A good example would be counting votes in an election.

Broken answered 15/5, 2009 at 6:23 Comment(0)
L
3

You should consider that humans might make non-transitive comparisons, e.g. they favor A over B, B over C but also C over A. So when choosing your sort algorithm, make sure it doesn't completely break when that happens.

Litchi answered 15/5, 2009 at 6:37 Comment(2)
This should probably be a comment rather than an answer, but it's an important point nonetheless.Synchrotron
Absolutely true, but look at the date... back then, the rules were not that strict.Litchi
W
3

People are really good at ordering 5-10 things from best to worst and come up with more consistent results when doing so. I think trying to apply a classical sorting algo might not work here because of the typically human multi-compare approach.

I'd argue that you should have a round robin type approach and try to bucket things into their most consistent groups each time. Each iteration would only make the result more certain.

It'd be interesting to write too :)

Widespread answered 15/5, 2009 at 6:45 Comment(1)
It's an interesting point. Most sorting algorithms only compare two things at a time, whereas people seem to be able rank a small number of items quite quickly, relatively speaking. Maybe we're a little bit parallel ;) Incidentally, bucket sort and pigeon sort are pretty much the same thing.Broken
T
2

If comparisons are expensive relative to book-keeping costs, you might try the following algorithm which I call "tournament sort". First, some definitions:

  1. Every node has a numeric "score" property (which must be able to hold values from 1 to the number of nodes), and a "last-beat" and "fellow-loser" properties, which must be able to hold node references.
  2. A node is "better" than another node if it should be output before the other.
  3. An element is considered "eligible" if there are no elements known to be better than it which have been output, and "ineligible" if any element which has not been output is known to be better than it.
  4. The "score" of a node is the number of nodes it's known to be better than, plus one.

To run the algorithm, initially assign every node a score of 1. Repeatedly compare the two lowest-scoring eligible nodes; after each comparison, mark the loser "ineligible", and add the loser's score to the winner's (the loser's score is unaltered). Set the loser's "fellow loser" property to the winner's "last-beat", and the winner's "last-beat" property to the loser. Iterate this until only one eligible node remains. Output that node, and make eligible all nodes the winner beat (using the winner's "last-beat" and the chain of "fellow-loser" properties). Then continue the algorithm on the remaining nodes.

The number of comparisons with 1,000,000 items was slightly lower than that of a stock library implementation of Quicksort; I'm not sure how the algorithm would compare against a more modern version of QuickSort. Bookkeeping costs are significant, but if comparisons are sufficiently expensive the savings could possibly be worth it. One interesting feature of this algorithm is that it will only perform comparisons relevant to determining the next node to be output; I know of no other algorithm with that feature.

Twosided answered 8/2, 2011 at 18:23 Comment(4)
Interesting idea. Did you read about it somewhere or make it up? If made up, will you publish more formally? What's the complexity analysis? Have you in mind any realistic scenarios for this? Does this extend naturally to multiway comparison primitives? etc.Overt
@Ian: I came up with the idea after watching the Olympics, sometime in the 1990s, when I had a 16MB machine on my desk at work. I don't think this would be a practical method of sorting, and don't think it would offer any particularly useful insights toward developing better, so I never felt it was worth any particular kind of formal write-up. The big under-exploited concept I would think would be worth writing up would be stateful comparators which could be given information about partitions. If one is sorting things alphabetically and knows that [simplistic example] all items...Twosided
...in a partition are between HUMBLE and HUMPH, then when comparing items within the partition there would be no need to compare the first three letters. Not a useful performance enhancement with short keys, but there are many real-world situations with long keys where thousands or millions of items will have the same value in the first 90% of the key, and having comparisons ignore that portion could offer a useful performance boost.Twosided
@Ian: BTW, here's a fun little challenge if you haven't seen it yet: how many comparisons are required to sort five items?Twosided
N
1

I don't think you're likely to get a better answer than the Wikipedia page on sorting.

Summary:

  • For arbitrary comparisons (where you can't use something like radix sorting) the best you can achieve is O(n log n)
  • Various algorithms achieve this - see the "comparison of algorithms" section.
  • The commonly used QuickSort is O(n log n) in a typical case, but O(n^2) in the worst case; there are often ways to avoid this, but if you're really worried about the cost of comparisons, I'd go with something like MergeSort or a HeapSort. It partly depends on your existing data structures.

If humans are doing the comparisons, are they also doing the sorting? Do you have a fixed data structure you need to use, or could you effectively create a copy using a balanced binary tree insertion sort? What are the storage requirements?

Narayan answered 15/5, 2009 at 6:23 Comment(4)
O(n log n) is only the best general sort. There are a few sorts, such as pigeon holing, that are o(n) albeit limited to specific types of data.Broken
Hence the "For arbitrary comparisons" part of my first point.Narayan
Fair enough, but if you have human interaction at every comparison based on recognising images, I would doubt the applicability of many arbitrary methods. Many manual sorts, e.g. filing, aim for o(n) even if they fail to achieve it. As you have asked, we need to know more about the specifics of the problem to give a good answer.Broken
Yup - it's definitely a case where the details could make a huge difference.Narayan
S
1

Here is a comparison of algorithms. The two better candidates are Quick Sort and Merge Sort. Quick Sort is in general better, but has a worse worst case performance.

Soil answered 15/5, 2009 at 6:27 Comment(1)
+1 agreed... I usually use a combination of quicksort (for large sets) and mergesort (for small sets) myself, though I never tried to figure out if it was the optimal way to go.Certifiable
T
1

Merge sort is definately the way to go here as you can use a Map/Reduce type algorithm to have several humans doing the comparisons in parallel.

Quicksort is essentially a single threaded sort algorithm.

You could also tweak the merge sort algorithm so that instead of comparing two objects you present your human with a list of say five items and ask him or her to rank them.

Another possibility would be to use a ranking system as used by the famous "Hot or Not" web site. This requires many many more comparisons, but, the comparisons can happen in any sequence and in parallel, this would work faster than a classic sort provided you have enough huminoids at your disposal.

Telium answered 15/5, 2009 at 6:47 Comment(1)
Sure, m humans can start mergesorting n/m items each "right away", while for quicksort there is a "ramping up" period at the start -- you need log(m) partitioning steps before you have enough tasks for m people. But doesn't mergesort have the same problem at the end of the algorithm? The final merge step must be performed by a single person, right? Quicksort OTOH hand keeps everyone busy till the end.Nevermore
G
1

The questions raises more questions really.

Are we talking a single human performing the comparisons? It's a very different challenge if you are talking a group of humans trying to arrange objects in order.

What about the questions of trust and error? Not everyone can be trusted or to get everything right - certain sorts would go catastrophically wrong if at any given point you provided the wrong answer to a single comparison.

What about subjectivity? "Rank these pictures in order of cuteness". Once you get to this point, it could get really complex. As someone else mentions, something like "hot or not" is the simplest conceptually, but isn't very efficient. At it's most complex, I'd say that google is a way of sorting objects into an order, where the search engine is inferring the comparisons made by humans.

Glavin answered 15/5, 2009 at 7:27 Comment(1)
I assumed that a single human makes the comparisons. So I expect them to be consistent (as far as a human can be...). Of course they are subjective and maybe wrong sometimes. If many persons do the (subjective) comparison, I would use something like the chess ELO numbering, as mentioned in #165331Havener
J
0

The best one would be the merge sort

The minimum run time is n*log(n) [Base 2] The way it is implemented is

If the list is of length 0 or 1, then it is already sorted.

Otherwise:

Divide the unsorted list into two sublists of about half the size.

Sort each sublist recursively by re-applying merge sort.

Merge the two sublists back into one sorted list.

Jaffe answered 15/5, 2009 at 6:26 Comment(0)
M
0

https://mcmap.net/q/550845/-which-sorting-algorithm-uses-the-fewest-comparisons points to Merge-insertion sort, as "perhaps still the best freely-documented sorting algorithm for minimal comparisons."

The answer also notes

Glenn K. Manacher's algorithm “and later record-breaking sorting algorithms” (none of which, unfortunately, seem to be freely documented at this time) are reported to adapt the merge-insertion sort algorithm to offer even fewer comparisons.

However, you might consider how many things you're sorting. In your case, it's probably not too many. The asymptotic results in books may not apply. Instead you (or whoever is reading this answer years later) might want to generate random permutations of realistic size for your problem and see how many comparisons it takes to sort them. For 500 items either Ford-Johnson or a traditional merge sort are already 3,800 comparisons. That's a lot for a human.

Messenia answered 20/6 at 20:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.