Why does array_unique sort the values?
Asked Answered
O

2

9

This refers to one of my previous questions: array_unique vs array_flip - This states that array_flip(array_flip()) is much quicker than array_unique() when dealing with simple strings and integers.

What I would like to know is why array_unique() creates a copy of the array, sorts it then removed the duplicates

The source for both functions is available here.

Thanks in advance!

Orthodontics answered 1/12, 2011 at 21:39 Comment(0)
E
18

If you think about it algorithmically, the way to remove duplicates is to go through a list, keep track of items you find, and get rid of things that are already in that "found this" list. One easy way to accomplish this is to sort a list. That way it's obvious where to remove duplicates efficiently. Think about you, let alone a computer; which one of these lists is easier to remove duplicates from?

apple
banana
cantaloupe
apple
durian
apple
banana
cantaloupe

or

apple
apple
apple
banana
banana
cantaloupe
cantaloupe
durian

Edit: After looking into it a bit (and finding this article), it looks like while the two both get the job done, they are not functionally equivalent, or at least they aren't always. To paraphrase a couple of these points:

  1. array_unique() sorts the values, as you noted, so array_flip(array_flip()) wouldn't return the same-ordered array -- but this might be desired.
  2. If the values are objects, then you can't make them keys (right?), i.e. the flip method wouldn't work out of the box on all arrays, whereas the sort method works fine, regardless of the value types.
Effie answered 1/12, 2011 at 21:43 Comment(6)
I would have to agree, looking at the documentation, there is an optional parameter for sorting, almost a dead give away that they do the comparisons internally with sorting.Increment
using array_flip(array_flip()) gives you unique values without the need for sorting though. There must be a better way surely?Orthodontics
Well that makes sense, since the values will have to be "squashed" since keys can't have duplicates. Thinking about it, that would leave the operation at O(n) if the assignments to the array are constant time. To answer your question, I'm not sure why the built-in function doesn't do that off the top of my head.Effie
@Dan Fego, I think you understand that there is no 'O(n)' sorting, ever. PHP arrays are implemented as hash maps and there is always non-linear speed penalty for inserting a new array member (the bigger the array is the bigger it is). If you are familliar with MySql(or almost any other) database, then you can think about it as an index on array keys (with mostly all the upsides and downsides of it). About why that double flip is not used: it will require much more(in worst case more then twice) memory to do it, while sorting requires almost none additional memory.Forenoon
@Lizard, array_flip uses sorting too, but it implicit as it is done on lower-level (implementation of arrays). I think array_unique() is a common trade-off in programming: more cpu time for less memory usage.Forenoon
it is possible to remove duplicates without sorting, however why would you? Array_unique is only for removing duplicates, so they formed a system to make it easier for them to find the duplicates. I don't know why exactly array_flip(array_flip()) removes duplicates, but if it does, and does it faster I must just wonder why you want to know why the other way sorts them?Alumnus
F
0

I think Dan Fego gave a wonderful answered as to why one would sort an array prior to removing duplicates; however, I’d like to examine what array_flip() does. I’ll be using the following array to illustrate:

'a' => 'apple'
'b' => 'banana'
'c' => 'apple'
'd' => 'date'

array_flip() exhanges the keys and values producing

'apple'  => 'a'
'banana' => 'b'
'apple'  => 'c'
'date'   => 'd'

However, keys must be unique. The manual describes how array_flip() handles this:

If a value has several occurrences, the latest key will be used as its values, and all others will be lost.

So we get something like this:

'banana' => 'b'
'apple' => 'c'
'date' => 'd'

So if we use array_flip(array_flip()) we get:

'b' => 'banana'
'c' => 'apple'
'd' => 'date'

As for the motivation behind array_unique(), we can only speculate unless Rasmus Lerdorf or someone currently working on PHP development cares to answer.

Fadeless answered 10/12, 2011 at 12:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.