numpy unique without sort [duplicate]

Asked 17/10, 2012 at 3:58 Answered 17/10, 2012 at 4:13

How can I use numpy unique without sorting the result but just in the order they appear in the sequence? Something like this?

a = [4,2,1,3,1,2,3,4]

np.unique(a) = [4,2,1,3]

rather than

np.unique(a) = [1,2,3,4]

Use naive solution should be fine to write a simple function. But as I need to do this multiple times, are there any fast and neat way to do this?

Excavator answered 17/10, 2012 at 3:58 Comment(0)

You can do this with the return_index parameter:

>>> import numpy as np
>>> a = [4,2,1,3,1,2,3,4]
>>> np.unique(a)
array([1, 2, 3, 4])
>>> indexes = np.unique(a, return_index=True)[1]
>>> [a[index] for index in sorted(indexes)]
[4, 2, 1, 3]

Elm answered 17/10, 2012 at 4:11 Comment(6)

Always helpful to link the docs: numpy.unique – Subterfuge 15/11, 2017 at 16:2

Yes this gets the unique indices, but is the sorting necessary? The iterations req'd to sort are the same as just searching through the array for unique items, so the time complexity can't be avoided. But numpy.unique returns a new array object. We should be able to avoid this space complexity. – Subterfuge 15/11, 2017 at 16:9

Works but this really should be built-in as an option to np.unique. – Insociable 4/2, 2020 at 12:58

From this answer just use pandas.unique(). It doesn't sort by default. – Hausa 18/5, 2021 at 19:18

FYI: this answer (linked above) even presents a shorter solution using np.unique (and calling it only once): a[np.sort(np.unique(a, return_index=True)[1])] – Lahdidah 10/11, 2021 at 9:21

@AlaaM. pandas is a very large module; adding such a requirement for installation of your package is not suitable if you want to keep it simple – Phalangeal 6/1, 2023 at 8:24

You could do this using numpy by doing something like this, the mergsort is stable so it'll let you pick out the first or last occurrence of each value:

def unique(array, orderby='first'):
    array = np.asarray(array)
    order = array.argsort(kind='mergesort')
    array = array[order]
    diff = array[1:] != array[:-1]
    if orderby == 'first':
        diff = np.concatenate([[True], diff])
    elif orderby == 'last':
        diff = np.concatenate([diff, [True]])
    else:
        raise ValueError
    uniq = array[diff]
    index = order[diff]
    return uniq[index.argsort()]

This answer is very similar to:

def unique(array):
    uniq, index = np.unique(array, return_index=True)
    return uniq[index.argsort()]

But, numpy.unique uses an unstable sort internally so you're not guaranteed to get any specific index, ie first or last.

I think an ordered dict might also work:

def unique(array):
    uniq = OrderedDict()
    for i in array:
         uniq[i] = 1
    return uniq.keys()

Thermosiphon answered 17/10, 2012 at 4:13 Comment(3)

Thanks for your quick reply. I have thought about the first one, but I am not sure whether it is the fastest. The second one should suffer from putting a numpy object into a python object implicitly :) – Excavator 17/10, 2012 at 4:21

Is the problem with the second unique using np.unique's return_index argument that it might produce incorrect results? That this unique might return a sequence with some elements not respecting the order imposed by the original sequence, e.g., (purely for demonstration) unique([1,0,1]) --> [0, 1]? – Seasick 6/10, 2013 at 12:29

np.unique's documentation (docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html) states that the indexes returned with return_index=True will indicate the first occurrences, so your second unique should be safe and correct, right? – Seasick 6/10, 2013 at 12:36

Recommended topics

Hot tags