numpy unique without sort [duplicate]
Asked Answered
E

2

50

How can I use numpy unique without sorting the result but just in the order they appear in the sequence? Something like this?

a = [4,2,1,3,1,2,3,4]

np.unique(a) = [4,2,1,3]

rather than

np.unique(a) = [1,2,3,4]

Use naive solution should be fine to write a simple function. But as I need to do this multiple times, are there any fast and neat way to do this?

Excavator answered 17/10, 2012 at 3:58 Comment(0)
E
72

You can do this with the return_index parameter:

>>> import numpy as np
>>> a = [4,2,1,3,1,2,3,4]
>>> np.unique(a)
array([1, 2, 3, 4])
>>> indexes = np.unique(a, return_index=True)[1]
>>> [a[index] for index in sorted(indexes)]
[4, 2, 1, 3]
Elm answered 17/10, 2012 at 4:11 Comment(6)
Always helpful to link the docs: numpy.uniqueSubterfuge
Yes this gets the unique indices, but is the sorting necessary? The iterations req'd to sort are the same as just searching through the array for unique items, so the time complexity can't be avoided. But numpy.unique returns a new array object. We should be able to avoid this space complexity.Subterfuge
Works but this really should be built-in as an option to np.unique.Insociable
From this answer just use pandas.unique(). It doesn't sort by default.Hausa
FYI: this answer (linked above) even presents a shorter solution using np.unique (and calling it only once): a[np.sort(np.unique(a, return_index=True)[1])]Lahdidah
@AlaaM. pandas is a very large module; adding such a requirement for installation of your package is not suitable if you want to keep it simplePhalangeal
T
7

You could do this using numpy by doing something like this, the mergsort is stable so it'll let you pick out the first or last occurrence of each value:

def unique(array, orderby='first'):
    array = np.asarray(array)
    order = array.argsort(kind='mergesort')
    array = array[order]
    diff = array[1:] != array[:-1]
    if orderby == 'first':
        diff = np.concatenate([[True], diff])
    elif orderby == 'last':
        diff = np.concatenate([diff, [True]])
    else:
        raise ValueError
    uniq = array[diff]
    index = order[diff]
    return uniq[index.argsort()]

This answer is very similar to:

def unique(array):
    uniq, index = np.unique(array, return_index=True)
    return uniq[index.argsort()]

But, numpy.unique uses an unstable sort internally so you're not guaranteed to get any specific index, ie first or last.

I think an ordered dict might also work:

def unique(array):
    uniq = OrderedDict()
    for i in array:
         uniq[i] = 1
    return uniq.keys()
Thermosiphon answered 17/10, 2012 at 4:13 Comment(3)
Thanks for your quick reply. I have thought about the first one, but I am not sure whether it is the fastest. The second one should suffer from putting a numpy object into a python object implicitly :)Excavator
Is the problem with the second unique using np.unique's return_index argument that it might produce incorrect results? That this unique might return a sequence with some elements not respecting the order imposed by the original sequence, e.g., (purely for demonstration) unique([1,0,1]) --> [0, 1]?Seasick
np.unique's documentation (docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html) states that the indexes returned with return_index=True will indicate the first occurrences, so your second unique should be safe and correct, right?Seasick

© 2022 - 2024 — McMap. All rights reserved.