numpy.unique with order preserved

P

7

69

['b','b','b','a','a','c','c']

numpy.unique gives

['a','b','c']

How can I get the original order preserved

['b','a','c']

Great answers. Bonus question. Why do none of these methods work with this dataset? http://www.uploadmb.com/dw.php?id=1364341573 Here's the question numpy sort wierd behavior

Pareu answered 26/3, 2013 at 12:41 Comment(1)

See this numpy bug report. – Jewel 19/8, 2021 at 7:40

U

114

unique() is slow, O(Nlog(N)), but you can do this by following code:

import numpy as np
a = np.array(['b','a','b','b','d','a','a','c','c'])
_, idx = np.unique(a, return_index=True)
print(a[np.sort(idx)])

output:

['b' 'a' 'd' 'c']

Pandas.unique() is much faster for big array O(N):

import pandas as pd

a = np.random.randint(0, 1000, 10000)
%timeit np.unique(a)
%timeit pd.unique(a)

1000 loops, best of 3: 644 us per loop
10000 loops, best of 3: 144 us per loop

Uribe answered 26/3, 2013 at 12:50 Comment(5)

The O(N) complexity is not mentioned anywhere and is thus only an implementation detail. The documentation simply states that it is significantly faster than numpy.unique, but this may simply mean that it has smaller constants or the complexity might be between linear and NlogN. – Marmot 26/3, 2013 at 17:57

It's mentioned here: slideshare.net/fullscreen/wesm/… – Uribe 26/3, 2013 at 22:40

How would you preserve the ordering with pandas.unique()? As far as I can tell it does not allow any parameters. – Kaine 23/11, 2016 at 17:2

@F Lekschas, pandas.unique() seems to preserve the ordering as default – Alwitt 12/4, 2018 at 9:5

@Uribe - The link is broken, need to remove the "/fullscreen": slideshare.net/wesm/a-look-at-pandas-design-and-development/41 – Falter 6/1, 2023 at 12:35

S

27

Use the return_index functionality of np.unique. That returns the indices at which the elements first occurred in the input. Then argsort those indices.

>>> u, ind = np.unique(['b','b','b','a','a','c','c'], return_index=True)
>>> u[np.argsort(ind)]
array(['b', 'a', 'c'], 
      dtype='|S1')

Shluh answered 26/3, 2013 at 12:49 Comment(0)

A

9

a = ['b','b','b','a','a','c','c']
[a[i] for i in sorted(np.unique(a, return_index=True)[1])]

Assay answered 26/3, 2013 at 12:44 Comment(1)

This is just a slower version of the accepted answer – Homeopathist 16/2, 2017 at 14:30

D

4

If you're trying to remove duplication of an already sorted iterable, you can use itertools.groupby function:

>>> from itertools import groupby
>>> a = ['b','b','b','a','a','c','c']
>>> [x[0] for x in groupby(a)]
['b', 'a', 'c']

This works more like unix 'uniq' command, because it assumes the list is already sorted. When you try it on unsorted list you will get something like this:

>>> b = ['b','b','b','a','a','c','c','a','a']
>>> [x[0] for x in groupby(b)]
['b', 'a', 'c', 'a']

Duer answered 26/3, 2013 at 12:54 Comment(1)

Almost all of the time numpy problems get solved way faster using numpy, pure python solutions will be slow since numpy is specialised. – Delight 26/3, 2013 at 13:9

A

3

#List we need to remove duplicates from while preserving order

x = ['key1', 'key3', 'key3', 'key2'] 

thisdict = dict.fromkeys(x) #dictionary keys are unique and order is preserved

print(list(thisdict)) #convert back to list

output: ['key1', 'key3', 'key2']

Altorilievo answered 16/11, 2020 at 17:52 Comment(0)

E

2

If you want to delete repeated entries, like the Unix tool uniq, this is a solution:

def uniq(seq):
  """
  Like Unix tool uniq. Removes repeated entries.
  :param seq: numpy.array
  :return: seq
  """
  diffs = np.ones_like(seq)
  diffs[1:] = seq[1:] - seq[:-1]
  idx = diffs.nonzero()
  return seq[idx]

Electrophysiology answered 10/7, 2015 at 13:40 Comment(1)

This only works for numbers. Use != instead of - – Homeopathist 16/2, 2017 at 14:31

D

2

Use an OrderedDict (faster than a list comprehension)

from collections import OrderedDict  
a = ['b','a','b','a','a','c','c']
list(OrderedDict.fromkeys(a))

Downwash answered 17/9, 2019 at 16:21 Comment(0)

Recommended topics

Hot tags