numpy.unique with order preserved
Asked Answered
P

7

69
['b','b','b','a','a','c','c']

numpy.unique gives

['a','b','c']

How can I get the original order preserved

['b','a','c']

Great answers. Bonus question. Why do none of these methods work with this dataset? http://www.uploadmb.com/dw.php?id=1364341573 Here's the question numpy sort wierd behavior

Pareu answered 26/3, 2013 at 12:41 Comment(1)
See this numpy bug report.Jewel
U
114

unique() is slow, O(Nlog(N)), but you can do this by following code:

import numpy as np
a = np.array(['b','a','b','b','d','a','a','c','c'])
_, idx = np.unique(a, return_index=True)
print(a[np.sort(idx)])

output:

['b' 'a' 'd' 'c']

Pandas.unique() is much faster for big array O(N):

import pandas as pd

a = np.random.randint(0, 1000, 10000)
%timeit np.unique(a)
%timeit pd.unique(a)

1000 loops, best of 3: 644 us per loop
10000 loops, best of 3: 144 us per loop
Uribe answered 26/3, 2013 at 12:50 Comment(5)
The O(N) complexity is not mentioned anywhere and is thus only an implementation detail. The documentation simply states that it is significantly faster than numpy.unique, but this may simply mean that it has smaller constants or the complexity might be between linear and NlogN.Marmot
It's mentioned here: slideshare.net/fullscreen/wesm/…Uribe
How would you preserve the ordering with pandas.unique()? As far as I can tell it does not allow any parameters.Kaine
@F Lekschas, pandas.unique() seems to preserve the ordering as defaultAlwitt
@Uribe - The link is broken, need to remove the "/fullscreen": slideshare.net/wesm/a-look-at-pandas-design-and-development/41Falter
S
27

Use the return_index functionality of np.unique. That returns the indices at which the elements first occurred in the input. Then argsort those indices.

>>> u, ind = np.unique(['b','b','b','a','a','c','c'], return_index=True)
>>> u[np.argsort(ind)]
array(['b', 'a', 'c'], 
      dtype='|S1')
Shluh answered 26/3, 2013 at 12:49 Comment(0)
A
9
a = ['b','b','b','a','a','c','c']
[a[i] for i in sorted(np.unique(a, return_index=True)[1])]
Assay answered 26/3, 2013 at 12:44 Comment(1)
This is just a slower version of the accepted answerHomeopathist
D
4

If you're trying to remove duplication of an already sorted iterable, you can use itertools.groupby function:

>>> from itertools import groupby
>>> a = ['b','b','b','a','a','c','c']
>>> [x[0] for x in groupby(a)]
['b', 'a', 'c']

This works more like unix 'uniq' command, because it assumes the list is already sorted. When you try it on unsorted list you will get something like this:

>>> b = ['b','b','b','a','a','c','c','a','a']
>>> [x[0] for x in groupby(b)]
['b', 'a', 'c', 'a']
Duer answered 26/3, 2013 at 12:54 Comment(1)
Almost all of the time numpy problems get solved way faster using numpy, pure python solutions will be slow since numpy is specialised.Delight
A
3
#List we need to remove duplicates from while preserving order

x = ['key1', 'key3', 'key3', 'key2'] 

thisdict = dict.fromkeys(x) #dictionary keys are unique and order is preserved

print(list(thisdict)) #convert back to list

output: ['key1', 'key3', 'key2']
Altorilievo answered 16/11, 2020 at 17:52 Comment(0)
E
2

If you want to delete repeated entries, like the Unix tool uniq, this is a solution:

def uniq(seq):
  """
  Like Unix tool uniq. Removes repeated entries.
  :param seq: numpy.array
  :return: seq
  """
  diffs = np.ones_like(seq)
  diffs[1:] = seq[1:] - seq[:-1]
  idx = diffs.nonzero()
  return seq[idx]
Electrophysiology answered 10/7, 2015 at 13:40 Comment(1)
This only works for numbers. Use != instead of -Homeopathist
D
2

Use an OrderedDict (faster than a list comprehension)

from collections import OrderedDict  
a = ['b','a','b','a','a','c','c']
list(OrderedDict.fromkeys(a))
Downwash answered 17/9, 2019 at 16:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.