List of unique dictionaries
Asked Answered
A

23

270

Let's say I have a list of dictionaries:

[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]

How can I obtain a list of unique dictionaries (removing the duplicates)?

[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]

See How can I properly hash dictionaries with a common set of keys, for deduplication purposes? for in-depth, technical discussion of why the usual approach for deduplicating a list (explained at Removing duplicates in lists) does not work.

Autobiographical answered 18/6, 2012 at 23:30 Comment(4)
How extensive are these dictionaries? Do you need individual attribute checking to determine duplicates, or is checking a single value in them sufficient?Bogosian
These dicts got 8 key:value pairs and the list got 200 dicts. They actually got an ID and it's safe for me to remove the dict from list if the ID value found is a duplicate.Autobiographical
Possible duplicate of How to make values in list of dictionary unique?Syngamy
forzenset is an effective option. set(frozenset(i.items()) for i in list)Syngamy
G
360

So make a temporary dict with the key being the id. This filters out the duplicates. The values() of the dict will be the list

In Python2.7

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> {v['id']:v for v in L}.values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

In Python3

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ] 
>>> list({v['id']:v for v in L}.values())
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

In Python2.5/2.6

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ] 
>>> dict((v['id'],v) for v in L).values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
Gantry answered 18/6, 2012 at 23:42 Comment(9)
@John La Rooy - how could one use the same to remove dictionarys from a list based on multiple attributes , tried this but seems not to work > {v['flight']['lon']['lat']: v for v in stream}.values()Gaelan
@JorgeVidinha assuming each could be cast to str (or unicode), try this: {str(v['flight'])+':'+str(v['lon'])+','+str(v['lat']): v for v in stream}.values() This just creates a unique key based on your values. Like 'MH370:-21.474370,86.325589'Reggy
@JorgeVidinha, you can use a tuple as the dictionary key {(v['flight'], v['lon'], v['lat']): v for v in stream}.values()Gantry
note that this may alter the order of the dictionaries in the list! use OrderedDict from collections list(OrderedDict((v['id'], v) for v in L).values()) or sort the resulting list if that works better for youHerwig
If you need all values considered and not just the ID you can use list({str(i):i for i in L}.values()) Here we use str(i) to create a unique string that represents the dictionary which is used to filter the duplicates.Milka
@DelboyJay, dicts are unordered, so you'd need to use str(sorted(i.items()))Gantry
This does not actually de-duplicate identical dictionaries (where dict1 == dict2 returns true). The solution only works if you have identified a key to compare.Dotson
Hey can someone explain what is actually happening here? I don't know. list({v['id']:v for v in L}.values())Helgeson
v['id']:v for v in L creates new dictionary with ids as keys, and whole dicts as values. By default, keys in dictionaries are unique, so if the dict with the same id is being added to this new dictionary, it overwrites previous dict with the same id. .values() returns a view object that displays a list of all the values in the dictionary - here a list of whole unique (by id) dicts. And list(...) just converts the dict_values object of returned view to simple Python list.Pawnbroker
C
124

The usual way to find just the common elements in a set is to use Python's set class. Just add all the elements to the set, then convert the set to a list, and bam the duplicates are gone.

The problem, of course, is that a set() can only contain hashable entries, and a dict is not hashable.

If I had this problem, my solution would be to convert each dict into a string that represents the dict, then add all the strings to a set() then read out the string values as a list() and convert back to dict.

A good representation of a dict in string form is JSON format. And Python has a built-in module for JSON (called json of course).

The remaining problem is that the elements in a dict are not ordered, and when Python converts the dict to a JSON string, you might get two JSON strings that represent equivalent dictionaries but are not identical strings. The easy solution is to pass the argument sort_keys=True when you call json.dumps().

EDIT: This solution was assuming that a given dict could have any part different. If we can assume that every dict with the same "id" value will match every other dict with the same "id" value, then this is overkill; @gnibbler's solution would be faster and easier.

EDIT: Now there is a comment from André Lima explicitly saying that if the ID is a duplicate, it's safe to assume that the whole dict is a duplicate. So this answer is overkill and I recommend @gnibbler's answer.

Corum answered 18/6, 2012 at 23:44 Comment(6)
While overkill given the ID in this particular case, this is still an excellent answer!Onagraceous
This helps me since my dictionary does not have a key, and is only uniquely identified by all of its entries. Thanks!Christianachristiane
This solution works most of the time but there may performance issues with scaling up but the author I think knows this and therefore recommends the solution with "id". Performance concerns: This solution uses serializing to string and then deserializing ... serializing/deserializing is expensive computation and does not usually scale up well (number of items is n>1e6 or each dictionary contains >1e6 items or both) or if you have to execute this many times >1e6 or often.Dhobi
Just as a short aside this solution illustrates a great canonical example of why you would want to design your solution... i.e. if you have an id that is unique... then you can efficiently access the data... if you are lazy and don't have an id then your data access is more expensive.Dhobi
Implementation: ` output_lod = {json.dumps(d, sort_keys=True) for d in lod} output_lod = [json.loads(x) for x in output_lod] `Armindaarming
list(map(json.loads, set(map(lambda x: json.dumps(x, sort_keys=True), [{1:2}, {3:4}, {1:2}])))) only problem with this solution is when keys are not strings.Particolored
W
89

In case the dictionaries are only uniquely identified by all items (ID is not available) you can use the answer using JSON. The following is an alternative that does not use JSON, and will work as long as all dictionary values are immutable

[dict(s) for s in set(frozenset(d.items()) for d in L)]
Weatherboarding answered 22/7, 2016 at 8:0 Comment(4)
I ended up going with this, very elegant working solutionFoulk
This way I lose the order of keys inside the dictionary.Padraic
How can I use JSON to tackle this problem?Padraic
Note - you cannot rely on the order here - That is, the items in this result expression are not guaranteed to be in the same order as the original list L. So be sure to sort the list after if needed, example: sorted(final_list, key=lambda i: i['some_key'], reverse=True)Heaume
F
27

Here's a reasonably compact solution, though I suspect not particularly efficient (to put it mildly):

>>> ds = [{'id':1,'name':'john', 'age':34},
...       {'id':1,'name':'john', 'age':34},
...       {'id':2,'name':'hanna', 'age':30}
...       ]
>>> map(dict, set(tuple(sorted(d.items())) for d in ds))
[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]
Forearm answered 18/6, 2012 at 23:47 Comment(3)
Surround the map() call with list() in Python 3 to get a list back, otherwise it's a map object.Indeterminacy
an additional benefit of this approach in python 3.6+ is that the list ordering is preservedDehydrogenase
@Dehydrogenase I'm using Python 3.8.6 and list ordering is not preserved! My list: x=[{'a':15}, {'a':15}, {'b':30}] Converting: list(map(dict, set(tuple(sorted(i.items())) for i in x))) which returns: [{'b': 30}, {'a': 15}]Cauchy
C
19

You can use numpy library (works for Python2.x only):

   import numpy as np 

   list_of_unique_dicts=list(np.unique(np.array(list_of_dicts)))

To get it worked with Python 3.x (and recent versions of numpy), you need to convert array of dicts to numpy array of strings, e.g.

list_of_unique_dicts=list(np.unique(np.array(list_of_dicts).astype(str)))
Cavallaro answered 6/11, 2013 at 4:25 Comment(3)
Get the error TypeError: unorderable types: dict() > dict() when doing this in Python 3.5.Frederiksen
You might have forgotten the .asrtype(str) element !Sippet
#55695979Cauchy
T
15
a = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]

b = {x['id']:x for x in a}.values()

print(b)

outputs:

[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Trilinear answered 18/6, 2012 at 23:52 Comment(4)
In the same example. how can I get the dicts containing only the similar IDs ?Exercise
@user8162, what would you want the output to look like?Trilinear
Sometimes, I will have same ID, but different age. so output to be [{'age': [34, 40], 'id': 1, 'name': ['john', Peter]}]. In short, if IDs are same, then combine the contents of others to a list as I mentioned here. Thanks in advance.Exercise
b = {x['id']:[y for y in a if y['id'] == x['id'] ] for x in a} is one way to group them together.Trilinear
A
8

Since the id is sufficient for detecting duplicates, and the id is hashable: run 'em through a dictionary that has the id as the key. The value for each key is the original dictionary.

deduped_dicts = dict((item["id"], item) for item in list_of_dicts).values()

In Python 3, values() doesn't return a list; you'll need to wrap the whole right-hand-side of that expression in list(), and you can write the meat of the expression more economically as a dict comprehension:

deduped_dicts = list({item["id"]: item for item in list_of_dicts}.values())

Note that the result likely will not be in the same order as the original. If that's a requirement, you could use a Collections.OrderedDict instead of a dict.

As an aside, it may make a good deal of sense to just keep the data in a dictionary that uses the id as key to begin with.

Arleanarlee answered 18/6, 2012 at 23:45 Comment(0)
Y
8

We can do with pandas

import pandas as pd
yourdict=pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[293]: [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Notice slightly different from the accept answer.

drop_duplicates will check all column in pandas , if all same then the row will be dropped .

For example :

If we change the 2nd dict name from john to peter

L=[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'peter', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]
pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[295]: 
[{'age': 34, 'id': 1, 'name': 'john'},
 {'age': 34, 'id': 1, 'name': 'peter'},# here will still keeping the dict in the out put 
 {'age': 30, 'id': 2, 'name': 'hanna'}]
Yippee answered 7/6, 2019 at 1:37 Comment(1)
This is a good trick, but it should be noted that this will not work for nested dictionaries.Samanthasamanthia
L
8

In python 3, simple trick, but based on unique field (id):

data = [ {'id': 1}, {'id': 1}]

list({ item['id'] : item for item in data}.values())
Lindahl answered 2/4, 2021 at 8:30 Comment(0)
A
7

I have summarized my favorites to try out:

https://repl.it/@SmaMa/Python-List-of-unique-dictionaries

# ----------------------------------------------
# Setup
# ----------------------------------------------

myList = [
  {"id":"1", "lala": "value_1"},
  {"id": "2", "lala": "value_2"}, 
  {"id": "2", "lala": "value_2"}, 
  {"id": "3", "lala": "value_3"}
]
print("myList:", myList)

# -----------------------------------------------
# Option 1 if objects has an unique identifier
# -----------------------------------------------

myUniqueList = list({myObject['id']:myObject for myObject in myList}.values())
print("myUniqueList:", myUniqueList)

# -----------------------------------------------
# Option 2 if uniquely identified by whole object
# -----------------------------------------------

myUniqueSet = [dict(s) for s in set(frozenset(myObject.items()) for myObject in myList)]
print("myUniqueSet:", myUniqueSet)

# -----------------------------------------------
# Option 3 for hashable objects (not dicts)
# -----------------------------------------------

myHashableObjects = list(set(["1", "2", "2", "3"]))
print("myHashAbleList:", myHashableObjects)
Animation answered 13/12, 2019 at 14:33 Comment(0)
R
6

There are a lot of answers here, so let me add another:

import json
from typing import List

def dedup_dicts(items: List[dict]):
    dedupped = [ json.loads(i) for i in set(json.dumps(item, sort_keys=True) for item in items)]
    return dedupped

items = [
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]
dedup_dicts(items)
Raptor answered 13/3, 2019 at 13:24 Comment(0)
S
5

I don't know if you only want the id of your dicts in the list to be unique, but if the goal is to have a set of dict where the unicity is on all keys' values.. you should use tuples key like this in your comprehension :

>>> L=[
...     {'id':1,'name':'john', 'age':34},
...    {'id':1,'name':'john', 'age':34}, 
...    {'id':2,'name':'hanna', 'age':30},
...    {'id':2,'name':'hanna', 'age':50}
...    ]
>>> len(L)
4
>>> L=list({(v['id'], v['age'], v['name']):v for v in L}.values())
>>>L
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, {'id': 2, 'name': 'hanna', 'age': 50}]
>>>len(L)
3

Hope it helps you or another person having the concern....

Shend answered 26/6, 2018 at 17:11 Comment(1)
Similar with comprehensive answers above BUT, this is more generic and might provide full unique option. So this is upvoted.Galactometer
W
3

Expanding on John La Rooy (Python - List of unique dictionaries) answer, making it a bit more flexible:

def dedup_dict_list(list_of_dicts: list, columns: list) -> list:
    return list({''.join(row[column] for column in columns): row
                for row in list_of_dicts}.values())

Calling Function:

sorted_list_of_dicts = dedup_dict_list(
    unsorted_list_of_dicts, ['id', 'name'])
Write answered 4/9, 2017 at 16:14 Comment(0)
B
3

If there is not a unique id in the dictionaries, then I'd keep it simple and define a function as follows:

def unique(sequence):
    result = []
    for item in sequence:
        if item not in result:
            result.append(item)
    return result

The advantage with this approach, is that you can reuse this function for any comparable objects. It makes your code very readable, works in all modern versions of Python, preserves the order in the dictionaries, and is fast too compared to its alternatives.

>>> L = [
... {'id': 1, 'name': 'john', 'age': 34},
... {'id': 1, 'name': 'john', 'age': 34},
... {'id': 2, 'name': 'hanna', 'age': 30},
... ] 
>>> unique(L)
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}]
Bate answered 18/2, 2022 at 12:11 Comment(0)
M
2

In python 3.6+ (what I've tested), just use:

import json

#Toy example, but will also work for your case 
myListOfDicts = [{'a':1,'b':2},{'a':1,'b':2},{'a':1,'b':3}]
#Start by sorting each dictionary by keys
myListOfDictsSorted = [sorted(d.items()) for d in myListOfDicts]

#Using json methods with set() to get unique dict
myListOfUniqueDicts = list(map(json.loads,set(map(json.dumps, myListOfDictsSorted))))

print(myListOfUniqueDicts)

Explanation: we're mapping the json.dumps to encode the dictionaries as json objects, which are immutable. set can then be used to produce an iterable of unique immutables. Finally, we convert back to our dictionary representation using json.loads. Note that initially, one must sort by keys to arrange the dictionaries in a unique form. This is valid for Python 3.6+ since dictionaries are ordered by default.

Manifold answered 2/10, 2018 at 19:47 Comment(1)
Remember to sort the keys before dumping to JSON. You also don't need to convert to list before doing set.Framing
S
1

Well all the answers mentioned here are good, but in some answers one can face error if the dictionary items have nested list or dictionary, so I propose simple answer

a = [str(i) for i in a]
a = list(set(a))
a = [eval(i) for i in a]
Surinam answered 27/5, 2020 at 21:20 Comment(1)
Best Answer, except I would use literal_eval from ast just to be safe as eval isn't safe.Tankersley
C
1

Objects can fit into sets. You can work with objects instead of dicts and if needed after all set insertions convert back to a list of dicts. Example

class Person:
    def __init__(self, id, age, name):
        self.id = id
        self.age = age
        self.name = name

my_set = {Person(id=2, age=3, name='Jhon')}

my_set.add(Person(id=3, age=34, name='Guy'))

my_set.add({Person(id=2, age=3, name='Jhon')})

# if needed convert to list of dicts
list_of_dict = [{'id': obj.id,
                 'name': obj.name,
                 'age': obj.age} for obj in my_set]
Clayborne answered 25/9, 2021 at 0:26 Comment(1)
A shorter way to define Person: Person = collections.namedtuple('Person', ['id', 'age', 'name'])Tinny
J
0

A quick-and-dirty solution is just by generating a new list.

sortedlist = []

for item in listwhichneedssorting:
    if item not in sortedlist:
        sortedlist.append(item)
Jovita answered 17/9, 2016 at 23:58 Comment(0)
P
0

Let me add mine.

  1. sort target dict so that {'a' : 1, 'b': 2} and {'b': 2, 'a': 1} are not treated differently

  2. make it as json

  3. deduplicate via set (as set does not apply to dicts)

  4. again, turn it into dict via json.loads

import json

[json.loads(i) for i in set([json.dumps(i) for i in [dict(sorted(i.items())) for i in target_dict]])]
Pomiculture answered 27/9, 2021 at 8:8 Comment(0)
T
0

There may be more elegant solutions, but I thought it might be nice to add a more verbose solution to make it easier to follow. This assumes there is not a unique key, you have a simple k,v structure, and that you are using a version of python that guarantees list order. This would work for the original post.

data_set = [
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]

# list of keys
keys = [k for k in data_set[0]]

# Create a List of Lists of the values from the data Set
data_set_list = [[v for v in v.values()] for v in data_set]

# Dedupe
new_data_set = []
for lst in data_set_list:
    # Check if list exists in new data set
    if lst in new_data_set:
        print(lst)
        continue
    # Add list to new data set
    new_data_set.append(lst)

# Create dicts
new_data_set = [dict(zip(keys,lst)) for lst in new_data_set]    

print(new_data_set)
Tamqrah answered 6/2, 2023 at 0:28 Comment(0)
T
-1

Pretty straightforward option:

L = [
    {'id':1,'name':'john', 'age':34},
    {'id':1,'name':'john', 'age':34},
    {'id':2,'name':'hanna', 'age':30},
    ]


D = dict()
for l in L: D[l['id']] = l
output = list(D.values())
print output
Teary answered 18/6, 2012 at 23:48 Comment(0)
O
-2

Heres an implementation with little memory overhead at the cost of not being as compact as the rest.

values = [ {'id':2,'name':'hanna', 'age':30},
           {'id':1,'name':'john', 'age':34},
           {'id':1,'name':'john', 'age':34},
           {'id':2,'name':'hanna', 'age':30},
           {'id':1,'name':'john', 'age':34},]
count = {}
index = 0
while index < len(values):
    if values[index]['id'] in count:
        del values[index]
    else:
        count[values[index]['id']] = 1
        index += 1

output:

[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]
Oldworld answered 18/6, 2012 at 23:52 Comment(4)
You need to test this a bit more. Modifying the list while you are iterating over it might not always work as you expectGantry
@gnibbler very good point! I'll delete the answer and test it more thoroughly.Oldworld
Looks better. You can use a set to keep track of the ids instead of the dict. Consider starting the index at len(values) and counting backwards, that means that you can always decrement index whether you del or not. eg for index in reversed(range(len(values))):Gantry
@gnibbler interesting, do sets have near constant look up like dictionaries?Oldworld
N
-4

This is the solution I found:

usedID = []

x = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]

for each in x:
    if each['id'] in usedID:
        x.remove(each)
    else:
        usedID.append(each['id'])

print x

Basically you check if the ID is present in the list, if it is, delete the dictionary, if not, append the ID to the list

Nugatory answered 18/6, 2012 at 23:43 Comment(6)
I'd use a set rather than list for usedID. It's a faster lookup, and more readableSclerous
Yea i didnt know about sets... but I am learning... I was just looking at @gnibbler answer...Nugatory
You need to test this a bit more. Modifying the list while you are iterating over it might not always work as you expectGantry
Yea I don't understand why it doesn't work... Any ideas what I'm doing wrong?Nugatory
No I caught the problem... its just that I dont understand why its giving that problem... do you know?Nugatory
When you remove an item from the list, all the remaining items are moved down one place, so each never references the item following one that is removedGantry

© 2022 - 2024 — McMap. All rights reserved.