Retaining order while using Python's set difference
Asked Answered
N

3

21

I'm doing a set difference operation in Python:

x = [1, 5, 3, 4]
y = [3]

result = list(set(x) - set(y))
print(result)

I'm getting:

[1, 4, 5]

As you can see, the order of the list elements has changed. How can I retain the list x in original format?

Nigercongo answered 4/4, 2012 at 5:25 Comment(5)
Sets are by definition unordered.Debroahdebs
And you shouldn't ever be using the sets module. Use the builtin set type.Rattletrap
The sets.Set type is a reasonable choice for someone needing compatibility with older versions of Python. The built-in set type was modeled after sets.Set -- they both work fine for most applications (though the built-in version is faster).Sabin
Note that "older" means "Python 2.3 or before", which is quite a lot older.Spiers
But not as old as any of usWilmot
S
25

It looks like you need an ordered set instead of a regular set.

>>> x = [1, 5, 3, 4]
>>> y = [3]
>>> print(list(OrderedSet(x) - OrderedSet(y)))
[1, 5, 4]

Python doesn't come with an ordered set, but it is easy to make one:

import collections.abc

class OrderedSet(collections.abc.Set):
    def __init__(self, iterable=()):
        self.d = collections.OrderedDict.fromkeys(iterable)

    def __len__(self):
        return len(self.d)

    def __contains__(self, element):
        return element in self.d

    def __iter__(self):
        return iter(self.d)

Hope this helps :-)

Sabin answered 4/4, 2012 at 7:27 Comment(3)
You rock, awesome.Gentianella
this one did not work for some inexplicable reason. The list comprehension did.Seppala
@Seppala I just made an update. The import had changed a little bit from when this answer was first written.Sabin
C
22

Sets are unordered, so you will need to put the results back in the correct order after doing your set difference. Fortunately you already have the elements in the order you want, so this is easy.

diff = set(x) - set(y)
result = [o for o in x if o in diff]

But this can be streamlined; you can do the difference as part of the list comprehension (though it is arguably slightly less clear that that's what you're doing).

sety = set(y)
result = [o for o in x if o not in sety]

You could even do it without creating the set from y, but the set will provide fast membership tests, which will save you significant time if either list is large.

Circumcise answered 4/4, 2012 at 5:36 Comment(2)
nvm, figured it must be faster.Shugart
Slightly faster, yes. It will only need to traverse the list x once instead of twice.Circumcise
S
11

You could just do this

diff = set(x) - set(y)
[item for item in x if item in diff]

or

filter(diff.__contains__, x)
Shugart answered 4/4, 2012 at 5:35 Comment(9)
And if you do it with a large number of elements in y or lots of times, working on set(y) rather than y may be faster.Rattletrap
Alright, i wasn't sure about the speed but if you are sure about it then i guess that is best.Shugart
Why would you ever go for the filter version? Not Pythonic. Also the list() wrapping would make it less efficient on Python 2 where a list is already returned by filter().Rattletrap
Oh right, forgot i was using python 3... ill change it back. I thought it would be useful in python 3 as a generator. Also why is filter not pythonic?Shugart
Except that you're then immediately putting it in a list, so the list comprehension is just as good. filter and the other functions like it are taken from functional programming; for general programming in Python they are considered to be less friendly than list comprehensions or generator expressions. There was even discussion about removing them altogether in Python 3.Rattletrap
Ya that was just a mixup with python 3, i forgot i should use python 2 when answering SO questions, just had it as an option for the generator... Less friendly... hmm... well they weren't removed and python has a bunch of useful things that make writing the code easier for small things like map. If they are still in python we can still use them.Shugart
Updated mine to operate on the result of set.Shugart
Your filter would be a little smoother if you did filter(diff.__contains__, x). Of course, it's equivalent to do set_y = set(y); filter(lambda item: item not in set_y, x) as well. (Too bad Python doesn't have a not that lifts to functions or the "opposite" of filter, or that'd be nicer.)Trombidiasis
That would work if you reversed the set operation. But isn't it considered bad practise to use private methods?Shugart

© 2022 - 2024 — McMap. All rights reserved.