Retaining order while using Python's set difference

Go To StackoverFlow.com

13

I'm doing a set difference operation in Python:

from sets import Set
from mongokit import ObjectId
x = [ObjectId("4f7aba8a43f1e51544000006"), ObjectId("4f7abaa043f1e51544000007"), ObjectId("4f7ac02543f1e51a44000001")]
y = [ObjectId("4f7acde943f1e51fb6000003")]
print list(Set(x).difference(Set(y)))

I'm getting:

[ObjectId('4f7abaa043f1e51544000007'), ObjectId('4f7ac02543f1e51a44000001'), ObjectId('4f7aba8a43f1e51544000006')]

I need to get the first element for next operation which is important. How can I retain the list x in original format?

2012-04-04 05:25
by Avinash
Sets are by definition unordered - icktoofay 2012-04-04 05:28
And you shouldn't ever be using the sets module. Use the builtin set type - Chris Morgan 2012-04-04 05:30
The sets.Set type is a reasonable choice for someone needing compatibility with older versions of Python. The built-in set type was modeled after sets.Set -- they both work fine for most applications (though the built-in version is faster) - Raymond Hettinger 2012-04-04 07:32
Note that "older" means "Python 2.3 or before", which is quite a lot older - Thomas Wouters 2012-04-04 23:08
But not as old as any of u - holdenweb 2014-03-15 18:14


19

It looks like you need an ordered set instead of a regular set.

>>> x = [ObjectId("4f7aba8a43f1e51544000006"), ObjectId("4f7abaa043f1e51544000007"), ObjectId("4f7ac02543f1e51a44000001")]
>>> y = [ObjectId("4f7acde943f1e51fb6000003")]
>>> print list(OrderedSet(x) - OrderedSet(y))
[ObjectId("4f7aba8a43f1e51544000006"), ObjectId("4f7abaa043f1e51544000007"), ObjectId("4f7ac02543f1e51a44000001")]

Python doesn't come with an ordered set, but it is easy to make one:

import collections

class OrderedSet(collections.Set):

    def __init__(self, iterable=()):
        self.d = collections.OrderedDict.fromkeys(iterable)

    def __len__(self):
        return len(self.d)

    def __contains__(self, element):
        return element in self.d

    def __iter__(self):
        return iter(self.d)

Hope this helps :-)

2012-04-04 07:27
by Raymond Hettinger
You rock, awesome - Bruce Dean 2018-06-27 04:07


13

Sets are unordered, so you will need to put the results back in the correct order after doing your set difference. Fortunately you already have the elements in the order you want, so this is easy.

diff = set(x) - set(y)
result = [o for o in x if o in diff]

But this can be streamlined; you can do the difference as part of the list comprehension (though it is arguably slightly less clear that that's what you're doing).

sety = set(y)
result = [o for o in x if o not in sety]

You could even do it without creating the set from y, but the set will provide fast membership tests, which will save you significant time if either list is large.

2012-04-04 05:36
by kindall
When you say streamlined, do you mean in performance - jamylak 2012-04-04 06:05
nvm, figured it must be faster - jamylak 2012-04-04 06:21
Slightly faster, yes. It will only need to traverse the list x once instead of twice - kindall 2012-04-04 16:22


4

You could just do this

diff = set(x) - set(y)
[item for item in x if item in diff]

or

filter(diff.__contains__, x)
2012-04-04 05:35
by jamylak
And if you do it with a large number of elements in y or lots of times, working on set(y) rather than y may be faster - Chris Morgan 2012-04-04 05:36
Alright, i wasn't sure about the speed but if you are sure about it then i guess that is best - jamylak 2012-04-04 05:39
It's something you'd want to check performance of - Chris Morgan 2012-04-04 05:40
Why would you ever go for the filter version? Not Pythonic. Also the list() wrapping would make it less efficient on Python 2 where a list is already returned by filter() - Chris Morgan 2012-04-04 05:41
Oh right, forgot i was using python 3... ill change it back. I thought it would be useful in python 3 as a generator. Also why is filter not pythonic - jamylak 2012-04-04 05:42
Except that you're then immediately putting it in a list, so the list comprehension is just as good. filter and the other functions like it are taken from functional programming; for general programming in Python they are considered to be less friendly than list comprehensions or generator expressions. There was even discussion about removing them altogether in Python 3 - Chris Morgan 2012-04-04 05:43
Ya that was just a mixup with python 3, i forgot i should use python 2 when answering SO questions, just had it as an option for the generator... Less friendly... hmm... well they weren't removed and python has a bunch of useful things that make writing the code easier for small things like map. If they are still in python we can still use them - jamylak 2012-04-04 05:45
Updated mine to operate on the result of set - jamylak 2012-04-04 05:50
Your filter would be a little smoother if you did filter(diff.__contains__, x). Of course, it's equivalent to do set_y = set(y); filter(lambda item: item not in set_y, x) as well. (Too bad Python doesn't have a not that lifts to functions or the "opposite" of filter, or that'd be nicer. - Dougal 2012-04-04 06:02
That would work if you reversed the set operation. But isn't it considered bad practise to use private methods - jamylak 2012-04-04 06:09
Ads