I'm doing a set difference operation in Python:
from sets import Set
from mongokit import ObjectId
x = [ObjectId("4f7aba8a43f1e51544000006"), ObjectId("4f7abaa043f1e51544000007"), ObjectId("4f7ac02543f1e51a44000001")]
y = [ObjectId("4f7acde943f1e51fb6000003")]
print list(Set(x).difference(Set(y)))
I'm getting:
[ObjectId('4f7abaa043f1e51544000007'), ObjectId('4f7ac02543f1e51a44000001'), ObjectId('4f7aba8a43f1e51544000006')]
I need to get the first element for next operation which is important. How can I retain the list x
in original format?
sets
module. Use the builtin set
type - Chris Morgan 2012-04-04 05:30
It looks like you need an ordered set instead of a regular set.
>>> x = [ObjectId("4f7aba8a43f1e51544000006"), ObjectId("4f7abaa043f1e51544000007"), ObjectId("4f7ac02543f1e51a44000001")]
>>> y = [ObjectId("4f7acde943f1e51fb6000003")]
>>> print list(OrderedSet(x) - OrderedSet(y))
[ObjectId("4f7aba8a43f1e51544000006"), ObjectId("4f7abaa043f1e51544000007"), ObjectId("4f7ac02543f1e51a44000001")]
Python doesn't come with an ordered set, but it is easy to make one:
import collections
class OrderedSet(collections.Set):
def __init__(self, iterable=()):
self.d = collections.OrderedDict.fromkeys(iterable)
def __len__(self):
return len(self.d)
def __contains__(self, element):
return element in self.d
def __iter__(self):
return iter(self.d)
Hope this helps :-)
Sets are unordered, so you will need to put the results back in the correct order after doing your set difference. Fortunately you already have the elements in the order you want, so this is easy.
diff = set(x) - set(y)
result = [o for o in x if o in diff]
But this can be streamlined; you can do the difference as part of the list comprehension (though it is arguably slightly less clear that that's what you're doing).
sety = set(y)
result = [o for o in x if o not in sety]
You could even do it without creating the set
from y
, but the set
will provide fast membership tests, which will save you significant time if either list is large.
x
once instead of twice - kindall 2012-04-04 16:22
You could just do this
diff = set(x) - set(y)
[item for item in x if item in diff]
or
filter(diff.__contains__, x)
y
or lots of times, working on set(y)
rather than y
may be faster - Chris Morgan 2012-04-04 05:36
filter
version? Not Pythonic. Also the list()
wrapping would make it less efficient on Python 2 where a list is already returned by filter()
- Chris Morgan 2012-04-04 05:41
filter
and the other functions like it are taken from functional programming; for general programming in Python they are considered to be less friendly than list comprehensions or generator expressions. There was even discussion about removing them altogether in Python 3 - Chris Morgan 2012-04-04 05:43
filter(diff.__contains__, x)
. Of course, it's equivalent to do set_y = set(y); filter(lambda item: item not in set_y, x)
as well. (Too bad Python doesn't have a not
that lifts to functions or the "opposite" of filter
, or that'd be nicer. - Dougal 2012-04-04 06:02